Multiple Languages in Internet Projects

Strategies for the integration of multiple languages in websites and virtual communities

This article is based on a chapter of my thesis (chapter 3.2, "Sprache", p.35-39). The original text is written in German. For quotes or general interest, the German version is also available.

When building a virtual community, one of the most important decisions is about the supported languages.

The language understood most of all internet users is - little surprising - English. Around 300 million native speakers (cp. Global Reach 2004) alone are a pretty impressive number, not even taking people into consideration, who understand English as a foreign language. Furthermore, there are more users with broadband connections in the English language area (cp. Internet World Stats 2004).

Slowly, English is loosing its dominance. Especially the Chinese language area is growing rapidly (cp. iResearch 2005), but also Japanese and Spanish are gaining importance (cp. Internet World Stats 2005).

Nonetheless, for a German project there is only one foreign language of choice: English. Other aspiring languages are not widespread enough to justify the effort of translation. While many employees in Germany can speak English, the chance that many employees know another language is very low. Hiring translators for everything is very costly as well. Not to mention that there are people needed who can supervise foreign language groups in a virtual community.

German as a niche

Using only English in a German-based project would probably lead to accusations of missing local roots. (cp. Michael 2003, p. 9) On top of that, you are usually confident and experienced when working with your own cultural home. This is even more important during the founding phase, when the community still has to develop and founders and operators of the system still have to get a "feeling" for the community.

History shows that most virtual communities - if they don't have a large advertising budget - can acquire their first wave of members through intensive mouth-to-mouth advertising (cp. Sotira 2003). Starting with German founders, acquiring their German friends can be difficult when the platform is only in English. 79% of all Germans prefer websites to be in their native language (cp. Maceviciute). German users will also be more bound to the community, if their native language makes them "feel at home".

English as an option

Nonetheless, the chance to enter the international market with its enormous user potential is tempting.

The advantage of English is the vast number of people you can reach - still more than with Chinese or Spanish. Also, it'll be easier to find staffs who know English than any other foreign language. This eases maintenance and member support of one's English version.

Basically, there are three different ways to build a multi-language website:

  1. Different languages available at the same address.
  2. Different languages available at different addresses on the same domain.
  3. Different languages available at different addresses on different domains.

With the first way, the web server checks the preferred language of the user's browser. Depending on this preference, the according language (or a default language) will be delivered. Users can change their browser settings to receive their preferred language. Most users already use a browser which is configured for their native language, but not all. For example, if you have a German Firefox browser, chances are high that your language setting already defaults to German.

Another way is to host different languages at different addresses on the same domain. For example, http://server.org/de/xyz/ could contain the German contents and the English contents could be available at http://server.org/en/xyz/. A choice of language has to be made only once - either automatically or by user interaction - to guide the user on the right language path. Afterwards, users move in this path unless they choose to change the language

The third way is a similar approach: For each language one domain is used. Each domain contains all the contents in the corresponding language. For example, domain.de could host the German contents and domain.com the English ones. According to Nicolas Michael (2003, p.8), this is the best although most expensive solution. Since he refers to domains hosted on completely separated systems, one can argue about the costs. It is possible to host several domains on the same server, which reduces the cost to the effort for implementation and the yearly registration fees.

Technically, there is not much difference between these three options. It is merely a question of "URIDesign" (cp. Lima / Powell o.D.; Berners-Lee 1998; Théreaux et al. 2003) which address-scheme one chooses.

There is only one - important - catch to the first solution: A URL doesn't stand for exactly one content anymore. One user may receive English content, the next one German. It is not possible to set a link to content in a specific language. If one wants to send a link to his German friend for example, it depends on the friend's browser whether he really gets German content or any other language. This is especially troublesome in the context of search engine optimization (SEO). Search engines cannot index more than one language at the same address. Without special preparations one looses potential search engine traffic. Most common statistics software also cannot differentiate between different language-settings at the same URL.

To solve these problems, one has to create custom solutions, which result in more effort and more possible complications. Therefore I do not recommend this option.

Sources

Ahmed, Bashir / Cha, Sung-Hyuk / Tappert, Charles (2004): Language Identification from Text Using N-gram Based Cumulative Frequency Addition
In: Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004

Berners-Lee, Tim (1998): Cool URIs don't change

Cavnar, William B. / Trenkle, John M. (1994): N-Gram-Based Text Categorization
In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval

Global Reach (2004): Global Internet Statistics

Grefenstette, Gregory (1995): Comparing two language identification schemes
In: Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data

Grefenstette, Gregory / Nioche, Julien (2000): Estimation of English and non-English Language Use on the WWW
In: Proceedings of RIAO'2000, "Content-Based Multimedia Information Access", Paris, April 12-14,2000, pp. 237-246

Internet World Stats (2004): DSL Broadband Internet Subscribers - Top 20 Countries

Internet World Stats (2005): Internet Usage By Language

iResearch (2005): China's Internet Users Top 100 Million

Lima, Joe / Powell, Thomas A. (ohne Datum): Towards Next Generation URLs

Maceviciute, Elena (ohne Datum): Multilingual Virtual World: Languages on the Internet

Michael, Nicolas (2003): Sprache im Internet

Sotira, Angelo (2003): The real story behind devART

Théreaux, Olivier / Bournez, Carine / Dubost, Karl / Guild, Ted / Lafon, Yves (2003): Common HTTP Implementation Problems

© 2005 Florian Sander