FutureQuest, Inc. FutureQuest, Inc. FutureQuest, Inc.
Foreign Language Customization
Posted on 24 October 2003 05:50 PM

Setting up support for your language for use on the FutureQuest servers is easy as FutureQuest has taken great care to provide the latest and most powerful tools available today for this purpose. We also comply with all norms, procedures and standards set to this end on the Internet.

Human beings, past and present, on our planet have used a number of languages. There are many reasons why one would want to identify the language used when presenting information.

So in order for you to configure your FutureQuest® account for a specific language or languages, we must work on several areas.

Depending on your language you will need to find its corresponding ISO-xxxx-xx tag and character definition so that the encodings are set correctly for the visiting browsers and characters specific to your language to be processed automatically for your users.

1. Web Server configuration (Apache)

In order to configure your local language or provide automatic support for more than one language you need to work with your .htaccess file.

The .htaccess file serves as a tool to configure Apache's behavior local to your account. This allows you to set the operational environment of the web server to meet your requirements, one of which is the language negotiation capabilities.

There are three directives that relate to that and they are available to you at any time to put them to good use. You first need, as mentioned before, to find out the language tag that is specific to your needs and the iso-xxxx-xx character set that belongs to it.

So, for example, if your language is Japanese then you have access to three character sets:

Charset File extension
EUC-JP .euc
ISO-2022-JP .jis
SHIFT_JIS .sjis

The language definition in Apache is composed of two letters that signify the language in question. Following with the Japanese example, the language tag for Apache would then be jp and its extension .jp.

Language Code File Extension
Japanese jp .jp

Now that we have the charset for encoding documents in your language and that we also know how to tell Apache which is our default language tag, we only need to go ahead and set this parameter into the .htaccess file so that the new settings enter into effect.

Add the following lines to the .htaccess file:

LanguagePriority jp en
AddCharset EUC-JP .euc
AddCharset ISO-2022-JP .jis
AddCharset SHIFT_JIS .sjis
AddLanguage jp .jp

These settings will add the needed information to fully support all available variants of the Japanese language, character set and encodings needed to present the information correctly.

The LanguagePriority directive tells Apache that Japanese is to be set as the default language ("jp") and English as secondary language ("en").

The AddCharacterset adds support to the special characters used in displaying data in Japanese.

The AddLanguage directive instructs Apache to provide support for the Japanese language.

By doing this you will make sure that your static content managed by Apache will be processed as you expect.

If a user visits your site and has not set the Japanese language into his/her browser then he/she will be notified of the official language of the site and normally the browser will offer to add the necessary software to support the character encodings and if granted auto install any needed files for this to happen. Of course you can always prepare an "en" (English page) that automatically gets displayed into the browser if the visitor has this as his/her default browser language.

Thus the use of the LanguagePriority tag becomes evident at this point.

2. CGI Configuration (scripts mainly perl)

If your site uses CGI scripts based on perl you need to send a header, prior to any actual content, that includes the content type which helps both the server and the browser to process the information base on the type of data being sent at any given moment. Thus a text/html type of content, which is the type you would use to generate an HTML page as the output of a CGI script, would need to have the added ISO character set encoding specification in it. An example would be:

Print "Content-type: text/html charset=ISO-2022-JP\r\n";

Or if you are using cgi.pm

print $in->header(-type=>'text/html', -charset=>'ISO-2022-JP');

For all other programming languages or CGI you may already be using, or will use, you need to set the charset whenever you prepare to create dynamic content output that finally will translate into a html page at the stage where you generate the headers sent to the browser as in the previous perl examples.

3. Static HTML pages with META tags

As discussed earlier if you correctly set the language priority and add support for the corresponding character set you can benefit from automatic content negotiation by just using the correct extension. So elaborating from previous examples you would name your html files, for example, index.html.jis or index.html.euc, index.html.jp or even index.html.sjis which would automatically provide support for this and serve the correct one to the browser of your visitors.

Of course there is always more than one way to do things. If you do not want to use the extensions method and you wish to use the normal html extension then you can just add the following META tag to the header section of each of your documents:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-2022-JP">

This will instruct the browser to switch to the desired character / encoding specified on the META tag.

Finally if your language is not Japanese and you wish to benefit from this feature just replace the iso, charset and language tags where appropriate to customize your site for your own language.

4. Email

When processing email, which is to provide encoded content other than english, you must add the correct Content-Type: text/plain; charset="iso-xxxx-xx" string into the header portion of your mailing script, the same as for static html. This needs to be done no matter if the email is single-part message or a MIME multi-part one. On the latter one you need to specify this on each part.

This line tells the receiving email client exactly what MIME type or types are included in the mail message. As long as the MIME-type referenced is compatible with the mail program it should have no problems automatically decoding the attachments. In the example, [text/plain; charset="ISO-2022-JP"] tells us that the message contains a regular ASCII text message encoded in Japanese.

The implementation depends upon your choice of CGI language. Commonly there is a mail function that helps set the headers and other needed items of an email.You should consult your reference manual to use the correct one.

A final word of advice. If you find yourself using a combination of static html pages and CGI scripts, for example a plain html form to be processed by a CGI script, then you need to combine methods. This means you would add the charset by means of a META tag into your static html form and into the CGI script in the appropriate area as discussed earlier where your script sends to the browser the content type string and of course set your .htaccess file as explained before.

Further reading can be found in the following links:

http://www.apacheweek.com/features/negotiation
http://httpd.apache.org/docs/content-negotiation.html
http://httpd.apache.org/docs/mod/mod_negotiation.html
http://phpbuilder.com/columns/didimo20010214.php3