i18n + l10n = NLS
Date: Wednesday, January 28 @ 08:33:21 CET
Topic: Help Requests


internationalization + localization = Native Language Support i18n ( Internationalization )
This is the process of making the program ready to accept the use of languages other than English.
Basically most program languages in use today the program's language is written in English. PHP is no different.
The input / output doesn't necessarily need to be English but some preparations must be made to accept alternative encoding standards. Now there are hundreds of encoding standards ISO 8859-1 being one of the most used in *-nuke languages today Why UTF-8?

Before UTF-8 emerged, Linux users all over the world had to use various different language-specific extensions of ASCII. Most popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc. This made the exchange of files difficult and application software had to worry about various small differences between these encodings. Support for these encodings was usually incomplete, untested, and unsatisfactory, because the application developers rarely used all these encodings themselves. Because of these difficulties, major Linux distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8. UTF-8 support has improved dramatically over the last few years and ever more people now use UTF-8 on a daily basis in * text files (source code, HTML files, email messages, etc.)
* file names
* standard input and standard output, pipes
* environment variables
* cut and paste selection buffers
* telnet, modem, and serial port connections to terminal emulators and in any other places where byte sequences used to be interpreted in ASCII. 1 "Unicode is well on the way to replace ASCII, ISO 8859 and EUC at all levels. It allows you to handle not only text in practically any script and language used on this planet, it also provides you with a comprehensive set of mathematical and technical symbols to simplify scientific information exchange."-Markus Kuhn. 1 Further more he states, "With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that, like Unix, were designed entirely around ASCII. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar with it and that your software supports UTF-8 smoothly. " To use UTF-8 encoding on the web, you can do so by notifying the browser through the header, meta content tag, and/or form control. All browsers since Navigator 4 accept UTF-8 encoding although it's fonts size are a big big for utf fonts. Two great attribute of using UTF-8 is that you can use multiple languages on one web page with the same encoding...Here is sample UTF-8 and it is backwards compatible with ASCII l10n ( Localization )
The process of translating output into individual language files, nuke scores a big plus on this fact because multi-lingualization is built in, all you need is the right files in the right place and nuke takes care of this fact via the cookie. Numbers and currency also must be taken into account. Luckily PHP has a built in function for this using the setlocale() function.. We are also accepting input to help update the language files at http://coppermine.findhere.org/modules.php?name=CPGlang there is a web based form to help get all language variables translated into their native tongues. Many of the admin variables and others are still in English. It even helps translate itself... I will also be making this module into part of the NLI package so others may use it for their modules. Native Language Support
When a program is properly i18n and l10n is said to provide NLS. This should include language detection to show the user the page in the right language the first time, my NLI release will include this function.
This includes:
  • Locale specific and culture specific conventions (dates, numbers, etc.)
  • Messages in native languages
  • Input method support
So what does this mean to the programmer? Just add a couple of tags and you're on you way? Not exactly. There are a couple of problems here. Form submission "Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire [ISO10646] character set."3 GET METHOD
When we use the URL to pass parameters in nuke we are using the get method. Any variable using from the user of language files needs to be rawurlencoded. This is not just for moving to UTF-8. Currently 17 nuke languages use an 8 bit encoding, your modules should provide a way for ANY character a user or language file may use in a parameter, uploaded file name, title of article, or wherever these may be found...
FORMS WITH GET METHOD
ASCII characters ONLY
POST METHOD
To use UTF-8 or other encoding other than 8859-1 you need to have proper form controls including enctype="multipart/form-data" and accept-charset="UTF-8" 2 STRINGS Character encodings that work with PHP:
ISO-8859-*, EUC-JP, UTF-8 Character encodings that do NOT work with PHP:
JIS, SJIS Character encoding, that does not work with PHP, may be converted with mbstring's HTTP input/output conversion feature/function. mbstring is an extended module and may not be enable in all configration and it function are considered experimental.4 1 UTF-8 and Unicode FAQ for Unix/Linux: 2 Form submission 3 W3 org Form content types 4 Multi-Byte String Functions





This article comes from NukeCops
http://www.nukecops.com

The URL for this story is:
http://www.nukecops.com/modules.php?name=News&file=article&sid=1481