RE: code pages I think
- Posted by Kat <gertie at PELL.NET> May 20, 2001
- 445 views
On 17 May 2001, at 13:17, Stuart Cox wrote: > It's relevant because these guys are Euphoria developers and they're trying to > bridge the communications gap that is forced upon different language > groups.... It's also helped when someone from Romania came into an irc channel i haunt, and said something which would be abusive in english, but wasn't in Romanian. The words he chose, or the translator he used, or something, produced a word, altho it was incorrectly spelled for an expected english word, it was correctly spelled for a Romanian word. Since i knew he was from Romania, i was the only one who knew what he meant, but i don't actually speak that language. Since i am working on some language processing code, it helps to have a global perspective. With usa president Bush promoting more nuclear power plants, for instance, i am looking to Russian sites for old news (1957-58) on Sverdlovsk, Kyshtym, and Kamensk- Uralskiy,, altho i still can't write them in Cyrillic. > Code pages are arcane. It also matters when a program written by a english-speaking programmer's code that does heavy string manipulation encounters non-english bytes. There is lots of code in the Eu archives that is written in a way that inherently excludes the use of Cyrillic, for instance, by typecasting out the upper 127- 255 chars, or ignoring them. I have no idea yet how to apply upper() and lower() to non-english characters. This has affected my coding, and it's given me more thoughts on items that must be considered in the future. Such as internally translating the 8bits+codepage into unicode,, or whatever comes around that's better,, or that i devise for my own use. 16-bit unicode (already called obsolete) will of course break all my existing code, but maybe not by much. Kat