OpenEuphoria: Forum: Internationalization

1. Internationalization

Posted by "Cuny, David" <David.Cuny at DSS.CA.GOV> Jun 22, 1998
505 views

I've added Jiri's fonts into WinMan, and am now faced with some small
problems. Some feedback would be appreciated:


[The Problem]

I would like to support an international character set, but I'm not sure
how to go about doing it. The ASCII set of characters that I see on my
PC does not come anywhere close to supporting a complete international
character set. For example, the characters:

  <  >
  A  A

are nowhere to be found. I suspect that on other PCs, a different ROM
set is loaded, and the characters appear automagically. This illusion
breaks down when using bitmapped font files, since they reflect the font
set of the user who created them.

I /think/ that the ANSI character set is set up to extend ASCII by
another bit, and adds a lot more characters to the "basic" Latin set.
Unicode adds another byte, and adds a slew of additional characters. I'm
under the impression that a UNICODE pair that looks like this:

   { n, 0 }

maps to an ANSI character {n}.

X Windows seems to use a compound text format of four bytes, with a
number of different character sets that can be mapped to.

Even if I extended Jiri's font sets to, say, 512 characters mapped to
the ANSI set, there are some other minor problems with the file format
(through no fault of his own) - there is no information as to point
size, family, and other information which would be useful when looking
for a "close" replacement.

As a result, I will probably choose to adopt the McSoft bitmapped font
file format, and ANSI character mapping. 


[The Questions]

1. Does anyone know if the ANSI character set extends to include all
those umlaut/acute/etc. stuff?

2. Is this what the Win32 font set is mapped to?

3. Would choosing to support only ANSI/Win32 character set be unbearable
for some people out there? That would mean no built-in support for:

   Arabic
   Cyrillic
   Greek
   Hebrew
   Kana
   Korean
   Thai
   (etc.),

although one could theoretically select a bitmapped font set of that
flavor (I suspect that's how non-UNICODE Win32 applications do it).


4. Any other suggestions?

Thanks!

new topic » topic index » view message » categorize

2. Re: Internationalization

Posted by JesusC - Jesus Consuegra <jconsuegra at REDESTB.ES> Jun 22, 1998
474 views
Last edited Jun 23, 1998

> -----Mensaje original-----
> De: Euphoria Programming for MS-DOS
> [mailto:EUPHORIA at LISTSERV.MUOHIO.EDU]En nombre de Cuny, David
> Enviado el: lunes 22 de junio de 1998 21:19
> Para: EUPHORIA at LISTSERV.MUOHIO.EDU
> Asunto: Internationalization

> [The Questions]
>
> 1. Does anyone know if the ANSI character set extends to include all
> those umlaut/acute/etc. stuff?

As far as I know, all basic character sets are made up of 256 characters.
Since this is not enough to map all the characters around, there are the so
called "code pages". Each code page assigns an arbitrary character set to
the
0..255 codes. Complex character sets (like Asian) use different approaches.

> 2. Is this what the Win32 font set is mapped to?

No. I believe that Win32 uses ISO 8990 characters. (I'm not sure).

> 3. Would choosing to support only ANSI/Win32 character set be unbearable
> for some people out there? That would mean no built-in support for:


Jesus.

new topic » goto parent » topic index » view message » categorize

3. Internationalization

Posted by "Wallace B. Riley" <wryly at MINDSPRING.COM> Jun 24, 1998
492 views

David Cuny's suggestion about internationalization is a good one.  However,
anyone who works on this project should make sure he or she knows something
about the languages and alphabets before the project is cast in concrete.

I was involved in just such a project in the early 1980s.  It had an
egregious error in the formal specification, which would have made the
sponsor a laughing stock for anybody who knew anything about foreign
languages -- as well as the company I was working for, on contract.

The sponsor was an organization that was connecting the electronic catalogs
of a group of libraries.  This required the terminals to be able to generate
and recognize a wide variety of alphabetic characters in many languages that
use alphabets, such as Russian, Arabic, Hebrew, and many others.  These
languages are now taken into account in latter-day ASCII; I'm not sure the
standards had been issued when my project was under way.  They may have been
in preparation, or they may not have been started.  In any case, how the
error I found got past the librarians working for the sponsor amazes me.

One of the "characters" specified for the terminal was a little circle well
above the base line on which all characters were displayed.  In the
specifications, this little circle was called an "angstrom".  I knew darn
well that wasn't the correct name for that little circle; I didn't know what
the correct name was, but I did know what an angstrom is.  I went to a
public library and consulted an elementary Swedish grammar book to find out
what that little circle should be called.

I learned that the Swedish alphabet includes several vowels that the English
alphabet doesn't have.  One of those vowels is an A with a little circle on
top of it, which is different from the A without the circle.  The little
circle is not a separate character in Swedish.  The word "angstrom", derived
from the name of a 19th-century Swedish physicist, is correctly spelled with
the A-with-circle.

So help me, I didn't know a thing about Swedish before consulting that
elementary grammar book, and to this day the only thing I remember about
Swedish is that it has those two characters, the "normal" A and the A with
the circle.

I told the people in the company where I was working about this error, but
they said nothing could be done about it, because it was in the specs.  In
later years, I've come to think I should have written to the president of
the company to warn him of this ridiculous mistake.

Later, after I was no longer involved, I heard that the project had gone
down in flames, with a lot of ill will on both sides.  I rather wonder if
the error I found had something to do with it.  I also wonder how many
equally ridiculous errors were specified for characters in other alphabets.

Wally Riley
wryly at mindspring.com

new topic » goto parent » topic index » view message » categorize

4. Re: Internationalization

Posted by Kasey <kaseyb at GEOCITIES.COM> May 24, 1998
488 views
Last edited May 25, 1998

JesusC - Jesus Consuegra wrote:
>
> > -----Mensaje original-----
> > De: Euphoria Programming for MS-DOS
> > [mailto:EUPHORIA at LISTSERV.MUOHIO.EDU]En nombre de Cuny, David
> > Enviado el: lunes 22 de junio de 1998 21:19
> > Para: EUPHORIA at LISTSERV.MUOHIO.EDU
> > Asunto: Internationalization
>
> > [The Questions]
> >
> > 1. Does anyone know if the ANSI character set extends to include all
> > those umlaut/acute/etc. stuff?
>
> As far as I know, all basic character sets are made up of 256 characters.
> Since this is not enough to map all the characters around, there are the so
> called "code pages". Each code page assigns an arbitrary character set to
> the
> 0..255 codes. Complex character sets (like Asian) use different approaches.
>
> > 2. Is this what the Win32 font set is mapped to?
>
> No. I believe that Win32 uses ISO 8990 characters. (I'm not sure).
>
> > 3. Would choosing to support only ANSI/Win32 character set be unbearable
> > for some people out there? That would mean no built-in support for:
>
> Jesus.


                IIRC:
        unicode is a 32bit character set,and win32 and to some degree win32c
(especially the versions sold outside us/canada) support it.
I'm not sure exactly what the differences are between the ussss/
intenational versions are, though I suspect the us version just defaults
to the non-unicode versions of the dll routines and doese not have the
extra character set files.

        there are ISO standard mappings for various character sets and how to
they are signaled when more than 1 byte is needed. I think they come
from pre-unicode (at least on wintell machines)  days. japanese I think
is iso220, not shure but lots of messages in the anime groups in usenet
have "gobledygook" subjects that all begin somthing like ?=iso220=? or
somthing.

        try microsofts site and maybee the iso's site as I'm not to positive
about any of this, been a while since i looked into it (and that was
just outa curiosity, not need:).

                Kasey

new topic » goto parent » topic index » view message » categorize

5. Internationalization

Posted by Shawn Pringle <pringle at techie.com> Feb 12, 2002
494 views

I have been using a computer in Argentnia here and
according to the language preference "Spanish" and
other conventions familiar applications use Spanish.
Somehow they all seem to know which human language 
to use, does anyone here know a way the programmer
can see this in Euphoria?  How do we find out what
preferences were selected by the user on instalation 
of Windows?  Thanks in advance.

Regards,
Shawn Pringle

new topic » goto parent » topic index » view message » categorize

6. Re: Internationalization

Posted by jzeitlin at cyburban.com Jan 06, 2002
446 views

On Sun, 06 Jan 2002 11:20:39 -0800, Bernie Ryan <xotron at localnet.com>
wrote:

>If you want to use different langauges in win32lib all
>have to do is use is my win32eru to generate include resource
>files with the resource strings and dialog boxs for whatever
>langauge you want your program to use.

What about for non-Win32 code?  DOS?  Linux?  Future ports to other
non-Win32 OSes?

Again, the routines I'm working on aren't the be-all and end-all of
internationalization.  Just a set of useful routines toward managing the
problem.
--
Jeff Zeitlin
jzeitlin at cyburban.com
(ILink: news without the abuse. Ask via email.)

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. Internationalization

2. Re: Internationalization

3. Internationalization

4. Re: Internationalization

5. Internationalization

6. Re: Internationalization

Search

Include:

Quick Links

User menu

Misc Menu