Re: More bugs in exw.exe
- Posted by CChris <christian.cuvier at ag?icul?ure.gouv.fr> Jul 18, 2007
- 641 views
Juergen Luethje wrote: > > CChris wrote: > > > Juergen Luethje wrote: > > > >> CChris wrote: > > [snipped old text] > > >>> Looks like I'm going to import the capitalisation > >>> table using function AX=#6504 to a sequence and transcode accented chars > >>> using > >>> that. Unless AX=#6522 is considered a decent, bug-free substitute - the > >>> character > >>> tables may differ subtly. > >> > >> It seems to me that the official DOS functions to capitalize a string are > >> INT 21, AX=#6521 and AX=#6522. Here is an example: > >> <<a > >> href="http://www.uv.tietgen.dk/Staff/Mlha/PC/Prog/asm/int/21/6521.htm">http://www.uv.tietgen.dk/Staff/Mlha/PC/Prog/asm/int/21/6521.htm</a>> > >> > > > > I have played with them last evening, and decided not to use them either. > > Capitalisation > > of accented characters is not done or at least not always done - for > > instance, > > ü and Ü are not recognised as differing in case only. > > It looks as if I've found the reason for this. In my tests, > INT 21 AX=#6521 properly capitalizes all lowercase German special > characters in _ASCII_ code. But when I just open an editor on Windows > and type some text, the text normally is in _ANSI_ code ... > > | in | output of INT 21 AX=#6521 (on Win XP) > ------+---------------+-------------------------------------- > | "äöü" | > ASCII | {#84,#94,#81} | {#8E,#99,#9A} = "ÄÖÜ" --> correct > ANSI | {#E4,#F6,#FC} | {#E5,#F6,#FC} = "åöü" --> wrong > > And the problem is that a program hardly knows, whether a file that it > reads is coded in ASCII or in ANSI. > Reminds me of something <smile/>. > > The only thing that I got working is the one I mentioned, ie working > > directly > > with the filename capitalisation table a pointer to which is returned by > > function > > #6504. I'm currently streamlining the code, since conditional jumps are > > usually > > bad for performance, but basically that fixed the bug. > > If this is reliable, then it's fine. Did you test it with special > characters in ASCII code and ANSI code? > When processed by a C compiler, the front end sees all strings as ANSI, so the ASCII framework is not a problem. And my method correctly maps â to Â, ü to Ü, é to É and so forth, didn't try them all. As a result, including both fileü.e and fileÜ.e with the same contents no longer causes "a namespace qualifier is required here". If you want to run some tests, just run this:
include machine.e sequence regs,dword atom strings,addr strings=allocate_low(5) -- prepare the DOS serice call regs=repeat(0,10) regs[REG_DX]=-1 -- current country regs[REG_BX]=-1 -- current code page regs[REG_ES]=floor(strings/16) -- turn 32 bit ptr into 16:16 regs[REG_DI]=and_bits(strings,#0F) regs[REG_CX]=5 -- buffer size regs[REG_AX]=#6504 -- function code -- request *table from DOS for the current code page and country regs=dos_interrupt(#21,regs) -- should be an error check here, will fail under DOS <4.0 -- retrieve 16:16 far pointer and turn it into 32-bit near ptr dword=peek({strings+1,4}) addr=dword[1]+256*dword[2]+16*dword[3]+4096*dword[4] -- print numerical values in table ?peek({addr+2,128}) -- clean up and wait for keypress free_low(strings) ?machine_func(26,0)
The printout of numeric character codes will tell you exactly what is mapped to what. Tconsole display using puts() instead of print() is wrong as expected. > > Under Windows, there's a CharUpperA() API that does the job straight and > > hardly > > ever fails, so using it is certainly the safest way to go. > > I always use CharUpperBuffA(), but I really hope the results are the > same as with CharUpperA(). Exactly the same, and even more so as CharUpperA() transforms the string in place. This can be a problem with multibyte characters (and then CharUpperBuffA() is safer to use -. > > | in | output of CharUpperBuffA() (on Win XP) > ------+---------------+--------------------------------------- > | "äöü" | > ASCII | {#84,#94,#81} | {#84,#94,#81} = "äöü" --> wrong > ANSI | {#E4,#F6,#FC} | {#C4,#D6,#DC} = "ÄÖÜ" --> correct > > So yes, it hardly ever fails -- as long as the special characters are > coded in ANSI. >:-> > > Regards, > Juergen > > -- > Computers help to solve problems, > which wouldn't exist without computers.