Re: More bugs in exw.exe

new topic     » goto parent     » topic index » view thread      » older message » newer message

Juergen Luethje wrote:
> 
> CChris wrote:
> 
> > Juergen Luethje wrote:
> > 
> >> CChris wrote:
> 
> [snipped old text]
> 
> >>> Looks like I'm going to import the capitalisation
> >>> table using function AX=#6504 to a sequence and transcode accented chars
> >>> using
> >>> that. Unless AX=#6522 is considered a decent, bug-free substitute - the
> >>> character
> >>> tables may differ subtly.
> >> 
> >> It seems to me that the official DOS functions to capitalize a string are
> >> INT 21, AX=#6521 and AX=#6522. Here is an example:
> >> <<a
> >> href="http://www.uv.tietgen.dk/Staff/Mlha/PC/Prog/asm/int/21/6521.htm">http://www.uv.tietgen.dk/Staff/Mlha/PC/Prog/asm/int/21/6521.htm</a>>
> >> 
> > 
> > I have played with them last evening, and decided not to use them either.
> > Capitalisation
> > of accented characters is not done or at least not always done - for
> > instance,
> > ü and Ü are not recognised as differing in case only.
> 
> It looks as if I've found the reason for this. In my tests,
> INT 21 AX=#6521 properly capitalizes all lowercase German special
> characters in _ASCII_ code. But when I just open an editor on Windows
> and type some text, the text normally is in _ANSI_ code ...
> 
>       |      in       | output of INT 21 AX=#6521 (on Win XP)
> ------+---------------+--------------------------------------
>       |     "äöü"     |
> ASCII | {#84,#94,#81} |  {#8E,#99,#9A} = "ÄÖÜ"  -->  correct
> ANSI  | {#E4,#F6,#FC} |  {#E5,#F6,#FC} = "åöü"  -->  wrong
> 
> And the problem is that a program hardly knows, whether a file that it
> reads is coded in ASCII or in ANSI.
> 

Reminds me of something <smile/>.

> > The only thing that I got working is the one I mentioned, ie working
> > directly
> > with the filename capitalisation table a pointer to which is returned by
> > function
> > #6504. I'm currently streamlining the code, since conditional jumps are
> > usually
> > bad for performance, but basically that fixed the bug.
> 
> If this is reliable, then it's fine. Did you test it with special
> characters in ASCII code and ANSI code?
> 

When processed by a C compiler, the front end sees all strings as ANSI, so the
ASCII framework is not a problem. And my method correctly maps â to Â, ü to Ü, é
to É and so forth, didn't try them all.

As a result, including both fileü.e and fileÜ.e with the same contents no longer
causes "a namespace qualifier is required here".

If you want to run some tests, just run this:
include machine.e

sequence regs,dword
atom strings,addr
strings=allocate_low(5)
-- prepare the DOS serice call
regs=repeat(0,10)
regs[REG_DX]=-1 -- current country
regs[REG_BX]=-1 -- current code page
regs[REG_ES]=floor(strings/16) -- turn 32 bit ptr into 16:16
regs[REG_DI]=and_bits(strings,#0F)
regs[REG_CX]=5 -- buffer size
regs[REG_AX]=#6504 -- function code
-- request *table from DOS for the current code page and country
regs=dos_interrupt(#21,regs)
-- should be an error check here, will fail under DOS <4.0

-- retrieve 16:16 far pointer and turn it into 32-bit near ptr
dword=peek({strings+1,4})
addr=dword[1]+256*dword[2]+16*dword[3]+4096*dword[4]
-- print numerical values in table
?peek({addr+2,128})
-- clean up and wait for keypress
free_low(strings)
?machine_func(26,0)


The printout of numeric character codes will tell you exactly what is mapped to
what. Tconsole display using puts() instead of print() is wrong as expected.

> > Under Windows, there's a CharUpperA() API that does the job straight and
> > hardly
> > ever fails, so using it is certainly the safest way to go.
> 
> I always use CharUpperBuffA(), but I really hope the results are the
> same as with CharUpperA(). smile

Exactly the same, and even more so as CharUpperA() transforms the string in
place. This can be a problem with multibyte characters (and then CharUpperBuffA()
is safer to use -.

> 
>       |      in       | output of CharUpperBuffA() (on Win XP)
> ------+---------------+---------------------------------------
>       |     "äöü"     |
> ASCII | {#84,#94,#81} |  {#84,#94,#81} = "äöü"  -->  wrong
> ANSI  | {#E4,#F6,#FC} |  {#C4,#D6,#DC} = "ÄÖÜ"  -->  correct
> 
> So yes, it hardly ever fails -- as long as the special characters are
> coded in ANSI. >:->
> 
> Regards,
>    Juergen
> 
> -- 
> Computers help to solve problems,
> which wouldn't exist without computers.

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu