Re: Character conversion
- Posted by "Juergen Luethje" <j.lue at gmx.de> Oct 23, 2004
- 746 views
Pete Lomax wrote: > On Fri, 22 Oct 2004 21:43:00 +0200, Juergen Luethje <j.lue at gmx.de> > wrote: > >> Now I downloaded 'nlseu.zip' and compared it: >> Applied to the whole text of Euphoria/Doc/Library.doc, the nlsLower() >> function in 'nlseu.zip' takes 310% of the time that the lower() function >> in 'wildcard.e' uses. > Probably >> Furthermore, nlsLower() is only for Windows. > True[1] > > It will convert to lower case not only the usual A-Z, not only the few > characters in #80..#FF, but also, potentially, if modified to use > CharLowerW instead of CharLowerA, unicode. Yes.(*) > (Much) Slower, yes. BTW: 'CharLowerW' is a different function, it might be faster or slower than 'CharLowerA'. In the meantime I looked at the documentation for 'CharLowerA' at MSDN. Interestingly, there is an alternative way of calling this function:
----------------------------------------------------------------- include dll.e include machine.e constant user32 = open_dll("user32.dll"), CharLowerA = define_c_proc(user32, "CharLowerA", {C_POINTER}) global function lower_winA (sequence text) sequence ret atom addr addr = allocate_string(text) c_proc(CharLowerA, {addr}) ret = peek({addr, length(text)}) free(addr) return ret end function ----------------------------------------------------------------- -- Demo include nlsEu.ew sequence s, lo atom t integer n s = "ABCDE" -- or "abcde" n = 500000 lo = "" t = time() for i = 1 to n do lo = nlsLower(s) end for t = time()-t ? t lo = "" t = time() for i = 1 to n do lo = lower_winA(s) end for t = time()-t ? t
==> For flat strings containing 5 or more characters, the above implementation of 'CharLowerA' is faster (on my system) than nlsLower(s) that is contained in 'nlseu.zip', no matter whether or not the characters are already in lowercase. Applied to the whole text of Euphoria/Doc/Library.doc, the lower_winA() function above only takes 46% of the time that the lower() function in 'wildcard.e' uses (Win 98)! Kat, you probably should call 'CharLowerA' this way. > However, I wanted to point out that it is fundamentally better, at > least for some purposes. Yes, thanks to Kat and you for pointing out these functions. As you wrote, 'CharLowerA' doesn't handle Unicode. However, in programs that are intended for world-wide distribution, it has at least 2 advantages over self-written lookup tables: a) It's easy e.g. for me to build a lookup table for the German umlauts. That's sufficient for my private need. But a world-wide distributed program has to support many languages. It would be much work to write (and test!) all those lookup tables. b) 'CharLowerA' takes automatically the active code page into account. (*) Windows 95/98/Me don't have 'native' support for 'CharLowerW'. > Regards, > Pete > [1] no doubt there is a similar Linux system call. Agreed, but then we probably need a functions such as: <pseudo eucode> function cool_lower (object x) if platform() = DOS32 then if unicode then ... else ... end if elsif platform() = WIN32 then if unicode then if WinVersion <= ME then ... else ... end if else ... end if else if unicode then ... else ... end if end if end function </pseudo eucode> Do we already have such a function? Regards, Juergen