Re: Character conversion

new topic     » topic index » view thread      » older message » newer message

Pete Lomax wrote:

> On Fri, 22 Oct 2004 21:43:00 +0200, Juergen Luethje <j.lue at gmx.de>
> wrote:
>
>> Now I downloaded 'nlseu.zip' and compared it:
>> Applied to the whole text of Euphoria/Doc/Library.doc, the nlsLower()
>> function in 'nlseu.zip' takes 310% of the time that the lower() function
>> in 'wildcard.e' uses.
> Probably
>> Furthermore, nlsLower() is only for Windows.
> True[1]
>
> It will convert to lower case not only the usual A-Z, not only the few
> characters in #80..#FF, but also, potentially, if modified to use
> CharLowerW instead of CharLowerA, unicode.

Yes.(*)

> (Much) Slower, yes.

BTW: 'CharLowerW' is a different function, it might be faster or slower
     than 'CharLowerA'.

In the meantime I looked at the documentation for 'CharLowerA' at MSDN.
Interestingly, there is an alternative way of calling this function:

-----------------------------------------------------------------
include dll.e
include machine.e

constant
   user32 = open_dll("user32.dll"),
   CharLowerA = define_c_proc(user32, "CharLowerA", {C_POINTER})

global function lower_winA (sequence text)
   sequence ret
   atom addr

   addr = allocate_string(text)
   c_proc(CharLowerA, {addr})
   ret = peek({addr, length(text)})
   free(addr)
   return ret
end function
-----------------------------------------------------------------

-- Demo
include nlsEu.ew

sequence s, lo
atom t
integer n

s = "ABCDE"      -- or "abcde"
n = 500000

lo = ""
t = time()
for i = 1 to n do
   lo = nlsLower(s)
end for
t = time()-t
? t

lo = ""
t = time()
for i = 1 to n do
   lo = lower_winA(s)
end for
t = time()-t
? t


==> For flat strings containing 5 or more characters, the above
    implementation of 'CharLowerA' is faster (on my system) than
    nlsLower(s) that is contained in 'nlseu.zip', no matter whether
    or not the characters are already in lowercase.

Applied to the whole text of Euphoria/Doc/Library.doc, the lower_winA()
function above only takes 46% of the time that the lower() function
in 'wildcard.e' uses (Win 98)!

Kat, you probably should call 'CharLowerA' this way. smile

> However, I wanted to point out that it is fundamentally better, at
> least for some purposes.

Yes, thanks to Kat and you for pointing out these functions.

As you wrote, 'CharLowerA' doesn't handle Unicode. However, in programs
that are intended for world-wide distribution, it has at least 2
advantages over self-written lookup tables:
a) It's easy e.g. for me to build a lookup table for the German umlauts.
   That's sufficient for my private need. But a world-wide distributed
   program has to support many languages. It would be much work to write
   (and test!) all those lookup tables.
b) 'CharLowerA' takes automatically the active code page into account.



(*) Windows 95/98/Me don't have 'native' support for 'CharLowerW'.

> Regards,
> Pete
> [1] no doubt there is a similar Linux system call.

Agreed, but then we probably need a functions such as:

<pseudo eucode>
function cool_lower (object x)
   if platform() = DOS32 then
      if unicode then
         ...
      else
         ...
      end if
   elsif platform() = WIN32 then
      if unicode then
         if WinVersion <= ME then
            ...
         else
            ...
         end if
      else
         ...
      end if
   else
      if unicode then
         ...
      else
         ...
      end if
   end if
end function
</pseudo eucode>

Do we already have such a function? smile

Regards,
   Juergen

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu