RE: Small feature request for future EU versions

new topic     » goto parent     » topic index » view thread      » older message » newer message

----- Original Message -----
From: Patrick Barnes <mrtrick at gmail.com>
To: <EUforum at topica.com>
Sent: Tuesday, October 26, 2004 9:05 PM
Subject: Re: Small feature request for future EU versions


>
> What I think he means is this:
>
> constant  LA_up= "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
> constant LA_lo = "abcdefghijklmnopqrstuvwxyz"
> constant LA_diff = LA_up - LA_lo
>
> global function case_LA(integer c, object x)
>     integer n
>     if atom(x) then
>         if c then
>              n = find( x, LA_lo )
>              if n then
>                   x += LA_diff[n]
>              end if
>         else
>              n = find( x, LA_up )
>              if n then
>                   x -= LA_diff[n]
>              end if
>          end if
>      else
>          for i = 1 to length(x) do
>              x[i] = case_LA(c, x[i])
>          end for
>      end if
>
>       return x
> end function
>


Well... not exactly this.
find() is a bit slow.
What I suggest is having a 256-character sequence from which you take the
character corresponding to the the one you want to translate, used as an
index.
For example, you know that 'A' is equal to 65 (its ASCII value). Assume then
that sequence X contains an 'a' in position 66, a 'b' in position 67, and so
on. So, to translate sequence Z from upper to lower case, you will code:

for i = 1 to length(Z) do
    Z[i] = X[Z[i]+1]
end for

For an atom: C = X[C+1]

Adding 1 is necessary because Euphoria uses 1-origin indexing. C, instead,
uses 0-origin indexing, and consequently you do not have to add 1.

Something similar but oriented to translating whole files is one of my
contributions to The Archive, I dont remember under what name, at the
moment.

> > I like your suggestion, but I just do not see the solution how to make
> > this way the stable templet for *any* alphabet now.
>
> Works for any alphabet and code page, and is faster, because it
> doesn't have to keep slicing the alphabet sequences.
>
> > Some alphabets have no case at all, some alphabets have different
> > numbers of upper and lower letters.
> > For example, computer Russian has 3 extra letters in upper case, which
> > are absent in Russian canonical grammar.
>
> The limitation of the above function is that LA_up and LA_lo must be
> the same length... what do you mean by 3 extra letters? What if you
> try to convert them to lower case? If it should just leave them as
> upper case, that's fine - just leave them out of the function.
>
>
> ***The above function is completely untested. It should not be used in
> nuclear reactors, medical life support systems, or anywhere where
> failure may cause injury*** smile
> --
> MrTrick
>

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu