RE: Small feature request for future EU versions
- Posted by "Ricardo M. Forno" <rforno at uyuyuy.com> Oct 28, 2004
- 536 views
----- Original Message ----- From: Patrick Barnes <mrtrick at gmail.com> To: <EUforum at topica.com> Sent: Tuesday, October 26, 2004 9:05 PM Subject: Re: Small feature request for future EU versions > > What I think he means is this: > > constant LA_up= "ABCDEFGHIJKLMNOPQRSTUVWXYZ" > constant LA_lo = "abcdefghijklmnopqrstuvwxyz" > constant LA_diff = LA_up - LA_lo > > global function case_LA(integer c, object x) > integer n > if atom(x) then > if c then > n = find( x, LA_lo ) > if n then > x += LA_diff[n] > end if > else > n = find( x, LA_up ) > if n then > x -= LA_diff[n] > end if > end if > else > for i = 1 to length(x) do > x[i] = case_LA(c, x[i]) > end for > end if > > return x > end function > Well... not exactly this. find() is a bit slow. What I suggest is having a 256-character sequence from which you take the character corresponding to the the one you want to translate, used as an index. For example, you know that 'A' is equal to 65 (its ASCII value). Assume then that sequence X contains an 'a' in position 66, a 'b' in position 67, and so on. So, to translate sequence Z from upper to lower case, you will code: for i = 1 to length(Z) do Z[i] = X[Z[i]+1] end for For an atom: C = X[C+1] Adding 1 is necessary because Euphoria uses 1-origin indexing. C, instead, uses 0-origin indexing, and consequently you do not have to add 1. Something similar but oriented to translating whole files is one of my contributions to The Archive, I dont remember under what name, at the moment. > > I like your suggestion, but I just do not see the solution how to make > > this way the stable templet for *any* alphabet now. > > Works for any alphabet and code page, and is faster, because it > doesn't have to keep slicing the alphabet sequences. > > > Some alphabets have no case at all, some alphabets have different > > numbers of upper and lower letters. > > For example, computer Russian has 3 extra letters in upper case, which > > are absent in Russian canonical grammar. > > The limitation of the above function is that LA_up and LA_lo must be > the same length... what do you mean by 3 extra letters? What if you > try to convert them to lower case? If it should just leave them as > upper case, that's fine - just leave them out of the function. > > > ***The above function is completely untested. It should not be used in > nuclear reactors, medical life support systems, or anywhere where > failure may cause injury*** > -- > MrTrick >