1. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 401 views
Tony Steward wrote: > > > Hello all, > When I use nat_sort on the following sequence My program crashes with > this error > > C:\EUPHORIA\ExInclude\\\\nat_sort.e:171 in function nat_compare() > type_check failure, x2num is 1234567890 > The number simply got too big -- outside integer range. I never considered that. Just change the declarations for "x1num" & "x2num" to atom (from integer) in nat_compare() and it should take care of it. Change them also in nat_compare_str(). Also, a reminder that nat_sort_str() is faster than plain nat_sort() if you know before-hand that all elements to be sorted are character strings (which is the usual case for most apps). -- Andy
2. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 413 views
Derek Parnell wrote: > > > Tony, > changing x2num, x1num to integers doesn't work either because atoms > can't hold very large numbers either. > > Another technique will be required for this problem. > > There are big-number libraries available so you might have to use these > but I suspect a different approach to the natural sort algorithm might > be be faster. Just not sure yet. > How big are these numbers gonna be? nat_sort.e is made for sorting filenames & such, not doing string math. It ought to work for anything within atom range (after redeclaring x1num & x2num as atoms), although it won't work with representations (in the original string) like "1e20". However, it should work fine with strings like "100000000000000000000". The original version just left the strings as strings and padded the shorter one with leading zeros for comparison, but it was much slower, at least the way I did it back then...
3. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 400 views
Derek Parnell wrote: > > > On Mon, 11 Aug 2003 02:07:43 +0100 (08/11/03 11:07:43) > , Pete Lomax <petelomax at blueyonder.co.uk> wrote: > > > > > On Sat, 9 Aug 2003 12:09:03 +0000, Tony Steward > > <tsteward at dodo.com.au> wrote: > > > >> Hello all, > >> When I use nat_sort on the following sequence My program crashes with > >> this error > >> > >> C:\EUPHORIA\ExInclude\\\\nat_sort.e:171 in function nat_compare() > >> type_check failure, x2num is 1234567890 Below is the actual data order > >> contains. > >> > >> order = {"104/RING", > >> "123456789012345", > > ^ there lies the root. > > Change the definition of x1num and x2num to atom, it works fine. > > That's just a quick fix, though: > > Changing it to an atom doesn't always work. Sure it no longer crashes, > but > it doesn't aways sort correctly either. > > For example, "123456789012345" and "123456789012399" sort to the same > location. > They do? Works fine with my system. ? nat_compare("123456789012345","123456789012345") ? nat_compare("123456789012399","123456789012399") ? nat_compare("123456789012399","123456789012345") ? nat_compare("123456789012345","123456789012399") Output: 0 0 1 -1 ? nat_compare_str("123456789012345","123456789012345") ? nat_compare_str("123456789012399","123456789012399") ? nat_compare_str("123456789012399","123456789012345") ? nat_compare_str("123456789012345","123456789012399") 0 0 1 -1 All correct. Sorting of those strings is correct also. Maybe I fixed a bug and forget to send an update? If you have a specific sequence that doesn't seem to be working, let's see it. Here is my complete version of nat_sort.e (after changing x1num & x2num to atoms): --------------------------------------------------------------------- ----------------------------------------------------------------------- -- <Release notes snipped> include sort.e -- compares any two objects using the natural order algorithm global function nat_compare(object x1, object x2) integer x1pt, x2pt, x1len, x2len, number, comp, string atom x1num, x2num object x1char, x2char if sequence(x1) and sequence(x2) then x1len = length(x1) x2len = length(x2) -- if both are strings -- inline string type-checking for speed string = 1 for i = 1 to x1len do x1char = x1[i] if not integer(x1char) or x1char > 255 or (x1char < ' ' and not find(x1char,"\t\n\r")) then -- not a string string = 0 exit end if end for if string then string = 1 for i = 1 to x2len do x2char = x2[i] if not integer(x2char) or x2char > 255 or (x2char < ' ' and not find(x2char,"\t\n\r")) then -- not a string string = 0 exit end if end for end if if not string then -- recurse for i = 1 to x1len do if i > x2len then return 1 end if comp = nat_compare(x1[i],x2[i]) if comp != 0 then return comp end if end for return compare(x1len,x2len) end if -- both are strings x1pt = 1 x2pt = 1 -- number flag number = 0 while x1pt <= x1len do if number then -- build numbers x1num = x1char - '0' while x1pt < x1len do x1pt += 1 x1char = x1[x1pt] if (x1char >= '0' and x1char <= '9') then x1num = (x1num * 10) + (x1char - '0') else x1pt -= 1 exit end if end while x2num = x2char - '0' while x2pt < x2len do x2pt += 1 x2char = x2[x2pt] if (x2char >= '0' and x2char <= '9') then x2num = (x2num * 10) + (x2char - '0') else x2pt -= 1 exit end if end while -- compare as atoms comp = compare(x1num,x2num) if comp != 0 then return comp end if -- still equal, keep going number = 0 x1pt += 1 x2pt += 1 else if x2pt > x2len then -- equal up to point x2 stops -- x1 is longer, therefore greater return 1 end if x1char = x1[x1pt] x2char = x2[x2pt] if (x1char >= '0' and x1char <= '9') and (x2char >= '0' and x2char <= '9') then -- first digit of number found, set flag number = 1 else comp = compare(x1char,x2char) if comp != 0 then return comp end if x1pt += 1 x2pt += 1 end if end if end while -- equal up to point x1 stops, longer (if either) is greater return compare(x1len,x2len) else -- one or both top-level elements are atoms -- compare normally return compare(x1,x2) end if end function -- compares two character strings using natural ordering -- (eliminates string type checking, recursion, etc. = faster) global function nat_compare_str(sequence x1, sequence x2) integer x1pt, x2pt, x1len, x2len, number, comp atom x1num, x2num object x1char, x2char x1len = length(x1) x2len = length(x2) x1pt = 1 x2pt = 1 -- number flag number = 0 while x1pt <= x1len do if number then -- build numbers x1num = x1char - '0' while x1pt < x1len do x1pt += 1 x1char = x1[x1pt] if (x1char >= '0' and x1char <= '9') then x1num = (x1num * 10) + (x1char - '0') else x1pt -= 1 exit end if end while x2num = x2char - '0' while x2pt < x2len do x2pt += 1 x2char = x2[x2pt] if (x2char >= '0' and x2char <= '9') then x2num = (x2num * 10) + (x2char - '0') else x2pt -= 1 exit end if end while -- compare as atoms comp = compare(x1num,x2num) if comp != 0 then return comp end if -- still equal, keep going number = 0 x1pt += 1 x2pt += 1 else if x2pt > x2len then -- equal up to point x2 stops -- x1 is longer, therefore greater return 1 end if x1char = x1[x1pt] x2char = x2[x2pt] if (x1char >= '0' and x1char <= '9') and (x2char >= '0' and x2char <= '9') then -- first digit of number found, set flag number = 1 else comp = compare(x1char,x2char) if comp != 0 then return comp end if x1pt += 1 x2pt += 1 end if end if end while -- equal up to point x1 stops, longer (if either) is greater return compare(x1len,x2len) end function -- for sorting any sequence of any depth global function nat_sort(sequence s) return custom_sort(routine_id("nat_compare"),s) end function -- for sorting sequences where all elements are character strings global function nat_sort_str(sequence s) return custom_sort(routine_id("nat_compare_str"),s) end function ----------------------------------------------------------------------
4. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 393 views
> > You are correct Andy. I made a mistake when I did a quick test. I > displayed the results incorrectly without looking at it very closely. > > Sorry about that. > While we are talking about nat_sort, I offer the following very slight optimization: Declare a constant at the top of the file: constant WHITESPACE = "\t\n\r" and then replace the two instances of "\t\n\r" in nat_compare() with WHITESPACE. I believe Rob once mentioned the fact the such inline use of sequences is not free -- they have to be recreated each time -- so if that is true that ought to speed up the string-checking by a very tiny amount. (String-checking is mainly why nat_compare() is slower than nat_compare_str().) -- Andy
5. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 427 views
Robert Craig wrote: > > > Andy Serpa wrote: > > While we are talking about nat_sort, I offer the following very slight > > optimization: > > > > Declare a constant at the top of the file: > > > > constant WHITESPACE = "\t\n\r" > > > > and then replace the two instances of "\t\n\r" in nat_compare() with > > WHITESPACE. I believe Rob once mentioned the fact the such inline use > > of sequences is not free -- they have to be recreated each time ... > > That doesn't apply to strings "...", just more general > sequences that you construct with {....} > > String sequences are created only once, > when the program is parsed (loaded into memory). > After that, a pointer refers to the sequence. > > Of course from a readability/maintainability point of view > using a named constant is better than repeating the same > value in many places. You might also save memory and > improve the hit rate of the data cache, > but now we're getting into really trivial issues. > Aha... is is actually the string designation (use of quotes) that makes the difference, or just any "byte-sized" sequence of integers? When you first mentioned it, it was to do with peek(), using {addr,len} where you've got a 32-bit address and an integer together...
6. RE: Nat_sort problems
- Posted by Andy Serpa <ac at onehorseshy.com> Aug 11, 2003
- 387 views
Robert Craig wrote: > > > Andy Serpa wrote: > > Aha... is is actually the string designation (use of quotes) that makes > > the difference, or just any "byte-sized" sequence of integers? When you > > > > first mentioned it, it was to do with peek(), using {addr,len} where > > you've got a 32-bit address and an integer together... > > At the moment, I save a single copy of each string sequence "..." > at parse time, but I never save a copy of {...} expressions > - they are recreated each time. Pete Lomax was right. > I could do the same for {...} expressions where each element > is a constant value, but I don't. I may do it in the future. > Not complaining... just like to know all the tricks...
7. RE: Nat_sort problems
- Posted by Tony Steward <tsteward at dodo.com.au> Aug 12, 2003
- 387 views
Hello all, Thanks for all the input. I had changed x2num and x1num to atoms before posting my problem, only I didn't realize they were re-declared in a second function. I have now made them atoms in both functions and all is well, thankyou. I never expected one of my customers to enter data in this way, and it will probably never happen again (I hope). So problem solved Thanks Tony Steward > -----Original Message----- > From: Andy Serpa [mailto:ac at onehorseshy.com] > Sent: Monday, 11 August 2003 5:25 PM > To: EUforum > Subject: RE: Nat_sort problems > > > Derek Parnell wrote: > > > > > > Tony, > > changing x2num, x1num to integers doesn't work either because atoms > > can't hold very large numbers either. > > > > Another technique will be required for this problem. > > > > There are big-number libraries available so you might have to use these > > but I suspect a different approach to the natural sort algorithm might > > be be faster. Just not sure yet. > > > > How big are these numbers gonna be? nat_sort.e is made for sorting > filenames & such, not doing string math. It ought to work for anything > within atom range (after redeclaring x1num & x2num as atoms), although > it won't work with representations (in the original string) like "1e20". > However, it should work fine with strings like "100000000000000000000". > The original version just left the strings as strings and padded the > shorter one with leading zeros for comparison, but it was much slower, > at least the way I did it back then... > > --^---------------------------------------------------------------- > This email was sent to: tsteward at dodo.com.au > > > TOPICA - Start your own email discussion group. FREE! >