Re: find/match not working
- Posted by Matt Lewis <matthewwalkerlewis at yahoo.com> Aug 24, 2004
- 483 views
It's of course, very late in the discussion (been out of town), but here is my contribution to the thread. I couldn't find Kat's original code (just her re-code using match), so like everyone else, I can't comment on what the problem is with find. I've never had a problem with it, except for the slowness when a sequence gets too big. Here's a routine (with some modificatins) that I use to maintain a non-duplicated list. It's very fast, because it keeps the items sorted, and uses a binary search to find duplicates. I had some data that contained about 70,000 unique entries over 1,300,000 lines. By switching from find() to this, I went from a load time of ~1.5hr to ~1min. The routine itself does two things. It either finds the item in the list, or adds it to the list in the correct spot.
constant blank_add = repeat( 0, 1024 ) sequence entry_list integer entry_last function no_dup( sequence entry ) integer lo, hi, mid, c, first, last lo = 1 hi = entry_last - 1 mid = floor( (hi+lo)/2 ) while lo < hi do c = compare( entry, entry_list[mid] ) if c > 0 then if lo = mid then lo = hi else lo = mid end if elsif c < 0 then hi = mid else exit end if mid = floor( (lo + hi) / 2 ) end while if mid then c = compare( entry, entry_list[mid] ) else entry_list[1] = entry entry_last += 1 return 1 end if if c and entry_last > length( entry_list ) then entry_list &= blank_add end if if c then if c > 0 then -- needs to be added after current mid += 1 end if if mid = entry_last then entry_list &= blank_add end if entry_list[mid+1..entry_last+1] = entry_list[mid..entry_last] entry_list[mid] = entry entry_last += 1 end if return mid end function entry_list = {} entry_list &= blank_add entry_last = 0 ? no_dup( "one" ) ? no_dup( "two" ) ? no_dup( "three" ) ? no_dup( "one" ) ? no_dup( "two" ) ? no_dup( "three" ) ? no_dup( "one" ) ? no_dup( "two" ) ? no_dup( "three" ) include misc.e pretty_print( 1, entry_list[1..entry_last], {}) include get.e abort(wait_key())
Matt Lewis