Re: find/match not working

new topic     » goto parent     » topic index » view thread      » older message » newer message

It's of course, very late in the discussion (been out of town), but here
is my contribution to the thread.  I couldn't find Kat's original code
(just her re-code using match), so like everyone else, I can't comment on 
what the problem is with find.  I've never had a problem with it, except for
the slowness when a sequence gets too big.

Here's a routine (with some modificatins) that I use to maintain a 
non-duplicated list.  It's very fast, because it keeps the items sorted,
and uses a binary search to find duplicates.  I had some data that 
contained about 70,000 unique entries over 1,300,000 lines.  By switching
from find() to this, I went from a load time of ~1.5hr to ~1min.  The
routine itself does two things.  It either finds the item in the list, or
adds it to the list in the correct spot.

constant blank_add = repeat( 0, 1024 )
sequence entry_list
integer entry_last
function no_dup( sequence entry )
	integer lo, hi, mid, c, first, last
	lo = 1
	hi = entry_last - 1
	mid = floor( (hi+lo)/2 )

	while lo < hi do
		c = compare( entry, entry_list[mid] )
		if c > 0 then
			if lo = mid then
				lo = hi
			else
				lo = mid
			end if
			
		elsif c < 0 then
			hi = mid
		else
			exit
		end if
		mid = floor( (lo + hi) / 2 )
	end while
	
	if mid then
		c = compare( entry, entry_list[mid] )
	else
		entry_list[1] = entry
		entry_last += 1
		return 1
	end if
	
	if c and entry_last > length( entry_list ) then
		entry_list &= blank_add
	end if
	
	if c then
		if c > 0 then
			-- needs to be added after current
			mid += 1
		end if		
		
		if mid = entry_last then
			entry_list &= blank_add
		end if
		entry_list[mid+1..entry_last+1] = entry_list[mid..entry_last]
		
		entry_list[mid] = entry

		entry_last += 1
	end if

	return mid
end function

entry_list = {}
entry_list &= blank_add
entry_last = 0
? no_dup( "one" )
? no_dup( "two" )
? no_dup( "three" )
? no_dup( "one" )
? no_dup( "two" )
? no_dup( "three" )
? no_dup( "one" )
? no_dup( "two" )
? no_dup( "three" )

include misc.e
pretty_print( 1, entry_list[1..entry_last], {})
include get.e
abort(wait_key())


Matt Lewis

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu