1. FastFind() for sorted sequences
- Posted by Jonathan Leger <jleger at amerisafe.com> Jan 15, 2002
- 404 views
Hey folks, In working out an indexing routine for a new Eu database I'm working on, I wrote a routine that you might find useful and thought I'd post it here. It's called FastFind(), and is the equivalent of Find(), but only works on sorted sequences. However, it works _very_ _very_ fast. For example, on my PC, finding 50,000 randomly selected values in a sequences of 50,000 integers took Find() 10 seconds, but only took FastFind() 0.6 seconds. I've pasted in the code with the routine as well as the benchmarking program. Hope you guys find it useful! Jonathan Leger ---START CODE--- include sort.e function fastfind(object x, sequence s) -- Perform a search in a sequence for a given value X. -- Returns location of X in S, or 0 otherwise. -- S must be sorted. -- As you will see when you run the benchmarks, -- using fastfind() on a sorted sequence is magnitudes -- faster than using find(). integer i_half integer i_min integer i_max if length(s) = 0 then -- The sequence length is 0, so return -- a 0 to have user prepend item. return 0 end if if x > s[length(s)] then -- The value we're looking for is greater -- than the last value, so return 0. return 0 elsif x < s[1] then -- The value we're looking for is less -- than the first value, so return 0. return 0 end if i_min = 1 i_max = length(s) while 1 do -- Find center of current sequence portion. i_half = floor((i_min - 1) + ((i_max - (i_min - 1)) / 2)) if x = s[i_half] then -- Found it. Return index. return i_half elsif s[i_half] < x then -- Adjust min to current half plus one. i_min = i_half + 1 else -- Adjust max to current half minus one. i_max = i_half end if -- If min and max are equal, then the -- value is not in the sequence. -- Return 0. if i_min = i_max then return 0 end if end while end function -- Run some comparison benchmarks for fastfind() and find(). integer i_count -- Number of values to run. sequence values -- Some random values. atom n_start -- Start time (for benchmark). atom n_end -- End time. atom n_time -- Seconds elapsed. object dummy -- For benchmark. -- Generate some random values. puts(1, "Generating values...\n") values = {} i_count = 50000 for i = 1 to i_count do values = append(values, rand(i_count)) end for -- Sort the values. puts(1, "Sorting values...\n") values = sort(values) -- Benchmark find(). puts(1, "Benchmarking find()...\n") n_start = time() for i = 1 to i_count do dummy = find(rand(i_count), values) end for n_end = time() n_time = n_end - n_start -- Show results. print(1, i_count) puts(1, " Items. Seconds: ") print(1, n_time) puts(1, "\n") -- Benchmark fastfind(). puts(1, "Benchmarking fastfind()...\n") n_start = time() for i = 1 to i_count do dummy = fastfind(rand(i_count), values) end for n_end = time() n_time = n_end - n_start -- Show results. print(1, i_count) puts(1, " Items. Seconds: ") print(1, n_time) puts(1, "\n")
2. Re: FastFind() for sorted sequences
- Posted by tone.skoda at siol.net Jan 16, 2002
- 445 views
Been there. It's officialy called binary search. And yes it is very fast i have done some benchmarks myself. You can find it also in database.e. ----- Original Message ----- From: "Jonathan Leger" <jleger at amerisafe.com> To: "EUforum" <EUforum at topica.com> Subject: FastFind() for sorted sequences > > Hey folks, > > In working out an indexing routine for a new Eu database I'm working > on, I wrote a routine that you might find useful and thought I'd > post it here. > > It's called FastFind(), and is the equivalent of Find(), but only > works on sorted sequences. However, it works _very_ _very_ fast. > > For example, on my PC, finding 50,000 randomly selected values in a > sequences of 50,000 integers took Find() 10 seconds, but only took > FastFind() 0.6 seconds. I've pasted in the code with the routine as > well as the benchmarking program. Hope you guys find it useful! > > Jonathan Leger > ---START CODE--- > include sort.e > > function fastfind(object x, sequence s) > > -- Perform a search in a sequence for a given value X. > -- Returns location of X in S, or 0 otherwise. > -- S must be sorted. > > -- As you will see when you run the benchmarks, > -- using fastfind() on a sorted sequence is magnitudes > -- faster than using find(). > > integer i_half > integer i_min > integer i_max > > if length(s) = 0 then > -- The sequence length is 0, so return > -- a 0 to have user prepend item. > return 0 > end if > > if x > s[length(s)] then > -- The value we're looking for is greater > -- than the last value, so return 0. > return 0 > elsif x < s[1] then > -- The value we're looking for is less > -- than the first value, so return 0. > return 0 > end if > > i_min = 1 > i_max = length(s) > > while 1 do > -- Find center of current sequence portion. > i_half = floor((i_min - 1) + ((i_max - (i_min - 1)) / 2)) > > if x = s[i_half] then > -- Found it. Return index. > return i_half > elsif s[i_half] < x then > -- Adjust min to current half plus one. > i_min = i_half + 1 > else > -- Adjust max to current half minus one. > i_max = i_half > end if > > -- If min and max are equal, then the > -- value is not in the sequence. > -- Return 0. > if i_min = i_max then > return 0 > end if > end while > > end function > > > -- Run some comparison benchmarks for fastfind() and find(). > > integer i_count -- Number of values to run. > sequence values -- Some random values. > atom n_start -- Start time (for benchmark). > atom n_end -- End time. > atom n_time -- Seconds elapsed. > object dummy -- For benchmark. > > > -- Generate some random values. > puts(1, "Generating values...\n") > values = {} > i_count = 50000 > for i = 1 to i_count do > values = append(values, rand(i_count)) > end for > > -- Sort the values. > puts(1, "Sorting values...\n") > values = sort(values) > <snip> > > >