FastFind() for sorted sequences

new topic     » topic index » view thread      » older message » newer message

Hey folks,

In working out an indexing routine for a new Eu database I'm working
on, I wrote a routine that you might find useful and thought I'd
post it here.

It's called FastFind(), and is the equivalent of Find(), but only
works on sorted sequences.  However, it works _very_ _very_ fast.

For example, on my PC, finding 50,000 randomly selected values in a
sequences of 50,000 integers took Find() 10 seconds, but only took
FastFind() 0.6 seconds.  I've pasted in the code with the routine as
well as the benchmarking program.  Hope you guys find it useful!

Jonathan Leger
---START CODE---
include sort.e

function fastfind(object x, sequence s)
    
    -- Perform a search in a sequence for a given value X.
    -- Returns location of X in S, or 0 otherwise.
    -- S must be sorted.
    
    -- As you will see when you run the benchmarks,
    -- using fastfind() on a sorted sequence is magnitudes
    -- faster than using find().
                
    integer i_half
    integer i_min
    integer i_max

    if length(s) = 0 then
        -- The sequence length is 0, so return
        -- a 0 to have user prepend item.
        return 0    
    end if
    
    if x > s[length(s)] then
        -- The value we're looking for is greater
        -- than the last value, so return 0.
        return 0
    elsif x < s[1] then
        -- The value we're looking for is less
        -- than the first value, so return 0.
        return 0
    end if
    
    i_min = 1
    i_max = length(s)
    
    while 1 do
        -- Find center of current sequence portion.
        i_half = floor((i_min - 1) + ((i_max - (i_min - 1)) / 2))

        if x = s[i_half] then
            -- Found it.  Return index.
            return i_half
        elsif s[i_half] < x then
            -- Adjust min to current half plus one.
            i_min = i_half + 1
        else
            -- Adjust max to current half minus one.
            i_max = i_half
        end if
        
        -- If min and max are equal, then the
        -- value is not in the sequence.  
        -- Return 0.
        if i_min = i_max then            
            return 0
        end if        
    end while
                        
end function



-- Run some comparison benchmarks for fastfind() and find().

integer i_count     -- Number of values to run.
sequence values     -- Some random values.
atom n_start        -- Start time (for benchmark).
atom n_end          -- End time.
atom n_time         -- Seconds elapsed.
object dummy        -- For benchmark.


-- Generate some random values.
puts(1, "Generating values...\n")
values = {}
i_count = 50000
for i = 1 to i_count do
    values = append(values, rand(i_count))
end for

-- Sort the values.
puts(1, "Sorting values...\n")
values = sort(values)

-- Benchmark find().
puts(1, "Benchmarking find()...\n")
n_start = time()
for i = 1 to i_count do
    dummy = find(rand(i_count), values)
end for
n_end = time()
n_time = n_end - n_start

-- Show results.
print(1, i_count)
puts(1, " Items.  Seconds: ")
print(1, n_time)
puts(1, "\n")

-- Benchmark fastfind().
puts(1, "Benchmarking fastfind()...\n")
n_start = time()
for i = 1 to i_count do       
    dummy = fastfind(rand(i_count), values)
end for
n_end = time()
n_time = n_end - n_start

-- Show results.
print(1, i_count)
puts(1, " Items.  Seconds: ")
print(1, n_time)
puts(1, "\n")

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu