1. FastFind() for sorted sequences

Hey folks,

In working out an indexing routine for a new Eu database I'm working
on, I wrote a routine that you might find useful and thought I'd
post it here.

It's called FastFind(), and is the equivalent of Find(), but only
works on sorted sequences.  However, it works _very_ _very_ fast.

For example, on my PC, finding 50,000 randomly selected values in a
sequences of 50,000 integers took Find() 10 seconds, but only took
FastFind() 0.6 seconds.  I've pasted in the code with the routine as
well as the benchmarking program.  Hope you guys find it useful!

Jonathan Leger
---START CODE---
include sort.e

function fastfind(object x, sequence s)
    
    -- Perform a search in a sequence for a given value X.
    -- Returns location of X in S, or 0 otherwise.
    -- S must be sorted.
    
    -- As you will see when you run the benchmarks,
    -- using fastfind() on a sorted sequence is magnitudes
    -- faster than using find().
                
    integer i_half
    integer i_min
    integer i_max

    if length(s) = 0 then
        -- The sequence length is 0, so return
        -- a 0 to have user prepend item.
        return 0    
    end if
    
    if x > s[length(s)] then
        -- The value we're looking for is greater
        -- than the last value, so return 0.
        return 0
    elsif x < s[1] then
        -- The value we're looking for is less
        -- than the first value, so return 0.
        return 0
    end if
    
    i_min = 1
    i_max = length(s)
    
    while 1 do
        -- Find center of current sequence portion.
        i_half = floor((i_min - 1) + ((i_max - (i_min - 1)) / 2))

        if x = s[i_half] then
            -- Found it.  Return index.
            return i_half
        elsif s[i_half] < x then
            -- Adjust min to current half plus one.
            i_min = i_half + 1
        else
            -- Adjust max to current half minus one.
            i_max = i_half
        end if
        
        -- If min and max are equal, then the
        -- value is not in the sequence.  
        -- Return 0.
        if i_min = i_max then            
            return 0
        end if        
    end while
                        
end function



-- Run some comparison benchmarks for fastfind() and find().

integer i_count     -- Number of values to run.
sequence values     -- Some random values.
atom n_start        -- Start time (for benchmark).
atom n_end          -- End time.
atom n_time         -- Seconds elapsed.
object dummy        -- For benchmark.


-- Generate some random values.
puts(1, "Generating values...\n")
values = {}
i_count = 50000
for i = 1 to i_count do
    values = append(values, rand(i_count))
end for

-- Sort the values.
puts(1, "Sorting values...\n")
values = sort(values)

-- Benchmark find().
puts(1, "Benchmarking find()...\n")
n_start = time()
for i = 1 to i_count do
    dummy = find(rand(i_count), values)
end for
n_end = time()
n_time = n_end - n_start

-- Show results.
print(1, i_count)
puts(1, " Items.  Seconds: ")
print(1, n_time)
puts(1, "\n")

-- Benchmark fastfind().
puts(1, "Benchmarking fastfind()...\n")
n_start = time()
for i = 1 to i_count do       
    dummy = fastfind(rand(i_count), values)
end for
n_end = time()
n_time = n_end - n_start

-- Show results.
print(1, i_count)
puts(1, " Items.  Seconds: ")
print(1, n_time)
puts(1, "\n")

new topic     » topic index » view message » categorize

2. Re: FastFind() for sorted sequences

Been there. It's officialy called binary search. And yes it is very fast i
have done some benchmarks myself. You can find it also in database.e.

----- Original Message -----
From: "Jonathan Leger" <jleger at amerisafe.com>
To: "EUforum" <EUforum at topica.com>
Subject: FastFind() for sorted sequences


>
> Hey folks,
>
> In working out an indexing routine for a new Eu database I'm working
> on, I wrote a routine that you might find useful and thought I'd
> post it here.
>
> It's called FastFind(), and is the equivalent of Find(), but only
> works on sorted sequences.  However, it works _very_ _very_ fast.
>
> For example, on my PC, finding 50,000 randomly selected values in a
> sequences of 50,000 integers took Find() 10 seconds, but only took
> FastFind() 0.6 seconds.  I've pasted in the code with the routine as
> well as the benchmarking program.  Hope you guys find it useful!
>
> Jonathan Leger
> ---START CODE---
> include sort.e
>
> function fastfind(object x, sequence s)
>
>     -- Perform a search in a sequence for a given value X.
>     -- Returns location of X in S, or 0 otherwise.
>     -- S must be sorted.
>
>     -- As you will see when you run the benchmarks,
>     -- using fastfind() on a sorted sequence is magnitudes
>     -- faster than using find().
>
>     integer i_half
>     integer i_min
>     integer i_max
>
>     if length(s) = 0 then
>         -- The sequence length is 0, so return
>         -- a 0 to have user prepend item.
>         return 0
>     end if
>
>     if x > s[length(s)] then
>         -- The value we're looking for is greater
>         -- than the last value, so return 0.
>         return 0
>     elsif x < s[1] then
>         -- The value we're looking for is less
>         -- than the first value, so return 0.
>         return 0
>     end if
>
>     i_min = 1
>     i_max = length(s)
>
>     while 1 do
>         -- Find center of current sequence portion.
>         i_half = floor((i_min - 1) + ((i_max - (i_min - 1)) / 2))
>
>         if x = s[i_half] then
>             -- Found it.  Return index.
>             return i_half
>         elsif s[i_half] < x then
>             -- Adjust min to current half plus one.
>             i_min = i_half + 1
>         else
>             -- Adjust max to current half minus one.
>             i_max = i_half
>         end if
>
>         -- If min and max are equal, then the
>         -- value is not in the sequence.
>         -- Return 0.
>         if i_min = i_max then
>             return 0
>         end if
>     end while
>
> end function
>
>
> -- Run some comparison benchmarks for fastfind() and find().
>
> integer i_count     -- Number of values to run.
> sequence values     -- Some random values.
> atom n_start        -- Start time (for benchmark).
> atom n_end          -- End time.
> atom n_time         -- Seconds elapsed.
> object dummy        -- For benchmark.
>
>
> -- Generate some random values.
> puts(1, "Generating values...\n")
> values = {}
> i_count = 50000
> for i = 1 to i_count do
>     values = append(values, rand(i_count))
> end for
>
> -- Sort the values.
> puts(1, "Sorting values...\n")
> values = sort(values)
>
<snip>

>
>
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu