1. RE: Sequence Slicing (Was RE: Tough ciphers?)

Coders often have to slice off the head of a sequence after find()ing some
specific item, so that the next find() is able to locate the next occurrence
of the item. Life would be SO MUCH easier if there was a findfrom() routine
in which we could specify the starting index. This would eliminate a whole
class of slicing operations.

  eg.

    s = "a,i,ab,am,an,as,at,abe,..."
    t = ','
    x = findfrom(1, t, s )
    while x != 0 do
	...
      x = findfrom(x + 1, t, s)
    end while

------
Derek.

-----Original Message-----
From: Robert Craig [mailto:rds at RapidEuphoria.com]
Sent: Friday, 15 March 2002 4:47
To: EUforum
Subject: Re: Sequence Slicing (Was RE: Tough ciphers?)



C.K. Lester writes:
> Anybody got any tips, tricks, or hints about faster slicing of
> sequences?
>
> They are of the pattern
>
>    x = x[a+1..length(x)]
>
> so, I'm assigning a slice of x back to x.
> Should I assign it to another variable instead?

In some cases it will be faster if you assign it to itself.
Given the right conditions (single reference count, fairly large
slice), it will not copy anything. It will simply adjust the
start and end pointers of the sequence.

> What can I do to make a significant difference, if anything?

Obviously you are doing this slice statement
many times inside some kind of loop. You'd have to ask
yourself if that's really necessary. I don't think we can
help you without knowing how this fits into your overall
algorithm.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » topic index » view message » categorize

2. RE: Sequence Slicing (Was RE: Tough ciphers?)

Derek Parnell wrote:
> Life would be SO MUCH easier if there was a findfrom() routine
> in which we could specify the starting index. This would
> eliminate a whole class of slicing operations.

Derek, doesn't the following do what you want?

     s = "a,i,ab,am,an,as,at,abe,..."
     t = ','
     x = find(t, s)
     lastx = x
     while x != 0 do
-- 	...
       x = find(t, s[lastx..length(s)])
       lastx += x
     end while

Something like that...

I actually tried this... finding on a slice of the sequence... but it 
didn't seem to work any faster. :(

new topic     » goto parent     » topic index » view message » categorize

3. RE: Sequence Slicing (Was RE: Tough ciphers?)

The problem that Derek is pointing out, is that you still have to slice 
your sequence to do consecutive finds.

Chris

C. K. Lester wrote:
> 
> Derek Parnell wrote:
> > Life would be SO MUCH easier if there was a findfrom() routine
> > in which we could specify the starting index. This would
> > eliminate a whole class of slicing operations.
> 
> Derek, doesn't the following do what you want?
> 
>      s = "a,i,ab,am,an,as,at,abe,..."
>      t = ','
>      x = find(t, s)
>      lastx = x
>      while x != 0 do
> -- 	...
>        x = find(t, s[lastx..length(s)])
>        lastx += x
>      end while
> 
> Something like that...
> 
> I actually tried this... finding on a slice of the sequence... but it 
> didn't seem to work any faster. :(
> 
>

new topic     » goto parent     » topic index » view message » categorize

4. RE: Sequence Slicing (Was RE: Tough ciphers?)

bensler at mail.com wrote:
> The problem that Derek is pointing out, is that you still
> have to slice your sequence to do consecutive finds.

I guess what he's suggesting is "give us an optimized find on a slice," 
right?

Rob, why not?! :)

new topic     » goto parent     » topic index » view message » categorize

5. RE: Sequence Slicing (Was RE: Tough ciphers?)

On 15 Mar 2002, at 19:21, C. K. Lester wrote:

> 
> bensler at mail.com wrote:
> > The problem that Derek is pointing out, is that you still
> > have to slice your sequence to do consecutive finds.
> 
> I guess what he's suggesting is "give us an optimized find on a slice," 
> right?
> 
> Rob, why not?! :)

Yeas, mirc has this!:
$pos(text,string,N)

 blink

Kat

new topic     » goto parent     » topic index » view message » categorize

6. RE: Sequence Slicing (Was RE: Tough ciphers?)

This is fast and works on any slice of a sequence. Sanity checks could 
be added.

function range_find(object x, sequence s, integer pos1, integer pos2)
 atom found
   found = 0
   while not found and pos1 <= pos2 do
      found = equal(x,s[pos1]) * pos1
      pos1 +=1
   end while
   return found
end function

This is the test that I used. I wasn't concerned with accuracy, because 
the routine beats find() by a long shot.

<TEST>
include file.e

integer fn,fl
fn = open("words.txt","rb")
fl = seek(fn,-1)
fl = where(fn)
if seek(fn,0) then end if

sequence temp     temp = repeat(0,fl)

for i = 1 to fl do
   temp[i] = getc(fn)
end for

close(fn)

function range_find(object x, sequence s, integer pos1, integer pos2)
 atom found
   found = 0
   while not found and pos1 <= pos2 do
      found = equal(x,s[pos1]) * pos1
      pos1 +=1
   end while
   return found
end function

atom t   t=time()
integer count  count = 0
integer f,found
f=0
found=find('\n',temp)
while found != 0 do
   count +=1
   f = found+1
   found = range_find('\n',temp,f,length(temp))
end while
? f
? count
? time()-t
while get_key()=-1 do end while
<END TEST>

The control test used:
found = find('\n',temp[f..length(temp)]

instead of:
found = range_find('\n',temp,f,length(temp))


I didn't try with shorter sequences.

Chris

C. K. Lester wrote:
> bensler at mail.com wrote:
> > The problem that Derek is pointing out, is that you still
> > have to slice your sequence to do consecutive finds.
> 
> I guess what he's suggesting is "give us an optimized find on a slice," 
> right?
> 
> Rob, why not?! :)
> 
>

new topic     » goto parent     » topic index » view message » categorize

7. RE: Sequence Slicing (Was RE: Tough ciphers?)

???

Am I reading this right? Are you saying that this routine
will actually find elements in a sequence faster than
taking a slice of the sequence and using find() to search
the slice?

Rod Jackson


bensler at mail.com wrote:
> This is fast and works on any slice of a sequence. Sanity checks could 
> be added.
> 
> function range_find(object x, sequence s, integer pos1, integer pos2)
>  atom found
>    found = 0
>    while not found and pos1 <= pos2 do
>       found = equal(x,s[pos1]) * pos1
>       pos1 +=1
>    end while
>    return found
> end function
> 
> This is the test that I used. I wasn't concerned with accuracy, because 
> the routine beats find() by a long shot.
> 
> <TEST>
<snip>
> <END TEST>
> 
> The control test used:
> found = find('\n',temp[f..length(temp)]
> 
> instead of:
> found = range_find('\n',temp,f,length(temp))

new topic     » goto parent     » topic index » view message » categorize

8. RE: Sequence Slicing (Was RE: Tough ciphers?)

If I did my test correctly, and you're repetitively searching slices of 
a sequence, yes.

Because of your question, I did a bit of further testing. I've 
discovered, the turning point between the benefit of find() vs 
range_find() is somewhere between 200 to 300 elements.

At 300 elements, iterated 1000000/fl times, range_find() beats find() 
with 2.09 vs 2.58 consistently. The higher the number of elements, the 
larger the difference.

I haven't tested with nested sequences, or sequences with varied data.

Here are both complete tests:
<CONTROL TEST>
constant fl=300
integer fn
fn = open("words.txt","rb")

sequence temp     temp = repeat(0,fl)

for i = 1 to fl do
   temp[i] = getc(fn)
end for

close(fn)

atom t   t=time()
integer count  count = 0
integer f,found
for i = 1 to 10000000/fl do
f=1
found=find('\n',temp)
while found != 0 do
   count +=1
   f +=found
   found = find('\n',temp[f..length(temp)])
end while
end for

? f
? count
? time()-t
while get_key()=-1 do end while
<END CONTROL TEST>

<range_find() TEST>
constant fl = 300
integer fn
fn = open("words.txt","rb")

sequence temp     temp = repeat(0,fl)

for i = 1 to fl do
   temp[i] = getc(fn)
end for

close(fn)

function range_find(object x, sequence s, integer pos1, integer pos2)
 atom found
   found = 0
   while not found and pos1 <= pos2 do
      found = equal(x,s[pos1]) * pos1
      pos1 +=1
   end while
   return found
end function

atom t   t=time()
integer count  count = 0
integer f,found
for i =1 to 10000000/fl do
f=0
found=find('\n',temp)
while found != 0 do
   count +=1
   f =found+1
   found = range_find('\n',temp,f,length(temp))
end while
end for

? f
? count
? time()-t
while get_key()=-1 do end while
<END range_find() TEST>

Chris

Rod Jackson wrote:
> ???
> 
> Am I reading this right? Are you saying that this routine
> will actually find elements in a sequence faster than
> taking a slice of the sequence and using find() to search
> the slice?
> 
> Rod Jackson
> 
> 
> bensler at mail.com wrote:
> > This is fast and works on any slice of a sequence. Sanity checks could 
> > be added.
> > 
> > function range_find(object x, sequence s, integer pos1, integer pos2)
> >  atom found
> >    found = 0
> >    while not found and pos1 <= pos2 do
> >       found = equal(x,s[pos1]) * pos1
> >       pos1 +=1
> >    end while
> >    return found
> > end function
> > 
> > This is the test that I used. I wasn't concerned with accuracy, because 
> > the routine beats find() by a long shot.
> > 
> > <TEST>
> <snip>
> > <END TEST>
> > 
> > The control test used:
> > found = find('\n',temp[f..length(temp)]
> > 
> > instead of:
> > found = range_find('\n',temp,f,length(temp))
> 
>

new topic     » goto parent     » topic index » view message » categorize

9. RE: Sequence Slicing (Was RE: Tough ciphers?)

I should point out, that using sequences of length < 200, range_find() 
is drastically slower. Using 200 elements or less, range_find() clocks 
in at 2 seconds. It doesn't seem to drop less than 2 secs.
fastest time I got for find() was 0.68 using 10 elements.

Chris

bensler at mail.com wrote:
> If I did my test correctly, and you're repetitively searching slices of 
> a sequence, yes.
> 
> Because of your question, I did a bit of further testing. I've 
> discovered, the turning point between the benefit of find() vs 
> range_find() is somewhere between 200 to 300 elements.
> 
> At 300 elements, iterated 1000000/fl times, range_find() beats find() 
> with 2.09 vs 2.58 consistently. The higher the number of elements, the 
> larger the difference.
> 
> I haven't tested with nested sequences, or sequences with varied data.
> 
> Here are both complete tests:
> <CONTROL TEST>
> constant fl=300
> integer fn
> fn = open("words.txt","rb")
> 
> sequence temp     temp = repeat(0,fl)
> 
> for i = 1 to fl do
>    temp[i] = getc(fn)
> end for
> 
> close(fn)
> 
> atom t   t=time()
> integer count  count = 0
> integer f,found
> for i = 1 to 10000000/fl do
> f=1
> found=find('\n',temp)
> while found != 0 do
>    count +=1
>    f +=found
>    found = find('\n',temp[f..length(temp)])
> end while
> end for
> 
> ? f
> ? count
> ? time()-t
> while get_key()=-1 do end while
> <END CONTROL TEST>
> 
> <range_find() TEST>
> constant fl = 300
> integer fn
> fn = open("words.txt","rb")
> 
> sequence temp     temp = repeat(0,fl)
> 
> for i = 1 to fl do
>    temp[i] = getc(fn)
> end for
> 
> close(fn)
> 
> function range_find(object x, sequence s, integer pos1, integer pos2)
>  atom found
>    found = 0
>    while not found and pos1 <= pos2 do
>       found = equal(x,s[pos1]) * pos1
>       pos1 +=1
>    end while
>    return found
> end function
> 
> atom t   t=time()
> integer count  count = 0
> integer f,found
> for i =1 to 10000000/fl do
> f=0
> found=find('\n',temp)
> while found != 0 do
>    count +=1
>    f =found+1
>    found = range_find('\n',temp,f,length(temp))
> end while
> end for
> 
> ? f
> ? count
> ? time()-t
> while get_key()=-1 do end while
> <END range_find() TEST>
> 
> Chris
> 
> Rod Jackson wrote:
> > ???
> > 
> > Am I reading this right? Are you saying that this routine
> > will actually find elements in a sequence faster than
> > taking a slice of the sequence and using find() to search
> > the slice?
> > 
> > Rod Jackson
> > 
<snip>

new topic     » goto parent     » topic index » view message » categorize

10. RE: Sequence Slicing (Was RE: Tough ciphers?)

I just made range_find() even faster.
They even out at 100 elements now, clocking at 1.27 secs. More than 100 
elements, and range_find() has the benefit.

Here it is:

function range_find(object x, sequence s, integer pos1, integer pos2)
 integer pos
   pos = pos1
   while pos <= pos2 do
      if equal(x,s[pos]) then exit end if
      pos +=1
   end while
   if pos > pos2 then return 0
   elsif pos = pos1 then return 0
   else return pos
   end if
end function

Chris

bensler at mail.com wrote:
> I should point out, that using sequences of length < 200, range_find() 
> is drastically slower. Using 200 elements or less, range_find() clocks 
> in at 2 seconds. It doesn't seem to drop less than 2 secs.
> fastest time I got for find() was 0.68 using 10 elements.
> 
> Chris
> 
> bensler at mail.com wrote:
> > If I did my test correctly, and you're repetitively searching slices of 
> > a sequence, yes.
> > 
> > Because of your question, I did a bit of further testing. I've 
> > discovered, the turning point between the benefit of find() vs 
> > range_find() is somewhere between 200 to 300 elements.
> > 
> > At 300 elements, iterated 1000000/fl times, range_find() beats find() 
> > with 2.09 vs 2.58 consistently. The higher the number of elements, the 
> > larger the difference.
> > 
> > I haven't tested with nested sequences, or sequences with varied data.
> > 
> > Here are both complete tests:
> > <CONTROL TEST>
> > constant fl=300
> > integer fn
> > fn = open("words.txt","rb")
> > 
> > sequence temp     temp = repeat(0,fl)
> > 
> > for i = 1 to fl do
> >    temp[i] = getc(fn)
> > end for
> > 
> > close(fn)
> > 
> > atom t   t=time()
> > integer count  count = 0
> > integer f,found
> > for i = 1 to 10000000/fl do
> > f=1
> > found=find('\n',temp)
> > while found != 0 do
> >    count +=1
> >    f +=found
> >    found = find('\n',temp[f..length(temp)])
> > end while
> > end for
> > 
> > ? f
> > ? count
> > ? time()-t
> > while get_key()=-1 do end while
> > <END CONTROL TEST>
> > 
> > <range_find() TEST>
> > constant fl = 300
> > integer fn
> > fn = open("words.txt","rb")
> > 
> > sequence temp     temp = repeat(0,fl)
> > 
> > for i = 1 to fl do
> >    temp[i] = getc(fn)
> > end for
> > 
> > close(fn)
> > 
> > function range_find(object x, sequence s, integer pos1, integer pos2)
> >  atom found
> >    found = 0
> >    while not found and pos1 <= pos2 do
> >       found = equal(x,s[pos1]) * pos1
> >       pos1 +=1
> >    end while
> >    return found
> > end function
> > 
> > atom t   t=time()
> > integer count  count = 0
> > integer f,found
> > for i =1 to 10000000/fl do
> > f=0
> > found=find('\n',temp)
> > while found != 0 do
> >    count +=1
> >    f =found+1
> >    found = range_find('\n',temp,f,length(temp))
> > end while
> > end for
> > 
> > ? f
> > ? count
> > ? time()-t
> > while get_key()=-1 do end while
> > <END range_find() TEST>
> > 
> > Chris
> > 
> > Rod Jackson wrote:
> > > ???
<snip>

new topic     » goto parent     » topic index » view message » categorize

11. RE: Sequence Slicing (Was RE: Tough ciphers?)

On 16 Mar 2002, at 0:19, bensler at mail.com wrote:

> 
> I should point out, that using sequences of length < 200, range_find() 
> is drastically slower. Using 200 elements or less, range_find() clocks 
> in at 2 seconds. It doesn't seem to drop less than 2 secs.
> fastest time I got for find() was 0.68 using 10 elements.

In the spirit of a smart interpreter, how about if Eu switched algorithms in 
find() at the optimum sequence length?

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu