1. Fast "locate" function

Any elegant coder out there have a fast "locate" function, similar to
"find",
but returns the location of a defined byte sequence inside a large byte
sequence..?
As in, something without a "compare" or "match" inside a loop...
"find" seems to be the answer, but it does not work for me. Example:

sequence s1, s2
integer i1
s1 = {#00,#00,#00,#30}
s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
i1 = find(s1,s2)
printf(1,"Marker found at location %d\n",i1)
-- i1 should be 3, but its 0, as in nothing found.

I'm attempting to extract some 27K records from a 6M(mainframe) file, by
looking for s1.
The records are variable length, so my current (slow) solution is a nested
loop containing "compare". The only consistency in my data file is the byte
sequence s1....
Any help would be appreciated.

Alan

new topic     » topic index » view message » categorize

2. Re: Fast "locate" function

This is the best Ive seen in a while....
 
function find_all_2(object test, sequence data)
        integer ix, jx, len
        sequence result
        result = {}
        ix = 1
        len = length(data)
        jx = find( test, data[ix..len] )
        while jx do
                result &= ix+jx-1
                ix += jx
                jx = find( test, data[ix..len] )
        end while
        return result
end function

Thank Matt Lewis for this one.

If this isnt what your looking for or want to gander "look at" some other
interesting twist "code" on this same subject then search prior post on the
mailing list

keyword = find_all

Euman
euman at bellsouth.net


----- Original Message ----- 
From: "Alan Oxley" <fizzpop at icon.co.za>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, December 06, 2001 3:07 AM
Subject: Fast "locate" function


> 
> Any elegant coder out there have a fast "locate" function, similar to
> "find",
> but returns the location of a defined byte sequence inside a large byte
> sequence..?
> As in, something without a "compare" or "match" inside a loop...
> "find" seems to be the answer, but it does not work for me. Example:
> 
> sequence s1, s2
> integer i1
> s1 = {#00,#00,#00,#30}
> s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
> i1 = find(s1,s2)
> printf(1,"Marker found at location %d\n",i1)
> -- i1 should be 3, but its 0, as in nothing found.
> 
> I'm attempting to extract some 27K records from a 6M(mainframe) file, by
> looking for s1.
> The records are variable length, so my current (slow) solution is a nested
> loop containing "compare". The only consistency in my data file is the byte
> sequence s1....
> Any help would be appreciated.
> 
> Alan
> 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: Fast "locate" function

> sequence s1, s2
> integer i1
> s1 = {#00,#00,#00,#30}
> s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
> i1 = find(s1,s2)
> printf(1,"Marker found at location %d\n",i1)
> -- i1 should be 3, but its 0, as in nothing found.

BTW, I would use what I just sent you and test for s1[1] to be #00
if this were true proceed up  s1 and s2 sequence at the same time.

Heres how I use find_all in one of my projects, Im looking for 0 (zero)
if I find 0 I test the next char in sequence to see if its 0 before I proceed
maybe not the fastest or most appropriate method but this hasnt failed me.

The good thing about this routine is that you are only searching using find( )
from the last (test) encountered which should be faster than other methods
on large sequences of data.

function find_all(object test, sequence data)
integer ix, jx, len
sequence result
  result = {}
  ix = 1
  len = length(data)
  jx = find( test, data[ix..len] )
  while jx do
     result &= {data[ix..ix+jx-2]}
     ix += jx
     jx = find( test, data[ix..len])
     if ix < len and data[ix] = 0 then
        ix += 1
        jx = find( test, data[ix..len])
     end if 
  end while
  return result
end function

new topic     » goto parent     » topic index » view message » categorize

4. Re: Fast "locate" function

Howzat Alan,
have you tried the match() function?

  sequence s1, s2
  integer i1
  s1 = {#00,#00,#00,#30}
  s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
  i1 = match(s1,s2)
  printf(1,"Marker found at location %d\n",i1)


----- Original Message -----
From: "Alan Oxley" <fizzpop at icon.co.za>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, December 06, 2001 7:07 PM
Subject: Fast "locate" function


>
> Any elegant coder out there have a fast "locate" function, similar to
> "find",
> but returns the location of a defined byte sequence inside a large byte
> sequence..?
> As in, something without a "compare" or "match" inside a loop...
> "find" seems to be the answer, but it does not work for me. Example:
>
> sequence s1, s2
> integer i1
> s1 = {#00,#00,#00,#30}
> s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
> i1 = find(s1,s2)
> printf(1,"Marker found at location %d\n",i1)
> -- i1 should be 3, but its 0, as in nothing found.
>
> I'm attempting to extract some 27K records from a 6M(mainframe) file, by
> looking for s1.
> The records are variable length, so my current (slow) solution is a nested
> loop containing "compare". The only consistency in my data file is the
byte
> sequence s1....
> Any help would be appreciated.
>
> Alan
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

5. Re: Fast "locate" function

Hi...
ahem... yes, the match does indeed work...
After some RTFM,  I had tried match first, but kept getting type check
errors; so I moved
on to try "equal","compare" etc, all of which involved looping to get a
slice of the large sequence.
My earlier assignments for s1 or s2 during match attempts must have been
wrong, as per the error messages....
I feel real stuuupid about now...

Thanks Derek!
BTW, Derek, noticing your greeting, are you an ex-South African?
Alan

new topic     » goto parent     » topic index » view message » categorize

6. Re: Fast "locate" function

This is a shot in the dark so be carefull I havent tested this....

Let me know if it works...

> sequence s1, s2
> integer i1
> s1 = {#00,#00,#00,#30} -- constant data
> s2 = {#27,#20,#00,#00,#00,#30,#20,#40}
> i1 = find(s1,s2)

function find_all(object test, sequence data)
        integer ix, jx, kx, len
        sequence loc
        loc = repeat(0,length(data))
        ix = 1  kx = 1
        len = length(data)
        jx = find( test, data[ix..len] )
        loc[kx] = jx
        while jx do
                ix += jx
                jx = find( test, data[ix..len] )
                kx += 1
                loc[kx] = jx
                jx += 4
        end while
        ix = find(0,data) 
        loc = loc[1..ix]     
        return loc
end function

sequence locations
locations = find_all(#00, s2)

new topic     » goto parent     » topic index » view message » categorize

7. Re: Fast "locate" function

or, MATCH( )  is good!

either way.

Euman

>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu