1. find if any memers of a set are in another set (blank line finder)?

I want to be able to discern whether a sequence (a line of text) is "empty"
(ie, has *only* spaces or tabs or CR or any combination of those), or if it
has *any* alpha/num/punctuation content at all.  In other words,a "blank"
line finder.  And it needs to function as quickly as possible.

I thought there might be a spiffy way similar to how:

w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}

and then I could make a sequence of the numbers 33-126 (for all the
al/num/punc), and test any line against that sequence with an "or" in place
of the "="; but as a test,

w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I don't understand.
(I'm thinking it means that neither 3 nor 4 are zero.)

So, is there some way to find if any member of a given set is found in
another set, or some different, good (fast) way to find "blank" lines?

Dan Moyer

new topic     » topic index » view message » categorize

2. Re: find if any memers of a set are in another set (blank line finder)?

I haven't tested this yet against text read in from a file, but it seems
like it *might* work, does anyone have any thing better/faster?  When I put
a tab in sequence "a" below, it didn't like that, but I think a tab in a
file might be different?

Dan Moyer

<code begins>
sequence a, b
a = "      "
b = "this is a line of text!"

function IsBlankLine(sequence aLine)
  sequence w,x
  w = aLine
  x = w
  w = w > 32  -- catches any characters *less* than 33
  x = w < 126 -- catches any characters more than 126
  w = w and x

  if find(1,w) then
     return 0 -- no, is *not* blank line
  else
     return 1 -- yes, is blank line
  end if
end function

if IsBlankLine(a) = 1 then
    puts(1, "yes, \"" & a & "\" is a blank line")
else
   puts(1, "no, \"" & a & "\" is  not a blank line")
end if
puts(1, "\n")
if IsBlankLine(b) = 1 then
    puts(1, "yes, \"" & b & "\" is a blank line")
else
   puts(1, "no, \"" & b & "\" is  not a blank line")
end if

<code ends>


----- Original Message -----
From: "Dan Moyer" <DANIELMOYER at prodigy.net>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, July 18, 2002 5:19 AM
Subject: find if any memers of a set are in another set (blank line finder)?


>
> I want to be able to discern whether a sequence (a line of text) is
"empty"
> (ie, has *only* spaces or tabs or CR or any combination of those), or if
it
> has *any* alpha/num/punctuation content at all.  In other words,a "blank"
> line finder.  And it needs to function as quickly as possible.
>
> I thought there might be a spiffy way similar to how:
>
> w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}
>
> and then I could make a sequence of the numbers 33-126 (for all the
> al/num/punc), and test any line against that sequence with an "or" in
place
> of the "="; but as a test,
>
> w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I don't
understand.
> (I'm thinking it means that neither 3 nor 4 are zero.)
>
> So, is there some way to find if any member of a given set is found in
> another set, or some different, good (fast) way to find "blank" lines?
>
> Dan Moyer
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: find if any memers of a set are in another set (blank line finder)?

On 18 Jul 2002, at 5:59, Dan Moyer wrote:

> 
> I haven't tested this yet against text read in from a file, but it seems
> like it *might* work, does anyone have any thing better/faster?  When I put a
> tab in sequence "a" below, it didn't like that, but I think a tab in a file
> might be different?

A tab is ascii 9, doesn't matter where you get it from.

This is a lil shorter than what you have below:

include strtok-v2.e

line = "                 " -- or whatever
junk = parse(line,32) -- 32 is ascii for blank, or ' ' or " "
if length(junk) = 0 
 then -- it's blank
 else -- it's not blank
end if

Here is a way to see if there is any member of a set in a string:

punctset = "-_^`'<>][?*\\/}{+&^ at !~|;:,."
line = "a b at Cat.d"

junk = parse(line,punctset)
if length(junk) > 1  
  then -- length is how many of punctset in line
  else -- none of punctset is in line
end if

Kat

> Dan Moyer
> 
> <code begins>
> sequence a, b
> a = "      "
> b = "this is a line of text!"
> 
> function IsBlankLine(sequence aLine)
>   sequence w,x
>   w = aLine
>   x = w
>   w = w > 32  -- catches any characters *less* than 33
>   x = w < 126 -- catches any characters more than 126
>   w = w and x
> 
>   if find(1,w) then
>      return 0 -- no, is *not* blank line
>   else
>      return 1 -- yes, is blank line
>   end if
> end function
> 
> if IsBlankLine(a) = 1 then
>     puts(1, "yes, \"" & a & "\" is a blank line")
> else
>    puts(1, "no, \"" & a & "\" is  not a blank line")
> end if
> puts(1, "\n")
> if IsBlankLine(b) = 1 then
>     puts(1, "yes, \"" & b & "\" is a blank line")
> else
>    puts(1, "no, \"" & b & "\" is  not a blank line")
> end if
> 
> <code ends>
> 
> 
> ----- Original Message -----
> From: "Dan Moyer" <DANIELMOYER at prodigy.net>
> To: "EUforum" <EUforum at topica.com>
> Sent: Thursday, July 18, 2002 5:19 AM
> Subject: find if any memers of a set are in another set (blank line finder)?
> 
> 
> > I want to be able to discern whether a sequence (a line of text) is
> "empty"
> > (ie, has *only* spaces or tabs or CR or any combination of those), or if
> it
> > has *any* alpha/num/punctuation content at all.  In other words,a "blank"
> > line
> > finder.  And it needs to function as quickly as possible.
> >
> > I thought there might be a spiffy way similar to how:
> >
> > w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}
> >
> > and then I could make a sequence of the numbers 33-126 (for all the
> > al/num/punc), and test any line against that sequence with an "or" in
> place
> > of the "="; but as a test,
> >
> > w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I don't
> understand.
> > (I'm thinking it means that neither 3 nor 4 are zero.)
> >
> > So, is there some way to find if any member of a given set is found in
> > another set, or some different, good (fast) way to find "blank" lines?
> >
> > Dan Moyer
> >
> >
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: find if any memers of a set are in another set (blank line finder)?

Dan,

Sometimes the direct approach is the simplest and fastest:
---
type blank_line(sequence s)
    for i = 1 to length(s) do
        if not find(s[i], " \t\n") then
            return 0
        end if
    end for
    return 1
end type
---

Colin Taylor

----- Original Message -----
From: "Dan Moyer" <DANIELMOYER at prodigy.net>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, July 18, 2002 8:59 AM
Subject: Re: find if any memers of a set are in another set (blank line
finder)?


>
>
> I haven't tested this yet against text read in from a file, but it seems
> like it *might* work, does anyone have any thing better/faster?  When I
put
> a tab in sequence "a" below, it didn't like that, but I think a tab in a
> file might be different?
>
> Dan Moyer
>
> <code begins>
> sequence a, b
> a = "      "
> b = "this is a line of text!"
>
> function IsBlankLine(sequence aLine)
>   sequence w,x
>   w = aLine
>   x = w
>   w = w > 32  -- catches any characters *less* than 33
>   x = w < 126 -- catches any characters more than 126
>   w = w and x
>
>   if find(1,w) then
>      return 0 -- no, is *not* blank line
>   else
>      return 1 -- yes, is blank line
>   end if
> end function
>
> if IsBlankLine(a) = 1 then
>     puts(1, "yes, \"" & a & "\" is a blank line")
> else
>    puts(1, "no, \"" & a & "\" is  not a blank line")
> end if
> puts(1, "\n")
> if IsBlankLine(b) = 1 then
>     puts(1, "yes, \"" & b & "\" is a blank line")
> else
>    puts(1, "no, \"" & b & "\" is  not a blank line")
> end if
>
> <code ends>
>
>
> ----- Original Message -----
> From: "Dan Moyer" <DANIELMOYER at prodigy.net>
> To: "EUforum" <EUforum at topica.com>
> Sent: Thursday, July 18, 2002 5:19 AM
> Subject: find if any memers of a set are in another set (blank line
finder)?
>
>
> > I want to be able to discern whether a sequence (a line of text) is
> "empty"
> > (ie, has *only* spaces or tabs or CR or any combination of those), or if
> it
> > has *any* alpha/num/punctuation content at all.  In other words,a
"blank"
> > line finder.  And it needs to function as quickly as possible.
> >
> > I thought there might be a spiffy way similar to how:
> >
> > w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}
> >
> > and then I could make a sequence of the numbers 33-126 (for all the
> > al/num/punc), and test any line against that sequence with an "or" in
> place
> > of the "="; but as a test,
> >
> > w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I don't
> understand.
> > (I'm thinking it means that neither 3 nor 4 are zero.)
> >
> > So, is there some way to find if any member of a given set is found in
> > another set, or some different, good (fast) way to find "blank" lines?
> >
> > Dan Moyer
> >
> >
>
>
>

new topic     » goto parent     » topic index » view message » categorize

5. Re: find if any memers of a set are in another set (blank line finder)?

Colin,

I did think of that, but I haven't made user-defined types before & when I
read the manual, I thought it said that if your program encounters something
in a user-typed variable that isn't the right type, the program *halts* &
gives an error message.  That wouldn't be what I wanted at all!  Did I
misunderstand the manual?

Dan


----- Original Message -----
From: <cetaylor at compuserve.com>

>
> Dan,
>
> Sometimes the direct approach is the simplest and fastest:
> ---
> type blank_line(sequence s)
>     for i = 1 to length(s) do
>         if not find(s[i], " \t\n") then
>             return 0
>         end if
>     end for
>     return 1
> end type
> ---
>
> Colin Taylor
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: find if any memers of a set are in another set (blank line finder)?

Thanks Matt,

I'm gonna have to try your method one to see what it does, I don't get it on
just reading it (ditto the one liner variation).  I'd considered your last
"test each character in the line against the white_space set" method, but
had just assumed it would be slower than some kind of "test whole set
against whole set" method.

Dan


----- Original Message -----
From: "Matthew Lewis" <matthewwalkerlewis at YAHOO.COM>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, July 18, 2002 5:43 AM
Subject: RE: find if any memers of a set are in another set (blank line
finder)?


>
>
> > -----Original Message-----
> > From: Dan Moyer [mailto:DANIELMOYER at prodigy.net]
>
> > I want to be able to discern whether a sequence (a line of
> > text) is "empty"
> > (ie, has *only* spaces or tabs or CR or any combination of
> > those), or if it
> > has *any* alpha/num/punctuation content at all.  In other
> > words,a "blank"
> > line finder.  And it needs to function as quickly as possible.
> >
> > I thought there might be a spiffy way similar to how:
> >
> > w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}
>
> You could do it this way:
>
> -- your line is in sequence line
> constant white_space = { ' ', '\t', '\r', '\n' }
> sequence l
>
> l = repeat(0,length(line))
> for i = 1 to length( white_space ) do
>     l += ( line = white_space[i] )
> end for
>
> if find( 0, l ) then
>     -- something else in there
> end if
>
> Rewritten a-la Carl's one liners:
> not_blank = find(0, (line = ' ') + (line = '\t') + (line = '\r') + (line =
> '\n') )
>
> Functionally, but not necessarily speedwise equivalent:
> not_blank = find(0, (line = ' ') or (line = '\t') or (line = '\r') or
(line
> = '\n') )
>
> > w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I
> > don't understand.
> > (I'm thinking it means that neither 3 nor 4 are zero.)
>
> Here's why:
> w = { 1 or 1, 2 or 2, 3 or 4 } = {1,1,1}
>
> > So, is there some way to find if any member of a given set is found in
> > another set, or some different, good (fast) way to find "blank" lines?
>
> I often use this method:
>
> for i = 1 to length(line) do
>     if not find(line[i], white_space) then
>         -- found something else!
>     end if
> end for
>
> This would most likely be faster if the first non-whitespace character
were
> toward the beginning of the line.  You might also get different results
> depending on the length of the lines to be tested.  While the first method
> is cool because it uses slick sequence math, I suspect that the last
version
> will be the fastest for your purposes.
>
> Matt Lewis
>
>

new topic     » goto parent     » topic index » view message » categorize

7. Re: find if any memers of a set are in another set (blank line finder)?

Thanks Kat,

I'll see if it's any faster than some of the other ways.

tab:  what I meant was, if you're typing characters into a sequence within
your program, and hit the "tab" key while doing so, when that sequence is
encountered by your program, it will error and say something like, "use \t
to put a tab character into a sequence".  But if you have read a *file* into
a sequence, and the file was a text file with a tab in it, I think it will
have the correct tab value there.

Dan

----- Original Message -----
From: "Kat" <gertie at PELL.NET>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, July 18, 2002 7:07 AM
Subject: Re: find if any memers of a set are in another set (blank line
finder)?


>
> On 18 Jul 2002, at 5:59, Dan Moyer wrote:
>
> >
> > I haven't tested this yet against text read in from a file, but it seems
> > like it *might* work, does anyone have any thing better/faster?  When I
put a
> > tab in sequence "a" below, it didn't like that, but I think a tab in a
file
> > might be different?
>
> A tab is ascii 9, doesn't matter where you get it from.
>
> This is a lil shorter than what you have below:
>
> include strtok-v2.e
>
> line = "                 " -- or whatever
> junk = parse(line,32) -- 32 is ascii for blank, or ' ' or " "
> if length(junk) = 0
>  then -- it's blank
>  else -- it's not blank
> end if
>
> Here is a way to see if there is any member of a set in a string:
>
> punctset = "-_^`'<>][?*\\/}{+&^ at !~|;:,."
> line = "a b at Cat.d"
>
> junk = parse(line,punctset)
> if length(junk) > 1
>   then -- length is how many of punctset in line
>   else -- none of punctset is in line
> end if
>
> Kat
>
> > Dan Moyer
> >
> > <code begins>
> > sequence a, b
> > a = "      "
> > b = "this is a line of text!"
> >
> > function IsBlankLine(sequence aLine)
> >   sequence w,x
> >   w = aLine
> >   x = w
> >   w = w > 32  -- catches any characters *less* than 33
> >   x = w < 126 -- catches any characters more than 126
> >   w = w and x
> >
> >   if find(1,w) then
> >      return 0 -- no, is *not* blank line
> >   else
> >      return 1 -- yes, is blank line
> >   end if
> > end function
> >
> > if IsBlankLine(a) = 1 then
> >     puts(1, "yes, \"" & a & "\" is a blank line")
> > else
> >    puts(1, "no, \"" & a & "\" is  not a blank line")
> > end if
> > puts(1, "\n")
> > if IsBlankLine(b) = 1 then
> >     puts(1, "yes, \"" & b & "\" is a blank line")
> > else
> >    puts(1, "no, \"" & b & "\" is  not a blank line")
> > end if
> >
> > <code ends>
> >
> >
> > ----- Original Message -----
> > From: "Dan Moyer" <DANIELMOYER at prodigy.net>
> > To: "EUforum" <EUforum at topica.com>
> > Sent: Thursday, July 18, 2002 5:19 AM
> > Subject: find if any memers of a set are in another set (blank line
finder)?
> >
> >
> > > I want to be able to discern whether a sequence (a line of text) is
> > "empty"
> > > (ie, has *only* spaces or tabs or CR or any combination of those), or
if
> > it
> > > has *any* alpha/num/punctuation content at all.  In other words,a
"blank" line
> > > finder.  And it needs to function as quickly as possible.
> > >
> > > I thought there might be a spiffy way similar to how:
> > >
> > > w = {1, 2, 3} = {1, 2, 4}  gives: w={1,1,0}
> > >
> > > and then I could make a sequence of the numbers 33-126 (for all the
> > > al/num/punc), and test any line against that sequence with an "or" in
> > place
> > > of the "="; but as a test,
> > >
> > > w = {1, 2, 3} or {1, 2, 4} gives me:  w= {1,1,1}, which I don't
> > understand.
> > > (I'm thinking it means that neither 3 nor 4 are zero.)
> > >
> > > So, is there some way to find if any member of a given set is found in
> > > another set, or some different, good (fast) way to find "blank" lines?
> > >
> > > Dan Moyer
> > >
> > >
>
>
>

new topic     » goto parent     » topic index » view message » categorize

8. Re: find if any memers of a set are in another set (blank line finder)?

Dan,

A type can be used just like a function that returns TRUE or FALSE.  If it
makes you feel better you can use a function call instead of a type call
(just substitute the word "function" for "type" in the code).  Use it like
this:

    if blank_line(line) then
        -- do this
    else
        -- do that
    end if

Testing character by character will usually be faster than testing the whole
string, since the routine exits at the first non-whitespace character it
finds.

Colin Taylor


----- Original Message -----
From: "Dan Moyer" <DANIELMOYER at prodigy.net>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, July 18, 2002 5:38 PM
Subject: Re: find if any memers of a set are in another set (blank line
finder)?


>
>
> Colin,
>
> I did think of that, but I haven't made user-defined types before & when I
> read the manual, I thought it said that if your program encounters
something
> in a user-typed variable that isn't the right type, the program *halts* &
> gives an error message.  That wouldn't be what I wanted at all!  Did I
> misunderstand the manual?
>
> Dan
>
>
> ----- Original Message -----
> From: <cetaylor at compuserve.com>
>
> >
> > Dan,
> >
> > Sometimes the direct approach is the simplest and fastest:
> > ---
> > type blank_line(sequence s)
> >     for i = 1 to length(s) do
> >         if not find(s[i], " \t\n") then
> >             return 0
> >         end if
> >     end for
> >     return 1
> > end type
> > ---
> >
> > Colin Taylor
> >
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu