1. my dilemma -- hundreds of record types

Hello Eu folks!

I am attempting to develop an application that will
read records from a daily FTP feed, sort the records
according to type, display pertinent information on
each record, and send that information to subscribers
based on their subscription options.

Each record has a corresponding template with variable
length fields. The fields are delimited by tags.  Some
of the fields are optional, and thus even the tags
thenselves do not appear in those records that do not
use them.  Here is a view of one of the smaller
templates, with sample information:

<PRESOL>
<DATE>1114
<YEAR>02
<AGENCY>Department of Justice
<OFFICE>Bureau of Prisons
<LOCATION>FCI Talladega
<ZIP>35160
<CLASSCOD>89
<OFFADD>Department of Justice, Bureau of Prisons, FCI
Talladega, 565 East Renfroe Road, Talladega, AL, 35160
<SUBJECT>89 -- Subsistence
<SOLNBR>31303-045
<RESPDATE>121703
<ARCHDATE>01012004
<CONTACT>Ricky D, Contract Specialist, Phone (703)
555-4251, Fax (703) 999-4493, Email rwd at bop.gov 
<DESC>2nd qtr food items
<LINK>
<URL>http://www.yadayada.gov/spg/DOJ/BPR/31303/31303-045/listing.html
<DESC>Link to document.
<SETASIDE>Total Small Business
<POPCOUNTRY>US
<POPZIP>35160
<POPADDRESS>Federal Correctional Institution
565 East Renfroe Road
Talladega,AL
</PRESOL>


There are 9 major template types, but since a lot of
the fields are optional it boosts the number of
different record variations to hundreds.  The only
tags that truly remain constant are the first and last
(<PRESOL and </PRESOL> in the example above) and a few
others strewn throughout the record.  

What is, in your opinion, the best way to read in all
of these record variations so that I can easily work
with and manage the data?

Thank you for your assistance,

Chris

new topic     » topic index » view message » categorize

2. Re: my dilemma -- hundreds of record types

On 15 Nov 2002, at 9:29, Chris Saik wrote:

> 
> Hello Eu folks!
> 
> I am attempting to develop an application that will
> read records from a daily FTP feed, sort the records
> according to type, display pertinent information on
> each record, and send that information to subscribers
> based on their subscription options.
> 
> Each record has a corresponding template with variable
> length fields. The fields are delimited by tags.  Some
> of the fields are optional, and thus even the tags
> thenselves do not appear in those records that do not
> use them.  Here is a view of one of the smaller
> templates, with sample information:
> 
> <PRESOL>
> <DATE>1114
> <YEAR>02

<snip>

> There are 9 major template types, but since a lot of
> the fields are optional it boosts the number of
> different record variations to hundreds.  The only
> tags that truly remain constant are the first and last
> (<PRESOL and </PRESOL> in the example above) and a few
> others strewn throughout the record.  
> 
> What is, in your opinion, the best way to read in all
> of these record variations so that I can easily work
> with and manage the data?

Use gets() to read the file into a sequence, and then a simple loop thru the 
sequence, to find the tags you are looking for.

<untested>
ftpdata = fptdata & {gets(readfile)}

<untested>
function findtag(sequence tag)
for loop = 1 to length(ftpdata) do
  if match(tag,ftpdata[loop]) then 
    return ftpdata[loop] 
  end if
end for
return ""
end function

<untested>
function ListAllTags
sequence total, temp
total = ""
temp = ""
for loop = 1 to length(ftpdata) do
 temp = match(".",ftpdata[loop])
 if not match(temp,total) then
   total = total & {temp}
 end if
end for
return total
end function

Kat

new topic     » goto parent     » topic index » view message » categorize

3. Re: my dilemma -- hundreds of record types

Thank you!  This is what I was looking for.

(I'm still learning Eu and feel like a newbie... =)


--- Kat <kat at kogeijin.com> wrote:
> 
> On 15 Nov 2002, at 9:29, Chris Saik wrote:
> 
> > 
> > Hello Eu folks!
> > 
> > I am attempting to develop an application that
> will
> > read records from a daily FTP feed, sort the
> records
> > according to type, display pertinent information
> on
> > each record, and send that information to
> subscribers
> > based on their subscription options.
> > 
> > Each record has a corresponding template with
> variable
> > length fields. The fields are delimited by tags. 
> Some
> > of the fields are optional, and thus even the tags
> > thenselves do not appear in those records that do
> not
> > use them.  Here is a view of one of the smaller
> > templates, with sample information:
> > 
> > <PRESOL>
> > <DATE>1114
> > <YEAR>02
> 
> <snip>
> 
> > There are 9 major template types, but since a lot
> of
> > the fields are optional it boosts the number of
> > different record variations to hundreds.  The only
> > tags that truly remain constant are the first and
> last
> > (<PRESOL and </PRESOL> in the example above) and a
> few
> > others strewn throughout the record.  
> > 
> > What is, in your opinion, the best way to read in
> all
> > of these record variations so that I can easily
> work
> > with and manage the data?
> 
> Use gets() to read the file into a sequence, and
> then a simple loop thru the 
> sequence, to find the tags you are looking for.
> 
> <untested>
> ftpdata = fptdata & {gets(readfile)}
> 
> <untested>
> function findtag(sequence tag)
> for loop = 1 to length(ftpdata) do
>   if match(tag,ftpdata[loop]) then 
>     return ftpdata[loop] 
>   end if
> end for
> return ""
> end function
> 
> <untested>
> function ListAllTags
> sequence total, temp
> total = ""
> temp = ""
> for loop = 1 to length(ftpdata) do
>  temp = match(".",ftpdata[loop])
>  if not match(temp,total) then
>    total = total & {temp}
>  end if
> end for
> return total
> end function
> 
> Kat
> 
>
> 
> 
>
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: my dilemma -- hundreds of record types

On Friday 15 November 2002 12:48 pm, you wrote:
>
> On 15 Nov 2002, at 9:29, Chris Saik wrote:
> > Hello Eu folks!
> >
> > I am attempting to develop an application that will
> > read records from a daily FTP feed, sort the records
> > according to type, display pertinent information on
> > each record, and send that information to subscribers
> > based on their subscription options.

> >
> > <PRESOL>
> > <DATE>1114
> > <YEAR>02
>
> <snip>
>
> > There are 9 major template types, but since a lot of
> > the fields are optional it boosts the number of
> > different record variations to hundreds.  The only
> > tags that truly remain constant are the first and last
> > (<PRESOL and </PRESOL> in the example above) and a few
> > others strewn throughout the record.
> >
> > What is, in your opinion, the best way to read in all
> > of these record variations so that I can easily work
> > with and manage the data?

Kat wrote:

> Use gets() to read the file into a sequence, and then a simple loop thru
> the sequence, to find the tags you are looking for.

Almost:)  gets() reads one line up to the c/r from a file. In order to get an 
entire file, which you might as well do if the files aren't really huge:

-- (tested)
atom fn
object line, text

fn = open("test.txt","r")

text = {}  -- start with empty buffer
while 1 do  -- loop forever
  line = gets(fn) -- read a line
  if atom(line) then exit -- get out of the forever loop!
  else text = append(text,line) -- add line to buffer
  end if
end while

for i = 1 to length(text) do -- iterate thru the buffer
  if match("<YEAR>",text[i]) then -- looking for the <YEAR> tag
     puts(1,text[i])  -- if tag is found, print the entire line
  end if
end for

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

5. Re: my dilemma -- hundreds of record types

> Each record has a corresponding template with variable
> length fields. The fields are delimited by tags.  Some
> of the fields are optional, and thus even the tags
> thenselves do not appear in those records that do not
> use them.  Here is a view of one of the smaller
> templates, with sample information:

Despite their being "optional," would it be possible that each template
would store info from each field? If so, then you'll have one record type
and a field indicating what template type it is.

> There are 9 major template types, but since a lot of
> the fields are optional it boosts the number of
> different record variations to hundreds.

Just because a field is optional doesn't mean you have to have a whole
different record type for it. Just leave it blank.

new topic     » goto parent     » topic index » view message » categorize

6. Re: my dilemma -- hundreds of record types

<snip>

> 
> Kat wrote:
> 
> > Use gets() to read the file into a sequence, and
> then a simple loop thru
> > the sequence, to find the tags you are looking
> for.
> 
> Almost:)  gets() reads one line up to the c/r from a
> file. In order to get an 
> entire file, which you might as well do if the files
> aren't really huge:
> 
> -- (tested)
> atom fn
> object line, text
> 
> fn = open("test.txt","r")
> 
> text = {}  -- start with empty buffer
> while 1 do  -- loop forever
>   line = gets(fn) -- read a line
>   if atom(line) then exit -- get out of the forever
> loop!
>   else text = append(text,line) -- add line to
> buffer
>   end if
> end while
> 
> for i = 1 to length(text) do -- iterate thru the
> buffer
>   if match("<YEAR>",text[i]) then -- looking for the
> <YEAR> tag
>      puts(1,text[i])  -- if tag is found, print the
> entire line
>   end if
> end for

Thank you Irv.  I'm confused over one thing though...
what if the data following the tag is several lines
long, as in a paragraph?  How would I print the entire
data within the field, and stop when the program
reaches the next tag? 

Thanks for your assistance,

Chris

new topic     » goto parent     » topic index » view message » categorize

7. Re: my dilemma -- hundreds of record types

On Friday 15 November 2002 04:30 pm, Chris wrote:

> Thank you Irv.  I'm confused over one thing though...
> what if the data following the tag is several lines
> long, as in a paragraph?  How would I print the entire
> data within the field, and stop when the program
> reaches the next tag?

Hi:

One way would be to find the desired tag, then read lines until you hit the 
next < 
Of course, this won't work if there are < 's imbedded in the data field.

-- tested

object text

---------------------------------
function LoadFile(sequence name)
---------------------------------
integer fn
object line
fn = open(name,"r")
text = {}
while 1 do
   line = gets(fn)
   if atom(line) then exit
   else text = append(text,line)
   end if
end while
return text
end function

-----------------------------------------
function Extract(object tag, object text)
-----------------------------------------
integer i
object found
found = ""
i = 1
while i <= length(text) do -- iterate thru the buffer
   if match(tag,text[i]) then -- look for the <YEAR> tag
      found = text[i] -- store that line
      i += 1 --- go to next line
      while not match("<",text[i]) do -- look for <
           found &= text[i]   -- if not there, add next line to buffer
           i += 1 -- go to next line
       end while
   else i += 1
   end if
end while
return found
end function

-----------------------------[ MAIN ]-------------------------------
text = LoadFile("test.txt")
puts(1,Extract("<NAME>",text))

-- end
                  
Hope that helps
Irv

new topic     » goto parent     » topic index » view message » categorize

8. Re: my dilemma -- hundreds of record types

On 15 Nov 2002, at 13:30, Chris Saik wrote:

> 
> <snip>
> 
> > 
> > Kat wrote:
> > 
> > > Use gets() to read the file into a sequence, and
> > then a simple loop thru
> > > the sequence, to find the tags you are looking
> > for.
> > 
> > Almost:)  gets() reads one line up to the c/r from a
> > file. In order to get an 
> > entire file, which you might as well do if the files
> > aren't really huge:
> > 
> > -- (tested)
> > atom fn
> > object line, text
> > 
> > fn = open("test.txt","r")
> > 
> > text = {}  -- start with empty buffer
> > while 1 do  -- loop forever
> >   line = gets(fn) -- read a line
> >   if atom(line) then exit -- get out of the forever
> > loop!
> >   else text = append(text,line) -- add line to
> > buffer
> >   end if
> > end while
> > 
> > for i = 1 to length(text) do -- iterate thru the
> > buffer
> >   if match("<YEAR>",text[i]) then -- looking for the
> > <YEAR> tag
> >      puts(1,text[i])  -- if tag is found, print the
> > entire line
> >   end if
> > end for
> 
> Thank you Irv.  I'm confused over one thing though...
> what if the data following the tag is several lines
> long, as in a paragraph?  How would I print the entire
> data within the field, and stop when the program
> reaches the next tag? 

Yeas Irv, can you do it as easily as gets(), without using gets() ?

Kat

new topic     » goto parent     » topic index » view message » categorize

9. Re: my dilemma -- hundreds of record types

I think I answered my own question.  The FTP feed
always begins new tags on a new line.  I could just
check the first character of the line that I'm
currently reading in, and if it's a "<" then I can
assume it's a new tag. 

Although, if someone used "<" within the data, and it
just happened to be the first character of the line,
then it wouldn't work... 

hmmm... 

> 
> Thank you Irv.  I'm confused over one thing
> though...
> what if the data following the tag is several lines
> long, as in a paragraph?  How would I print the
> entire
> data within the field, and stop when the program
> reaches the next tag? 
> 
> Thanks for your assistance,
> 
> Chris
> 
>
> 
> 
>
>

new topic     » goto parent     » topic index » view message » categorize

10. Re: my dilemma -- hundreds of record types

On  0, Kat <kat at kogeijin.com> wrote:
> On 15 Nov 2002, at 13:30, Chris Saik wrote:
> 
> > 
> > <snip>
> > 
> > > 
> > > Kat wrote:
> > > 
> > > > Use gets() to read the file into a sequence, and
> > > then a simple loop thru
> > > > the sequence, to find the tags you are looking
> > > for.
> > > 
> > > Almost:)  gets() reads one line up to the c/r from a
> > > file. In order to get an 
> > > entire file, which you might as well do if the files
> > > aren't really huge:
> > > 
> > > -- (tested)
> > > atom fn
> > > object line, text
> > > 
> > > fn = open("test.txt","r")
> > > 
> > > text = {}  -- start with empty buffer
> > > while 1 do  -- loop forever
> > >   line = gets(fn) -- read a line
> > >   if atom(line) then exit -- get out of the forever
> > > loop!
> > >   else text = append(text,line) -- add line to
> > > buffer
> > >   end if
> > > end while
> > > 
> > > for i = 1 to length(text) do -- iterate thru the
> > > buffer
> > >   if match("<YEAR>",text[i]) then -- looking for the
> > > <YEAR> tag
> > >      puts(1,text[i])  -- if tag is found, print the
> > > entire line
> > >   end if
> > > end for
> > 
> > Thank you Irv.  I'm confused over one thing though...
> > what if the data following the tag is several lines
> > long, as in a paragraph?  How would I print the entire
> > data within the field, and stop when the program
> > reaches the next tag? 
> 
> Yeas Irv, can you do it as easily as gets(), without using gets() ?
> 
> Kat
> 

I prefer this myself, actually:

atom fn, char
sequence text, line
 
fn = open("test.txt","r")
 
text = {}
line = {}

while 1 do
	char = getc(fn)
	if char = -1 then
		if length(line) then
			text &= {line}
		end if
		exit
	elsif char = '\n' then
		text &= {line&char}
		line = {}
	else
		line &= char
	end if
end while

for i = 1 to length(text) do
	if match("<YEAR>",text[i]) then
		puts(1,text[i])
	end if
end for



--

new topic     » goto parent     » topic index » view message » categorize

11. Re: my dilemma -- hundreds of record types

On Friday 15 November 2002 05:10 pm, you wrote:

> > Thank you Irv.  I'm confused over one thing
> > though...
> > what if the data following the tag is several lines
> > long, as in a paragraph?  How would I print the
> > entire
> > data within the field, and stop when the program
> > reaches the next tag?

That's fixable also:
use wlldcard_match("*<*>*",line) to look for the first tag in the line.
if found, then it will automatically be the first <tag> in the line, and the 
rest of the line can be considered data, even if it happens to have a < , a > 
or even both.
If no tag is found, then it must be a continuation of data from the previous 
tag, and can just be tacked on.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

12. Re: my dilemma -- hundreds of record types

On 15 Nov 2002, at 14:10, Chris Saik wrote:

> 
> I think I answered my own question.  The FTP feed
> always begins new tags on a new line.  I could just
> check the first character of the line that I'm
> currently reading in, and if it's a "<" then I can
> assume it's a new tag. 
> 
> Although, if someone used "<" within the data, and it
> just happened to be the first character of the line,
> then it wouldn't work... 
> 
> hmmm... 

This is why i used gets(), it breaks on the newline, and each field i added to 
the base sequence in a nested way, making searching the ftpdata for a 
specific tag easy as pie, yet keeps the fields separate. Little databases like 
this are so easy in Euphoria.

Kat

> > 
> > Thank you Irv.  I'm confused over one thing
> > though...
> > what if the data following the tag is several lines
> > long, as in a paragraph?  How would I print the
> > entire
> > data within the field, and stop when the program
> > reaches the next tag? 
> > 
> > Thanks for your assistance,
> > 
> > Chris
> > 
> >
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

13. Re: my dilemma -- hundreds of record types

I have written a HTML parser which has three events: on_start_tag,
on_end_tag and on_data. it isnt on my web page or in archives yet. if you
want it i can upload it?
----- Original Message -----
From: "Chris Saik" <csaik2002 at yahoo.com>
Sent: Friday, November 15, 2002 10:30 PM
Subject: Re: my dilemma -- hundreds of record types


>
> <snip>
>
> >
> > Kat wrote:
> >
> > > Use gets() to read the file into a sequence, and
> > then a simple loop thru
> > > the sequence, to find the tags you are looking
> > for.
> >
> > Almost:)  gets() reads one line up to the c/r from a
> > file. In order to get an
> > entire file, which you might as well do if the files
> > aren't really huge:
> >
> > -- (tested)
> > atom fn
> > object line, text
> >
> > fn = open("test.txt","r")
> >
> > text = {}  -- start with empty buffer
> > while 1 do  -- loop forever
> >   line = gets(fn) -- read a line
> >   if atom(line) then exit -- get out of the forever
> > loop!
> >   else text = append(text,line) -- add line to
> > buffer
> >   end if
> > end while
> >
> > for i = 1 to length(text) do -- iterate thru the
> > buffer
> >   if match("<YEAR>",text[i]) then -- looking for the
> > <YEAR> tag
> >      puts(1,text[i])  -- if tag is found, print the
> > entire line
> >   end if
> > end for
>
> Thank you Irv.  I'm confused over one thing though...
> what if the data following the tag is several lines
> long, as in a paragraph?  How would I print the entire
> data within the field, and stop when the program
> reaches the next tag?
>
> Thanks for your assistance,
>
> Chris
>
>
>
>

new topic     » goto parent     » topic index » view message » categorize

14. Re: my dilemma -- hundreds of record types

Hi C.K.,

I must have had a brain freeze... I was confused since
each record may or may not have all of the tag names
present.  

But now I think I'm on the right track, thanks to you
and Kat.

Chris


--- "C. K. Lester" <cklester at yahoo.com> wrote:
> 
> > Each record has a corresponding template with
> variable
> > length fields. The fields are delimited by tags. 
> Some
> > of the fields are optional, and thus even the tags
> > thenselves do not appear in those records that do
> not
> > use them.  Here is a view of one of the smaller
> > templates, with sample information:
> 
> Despite their being "optional," would it be possible
> that each template
> would store info from each field? If so, then you'll
> have one record type
> and a field indicating what template type it is.
> 
> > There are 9 major template types, but since a lot
> of
> > the fields are optional it boosts the number of
> > different record variations to hundreds.
> 
> Just because a field is optional doesn't mean you
> have to have a whole
> different record type for it. Just leave it blank.
> 
>
> 
> 
>
>

new topic     » goto parent     » topic index » view message » categorize

15. Re: my dilemma -- hundreds of record types

On Friday 15 November 2002 07:07 pm, you wrote:
>
> On Friday 15 November 2002 05:10 pm, you wrote:
> > > Thank you Irv.  I'm confused over one thing
> > > though...
> > > what if the data following the tag is several lines
> > > long, as in a paragraph?  How would I print the
> > > entire
> > > data within the field, and stop when the program
> > > reaches the next tag?

Forgot the akshul code:

include wildcard.e
object text

---------------------------------
function LoadFile(sequence name)
---------------------------------
integer fn
object line
fn = open(name,"r")
text = {}
while 1 do
   line = gets(fn)
   if atom(line) then exit
   else text = append(text,line)
   end if
end while
return text
end function

-----------------------------------------
function Extract(object tag, object text)
-----------------------------------------
integer i
object found
found = ""
i = 1
while i <= length(text) do -- iterate thru the buffer
   if match(tag,text[i]) then -- look for the <YEAR> tag
      found = text[i] -- store that line
      i += 1 --- go to next line
     while i <= length(text) and
         not wildcard_match("*<*>*",text[i]) do
           found &= text[i]
           i += 1
           if i > length(text) then exit
           end if
     end while

   else i += 1
   end if
end while
return found
end function

text = LoadFile("test.txt")
puts(1,Extract("<ADDR>",text))

--- end code: 
--- here's the test file:

<DATE> 11/15/2002
<NAME> Irv Mullins <esq>
   <ADDR>1234 Fifth St
Joizey City
<STATE>NJ
>!!
<!!
<PHONE>555-1212

new topic     » goto parent     » topic index » view message » categorize

16. Re: my dilemma -- hundreds of record types

Hello Tone,

Yes please, I would really like to get the HTML parser
you wrote!

And thanks to everyone else who responded to my
questions.  It's been a great learning experience for
me!

Chris


--- tone.skoda at gmx.net wrote:
> 
> I have written a HTML parser which has three events:
> on_start_tag,
> on_end_tag and on_data. it isnt on my web page or in
> archives yet. if you
> want it i can upload it?
> ----- Original Message -----
> From: "Chris Saik" <csaik2002 at yahoo.com>
> Sent: Friday, November 15, 2002 10:30 PM
> Subject: Re: my dilemma -- hundreds of record types
> 
> 
> > <snip>
> >
> > >
> > > Kat wrote:
> > >
> > > > Use gets() to read the file into a sequence,
> and
> > > then a simple loop thru
> > > > the sequence, to find the tags you are looking
> > > for.
> > >
> > > Almost:)  gets() reads one line up to the c/r
> from a
> > > file. In order to get an
> > > entire file, which you might as well do if the
> files
> > > aren't really huge:
> > >
> > > -- (tested)
> > > atom fn
> > > object line, text
> > >
> > > fn = open("test.txt","r")
> > >
> > > text = {}  -- start with empty buffer
> > > while 1 do  -- loop forever
> > >   line = gets(fn) -- read a line
> > >   if atom(line) then exit -- get out of the
> forever
> > > loop!
> > >   else text = append(text,line) -- add line to
> > > buffer
> > >   end if
> > > end while
> > >
> > > for i = 1 to length(text) do -- iterate thru the
> > > buffer
> > >   if match("<YEAR>",text[i]) then -- looking for
> the
> > > <YEAR> tag
> > >      puts(1,text[i])  -- if tag is found, print
> the
> > > entire line
> > >   end if
> > > end for
> >
> > Thank you Irv.  I'm confused over one thing
> though...
> > what if the data following the tag is several
> lines
> > long, as in a paragraph?  How would I print the
> entire
> > data within the field, and stop when the program
> > reaches the next tag?
> >
> > Thanks for your assistance,
> >
> > Chris
> >
> >
> 
> 
>
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu