1. my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 15, 2002
- 431 views
Hello Eu folks! I am attempting to develop an application that will read records from a daily FTP feed, sort the records according to type, display pertinent information on each record, and send that information to subscribers based on their subscription options. Each record has a corresponding template with variable length fields. The fields are delimited by tags. Some of the fields are optional, and thus even the tags thenselves do not appear in those records that do not use them. Here is a view of one of the smaller templates, with sample information: <PRESOL> <DATE>1114 <YEAR>02 <AGENCY>Department of Justice <OFFICE>Bureau of Prisons <LOCATION>FCI Talladega <ZIP>35160 <CLASSCOD>89 <OFFADD>Department of Justice, Bureau of Prisons, FCI Talladega, 565 East Renfroe Road, Talladega, AL, 35160 <SUBJECT>89 -- Subsistence <SOLNBR>31303-045 <RESPDATE>121703 <ARCHDATE>01012004 <CONTACT>Ricky D, Contract Specialist, Phone (703) 555-4251, Fax (703) 999-4493, Email rwd at bop.gov <DESC>2nd qtr food items <LINK> <URL>http://www.yadayada.gov/spg/DOJ/BPR/31303/31303-045/listing.html <DESC>Link to document. <SETASIDE>Total Small Business <POPCOUNTRY>US <POPZIP>35160 <POPADDRESS>Federal Correctional Institution 565 East Renfroe Road Talladega,AL </PRESOL> There are 9 major template types, but since a lot of the fields are optional it boosts the number of different record variations to hundreds. The only tags that truly remain constant are the first and last (<PRESOL and </PRESOL> in the example above) and a few others strewn throughout the record. What is, in your opinion, the best way to read in all of these record variations so that I can easily work with and manage the data? Thank you for your assistance, Chris
2. Re: my dilemma -- hundreds of record types
- Posted by Kat <kat at kogeijin.com> Nov 15, 2002
- 437 views
On 15 Nov 2002, at 9:29, Chris Saik wrote: > > Hello Eu folks! > > I am attempting to develop an application that will > read records from a daily FTP feed, sort the records > according to type, display pertinent information on > each record, and send that information to subscribers > based on their subscription options. > > Each record has a corresponding template with variable > length fields. The fields are delimited by tags. Some > of the fields are optional, and thus even the tags > thenselves do not appear in those records that do not > use them. Here is a view of one of the smaller > templates, with sample information: > > <PRESOL> > <DATE>1114 > <YEAR>02 <snip> > There are 9 major template types, but since a lot of > the fields are optional it boosts the number of > different record variations to hundreds. The only > tags that truly remain constant are the first and last > (<PRESOL and </PRESOL> in the example above) and a few > others strewn throughout the record. > > What is, in your opinion, the best way to read in all > of these record variations so that I can easily work > with and manage the data? Use gets() to read the file into a sequence, and then a simple loop thru the sequence, to find the tags you are looking for. <untested> ftpdata = fptdata & {gets(readfile)} <untested> function findtag(sequence tag) for loop = 1 to length(ftpdata) do if match(tag,ftpdata[loop]) then return ftpdata[loop] end if end for return "" end function <untested> function ListAllTags sequence total, temp total = "" temp = "" for loop = 1 to length(ftpdata) do temp = match(".",ftpdata[loop]) if not match(temp,total) then total = total & {temp} end if end for return total end function Kat
3. Re: my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 15, 2002
- 436 views
Thank you! This is what I was looking for. (I'm still learning Eu and feel like a newbie... =) --- Kat <kat at kogeijin.com> wrote: > > On 15 Nov 2002, at 9:29, Chris Saik wrote: > > > > > Hello Eu folks! > > > > I am attempting to develop an application that > will > > read records from a daily FTP feed, sort the > records > > according to type, display pertinent information > on > > each record, and send that information to > subscribers > > based on their subscription options. > > > > Each record has a corresponding template with > variable > > length fields. The fields are delimited by tags. > Some > > of the fields are optional, and thus even the tags > > thenselves do not appear in those records that do > not > > use them. Here is a view of one of the smaller > > templates, with sample information: > > > > <PRESOL> > > <DATE>1114 > > <YEAR>02 > > <snip> > > > There are 9 major template types, but since a lot > of > > the fields are optional it boosts the number of > > different record variations to hundreds. The only > > tags that truly remain constant are the first and > last > > (<PRESOL and </PRESOL> in the example above) and a > few > > others strewn throughout the record. > > > > What is, in your opinion, the best way to read in > all > > of these record variations so that I can easily > work > > with and manage the data? > > Use gets() to read the file into a sequence, and > then a simple loop thru the > sequence, to find the tags you are looking for. > > <untested> > ftpdata = fptdata & {gets(readfile)} > > <untested> > function findtag(sequence tag) > for loop = 1 to length(ftpdata) do > if match(tag,ftpdata[loop]) then > return ftpdata[loop] > end if > end for > return "" > end function > > <untested> > function ListAllTags > sequence total, temp > total = "" > temp = "" > for loop = 1 to length(ftpdata) do > temp = match(".",ftpdata[loop]) > if not match(temp,total) then > total = total & {temp} > end if > end for > return total > end function > > Kat > > > > > >
4. Re: my dilemma -- hundreds of record types
- Posted by irv at take.maxleft.com Nov 15, 2002
- 440 views
On Friday 15 November 2002 12:48 pm, you wrote: > > On 15 Nov 2002, at 9:29, Chris Saik wrote: > > Hello Eu folks! > > > > I am attempting to develop an application that will > > read records from a daily FTP feed, sort the records > > according to type, display pertinent information on > > each record, and send that information to subscribers > > based on their subscription options. > > > > <PRESOL> > > <DATE>1114 > > <YEAR>02 > > <snip> > > > There are 9 major template types, but since a lot of > > the fields are optional it boosts the number of > > different record variations to hundreds. The only > > tags that truly remain constant are the first and last > > (<PRESOL and </PRESOL> in the example above) and a few > > others strewn throughout the record. > > > > What is, in your opinion, the best way to read in all > > of these record variations so that I can easily work > > with and manage the data? Kat wrote: > Use gets() to read the file into a sequence, and then a simple loop thru > the sequence, to find the tags you are looking for. Almost:) gets() reads one line up to the c/r from a file. In order to get an entire file, which you might as well do if the files aren't really huge: -- (tested) atom fn object line, text fn = open("test.txt","r") text = {} -- start with empty buffer while 1 do -- loop forever line = gets(fn) -- read a line if atom(line) then exit -- get out of the forever loop! else text = append(text,line) -- add line to buffer end if end while for i = 1 to length(text) do -- iterate thru the buffer if match("<YEAR>",text[i]) then -- looking for the <YEAR> tag puts(1,text[i]) -- if tag is found, print the entire line end if end for Regards, Irv
5. Re: my dilemma -- hundreds of record types
- Posted by "C. K. Lester" <cklester at yahoo.com> Nov 15, 2002
- 413 views
> Each record has a corresponding template with variable > length fields. The fields are delimited by tags. Some > of the fields are optional, and thus even the tags > thenselves do not appear in those records that do not > use them. Here is a view of one of the smaller > templates, with sample information: Despite their being "optional," would it be possible that each template would store info from each field? If so, then you'll have one record type and a field indicating what template type it is. > There are 9 major template types, but since a lot of > the fields are optional it boosts the number of > different record variations to hundreds. Just because a field is optional doesn't mean you have to have a whole different record type for it. Just leave it blank.
6. Re: my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 15, 2002
- 451 views
<snip> > > Kat wrote: > > > Use gets() to read the file into a sequence, and > then a simple loop thru > > the sequence, to find the tags you are looking > for. > > Almost:) gets() reads one line up to the c/r from a > file. In order to get an > entire file, which you might as well do if the files > aren't really huge: > > -- (tested) > atom fn > object line, text > > fn = open("test.txt","r") > > text = {} -- start with empty buffer > while 1 do -- loop forever > line = gets(fn) -- read a line > if atom(line) then exit -- get out of the forever > loop! > else text = append(text,line) -- add line to > buffer > end if > end while > > for i = 1 to length(text) do -- iterate thru the > buffer > if match("<YEAR>",text[i]) then -- looking for the > <YEAR> tag > puts(1,text[i]) -- if tag is found, print the > entire line > end if > end for Thank you Irv. I'm confused over one thing though... what if the data following the tag is several lines long, as in a paragraph? How would I print the entire data within the field, and stop when the program reaches the next tag? Thanks for your assistance, Chris
7. Re: my dilemma -- hundreds of record types
- Posted by irv at take.maxleft.com Nov 15, 2002
- 426 views
On Friday 15 November 2002 04:30 pm, Chris wrote: > Thank you Irv. I'm confused over one thing though... > what if the data following the tag is several lines > long, as in a paragraph? How would I print the entire > data within the field, and stop when the program > reaches the next tag? Hi: One way would be to find the desired tag, then read lines until you hit the next < Of course, this won't work if there are < 's imbedded in the data field. -- tested object text --------------------------------- function LoadFile(sequence name) --------------------------------- integer fn object line fn = open(name,"r") text = {} while 1 do line = gets(fn) if atom(line) then exit else text = append(text,line) end if end while return text end function ----------------------------------------- function Extract(object tag, object text) ----------------------------------------- integer i object found found = "" i = 1 while i <= length(text) do -- iterate thru the buffer if match(tag,text[i]) then -- look for the <YEAR> tag found = text[i] -- store that line i += 1 --- go to next line while not match("<",text[i]) do -- look for < found &= text[i] -- if not there, add next line to buffer i += 1 -- go to next line end while else i += 1 end if end while return found end function -----------------------------[ MAIN ]------------------------------- text = LoadFile("test.txt") puts(1,Extract("<NAME>",text)) -- end Hope that helps Irv
8. Re: my dilemma -- hundreds of record types
- Posted by Kat <kat at kogeijin.com> Nov 15, 2002
- 422 views
On 15 Nov 2002, at 13:30, Chris Saik wrote: > > <snip> > > > > > Kat wrote: > > > > > Use gets() to read the file into a sequence, and > > then a simple loop thru > > > the sequence, to find the tags you are looking > > for. > > > > Almost:) gets() reads one line up to the c/r from a > > file. In order to get an > > entire file, which you might as well do if the files > > aren't really huge: > > > > -- (tested) > > atom fn > > object line, text > > > > fn = open("test.txt","r") > > > > text = {} -- start with empty buffer > > while 1 do -- loop forever > > line = gets(fn) -- read a line > > if atom(line) then exit -- get out of the forever > > loop! > > else text = append(text,line) -- add line to > > buffer > > end if > > end while > > > > for i = 1 to length(text) do -- iterate thru the > > buffer > > if match("<YEAR>",text[i]) then -- looking for the > > <YEAR> tag > > puts(1,text[i]) -- if tag is found, print the > > entire line > > end if > > end for > > Thank you Irv. I'm confused over one thing though... > what if the data following the tag is several lines > long, as in a paragraph? How would I print the entire > data within the field, and stop when the program > reaches the next tag? Yeas Irv, can you do it as easily as gets(), without using gets() ? Kat
9. Re: my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 15, 2002
- 441 views
I think I answered my own question. The FTP feed always begins new tags on a new line. I could just check the first character of the line that I'm currently reading in, and if it's a "<" then I can assume it's a new tag. Although, if someone used "<" within the data, and it just happened to be the first character of the line, then it wouldn't work... hmmm... > > Thank you Irv. I'm confused over one thing > though... > what if the data following the tag is several lines > long, as in a paragraph? How would I print the > entire > data within the field, and stop when the program > reaches the next tag? > > Thanks for your assistance, > > Chris > > > > > >
10. Re: my dilemma -- hundreds of record types
- Posted by jbrown105 at speedymail.org Nov 15, 2002
- 427 views
On 0, Kat <kat at kogeijin.com> wrote: > On 15 Nov 2002, at 13:30, Chris Saik wrote: > > > > > <snip> > > > > > > > > Kat wrote: > > > > > > > Use gets() to read the file into a sequence, and > > > then a simple loop thru > > > > the sequence, to find the tags you are looking > > > for. > > > > > > Almost:) gets() reads one line up to the c/r from a > > > file. In order to get an > > > entire file, which you might as well do if the files > > > aren't really huge: > > > > > > -- (tested) > > > atom fn > > > object line, text > > > > > > fn = open("test.txt","r") > > > > > > text = {} -- start with empty buffer > > > while 1 do -- loop forever > > > line = gets(fn) -- read a line > > > if atom(line) then exit -- get out of the forever > > > loop! > > > else text = append(text,line) -- add line to > > > buffer > > > end if > > > end while > > > > > > for i = 1 to length(text) do -- iterate thru the > > > buffer > > > if match("<YEAR>",text[i]) then -- looking for the > > > <YEAR> tag > > > puts(1,text[i]) -- if tag is found, print the > > > entire line > > > end if > > > end for > > > > Thank you Irv. I'm confused over one thing though... > > what if the data following the tag is several lines > > long, as in a paragraph? How would I print the entire > > data within the field, and stop when the program > > reaches the next tag? > > Yeas Irv, can you do it as easily as gets(), without using gets() ? > > Kat > I prefer this myself, actually: atom fn, char sequence text, line fn = open("test.txt","r") text = {} line = {} while 1 do char = getc(fn) if char = -1 then if length(line) then text &= {line} end if exit elsif char = '\n' then text &= {line&char} line = {} else line &= char end if end while for i = 1 to length(text) do if match("<YEAR>",text[i]) then puts(1,text[i]) end if end for --
11. Re: my dilemma -- hundreds of record types
- Posted by irv at take.maxleft.com Nov 15, 2002
- 426 views
- Last edited Nov 16, 2002
On Friday 15 November 2002 05:10 pm, you wrote: > > Thank you Irv. I'm confused over one thing > > though... > > what if the data following the tag is several lines > > long, as in a paragraph? How would I print the > > entire > > data within the field, and stop when the program > > reaches the next tag? That's fixable also: use wlldcard_match("*<*>*",line) to look for the first tag in the line. if found, then it will automatically be the first <tag> in the line, and the rest of the line can be considered data, even if it happens to have a < , a > or even both. If no tag is found, then it must be a continuation of data from the previous tag, and can just be tacked on. Regards, Irv
12. Re: my dilemma -- hundreds of record types
- Posted by Kat <kat at kogeijin.com> Nov 15, 2002
- 431 views
On 15 Nov 2002, at 14:10, Chris Saik wrote: > > I think I answered my own question. The FTP feed > always begins new tags on a new line. I could just > check the first character of the line that I'm > currently reading in, and if it's a "<" then I can > assume it's a new tag. > > Although, if someone used "<" within the data, and it > just happened to be the first character of the line, > then it wouldn't work... > > hmmm... This is why i used gets(), it breaks on the newline, and each field i added to the base sequence in a nested way, making searching the ftpdata for a specific tag easy as pie, yet keeps the fields separate. Little databases like this are so easy in Euphoria. Kat > > > > Thank you Irv. I'm confused over one thing > > though... > > what if the data following the tag is several lines > > long, as in a paragraph? How would I print the > > entire > > data within the field, and stop when the program > > reaches the next tag? > > > > Thanks for your assistance, > > > > Chris > > > > > > >
13. Re: my dilemma -- hundreds of record types
- Posted by tone.skoda at gmx.net Nov 16, 2002
- 451 views
I have written a HTML parser which has three events: on_start_tag, on_end_tag and on_data. it isnt on my web page or in archives yet. if you want it i can upload it? ----- Original Message ----- From: "Chris Saik" <csaik2002 at yahoo.com> Sent: Friday, November 15, 2002 10:30 PM Subject: Re: my dilemma -- hundreds of record types > > <snip> > > > > > Kat wrote: > > > > > Use gets() to read the file into a sequence, and > > then a simple loop thru > > > the sequence, to find the tags you are looking > > for. > > > > Almost:) gets() reads one line up to the c/r from a > > file. In order to get an > > entire file, which you might as well do if the files > > aren't really huge: > > > > -- (tested) > > atom fn > > object line, text > > > > fn = open("test.txt","r") > > > > text = {} -- start with empty buffer > > while 1 do -- loop forever > > line = gets(fn) -- read a line > > if atom(line) then exit -- get out of the forever > > loop! > > else text = append(text,line) -- add line to > > buffer > > end if > > end while > > > > for i = 1 to length(text) do -- iterate thru the > > buffer > > if match("<YEAR>",text[i]) then -- looking for the > > <YEAR> tag > > puts(1,text[i]) -- if tag is found, print the > > entire line > > end if > > end for > > Thank you Irv. I'm confused over one thing though... > what if the data following the tag is several lines > long, as in a paragraph? How would I print the entire > data within the field, and stop when the program > reaches the next tag? > > Thanks for your assistance, > > Chris > > > >
14. Re: my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 15, 2002
- 457 views
Hi C.K., I must have had a brain freeze... I was confused since each record may or may not have all of the tag names present. But now I think I'm on the right track, thanks to you and Kat. Chris --- "C. K. Lester" <cklester at yahoo.com> wrote: > > > Each record has a corresponding template with > variable > > length fields. The fields are delimited by tags. > Some > > of the fields are optional, and thus even the tags > > thenselves do not appear in those records that do > not > > use them. Here is a view of one of the smaller > > templates, with sample information: > > Despite their being "optional," would it be possible > that each template > would store info from each field? If so, then you'll > have one record type > and a field indicating what template type it is. > > > There are 9 major template types, but since a lot > of > > the fields are optional it boosts the number of > > different record variations to hundreds. > > Just because a field is optional doesn't mean you > have to have a whole > different record type for it. Just leave it blank. > > > > > >
15. Re: my dilemma -- hundreds of record types
- Posted by irv at take.maxleft.com Nov 15, 2002
- 428 views
- Last edited Nov 16, 2002
On Friday 15 November 2002 07:07 pm, you wrote: > > On Friday 15 November 2002 05:10 pm, you wrote: > > > Thank you Irv. I'm confused over one thing > > > though... > > > what if the data following the tag is several lines > > > long, as in a paragraph? How would I print the > > > entire > > > data within the field, and stop when the program > > > reaches the next tag? Forgot the akshul code: include wildcard.e object text --------------------------------- function LoadFile(sequence name) --------------------------------- integer fn object line fn = open(name,"r") text = {} while 1 do line = gets(fn) if atom(line) then exit else text = append(text,line) end if end while return text end function ----------------------------------------- function Extract(object tag, object text) ----------------------------------------- integer i object found found = "" i = 1 while i <= length(text) do -- iterate thru the buffer if match(tag,text[i]) then -- look for the <YEAR> tag found = text[i] -- store that line i += 1 --- go to next line while i <= length(text) and not wildcard_match("*<*>*",text[i]) do found &= text[i] i += 1 if i > length(text) then exit end if end while else i += 1 end if end while return found end function text = LoadFile("test.txt") puts(1,Extract("<ADDR>",text)) --- end code: --- here's the test file: <DATE> 11/15/2002 <NAME> Irv Mullins <esq> <ADDR>1234 Fifth St Joizey City <STATE>NJ >!! <!! <PHONE>555-1212
16. Re: my dilemma -- hundreds of record types
- Posted by Chris Saik <csaik2002 at yahoo.com> Nov 18, 2002
- 429 views
Hello Tone, Yes please, I would really like to get the HTML parser you wrote! And thanks to everyone else who responded to my questions. It's been a great learning experience for me! Chris --- tone.skoda at gmx.net wrote: > > I have written a HTML parser which has three events: > on_start_tag, > on_end_tag and on_data. it isnt on my web page or in > archives yet. if you > want it i can upload it? > ----- Original Message ----- > From: "Chris Saik" <csaik2002 at yahoo.com> > Sent: Friday, November 15, 2002 10:30 PM > Subject: Re: my dilemma -- hundreds of record types > > > > <snip> > > > > > > > > Kat wrote: > > > > > > > Use gets() to read the file into a sequence, > and > > > then a simple loop thru > > > > the sequence, to find the tags you are looking > > > for. > > > > > > Almost:) gets() reads one line up to the c/r > from a > > > file. In order to get an > > > entire file, which you might as well do if the > files > > > aren't really huge: > > > > > > -- (tested) > > > atom fn > > > object line, text > > > > > > fn = open("test.txt","r") > > > > > > text = {} -- start with empty buffer > > > while 1 do -- loop forever > > > line = gets(fn) -- read a line > > > if atom(line) then exit -- get out of the > forever > > > loop! > > > else text = append(text,line) -- add line to > > > buffer > > > end if > > > end while > > > > > > for i = 1 to length(text) do -- iterate thru the > > > buffer > > > if match("<YEAR>",text[i]) then -- looking for > the > > > <YEAR> tag > > > puts(1,text[i]) -- if tag is found, print > the > > > entire line > > > end if > > > end for > > > > Thank you Irv. I'm confused over one thing > though... > > what if the data following the tag is several > lines > > long, as in a paragraph? How would I print the > entire > > data within the field, and stop when the program > > reaches the next tag? > > > > Thanks for your assistance, > > > > Chris > > > > > > > >