1. Anyone who uses (has used) strTok by Kat
- Posted by John F Dutcher <John_Dutcher at urmc.rochester.edu> Nov 18, 2004
- 679 views
I have the luxury of a mainframe to analyze files I build or revise in Euphoria for correctness. I tried "strTok" to parse a file, find two matching tokens received from an input screen....and make a change to one "field" in the parsed input, all the time "deparsing" and writing all records to a new output file. This is followed by uploading both files to the mainframe to compare large files with a utility. Parse/deparse works so beautifully .. except where multiple commas (representing "missing" values) occur in the input. Here the parse/deparse does not "hold" the consecutive commas when "deparsed" and they are lost upon output. Has anyone dealt with this easily ?? Now that my sequence comparison education has been upgraded...I will re-try Derek's sub-routine for the same file exercise.
2. Re: Anyone who uses (has used) strTok by Kat
- Posted by Andy Serpa <ac at onehorseshy.com> Nov 18, 2004
- 684 views
John F Dutcher wrote: > > > I have the luxury of a mainframe to analyze files I build or revise > in Euphoria for correctness. I tried "strTok" to parse a file, find two > matching tokens received from an input screen....and make a change to > one "field" in the parsed input, all the time "deparsing" and writing > all records to a new output file. This is followed by uploading both > files to the mainframe to compare large files with a utility. > > Parse/deparse works so beautifully .. except where multiple commas > (representing "missing" values) occur in the input. Here the > parse/deparse does not "hold" the consecutive commas when "deparsed" > and they are lost upon output. > > Has anyone dealt with this easily ?? > > Now that my sequence comparison education has been upgraded...I will > re-try Derek's sub-routine for the same file exercise. > Here is the parse() function in its barest form (strtok version is fancier): global function parse(sequence s, integer c) integer slen, spt, flag sequence parsed parsed = {} slen = length(s) spt = 1 flag = 0 for i = 1 to slen do if s[i] = c then if flag = 1 then parsed = append(parsed,s[spt..i-1]) flag = 0 spt = i+1 else spt += 1 end if else flag = 1 end if end for if flag = 1 then parsed = append(parsed,s[spt..slen]) end if return parsed end function parse() does not perserve empty elements between delimiters. Following is the explode() function, which does: global function explode(sequence s, integer c) integer slen, spt, flag sequence exploded -- parse by delimiter, perserve blanks exploded = {} slen = length(s) spt = 1 for i = 1 to slen do if s[i] = c then exploded = append(exploded,s[spt..i-1]) spt = i+1 end if end for exploded = append(exploded,s[spt..slen]) return exploded end function Both of these assume that you are parsing strings with single-character delimiters...
3. Re: Anyone who uses (has used) strTok by Kat
- Posted by "Unkmar" <L3Euphoria at bellsouth.net> Nov 18, 2004
- 679 views
- Last edited Nov 19, 2004
If you are talking about handling CSV data then my CSV lib handles that without a problem. unkmar ----- Original Message ----- From: "John F Dutcher" <guest at RapidEuphoria.com> To: <EUforum at topica.com> Sent: Thursday, November 18, 2004 11:19 AM Subject: Anyone who uses (has used) strTok by Kat > > > posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu> > > > I have the luxury of a mainframe to analyze files I build or revise > in Euphoria for correctness. I tried "strTok" to parse a file, find two > matching tokens received from an input screen....and make a change to > one "field" in the parsed input, all the time "deparsing" and writing > all records to a new output file. This is followed by uploading both > files to the mainframe to compare large files with a utility. > > Parse/deparse works so beautifully .. except where multiple commas > (representing "missing" values) occur in the input. Here the > parse/deparse does not "hold" the consecutive commas when "deparsed" > and they are lost upon output. > > Has anyone dealt with this easily ?? > > Now that my sequence comparison education has been upgraded...I will > re-try Derek's sub-routine for the same file exercise. > > > >
4. Re: Anyone who uses (has used) strTok by Kat
- Posted by John F Dutcher <John_Dutcher at urmc.rochester.edu> Nov 18, 2004
- 687 views
- Last edited Nov 19, 2004
It's certainly true that "explode" nicely allows the preservation of the multiple delimting commas in its returned value. To no ones surprise if I write the exploded record to the output file after correcting a "found" sequence within it....I get a conspicuous Euphoria like nested sequence of atomic values. Is there an equivalent of something like "implode" to easily remove the braces provided by "explode" so that the corrected record can be written to the output file looking like the others ??
5. Re: Anyone who uses (has used) strTok by Kat
- Posted by "Kat" <gertie at visionsix.com> Nov 18, 2004
- 689 views
- Last edited Nov 19, 2004
On 18 Nov 2004, at 8:19, John F Dutcher wrote: > > > posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu> > > > I have the luxury of a mainframe to analyze files I build or revise > in Euphoria for correctness. I tried "strTok" to parse a file, find two > matching tokens received from an input screen....and make a change to > one "field" in the parsed input, all the time "deparsing" and writing > all records to a new output file. This is followed by uploading both > files to the mainframe to compare large files with a utility. > > Parse/deparse works so beautifully .. except where multiple commas > (representing "missing" values) occur in the input. Here the > parse/deparse does not "hold" the consecutive commas when "deparsed" > and they are lost upon output. > > Has anyone dealt with this easily ?? Actually, like the name of the lib implies, possibly too subtly, it was written for string processing primarily. As in natural language processing. One easy way around it is to put in something that signifies nothing, like: string2 = parse(replace(string1,",,"," "),",") so if string1 = {the,tall,,kat} string2 = {"the","tall"," ","kat"} Choose a replacement for "" that is below your lowest valid token start character (or higher), for best results in sorttok(), depending on if you want blanks sorted above or below non-empty fields. i use this replace code currently :
function replace(sequence st,sequence old_ch,sequence new_ch) integer k sequence newst, old_ch1, old_ch2 if match("*",old_ch) then old_ch1 = old_ch[1..match("*",old_ch)-1] old_ch2 = old_ch[match("*",old_ch)+1..length(old_ch)] k = match(upper(old_ch1),upper(st)) newst = "" while k and match(upper(old_ch2),upper(st)) do newst = newst & st[1..k-1] & new_ch st = st[k..length(st)] st = st[match(upper(old_ch2),upper(st))+length(old_ch2)..length(st)] k = match(upper(old_ch1),upper(st)) end while newst = newst & st return newst else k = match(upper(old_ch),upper(st)) newst = "" while k do newst = newst & st[1..k-1] & new_ch st = st[k+length(old_ch)..length(st)] k = match(upper(old_ch),upper(st)) end while newst = newst & st return newst end if end function
I had considered adding db-types processing to strtok, or regex code, but was waiting for the user demand. The work-around for not haveing db-specific code in strtok are easy enough, i think, but i could be convinced to add to it. The lawyer said today if i am convicted, he will appeal for free and make out my will for free. Kat
6. Re: Anyone who uses (has used) strTok by Kat
- Posted by "Kat" <gertie at visionsix.com> Nov 19, 2004
- 676 views
On 18 Nov 2004, at 13:11, John F Dutcher wrote: > > > posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu> > > It's certainly true that "explode" nicely allows the preservation of the > multiple delimting commas in its returned value. > > To no ones surprise if I write the exploded record to the output file > after correcting a "found" sequence within it....I get a conspicuous > Euphoria like nested sequence of atomic values. > > Is there an equivalent of something like "implode" to easily remove the > braces provided by "explode" so that the corrected record can be > written to the output file looking like the others ?? If there is no problem with the solution i provided for parse() earlier, then do something like this: pick a char not in the data, like "_", or 1. (i have a routine for that, but i do natural language parsing) when parsing, replace the ,, with ,_, or ,1, then after deparse, delete that artificial blank field token: string2 = replace(deparse(string1,","),"_","") or string2 = replace(deparse(string1,","),1,"") presto, all done. Kat