1. Anyone who uses (has used) strTok by Kat
I have the luxury of a mainframe to analyze files I build or revise
in Euphoria for correctness. I tried "strTok" to parse a file, find two
matching tokens received from an input screen....and make a change to
one "field" in the parsed input, all the time "deparsing" and writing
all records to a new output file. This is followed by uploading both
files to the mainframe to compare large files with a utility.
Parse/deparse works so beautifully .. except where multiple commas
(representing "missing" values) occur in the input. Here the
parse/deparse does not "hold" the consecutive commas when "deparsed"
and they are lost upon output.
Has anyone dealt with this easily ??
Now that my sequence comparison education has been upgraded...I will
re-try Derek's sub-routine for the same file exercise.
2. Re: Anyone who uses (has used) strTok by Kat
John F Dutcher wrote:
>
>
> I have the luxury of a mainframe to analyze files I build or revise
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
>
> Parse/deparse works so beautifully .. except where multiple commas
> (representing "missing" values) occur in the input. Here the
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
>
> Has anyone dealt with this easily ??
>
> Now that my sequence comparison education has been upgraded...I will
> re-try Derek's sub-routine for the same file exercise.
>
Here is the parse() function in its barest form (strtok version is fancier):
global function parse(sequence s, integer c)
integer slen, spt, flag
sequence parsed
parsed = {}
slen = length(s)
spt = 1
flag = 0
for i = 1 to slen do
if s[i] = c then
if flag = 1 then
parsed = append(parsed,s[spt..i-1])
flag = 0
spt = i+1
else
spt += 1
end if
else
flag = 1
end if
end for
if flag = 1 then
parsed = append(parsed,s[spt..slen])
end if
return parsed
end function
parse() does not perserve empty elements between delimiters. Following is the
explode() function, which does:
global function explode(sequence s, integer c)
integer slen, spt, flag
sequence exploded
-- parse by delimiter, perserve blanks
exploded = {}
slen = length(s)
spt = 1
for i = 1 to slen do
if s[i] = c then
exploded = append(exploded,s[spt..i-1])
spt = i+1
end if
end for
exploded = append(exploded,s[spt..slen])
return exploded
end function
Both of these assume that you are parsing strings with single-character
delimiters...
3. Re: Anyone who uses (has used) strTok by Kat
- Posted by "Unkmar" <L3Euphoria at bellsouth.net>
Nov 18, 2004
-
Last edited Nov 19, 2004
If you are talking about handling CSV data then my CSV lib handles that
without a problem.
unkmar
----- Original Message -----
From: "John F Dutcher" <guest at RapidEuphoria.com>
To: <EUforum at topica.com>
Sent: Thursday, November 18, 2004 11:19 AM
Subject: Anyone who uses (has used) strTok by Kat
>
>
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
>
>
> I have the luxury of a mainframe to analyze files I build or revise
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
>
> Parse/deparse works so beautifully .. except where multiple commas
> (representing "missing" values) occur in the input. Here the
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
>
> Has anyone dealt with this easily ??
>
> Now that my sequence comparison education has been upgraded...I will
> re-try Derek's sub-routine for the same file exercise.
>
>
>
>
4. Re: Anyone who uses (has used) strTok by Kat
It's certainly true that "explode" nicely allows the preservation of the
multiple delimting commas in its returned value.
To no ones surprise if I write the exploded record to the output file
after correcting a "found" sequence within it....I get a conspicuous
Euphoria like nested sequence of atomic values.
Is there an equivalent of something like "implode" to easily remove the
braces provided by "explode" so that the corrected record can be
written to the output file looking like the others ??
5. Re: Anyone who uses (has used) strTok by Kat
- Posted by "Kat" <gertie at visionsix.com>
Nov 18, 2004
-
Last edited Nov 19, 2004
On 18 Nov 2004, at 8:19, John F Dutcher wrote:
>
>
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
>
>
> I have the luxury of a mainframe to analyze files I build or revise
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
>
> Parse/deparse works so beautifully .. except where multiple commas
> (representing "missing" values) occur in the input. Here the
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
>
> Has anyone dealt with this easily ??
Actually, like the name of the lib implies, possibly too subtly, it was written
for string processing primarily. As in natural language processing. One easy
way around it is to put in something that signifies nothing, like:
string2 = parse(replace(string1,",,"," "),",")
so if
string1 = {the,tall,,kat}
string2 = {"the","tall"," ","kat"}
Choose a replacement for "" that is below your lowest valid token start
character (or higher), for best results in sorttok(), depending on if you want
blanks sorted above or below non-empty fields.
i use this replace code currently :
function replace(sequence st,sequence old_ch,sequence new_ch)
integer k
sequence newst, old_ch1, old_ch2
if match("*",old_ch)
then
old_ch1 = old_ch[1..match("*",old_ch)-1]
old_ch2 = old_ch[match("*",old_ch)+1..length(old_ch)]
k = match(upper(old_ch1),upper(st))
newst = ""
while k and match(upper(old_ch2),upper(st)) do
newst = newst & st[1..k-1] & new_ch
st = st[k..length(st)]
st = st[match(upper(old_ch2),upper(st))+length(old_ch2)..length(st)]
k = match(upper(old_ch1),upper(st))
end while
newst = newst & st
return newst
else
k = match(upper(old_ch),upper(st))
newst = ""
while k do
newst = newst & st[1..k-1] & new_ch
st = st[k+length(old_ch)..length(st)]
k = match(upper(old_ch),upper(st))
end while
newst = newst & st
return newst
end if
end function
I had considered adding db-types processing to strtok, or regex code, but
was waiting for the user demand. The work-around for not haveing db-specific
code in strtok are easy enough, i think, but i could be convinced to add to it.
The lawyer said today if i am convicted, he will appeal for free and make out
my will for free.
Kat
6. Re: Anyone who uses (has used) strTok by Kat
On 18 Nov 2004, at 13:11, John F Dutcher wrote:
>
>
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
>
> It's certainly true that "explode" nicely allows the preservation of the
> multiple delimting commas in its returned value.
>
> To no ones surprise if I write the exploded record to the output file
> after correcting a "found" sequence within it....I get a conspicuous
> Euphoria like nested sequence of atomic values.
>
> Is there an equivalent of something like "implode" to easily remove the
> braces provided by "explode" so that the corrected record can be
> written to the output file looking like the others ??
If there is no problem with the solution i provided for parse() earlier, then do
something like this:
pick a char not in the data, like "_", or 1.
(i have a routine for that, but i do natural language parsing)
when parsing, replace the ,, with ,_, or ,1,
then after deparse, delete that artificial blank field token:
string2 = replace(deparse(string1,","),"_","")
or
string2 = replace(deparse(string1,","),1,"")
presto, all done.
Kat