1. Anyone who uses (has used) strTok by Kat

I have the luxury of a mainframe to analyze files I build or revise 
in Euphoria for correctness. I tried "strTok" to parse a file, find two
matching tokens received from an input screen....and make a change to
one "field" in the parsed input, all the time "deparsing" and writing
all records to a new output file. This is followed by uploading both
files to the mainframe to compare large files with a utility.

Parse/deparse works so beautifully  ..  except where multiple commas
(representing "missing" values) occur in the input. Here the 
parse/deparse does not "hold" the consecutive commas when "deparsed"
and they are lost upon output.

Has anyone dealt with this easily ?? 

Now that my sequence comparison education has been upgraded...I will
re-try Derek's sub-routine for the same file exercise.

new topic     » topic index » view message » categorize

2. Re: Anyone who uses (has used) strTok by Kat

John F Dutcher wrote:
> 
> 
> I have the luxury of a mainframe to analyze files I build or revise 
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
> 
> Parse/deparse works so beautifully  ..  except where multiple commas
> (representing "missing" values) occur in the input. Here the 
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
> 
> Has anyone dealt with this easily ?? 
> 
> Now that my sequence comparison education has been upgraded...I will
> re-try Derek's sub-routine for the same file exercise.
> 
Here is the parse() function in its barest form (strtok version is fancier):


global function parse(sequence s, integer c)
integer slen, spt, flag
sequence parsed

    parsed = {}
    slen = length(s)

    spt = 1
    flag = 0
    for i = 1 to slen do
        if s[i] = c then
            if flag = 1 then
                parsed = append(parsed,s[spt..i-1])
                flag = 0
                spt = i+1
            else
                spt += 1
            end if
        else
            flag = 1
        end if
    end for
    if flag = 1 then
        parsed = append(parsed,s[spt..slen])
    end if
    return parsed
end function

parse() does not perserve empty elements between delimiters.  Following is the
explode() function, which does:

global function explode(sequence s, integer c)
integer slen, spt, flag
sequence exploded
-- parse by delimiter, perserve blanks

    exploded = {}
    slen = length(s)

    spt = 1
    for i = 1 to slen do
        if s[i] = c then
            exploded = append(exploded,s[spt..i-1])
            spt = i+1
        end if
    end for
    exploded = append(exploded,s[spt..slen])

    return exploded
end function

Both of these assume that you are parsing strings with single-character
delimiters...

new topic     » goto parent     » topic index » view message » categorize

3. Re: Anyone who uses (has used) strTok by Kat

If you are talking about handling CSV data then my CSV lib handles that
without a problem.

    unkmar

----- Original Message ----- 
From: "John F Dutcher" <guest at RapidEuphoria.com>
To: <EUforum at topica.com>
Sent: Thursday, November 18, 2004 11:19 AM
Subject: Anyone who uses (has used) strTok by Kat


> 
> 
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
> 
> 
> I have the luxury of a mainframe to analyze files I build or revise 
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
> 
> Parse/deparse works so beautifully  ..  except where multiple commas
> (representing "missing" values) occur in the input. Here the 
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
> 
> Has anyone dealt with this easily ?? 
> 
> Now that my sequence comparison education has been upgraded...I will
> re-try Derek's sub-routine for the same file exercise.
> 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: Anyone who uses (has used) strTok by Kat

It's certainly true that "explode" nicely allows the preservation of the 
multiple delimting commas in its returned value.

To no ones surprise if I write the exploded record to the output file
after correcting a "found" sequence within it....I get a conspicuous
Euphoria like nested sequence of atomic values.

Is there an equivalent of something like "implode" to easily remove the
braces provided by "explode" so that the corrected record can be 
written to the output file looking like the others ??

new topic     » goto parent     » topic index » view message » categorize

5. Re: Anyone who uses (has used) strTok by Kat

On 18 Nov 2004, at 8:19, John F Dutcher wrote:

> 
> 
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
> 
> 
> I have the luxury of a mainframe to analyze files I build or revise 
> in Euphoria for correctness. I tried "strTok" to parse a file, find two
> matching tokens received from an input screen....and make a change to
> one "field" in the parsed input, all the time "deparsing" and writing
> all records to a new output file. This is followed by uploading both
> files to the mainframe to compare large files with a utility.
> 
> Parse/deparse works so beautifully  ..  except where multiple commas
> (representing "missing" values) occur in the input. Here the 
> parse/deparse does not "hold" the consecutive commas when "deparsed"
> and they are lost upon output.
> 
> Has anyone dealt with this easily ?? 

Actually, like the name of the lib implies, possibly too subtly, it was written 
for string processing primarily. As in natural language processing. One easy 
way around it is to put in something that signifies nothing, like:

string2 = parse(replace(string1,",,"," "),",")

so if
string1 = {the,tall,,kat}
string2 = {"the","tall"," ","kat"}

Choose a replacement for "" that is below your lowest valid token start 
character (or higher), for best results in sorttok(), depending on if you want 
blanks sorted above or below non-empty fields.

i use this replace code currently :

function replace(sequence st,sequence old_ch,sequence new_ch)
    integer k
    sequence newst, old_ch1, old_ch2

if match("*",old_ch)
  then

    old_ch1 = old_ch[1..match("*",old_ch)-1]
    old_ch2 = old_ch[match("*",old_ch)+1..length(old_ch)]

    k = match(upper(old_ch1),upper(st))
    newst = ""
    while k and match(upper(old_ch2),upper(st)) do
      newst = newst & st[1..k-1] & new_ch
      st = st[k..length(st)]
      st = st[match(upper(old_ch2),upper(st))+length(old_ch2)..length(st)]
      k = match(upper(old_ch1),upper(st))
    end while
    newst = newst & st
    return newst

  else

    k = match(upper(old_ch),upper(st))
    newst = ""
    while k do
      newst = newst & st[1..k-1] & new_ch
      st = st[k+length(old_ch)..length(st)]
      k = match(upper(old_ch),upper(st))
    end while
    newst = newst & st
    return newst

  end if
end function



I had considered adding db-types processing to strtok, or regex code, but 
was waiting for the user demand. The work-around for not haveing db-specific 
code in strtok are easy enough, i think, but i could be convinced to add to it. 
The lawyer said today if i am convicted, he will appeal for free and make out 
my will for free.

Kat

new topic     » goto parent     » topic index » view message » categorize

6. Re: Anyone who uses (has used) strTok by Kat

On 18 Nov 2004, at 13:11, John F Dutcher wrote:

> 
> 
> posted by: John F Dutcher <John_Dutcher at urmc.rochester.edu>
> 
> It's certainly true that "explode" nicely allows the preservation of the 
> multiple delimting commas in its returned value.
> 
> To no ones surprise if I write the exploded record to the output file
> after correcting a "found" sequence within it....I get a conspicuous
> Euphoria like nested sequence of atomic values.
> 
> Is there an equivalent of something like "implode" to easily remove the
> braces provided by "explode" so that the corrected record can be 
> written to the output file looking like the others ??

If there is no problem with the solution i provided for parse() earlier, then do
something like this:

pick a char not in the data, like "_", or 1. 
(i have a routine for that, but i do natural language parsing)
when parsing, replace the ,, with ,_, or ,1,
then after deparse, delete that artificial blank field token:

string2 = replace(deparse(string1,","),"_","")
or
string2 = replace(deparse(string1,","),1,"")

presto, all done.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu