1. help with storing user input

This is a multi-part message in MIME format.

------=_NextPart_000_0005_01C31A7C.B6E0D280
	charset="iso-8859-1"

Hello,
What would be an efficient way of seperating words in a user inputted =
sentence? For example to break apart the words in a sentence. Im =
specifically trying to develop a GOOD algoritm to seperate words from =
user inputed sentence and store them as individual sequences.Like:
user input:"mary had a little lamb"=20
results:sequence 1st_sentence=3D{"mary","had","a","little","lamb")
Im having difficulties skipping whitespaces and converting to string

how would euphoria do this:?

while not end of line
    get one letter at a time untill you see a whitespace
    store all letters previous to the encountered whitespace in =
1st_sentence
    get one letter at a time untill you encounter a whitespace
    append 1st_sentence with all letters previous to the whitespace as a =
new sequence, but not the       letters previous to the first whitespace
  end while


Can someone please help?
Thanks!!




------=_NextPart_000_0005_01C31A7C.B6E0D280
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.3103.1000" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Hello,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>What would be an efficient way of =
seperating words=20
in a user inputted sentence? For example to break apart the words in a =
sentence.=20
Im specifically trying to develop a GOOD algoritm to seperate words from =
user=20
inputed sentence and store them as individual =
sequences.Like:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>user input:"mary had a little lamb" =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>results:sequence=20
1st_sentence=3D{"mary","had","a","little","lamb")</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Im having difficulties skipping =
whitespaces and=20
converting to string</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>how would euphoria do =
this:?</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>while not end of line</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; get one letter at a =
time untill=20
you see a whitespace</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; store all letters =
previous to=20
the encountered whitespace in 1st_sentence</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; get one letter at a =
time untill=20
you encounter a whitespace</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; append 1st_sentence =
with all=20
letters previous to the whitespace as a new sequence, but not=20
the&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; letters previous to the first=20
whitespace</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp; end while</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Can someone please help?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Thanks!!</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV>&nbsp;</DIV>

------=_NextPart_000_0005_01C31A7C.B6E0D280--

new topic     » topic index » view message » categorize

2. Re: help with storing user input

----- Original Message -----
From: "Jason Dube" <dubetyrant at hotmail.com>
To: "EUforum" <EUforum at topica.com>
Subject: help with storing user input


>
> Hello,
> What would be an efficient way of seperating words in a user inputted
sentence? For example to break apart the words in a sentence. Im
specifically trying to develop a GOOD algoritm to seperate words from user
inputed sentence and store them as individual sequences.Like:
> user input:"mary had a little lamb"
> results:sequence 1st_sentence={"mary","had","a","little","lamb")
> Im having difficulties skipping whitespaces and converting to string
>
> how would euphoria do this:?
Here are a couple of routines that I use...


-------------------------------------
global function Tokenize(sequence pText, object pWhiteSpace, object
pNonword,
                                          object pQuotes) --> sequence
-------------------------------------
-- pText is returned as a sequence of 'words'.
-- Each word is delimited by a set of one or more Delimiters


    sequence lTokens
    integer lStartQuote, lEndQuote
    integer lTextLength
    integer lStart
    integer lPos

    -- Validate whitespace parameter
    if atom(pWhiteSpace) then
        if pWhiteSpace = 0 then
            pWhiteSpace = ' ' & 8 & 9 & 10 & 11 & 12 & 13
        else
            pWhiteSpace = {pWhiteSpace}
        end if
    end if

    -- Validate non-word parameter
    if atom(pNonword) then
        if pNonword = 0 then
            pNonword = "`~!@#$%^&*()_-+={[}]|\\:;\"'<,>.?/"
        else
            pNonword = {pNonword}
        end if
    end if

    -- Validate quote marks parameter
    if     sequence(pQuotes) then
        if length(pQuotes) = 0 then
            pQuotes = {{},{},{},{},{}}
        elsif (length(pQuotes) != 5
                or
               atom(pQuotes[1])
                or
               atom(pQuotes[2])
                or
               atom(pQuotes[3])
                or
               length(pQuotes[1]) != length(pQuotes[2])
                or
               atom(pQuotes[4])
                or
               atom(pQuotes[5])
                or
               length(pQuotes[4]) != length(pQuotes[5])
            )
        then
            pQuotes = 0
        end if
    end if

    if atom(pQuotes) then
        if pQuotes = 0 then
            pQuotes = {"\"'`", "\"'`", "\\~","",""}
        else
            pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
        end if
    end if

    -- Initialize
    lTokens = {}
    lStart = 0
    lStartQuote = 0
    lEndQuote = 0
    for i = 1 to length(pText) do
        if lStartQuote != 0 then
            if pText[i] = lEndQuote then
                if find(pText[i - 1], pQuotes[3]) then
                    if i > 2 and find(pText[i - 2], pQuotes[3]) then
                        lTokens = append(lTokens, pText[lStart .. i - 1])
                        lStart = 0
                        lStartQuote = 0
                        lEndQuote = 0
                    end if
                 else
                    lTokens = append(lTokens, pText[lStart .. i - 1])
                    lStart = 0
                    lStartQuote = 0
                    lEndQuote = 0
                 end if
            end if
        else
            lPos =  find(pText[i], pQuotes[1])
            if lPos != 0 then
                lStartQuote = lPos
                lStart = i + 1
                lEndQuote = pQuotes[2][lPos]
            elsif find( pText[i], pWhiteSpace ) then
                if lStart != 0
                then
                    lTokens = append(lTokens, pText[lStart .. i - 1])
                    lStart = 0
                end if
            else
                if lStart = 0
                then
                    lStart = i
                end if

                if find(pText[i], pNonword) > 0
                then
                    if lStart != 0
                    then
                        -- Avoid empty tokens
                        if lStart != i then
                            lTokens = append(lTokens, pText[lStart .. i -
1])
                        end if
                        lStart = 0
                    end if

                    lTokens = append(lTokens, {pText[i]})
                    lStart = 0
                end if
            end if
        end if
    end for

    if lStart != 0
    then
        lTokens = append(lTokens, pText[lStart .. length(pText)])
        lStart = 0
    end if

    return lTokens
end function

-------------------------------------
global function SimpleTokenize(sequence s, object c)
-------------------------------------
-- Returns 's', as a number of words delimited by one or more 'c' objects
    integer slen, spt, i
    sequence parsed

    parsed = {}
    slen = length(s)
    spt = 1

    i = 1
    while i <= slen do
        while i <= slen and equal(s[i], c) do
            i += 1
        end while
        spt = i
        while i <= slen and not equal(s[i],c) do
            i += 1
        end while
        parsed = append(parsed,s[spt..i-1])
        i += 1
    end while

    return parsed
end function


----------------
cheers,
Derek Parnell

new topic     » goto parent     » topic index » view message » categorize

3. Re: help with storing user input

wow! Thank you! Definately gonna look at this close. I'd like to try to use 
it in my program, if I have some questions about it, is it ok if I ask?


>From: Derek Parnell <ddparnell at bigpond.com>
>Subject: Re: help with storing user input
>
>
>----- Original Message -----
>From: "Jason Dube" <dubetyrant at hotmail.com>
>To: "EUforum" <EUforum at topica.com>
>Subject: help with storing user input
>
>
> > Hello,
> > What would be an efficient way of seperating words in a user inputted
>sentence? For example to break apart the words in a sentence. Im
>specifically trying to develop a GOOD algoritm to seperate words from user
>inputed sentence and store them as individual sequences.Like:
> > user input:"mary had a little lamb"
> > results:sequence 1st_sentence={"mary","had","a","little","lamb")
> > Im having difficulties skipping whitespaces and converting to string
> >
> > how would euphoria do this:?
>Here are a couple of routines that I use...
>
>
>-- pText is returned as a sequence of 'words'.
>-- Each word is delimited by a set of one or more Delimiters
>
>
>     sequence lTokens
>     integer lStartQuote, lEndQuote
>     integer lTextLength
>     integer lStart
>     integer lPos
>
>     -- Validate whitespace parameter
>     if atom(pWhiteSpace) then
>         if pWhiteSpace = 0 then
>             pWhiteSpace = ' ' & 8 & 9 & 10 & 11 & 12 & 13
>         else
>             pWhiteSpace = {pWhiteSpace}
>         end if
>     end if
>
>     -- Validate non-word parameter
>     if atom(pNonword) then
>         if pNonword = 0 then
>             pNonword = "`~!@#$%^&*()_-+={[}]|\\:;\"'<,>.?/"
>         else
>             pNonword = {pNonword}
>         end if
>     end if
>
>     -- Validate quote marks parameter
>     if     sequence(pQuotes) then
>         if length(pQuotes) = 0 then
>             pQuotes = {{},{},{},{},{}}
>         elsif (length(pQuotes) != 5
>                 or
>                atom(pQuotes[1])
>                 or
>                atom(pQuotes[2])
>                 or
>                atom(pQuotes[3])
>                 or
>                length(pQuotes[1]) != length(pQuotes[2])
>                 or
>                atom(pQuotes[4])
>                 or
>                atom(pQuotes[5])
>                 or
>                length(pQuotes[4]) != length(pQuotes[5])
>             )
>         then
>             pQuotes = 0
>         end if
>     end if
>
>     if atom(pQuotes) then
>         if pQuotes = 0 then
>             pQuotes = {"\"'`", "\"'`", "\\~","",""}
>         else
>             pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
>         end if
>     end if
>
>     -- Initialize
>     lTokens = {}
>     lStart = 0
>     lStartQuote = 0
>     lEndQuote = 0
>     for i = 1 to length(pText) do
>         if lStartQuote != 0 then
>             if pText[i] = lEndQuote then
>                 if find(pText[i - 1], pQuotes[3]) then
>                     if i > 2 and find(pText[i - 2], pQuotes[3]) then
>                         lTokens = append(lTokens, pText[lStart .. i - 1])
>                         lStart = 0
<snip>

>
>

new topic     » goto parent     » topic index » view message » categorize

4. Re: help with storing user input

On 15 May 2003, at 0:55, Jason Dube wrote:

> 
> Hello,
> What would be an efficient way of seperating words in a user inputted
> sentence?
> For example to break apart the words in a sentence. Im specifically trying to
> develop a GOOD algoritm to seperate words from user inputed sentence and store
> them as individual sequences.Like: user input:"mary had a little lamb"
> results:sequence 1st_sentence={"mary","had","a","little","lamb") Im having
> difficulties skipping whitespaces and converting to string
> 
> how would euphoria do this:?

parsedline = parse(input," ")

new topic     » goto parent     » topic index » view message » categorize

5. Re: help with storing user input

On 15 May 2003, at 2:37, gertie at visionsix.com wrote:

> 
> On 15 May 2003, at 0:55, Jason Dube wrote:
> 
> > 
> > Hello,
> > What would be an efficient way of seperating words in a user inputted
> > sentence? For example to break apart the words in a sentence. Im
> > specifically
> > trying to develop a GOOD algoritm to seperate words from user inputed
> > sentence
> > and store them as individual sequences.Like: user input:"mary had a little
> > lamb" results:sequence 1st_sentence={"mary","had","a","little","lamb") Im
> > having difficulties skipping whitespaces and converting to string
> > 
> > how would euphoria do this:?
> 
> parsedline = parse(input," ")

You can also do:
parsedline = parse(input," ,.;:'")

or other punctuation. Problem with some is in math, like "1,234.5" , with the 
comma and period, or ""blah", he said sadly"" becomes 
{"blah","he","said","sadly"} which carries much less info.

Kat

new topic     » goto parent     » topic index » view message » categorize

6. Re: help with storing user input

On Thu, 15 May 2003 16:13:05 +1000, Derek Parnell
<ddparnell at bigpond.com> wrote:

<snip>

Interesting, two quick points:
>                if find(pText[i], pNonword) > 0
>                then
>                    if lStart !=3D 0
>                    then
>                        -- Avoid empty tokens
>                        if lStart !=3D i then
>                            lTokens =3D append(lTokens, pText[lStart .. =
i -
>1])
>                        end if
>                        lStart =3D 0
>                    end if
>
^^^^^ it looks to me an "else" has gone walkabouts here.
>                    lTokens =3D append(lTokens, {pText[i]})
>                    lStart =3D 0
>                end if
>            end if
>        end if
>    end for

2) I can't see they are used, what were pQuotes[4]&[5] supposed to be
for? Just curious.

Pete

new topic     » goto parent     » topic index » view message » categorize

7. Re: help with storing user input

----- Original Message -----
From: "Pete Lomax" <petelomax at blueyonder.co.uk>
To: "EUforum" <EUforum at topica.com>
Subject: Re: help with storing user input


>
> On Thu, 15 May 2003 16:13:05 +1000, Derek Parnell
> <ddparnell at bigpond.com> wrote:
>
> <snip>
>
> Interesting, two quick points:
> >                if find(pText[i], pNonword) > 0
> >                then
> >                    if lStart != 0
> >                    then
> >                        -- Avoid empty tokens
> >                        if lStart != i then
> >                            lTokens = append(lTokens, pText[lStart .. i -
> >1])
> >                        end if
> >                        lStart = 0
> >                    end if
> >
> ^^^^^ it looks to me an "else" has gone walkabouts here.
> >                    lTokens = append(lTokens, {pText[i]})
> >                    lStart = 0
> >                end if
> >            end if
> >        end if
> >    end for

No, the code is correct. No 'else' is missing.

> 2) I can't see they are used, what were pQuotes[4]&[5] supposed to be
> for? Just curious.

I never got around to this, but there were going to be used for nested
tokens; brackets for example. Must complete that I guess. pQuotes[4] is a
list of leading, or opening, symbols and pQuotes[5] is the matching closing
symbols.

----------------
cheers,
Derek Parnell

new topic     » goto parent     » topic index » view message » categorize

8. Re: help with storing user input

On Fri, 16 May 2003 00:53:30 +1000, Derek Parnell
<ddparnell at bigpond.com> wrote:
>> ^^^^^ it looks to me an "else" has gone walkabouts here.
>No, the code is correct. No 'else' is missing.
Good. The blank made me suspect, I see a -1 now. You happy, me happy.
>
>> 2) I can't see they are used, what were pQuotes[4]&[5] supposed to be
>> for? Just curious.
>
>I never got around to this, but there were going to be used for nested
>tokens;=20
Eeek(!) I have some pukka code, just for matching [{( & ]}) tho, if
you are interested. (All it does is stack the openings and recurse on
finding a matching close (?9/0 on mismatch); nothing special but you
have mentioned you is busy, so when/if I can help...)

Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: help with storing user input

On Fri, 16 May 2003 00:29:49 +0000, Jason Dube <dubetyrant at hotmail.com> 
wrote:


Hi Jason,
may I be of assistance (as it is my humble code ...)

>
>
> -------------------------------------
> global function Tokenize(sequence pText, object pWhiteSpace, object
> pNonword,
> object pQuotes) --> sequence
> -------------------------------------
> -- pText is returned as a sequence of 'words'.
> -- Each word is delimited by a set of one or more Delimiters
>
>
> sequence lTokens
> integer lStartQuote, lEndQuote
> integer lTextLength
> integer lStart
> integer lPos
>
> -- Validate whitespace parameter
> if atom(pWhiteSpace) then
> if pWhiteSpace = 0 then
> pWhiteSpace = ' ' & 8 & 9 & 10 & 11 & 12 & 13
> else
> pWhiteSpace = {pWhiteSpace}
> end if
> end if

Okay, let's start with this then.

The parameter definition of 'pWhiteSpace' is 'object', implying that the 
caller can use either an atom or a sequence. I allow both for a good 
reason. But before we look at that, realize that 'pWhiteSpace' is meant to 
represent a set of characters that can ALL be considered as "white space 
characters". Now back to the story...

  if pWhiteSpace was passed as an atom then
    if that value is a zero this indicates that the caller wishes to use 
the 'default' set of white space characters. And that is the set of 
characters represented by "' ' & 8 & 9 & 10 & 11 & 12 & 13" - namely the 
SPACE, BACKSPACE, TAB, LINEFEED, VERTICALFEED, FORMFEED and CARRIAGE- 
RETURN.
    if the atom value passed is NOT a zero then I just convert it to a 
sequence by enclosing it in braces.

You see, what I want in the program is a sequence, but I allow people to 
call the routine a number of ways...

   -- Just use the SPACE character as delimiter.
   Tokenize("derek parnell Level11", ' ', ...

   -- Use the SPACE and TAB characters as delimiters.
   Tokenize("derek parnell Level11", {"\t"}, ...

   -- Use the default characters as delimiters.
   Tokenize("derek parnell Level11", 0, ...


My validation of the parameter is not perfect because it allows people to 
pass floating point atoms and nested sequences - which I really do not 
want.

> -- Validate non-word parameter
> if atom(pNonword) then
> if pNonword = 0 then
> pNonword = "`~!@#$%^&*()_-+={[}]|\\:;\"'<,>.?/"
> else
> pNonword = {pNonword}
> end if
> end if

Ditto. The pNonword parameter is a set of characters that and definitely 
not found inside words. I allow people to supply their own non-word 
characters or to use the default ones.


-- 

cheers,
Derek Parnell

new topic     » goto parent     » topic index » view message » categorize

10. Re: help with storing user input

-Derek, I was wondering if you could give me an overview of what this
particular code is doing.

    -- Validate quote marks parameter
    if     sequence(pQuotes) then
 if length(pQuotes) = 0 then
     pQuotes = {{},{},{},{},{}}
 elsif (length(pQuotes) != 5
  or
        atom(pQuotes[1])
  or
        atom(pQuotes[2])
  or
        atom(pQuotes[3])
  or
        length(pQuotes[1]) != length(pQuotes[2])
  or
        atom(pQuotes[4])
  or
        atom(pQuotes[5])
  or
        length(pQuotes[4]) != length(pQuotes[5])
     )
 then
     pQuotes = 0
 end if
    end if

    if atom(pQuotes) then
 if pQuotes = 0 then
     pQuotes = {"\"'`", "\"'`", "\\~","",""}
 else
     pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
 end if
    end if
----- Original Message -----
From: "Derek Parnell" <ddparnell at bigpond.com>
To: "EUforum" <EUforum at topica.com>
Sent: Thursday, May 15, 2003 9:04 PM
Subject: Re: help with storing user input


>
> On Fri, 16 May 2003 00:29:49 +0000, Jason Dube <dubetyrant at hotmail.com>
> wrote:
>
>
> Hi Jason,
> may I be of assistance (as it is my humble code ...)
>
> >
> > -- pText is returned as a sequence of 'words'.
> > -- Each word is delimited by a set of one or more Delimiters
> >
> >
> > sequence lTokens
> > integer lStartQuote, lEndQuote
> > integer lTextLength
> > integer lStart
> > integer lPos
> >
> > -- Validate whitespace parameter
> > if atom(pWhiteSpace) then
> > if pWhiteSpace = 0 then
> > pWhiteSpace = ' ' & 8 & 9 & 10 & 11 & 12 & 13
> > else
> > pWhiteSpace = {pWhiteSpace}
> > end if
> > end if
>
> Okay, let's start with this then.
>
> The parameter definition of 'pWhiteSpace' is 'object', implying that the
> caller can use either an atom or a sequence. I allow both for a good
> reason. But before we look at that, realize that 'pWhiteSpace' is meant to
> represent a set of characters that can ALL be considered as "white space
> characters". Now back to the story...
>
>   if pWhiteSpace was passed as an atom then
>     if that value is a zero this indicates that the caller wishes to use
> the 'default' set of white space characters. And that is the set of
> characters represented by "' ' & 8 & 9 & 10 & 11 & 12 & 13" - namely the
> SPACE, BACKSPACE, TAB, LINEFEED, VERTICALFEED, FORMFEED and CARRIAGE-
> RETURN.
>     if the atom value passed is NOT a zero then I just convert it to a
> sequence by enclosing it in braces.
>
> You see, what I want in the program is a sequence, but I allow people to
> call the routine a number of ways...
>
>    -- Just use the SPACE character as delimiter.
>    Tokenize("derek parnell Level11", ' ', ...
>
>    -- Use the SPACE and TAB characters as delimiters.
>    Tokenize("derek parnell Level11", {"\t"}, ...
>
>    -- Use the default characters as delimiters.
>    Tokenize("derek parnell Level11", 0, ...
>
>
> My validation of the parameter is not perfect because it allows people to
> pass floating point atoms and nested sequences - which I really do not
> want.
>
> > -- Validate non-word parameter
> > if atom(pNonword) then
> > if pNonword = 0 then
> > pNonword = "`~!@#$%^&*()_-+={[}]|\\:;\"'<,>.?/"
> > else
> > pNonword = {pNonword}
> > end if
> > end if
>
> Ditto. The pNonword parameter is a set of characters that and definitely
> not found inside words. I allow people to supply their own non-word
> characters or to use the default ones.
>
>
> --
>
> cheers,
> Derek Parnell
>
>
>
> TOPICA - Start your own email discussion group. FREE!
>
>

new topic     » goto parent     » topic index » view message » categorize

11. Re: help with storing user input

----- Original Message -----
From: "Jason Dube" <dubetyrant at hotmail.com>
To: "EUforum" <EUforum at topica.com>
Subject: Re: help with storing user input


>
> -Derek, I was wondering if you could give me an overview of what this
> particular code is doing.
>
>     -- Validate quote marks parameter
>     if     sequence(pQuotes) then
>  if length(pQuotes) = 0 then
>      pQuotes = {{},{},{},{},{}}
>  elsif (length(pQuotes) != 5
>   or
>         atom(pQuotes[1])
>   or
>         atom(pQuotes[2])
>   or
>         atom(pQuotes[3])
>   or
>         length(pQuotes[1]) != length(pQuotes[2])
>   or
>         atom(pQuotes[4])
>   or
>         atom(pQuotes[5])
>   or
>         length(pQuotes[4]) != length(pQuotes[5])
>      )
>  then
>      pQuotes = 0
>  end if
>     end if
>
>     if atom(pQuotes) then
>  if pQuotes = 0 then
>      pQuotes = {"\"'`", "\"'`", "\\~","",""}
>  else
>      pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
>  end if
>     end if

This is just validating and initializing the pQuotes group of data.
 pQuotes[1] and [2] is a pair of character sets. [1] is the start quote and
[2] is the matching end quote. All the characters inside the quotes is
considered to be a word.

 pQuotes[3] is a list of characters that are 'escape' characters to be used
inside quoted strings. For example...

   'abc~'def'

with pQuotes = { "'", "'", '~', ...}

gives the word value as abc'def

pQuotes[4] and [5] are not being used yet.


----------------
cheers,
Derek Parnell

new topic     » goto parent     » topic index » view message » categorize

12. Re: help with storing user input

Hey,
I know your not responsible for teaching me to code with euphoria, but this 
algoritm uses the language in a lot of ways I couldn't imagine. This part of 
the code is kinda confusing for me, in order to understand the big for 
statement that follows, dont I have to find out whats going on with this 
pquotes variable?

-Basically I'm wondering what kinds of parameters the caller of this 
function would  specify for pquotes.

> >     -- Validate quote marks parameter
> >     if     sequence(pQuotes) then
> >  if length(pQuotes) = 0 then

--excuse me if I dont know euphoria syntax that well, but how could the 
length of pquotes possibly be zero. This function has to take four arguments 
right?In my mind control would never be passed here because euphoria wont 
run this program without getting four arguments.(Thinking out loud)so this 
line is simply testing to see if the caller has specified {} an empty 
sequence as a parameter?

> >      pQuotes = {{},{},{},{},{}}

ok, so now it has five empty sequences in it, why?

> >  elsif (length(pQuotes) != 5


> >   or
> >         atom(pQuotes[1])
> >   or
> >         atom(pQuotes[2])
> >   or
> >         atom(pQuotes[3])
> >   or
> >         length(pQuotes[1]) != length(pQuotes[2])
> >   or
> >         atom(pQuotes[4])
> >   or
> >         atom(pQuotes[5])
> >   or
> >         length(pQuotes[4]) != length(pQuotes[5])
> >      )
> >  then
> >      pQuotes = 0
> >  end if
> >     end if

--no idea why to test for all these things
> >
> >     if atom(pQuotes) then
> >  if pQuotes = 0 then
> >      pQuotes = {"\"'`", "\"'`", "\\~","",""}
> >  else
> >      pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
> >  end if
> >     end if

-no clue:)

>
>This is just validating and initializing the pQuotes group of data.
>  pQuotes[1] and [2] is a pair of character sets. [1] is the start quote 
>and
>[2] is the matching end quote. All the characters inside the quotes is
>considered to be a word.

okay I understand what your saying here...Im just not getting how you made 
that happen with your code.
>
>  pQuotes[3] is a list of characters that are 'escape' characters to be 
>used
>inside quoted strings. For example...


--so an escape character like ~ would be listed as a seperate sequence?

>
>    'abc~'def'
>
>with pQuotes = { "'", "'", '~', ...}
>
>gives the word value as abc'def
ok
>

--Overall I think I'll just use the simpler function, Im really not able yet 
to understand this function. I do understand the simpler one though!!

--Thanks for taking the time to explain, I'll probably be able to figure it 
out sometime:)

--And I'll make sure to credit you wherever I use it(the simpletokenize, 
that is)

_________________________________________________________________
Add photos to your e-mail with MSN 8. Get 2 months FREE*.  
http://join.msn.com/?page=features/featuredemail

new topic     » goto parent     » topic index » view message » categorize

13. Re: help with storing user input

----- Original Message -----
From: "Jason Dube" <dubetyrant at hotmail.com>
To: "EUforum" <EUforum at topica.com>
Subject: Re: help with storing user input


>
> Hey,
> I know your not responsible for teaching me to code with euphoria, but
this
> algoritm uses the language in a lot of ways I couldn't imagine.

Hey, I don't mind.

>This part of
> the code is kinda confusing for me, in order to understand the big for
> statement that follows, dont I have to find out whats going on with this
> pquotes variable?

Your choice.

> -Basically I'm wondering what kinds of parameters the caller of this
> function would  specify for pquotes.
>
> > >     -- Validate quote marks parameter
> > >     if     sequence(pQuotes) then
> > >  if length(pQuotes) = 0 then
>
> --excuse me if I dont know euphoria syntax that well, but how could the
> length of pquotes possibly be zero. This function has to take four
arguments
> right?In my mind control would never be passed here because euphoria wont
> run this program without getting four arguments.(Thinking out loud)so this
> line is simply testing to see if the caller has specified {} an empty
> sequence as a parameter?

Yes, that's right.
> > >      pQuotes = {{},{},{},{},{}}

and if so it i just a short hand for 5 empty sequences.

> ok, so now it has five empty sequences in it, why?

in case the user doesn't need 'quote' processing.

> > >  elsif (length(pQuotes) != 5
>
>
> > >   or
> > >         atom(pQuotes[1])
> > >   or
> > >         atom(pQuotes[2])
> > >   or
> > >         atom(pQuotes[3])
> > >   or
> > >         length(pQuotes[1]) != length(pQuotes[2])
> > >   or
> > >         atom(pQuotes[4])
> > >   or
> > >         atom(pQuotes[5])
> > >   or
> > >         length(pQuotes[4]) != length(pQuotes[5])
> > >      )
> > >  then
> > >      pQuotes = 0
> > >  end if
> > >     end if

All of this just makes sure that the parameter has 5 sub sequences and that
[1] and [2] are the same length and that [4] and [5] are the same length. If
the parameter fails this test, I force it to use the default values.

> --no idea why to test for all these things
> > >
> > >     if atom(pQuotes) then
> > >  if pQuotes = 0 then
> > >      pQuotes = {"\"'`", "\"'`", "\\~","",""}

this is just a way of requesting the default values. The user calls this
routine with a zero in this parameter.

> > >  else
> > >      pQuotes = {{pQuotes}, {pQuotes},{},{},{}}
this is just a shorthand way of saying that the user is only interested in
simple quote processing. They can call the routine like this...

    Tokenize( string, ws, nw, '|' )

so that all characters in the string between vertical bars forms a 'word'.

> > >  end if
> > >     end if
>
> -no clue:)
>
> >
> >This is just validating and initializing the pQuotes group of data.
> >  pQuotes[1] and [2] is a pair of character sets. [1] is the start quote
> >and
> >[2] is the matching end quote. All the characters inside the quotes is
> >considered to be a word.
>
> okay I understand what your saying here...Im just not getting how you made
> that happen with your code.
> >
> >  pQuotes[3] is a list of characters that are 'escape' characters to be
> >used
> >inside quoted strings. For example...
>
>
> --so an escape character like ~ would be listed as a seperate sequence?


Yes.

> >
> >    'abc~'def'
> >
> >with pQuotes = { "'", "'", '~', ...}
> >
> >gives the word value as abc'def
> ok
> >
>
> --Overall I think I'll just use the simpler function, Im really not able
yet
> to understand this function. I do understand the simpler one though!!
>
> --Thanks for taking the time to explain, I'll probably be able to figure
it
> out sometime:)
>
> --And I'll make sure to credit you wherever I use it(the simpletokenize,
> that is)

No problems.

----------------
cheers,
Derek Parnell

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu