Re: Contest

new topic     » goto parent     » topic index » view thread      » older message » newer message

Kat wrote:
> 
> On 31 Oct 2004, at 1:40, Derek Parnell wrote:
> 
> > 
> > posted by: Derek Parnell <ddparnell at bigpond.com>
> > 
> > Kat wrote:
> > > 
> > > On 30 Oct 2004, at 23:25, Derek Parnell wrote:
> > > 
> > > > 
> > > > posted by: Derek Parnell <ddparnell at bigpond.com>
> > > > 
> > > > cklester wrote:
> > > > > 
> > > > > Pete Lomax wrote:
> > > > > > 
> > > > > 
> > > > > > Can we have an absolute ruling on the '10-4' query?
> > > > > 
> > > > > It's already in the rules. Not a token. :)
> > > 
> > > This is getting so complicated that in a go/no-go contest, it's going to
> > > be
> > > plain luck if anyone gets all the correct answers.
> > 
> > No, I'm gettting the correct answers. blink 
> > 
> > This is NOT complicated, really. You guys are just reading too much into
> > simple rules. My code that does all this tokenizing runs to about 20 lines.
> > It's
> > not so hard, honestly.
> > 
> > > > '10-4' is a token.
> > > 
> > > Ok, because it has the two ' in it. I understand. 
> > 
> > Good. 
> > 
> > >But we are to ignore and strip them, making what was a token into:
> > 
> > Who said anything about stripping off characters? I talked about
> > not counting quotes when determining the length, but never about
> > removing them. Ignoring is not removing.
> 
> You have said :
> it's = its
> if that's not stripping or ignoring them, i don't know the meaning of the
> words.

Kat, if I say anything hurtful in this reply, please excuse me. I'm not
intending to do that. I'm frustrated at myself for not being a great
explainer. Given that, here we go ...

The actual quotation from the rules is ...

For the purposes of comparison and display, quotes are ignored in any
token. You can think of a quote as a zero-length token character. When
determining the effective length of a token string, it is the sum of
the lengths of each token character, and all token characters except
quote have a length of 1. For example it's and its are the same, 
'heaven' and heaven are the same token. A token's length does not
include any quotes in the count. Thus the tokens Ma'am, maam and
'MAAM' are considered to be the same 4-character token.

...
Notice the context. I'm talking about the effective length of a token.
I'm sorry (again) that I'm failing to clearly explain *my* rules. 

Would it help if I said "  it's and its are *equivalent* tokens. "
They both have effective length of 3. They compare as equals. 

Obviously they are not the same strings. But FOR THE PURPOSES OF THIS
CONTEST, AND THIS CONTEST ONLY, they are deemed to be equivalent. 

That's the rule. Just get over it. You might like it to be a different
rule, but it isn't. 

Above you say "if that's not stripping or ignoring them ...". I believe 
that stripping them off and ignoring them are two different things.
I am telling you that the rules say they are to be ignored when
counting the length of a token and when comparing them. Please just take
this as a requirement. Don't question it. It just is. 

I agree the the phrase "purposes of comparison and display" in the rules
is confusing. Hopefully the 'comparision' part isn't confusing, but 
what I meant about the display is that I don't care if you display 
the token with or without the quotes, however don't go displaying
BOTH "it's" and "its". Pick one of the variants when displaying 
the token. I don't really care which variant you choose.

> 
> > '10-4' is a SIX-BYTE string. Because it has a MIXTURE of quotes and other
> > token characters it is a token. It has an EFFECTIVE length of 4.
> > 
> > > > 10-4 is not a token.
> > 
> > True, but why are you converting '10-4' into 10-4 ? 
> 
> Again, because its = it's.

No, no, no.  Because of the mixture of quotes and digits, its a token.
So you don't have to go and re-examine it after ignoring the quotes. 
You have already determined that it is a token. 

After finding '10-4' you ask yourself - is this a token? Yes it is, so
move to find the next token. Don't strip off the quotes and then say, well
is it still a token? Instead, ignore the quotes, save it in your token
store, and move on to the next one.

> > The specs do not talk
> > about removing bytes from strings. 
> 
> And that is confusing.

Sorry. But hopeful the above helps de-confuse the rules.

> > If you find these 4 bytes surrounded
> > by spaces rather than quotes then it is a delimiter, 
> 
> or with leading or trailing -

Yes, but that's not the topic. Stay with the context.

> > in fact the spaces
> > would also be a part of the same delimiter, but that's not we are talking
> > about either. 
> 
> 
> > > But now this "is not a token" item is a 4 char token of length 6?
> > > 
> > > What about 5'8 ?
> > 
> > Again, its a token because it is a MIXTURE of quotes and token characters.
> > 
> > > Which is why i asked 2 days ago which order the parsing was to happen.
> > 
> > I may have misunderstood this question. Sorry. Can you ask it again for me?
> > --
> 
> Ok........
> 
> if we strip the ' out of it's, then check it's length, it's 3 bytes long.
> if we check length, and then remove the ', it's 4 bytes long.

Don't strip off the quotes then. Just ignore them.

PSEUDO CODE ::: 
DO NOT ATTEMPT THIS IN YOUR PROGRAM AS THERE ARE MUCH BETTER WAYS TO DO IT.

  grab bytes until you get to a non-token character.
  for each byte in potential_token
     if byte is not "'" then
        add 1 to effective_length
     end if
  end for
  if effective_length > 0 and effective_length <= 20 then
     if first byte is "-" or last byte is "-" then
        mark this as a delimiter
     otherwise
        if any byte in potenial_token is alphabetic or "'" then
            mark this a real_token
        otherwise
            mark this as a delimiter
        end if
     end if
  otherwise
     mark this as a delimiter.
  end if


> something must be done with the ' to make it's equal its, and keeping it 
> means both its and it's will be listed in the list of valid tokens.

That is part of the puzzle you must workout how to implement. It is 
possible to do because I've done it, and it looks like others have too.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu