Re: Contest
- Posted by Derek Parnell <ddparnell at bigpond.com> Oct 31, 2004
- 526 views
Kat wrote: > > On 31 Oct 2004, at 1:40, Derek Parnell wrote: > > > > > posted by: Derek Parnell <ddparnell at bigpond.com> > > > > Kat wrote: > > > > > > On 30 Oct 2004, at 23:25, Derek Parnell wrote: > > > > > > > > > > > posted by: Derek Parnell <ddparnell at bigpond.com> > > > > > > > > cklester wrote: > > > > > > > > > > Pete Lomax wrote: > > > > > > > > > > > > > > > > > Can we have an absolute ruling on the '10-4' query? > > > > > > > > > > It's already in the rules. Not a token. :) > > > > > > This is getting so complicated that in a go/no-go contest, it's going to > > > be > > > plain luck if anyone gets all the correct answers. > > > > No, I'm gettting the correct answers. > > > > This is NOT complicated, really. You guys are just reading too much into > > simple rules. My code that does all this tokenizing runs to about 20 lines. > > It's > > not so hard, honestly. > > > > > > '10-4' is a token. > > > > > > Ok, because it has the two ' in it. I understand. > > > > Good. > > > > >But we are to ignore and strip them, making what was a token into: > > > > Who said anything about stripping off characters? I talked about > > not counting quotes when determining the length, but never about > > removing them. Ignoring is not removing. > > You have said : > it's = its > if that's not stripping or ignoring them, i don't know the meaning of the > words. Kat, if I say anything hurtful in this reply, please excuse me. I'm not intending to do that. I'm frustrated at myself for not being a great explainer. Given that, here we go ... The actual quotation from the rules is ... For the purposes of comparison and display, quotes are ignored in any token. You can think of a quote as a zero-length token character. When determining the effective length of a token string, it is the sum of the lengths of each token character, and all token characters except quote have a length of 1. For example it's and its are the same, 'heaven' and heaven are the same token. A token's length does not include any quotes in the count. Thus the tokens Ma'am, maam and 'MAAM' are considered to be the same 4-character token. ... Notice the context. I'm talking about the effective length of a token. I'm sorry (again) that I'm failing to clearly explain *my* rules. Would it help if I said " it's and its are *equivalent* tokens. " They both have effective length of 3. They compare as equals. Obviously they are not the same strings. But FOR THE PURPOSES OF THIS CONTEST, AND THIS CONTEST ONLY, they are deemed to be equivalent. That's the rule. Just get over it. You might like it to be a different rule, but it isn't. Above you say "if that's not stripping or ignoring them ...". I believe that stripping them off and ignoring them are two different things. I am telling you that the rules say they are to be ignored when counting the length of a token and when comparing them. Please just take this as a requirement. Don't question it. It just is. I agree the the phrase "purposes of comparison and display" in the rules is confusing. Hopefully the 'comparision' part isn't confusing, but what I meant about the display is that I don't care if you display the token with or without the quotes, however don't go displaying BOTH "it's" and "its". Pick one of the variants when displaying the token. I don't really care which variant you choose. > > > '10-4' is a SIX-BYTE string. Because it has a MIXTURE of quotes and other > > token characters it is a token. It has an EFFECTIVE length of 4. > > > > > > 10-4 is not a token. > > > > True, but why are you converting '10-4' into 10-4 ? > > Again, because its = it's. No, no, no. Because of the mixture of quotes and digits, its a token. So you don't have to go and re-examine it after ignoring the quotes. You have already determined that it is a token. After finding '10-4' you ask yourself - is this a token? Yes it is, so move to find the next token. Don't strip off the quotes and then say, well is it still a token? Instead, ignore the quotes, save it in your token store, and move on to the next one. > > The specs do not talk > > about removing bytes from strings. > > And that is confusing. Sorry. But hopeful the above helps de-confuse the rules. > > If you find these 4 bytes surrounded > > by spaces rather than quotes then it is a delimiter, > > or with leading or trailing - Yes, but that's not the topic. Stay with the context. > > in fact the spaces > > would also be a part of the same delimiter, but that's not we are talking > > about either. > > > > > But now this "is not a token" item is a 4 char token of length 6? > > > > > > What about 5'8 ? > > > > Again, its a token because it is a MIXTURE of quotes and token characters. > > > > > Which is why i asked 2 days ago which order the parsing was to happen. > > > > I may have misunderstood this question. Sorry. Can you ask it again for me? > > -- > > Ok........ > > if we strip the ' out of it's, then check it's length, it's 3 bytes long. > if we check length, and then remove the ', it's 4 bytes long. Don't strip off the quotes then. Just ignore them. PSEUDO CODE ::: DO NOT ATTEMPT THIS IN YOUR PROGRAM AS THERE ARE MUCH BETTER WAYS TO DO IT. grab bytes until you get to a non-token character. for each byte in potential_token if byte is not "'" then add 1 to effective_length end if end for if effective_length > 0 and effective_length <= 20 then if first byte is "-" or last byte is "-" then mark this as a delimiter otherwise if any byte in potenial_token is alphabetic or "'" then mark this a real_token otherwise mark this as a delimiter end if end if otherwise mark this as a delimiter. end if > something must be done with the ' to make it's equal its, and keeping it > means both its and it's will be listed in the list of valid tokens. That is part of the puzzle you must workout how to implement. It is possible to do because I've done it, and it looks like others have too. -- Derek Parnell Melbourne, Australia