1. Contest Update

Six new submissions on web.

 http://www.users.bigpond.com/ddparnell/contest1/rules.htm

-- 
Derek Parnell
Melbourne, Australia

new topic     » topic index » view message » categorize

2. Re: Contest Update

Derek Parnell wrote:
> 
> Six new submissions on web.
> 
>  <a
>  href="http://www.users.bigpond.com/ddparnell/contest1/rules.htm">http://www.users.bigpond.com/ddparnell/contest1/rules.htm</a>
> 
> -- 
> Derek Parnell
> Melbourne, Australia
> 

Hi, Derek.

I'm kind of confused on the results as I'm not sure how to fix them.

My entry says wrong count and failed file6 (the empty file) but my program
just prints a message and then exits.  Should it still try to print some kind
of count for a zero-length file or a file with no discernable words?

I also don't understand how it failed on file8 (War and Peace) since I've been
testing this file (or one similar) all through the development.

Hmmm... Difficult to troubleshoot when I don't know what my program is doing
wrong.

Any troubleshooting hints anyone?

-------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Contest Update

Also I realize that coding style is a subjective category, but as I am a
hobbyist
and not a professional do you have any tips for improvement?

------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Contest Update

Jason Gade wrote:
> 
> Derek Parnell wrote:
> > 
> > Six new submissions on web.
> > 
> >  <a
> >  href="http://www.users.bigpond.com/ddparnell/contest1/rules.htm">http://www.users.bigpond.com/ddparnell/contest1/rules.htm</a>
> > 
> > -- 
> > Derek Parnell
> > Melbourne, Australia
> > 
> 
> Hi, Derek.
> 
> I'm kind of confused on the results as I'm not sure how to fix them.
> 
> My entry says wrong count and failed file6 (the empty file) but my program
> just prints a message and then exits.  Should it still try to print some kind
> of count for a zero-length file or a file with no discernable words?

This is my mistake. Currently, the program I wrote to evaluate the results
only looks for 'counts' and formatting. For this file, I need to be a bit
smarter. However, this only lost you one point out of 460 so its not a 
big deal (yet).

> I also don't understand how it failed on file8 (War and Peace) since I've been
> testing this file (or one similar) all through the development.

Don't know either. I'm not with the machine I tested the submission on
at the moment so I can't inspect the detailed results. However, because
you got all the top-used tokens correct, I'm going to guess that its the
'funny' tokens you are having problems with. My copy of file8 has been
doctored somewhat to include some odd looking tokens. 

Check again for tokens that might contain quotes and/or hyphens,
especially at the start or end of a string. Also, strings just made 
up of quotes gave my program problems at first.


> Hmmm... Difficult to troubleshoot when I don't know what my program is doing
> wrong.
> 
> Any troubleshooting hints anyone?

I'll give a hint that some people may have tripped up on.

A file opened as "text" will appear to prematurely end if it contains
the End-Of-File marker for text files.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

5. Re: Contest Update

Jason Gade wrote:
> 
> Also I realize that coding style is a subjective category, but as I am a
> hobbyist
> and not a professional do you have any tips for improvement?
> 

>From memory, your coding style was pretty good. The contest results
page shows a reduced style score because it failed some of the non-timed
files.

If I may suggest a book and web page ...

  http://www.cc2e.com/

This is the home page for Steve McConnell's "Code Complete" book.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

6. Re: Contest Update

Derek Parnell wrote:
> 
> I'll give a hint that some people may have tripped up on.
> 
> A file opened as "text" will appear to prematurely end if it contains
> the End-Of-File marker for text files.
> 

I suspected that might be the case.  But penalizing the programmer for treating
a file that is supposed to be text as text seems wrong.  It the input file was
not to be treated as text, then the rules should say that tokens contain the
*bytes*:

{65,66,67, etc.}

instead of:

"ABC ... etc."

Making an assumption that one should continue after an EOF marker could be wrong
if this was a "real-world" application.  Sticking to this esoterica would seem to
be making the contest about "who can best interpret logical loopholes in the
rules" rather than best program a well-defined task.

Alternatively, simply put in a rule that says, "Input files should be opened in
binary mode".

-- Andy

new topic     » goto parent     » topic index » view message » categorize

7. Re: Contest Update

Derek Parnell wrote:
> 
> Jason Gade wrote:

> > I also don't understand how it failed on file8 (War and Peace) since I've
> > been
> > testing this file (or one similar) all through the development.
> 
> Don't know either. I'm not with the machine I tested the submission on
> at the moment so I can't inspect the detailed results. However, because
> you got all the top-used tokens correct, I'm going to guess that its the
> 'funny' tokens you are having problems with. My copy of file8 has been
> doctored somewhat to include some odd looking tokens. 
> 
> Check again for tokens that might contain quotes and/or hyphens,
> especially at the start or end of a string. Also, strings just made 
> up of quotes gave my program problems at first.
> 
> 
> > Hmmm... Difficult to troubleshoot when I don't know what my program is doing
> > wrong.
> > 
> > Any troubleshooting hints anyone?
> 
> I'll give a hint that some people may have tripped up on.
> 
> A file opened as "text" will appear to prematurely end if it contains
> the End-Of-File marker for text files.
> 
> -- 
> Derek Parnell
> Melbourne, Australia
> 
Okay.  So in my testing I made a file that contained edge cases identified in
the rules
and they were counted correctly.

Also I do open the file in binary mode, so... hmm.

I may need to think of some new edge cases to test for.

Currently, the program follows these rules:

--    words consist of upper and lower case letters, digits 0-9, single quote
and dash;
--    for the purposes of comparison, case does not matter and quotes are not
counted;
--    words consisting of only digits, or digits and dashes, are not counted as
words
--    unless they are quoted;
--    words of zero length after quotes are removed are not counted.

If I am interpreting the rules correctly, I will try to come up with a new
(short)
test file to validate with.

I wish now that I had saved the version of your web page that had your unique
counts
and total counts for each file posted -- at least then it would be easier to
compare
with.  It kind of sucks that the calibration file works perfectly but the others
do not!! ;^)

------------------------------------
Too many freaks, not enough circuses.
j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Contest Update

On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com>
wrote:
> Making an assumption that one should continue after an EOF marker could be
> wrong if this was a "real-world" application.  Sticking to this esoterica would
> seem to be making the contest about "who can best interpret logical loopholes in
> the rules" rather than best program a well-defined task.
> 
> Alternatively, simply put in a rule that says, "Input files should be opened
> in binary mode".

I believe that this is covered in the programming style criteria:
"Defensive coding that is tolerant of bad data." That's why files 6-11
are there - they contain a lot of border cases that may trip up
programs less tolerant. You're not penalised anywhere near as much for
making mistakes with these files than you are with the first 5.

-- 
MrTrick

new topic     » goto parent     » topic index » view message » categorize

9. Re: Contest Update

Derek Parnell wrote:
> 
> Six new submissions on web.

LOL! I take reconciliation that each attempt has at least been faster! DOH!!

-=ck
"Programming in a state of EUPHORIA."
http://www.cklester.com/euphoria/

new topic     » goto parent     » topic index » view message » categorize

10. Re: Contest Update

Patrick Barnes wrote:
> 
> On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com>
> wrote:
> > Making an assumption that one should continue after an EOF marker could be
> > wrong if this was
> a "real-world" application.  Sticking to this esoterica would seem to be
> making the contest about "who can best interpret
> logical loopholes in the rules" rather than best program a well-defined
> task.</font></i>
> > 
> > Alternatively, simply put in a rule that says, "Input files should be opened
> > in binary mode".
> 
> I believe that this is covered in the programming style criteria:
> "Defensive coding that is tolerant of bad data." That's why files 6-11
> are there - they contain a lot of border cases that may trip up
> programs less tolerant. You're not penalised anywhere near as much for
> making mistakes with these files than you are with the first 5.
> 
> 
I understand bad data, but ignoring an EOF marker in a text file is making an
assumption that I wouldn't neccessarily consider correct.  Some data is so "bad"
that you can't expect the program to know what to do with it (unless that case is
explicitly covered in the rules).  Should I make guesses at what other "bad"
bytes in the file are "supposed to be" and adjust my token counts accordingly?

I just don't think it is reasonable, just as if there was a rule that said,
"Program must continue to perform while computer is set on fire."

new topic     » goto parent     » topic index » view message » categorize

11. Re: Contest Update

On Sun, 07 Nov 2004 15:13:18 -0800, Andy Serpa <guest at rapideuphoria.com>
wrote:

Hey at least you're told what sort of file it is. Knowing that a file
is random binary values will give you a bit of a clue...

-- 
MrTrick

new topic     » goto parent     » topic index » view message » categorize

12. Re: Contest Update

Andy Serpa wrote:
> 
> Patrick Barnes wrote:
> > 
> > On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com>
> > wrote:
> > > Making an assumption that one should continue after an EOF marker could be
> > > wrong if this was
> > a "real-world" application.  Sticking to this esoterica would seem to be
> > making the contest
> about "who can best interpret</font></i>
> > logical loopholes in the rules" rather than best program a well-defined
> > task.</font></i>
> > > 
> > > Alternatively, simply put in a rule that says, "Input files should be
> > > opened in binary mode".
> > 
> > I believe that this is covered in the programming style criteria:
> > "Defensive coding that is tolerant of bad data." That's why files 6-11
> > are there - they contain a lot of border cases that may trip up
> > programs less tolerant. You're not penalised anywhere near as much for
> > making mistakes with these files than you are with the first 5.
> > 
> > 
> I understand bad data, but ignoring an EOF marker in a text file is making an
> assumption
> that I wouldn't neccessarily consider correct.  Some data is so "bad" that you
> can't
> expect the program to know what to do with it (unless that case is explicitly
> covered
> in the rules).  Should I make guesses at what other "bad" bytes in the file
> are "supposed
> to be" and adjust my token counts accordingly?
> 
> I just don't think it is reasonable, just as if there was a rule that said,
> "Program
> must continue to perform while computer is set on fire."

If I may weigh in here, the EOF marker is not bad data. It is perfectly
allowable in text files and *any* program that reads a file as a text file
ought to take it into consideration, as you have said. However, who said
they were going to be text files?

The confusion might come about because it was assumed that the test files
would be 'text' files. There is no statement in the rules that this would
be the case. All it says, and I quote rule 13, ...

"The file should only contain bytes in the range #00 - #7F, and you can
consider those to be ASCII characters."

A better program would most likely validate the input rather than assuming
it to be correct. 

I'm sorry I didn't make it too easy for people, but this specification is
still a whole lot better than most specifications received from a client.
So yes, I could have told you how to open the file, how to hash the tokens,
how to store the lists of tokens, how to sort or order the tokens,
how to ...  etc... but I didn't. I wanted you to work out some of the
'traps' that might be there. 

Its a game more than a contest, so one needs a bit of an interesting 
challenge.

If anyone wants to withdraw, just let me know.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

13. Re: Contest Update

Jason Gade wrote:

[snip]

> Okay.  So in my testing I made a file that contained edge cases identified in
> the rules
> and they were counted correctly.
> 
> Also I do open the file in binary mode, so... hmm.
> 
> I may need to think of some new edge cases to test for.
> 
> Currently, the program follows these rules:
> 
> --    words consist of upper and lower case letters, digits 0-9, single quote
> and dash;
> --    for the purposes of comparison, case does not matter and quotes are not
> counted;
> --    words consisting of only digits, or digits and dashes, are not counted
> as words
> --    unless they are quoted;
> --    words of zero length after quotes are removed are not counted.
> 
> If I am interpreting the rules correctly, I will try to come up with a new
> (short)
> test file to validate with.

That summary is pretty good. It seems you understand the 'token' idea.
 
> I wish now that I had saved the version of your web page that had your unique
> counts
> and total counts for each file posted -- at least then it would be easier to
> compare
> with.  It kind of sucks that the calibration file works perfectly but the
> others
> do not!! ;^)

Agreed. My first attempt worked perfectly well, and fast, with the
calibration file. In fact, I thought I had everything under control
so that became my 'frozen' code. Then people started submitting
their efforts and they were consistently getting different counts 
to my program. After a few, I realized that these people were
agreeing with each other but not my results. Then it I got it! My
program had a bug (or two). My program failed all the other test
files, even though it breezed through the first file.

So how stupid do I look! LOL!

Tonight when I get home, I'll have a closer look at your results
to see if I can find any clues for you.

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

14. Re: Contest Update

Derek Parnell wrote:
> 
> 
> If I may weigh in here, the EOF marker is not bad data. It is perfectly
> allowable in text files and *any* program that reads a file as a text file
> ought to take it into consideration, as you have said. However, who said
> they were going to be text files?
> 
> The confusion might come about because it was assumed that the test files
> would be 'text' files. There is no statement in the rules that this would
> be the case. All it says, and I quote rule 13, ...
> 
> "The file should only contain bytes in the range #00 - #7F, and you can
> consider those to be ASCII characters."
> 
> A better program would most likely validate the input rather than assuming
> it to be correct. 
> 
> I'm sorry I didn't make it too easy for people, but this specification is
> still a whole lot better than most specifications received from a client.
> So yes, I could have told you how to open the file, how to hash the tokens,
> how to store the lists of tokens, how to sort or order the tokens,
> how to ...  etc... but I didn't. I wanted you to work out some of the
> 'traps' that might be there. 
> 
> Its a game more than a contest, so one needs a bit of an interesting 
> challenge.
> 
> If anyone wants to withdraw, just let me know.
> 

I just figured you wanted this to be a programming challenge rather than a
"rule-reading" challenge.  I would submit that your referring to the tokens as
"characters" rather than "bytes" (at least in some places) would allow one to
assume that the inputs are *supposed* to be text files.  "Characters" only exist
in text, at least in a Euphoria context (maybe if this were C, where a character
might be assumed to be any single byte).  I also thinking making assumptions
about the intentions of the creator of the input file (maybe the EOF is supposed
to be there, and you are NOT intended to read past it) is questionable.  If the
scope of the contest is to include the possibility of an ambiguous program
specification where the true wants of the (hypothetical) "client" are therefore
unknown, then my interpretation is just as valid as yours, and the programmer
should only be penalized if his program handles the same (ambiguous) situation
inconsistenly with different input files.

Or, alternatively, as I said before, if you (as the "client") remove such
ambiguousness from the rules/specification.

new topic     » goto parent     » topic index » view message » categorize

15. Re: Contest Update

Andy Serpa wrote:
> 
> Derek Parnell wrote:
> > 
> > 
> > If I may weigh in here, the EOF marker is not bad data. It is perfectly
> > allowable in text files and *any* program that reads a file as a text file
> > ought to take it into consideration, as you have said. However, who said
> > they were going to be text files?
> > 
> > The confusion might come about because it was assumed that the test files
> > would be 'text' files. There is no statement in the rules that this would
> > be the case. All it says, and I quote rule 13, ...
> > 
> > "The file should only contain bytes in the range #00 - #7F, and you can
> > consider those to be ASCII characters."
> > 
> > A better program would most likely validate the input rather than assuming
> > it to be correct. 
> > 
> > I'm sorry I didn't make it too easy for people, but this specification is
> > still a whole lot better than most specifications received from a client.
> > So yes, I could have told you how to open the file, how to hash the tokens,
> > how to store the lists of tokens, how to sort or order the tokens,
> > how to ...  etc... but I didn't. I wanted you to work out some of the
> > 'traps' that might be there. 
> > 
> > Its a game more than a contest, so one needs a bit of an interesting 
> > challenge.
> > 
> > If anyone wants to withdraw, just let me know.
> > 
> 
> I just figured you wanted this to be a programming challenge rather than a
> "rule-reading"
> challenge. 

Yes I did. But I submit that 'programming' is more than 'coding'. It also
includes understanding the specs amongst other things.

> I would submit that your referring to the tokens as "characters" rather
> than "bytes" (at least in some places) would allow one to assume that the
> inputs are
> *supposed* to be text files.

Yes again. That was one of the 'traps' I included. I refer to 'token text'
and characters. The idea behind this was so that one might make the
assumption that the file itself was a true ASCII text file, rather than
a file that contained ASCII text tokens. However, it is also quite possible
that some might not make this assumption, and many people did not.

>  "Characters" only exist in text, at least in a Euphoria
> context (maybe if this were C, where a character might be assumed to be any
> single
> byte).

A 'binary' can contain text though, no? Have look at your typical .EXE
file and while mostly it is not text, you can see text embedded in it.

>  I also thinking making assumptions about the intentions of the creator of the
> input file (maybe the EOF is supposed to be there, and you are NOT intended to
> read
> past it) is questionable.

Yes, you are right again. So did anyone question this before the contest
started? Time was made available for clarifying the specification. This
point was not brought up, so I didn't clarify it. Didn't want to make it
a perfect spec, did I?

>  If the scope of the contest is to include the possibility
> of an ambiguous program specification where the true wants of the
> (hypothetical) "client"
> are therefore unknown, then my interpretation is just as valid as yours, and
> the programmer
> should only be penalized if his program handles the same (ambiguous) situation
> inconsistenly
> with different input files.

So, you want to withdraw then? Want your money back? No problems. 

Look, a number of people tripped on this one, and few even fixed it
up themselves. But as I could see that some others were having an issue
with it, I tried to help with a 'hint'. It's a game. If I was a real 
client I would have mentioned this much, much earlier. That's what
prototypes are good at doing - defining the real specification.

> Or, alternatively, as I said before, if you (as the "client") remove such
> ambiguousness
> from the rules/specification.

Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy?

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

16. Re: Contest Update

Derek Parnell wrote:

[snip]
 
> Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy?

Done. The rules page is now clearer (I hope) on this point.

By the way, on another similar issue. The specs say that the output
should list all the token-length frequences up to the largest one
found in the file. So what should be output if you get a zero-count
for a length less than the largest one? Some people have displayed
a zero count and others have omitted the line altogether. I've been
lenient and allowed both interpretations.

  http://www.users.bigpond.com/ddparnell/contest1/rules.htm

-- 
Derek Parnell
Melbourne, Australia

new topic     » goto parent     » topic index » view message » categorize

17. Re: Contest Update

Derek Parnell wrote:
> 
> Look, a number of people tripped on this one, and few even fixed it
> up themselves. But as I could see that some others were having an issue
> with it, I tried to help with a 'hint'. It's a game. If I was a real 
> client I would have mentioned this much, much earlier. That's what
> prototypes are good at doing - defining the real specification.
> 
I was actually arguing that I was *not* tripped up -- that I in fact did right
according to the rules but that the rules were wrong for what you apparently
wanted.  Arrogant, I know.

> > Or, alternatively, as I said before, if you (as the "client") remove such
> > ambiguousness
> > from the rules/specification.
> 
> Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy?
> 
No need to get upset.  If you were a real "client", of course I'd be asking you
questions.  You'd even be allowed to answer them...

new topic     » goto parent     » topic index » view message » categorize

18. Re: Contest Update

Andy Serpa wrote:
> 
> Patrick Barnes wrote:
> > 
> > On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com>
> > wrote:
> > > Making an assumption that one should continue after an EOF marker could be
> > > wrong if this was
> > a "real-world" application.  Sticking to this esoterica would seem to be
> > making the contest
> about "who can best interpret</font></i>
> > logical loopholes in the rules" rather than best program a well-defined
> > task.</font></i>
> > > 
> > > Alternatively, simply put in a rule that says, "Input files should be
> > > opened in binary mode".
> > 
> > I believe that this is covered in the programming style criteria:
> > "Defensive coding that is tolerant of bad data." That's why files 6-11
> > are there - they contain a lot of border cases that may trip up
> > programs less tolerant. You're not penalised anywhere near as much for
> > making mistakes with these files than you are with the first 5.
> > 
> > 
> I understand bad data, but ignoring an EOF marker in a text file is making an
> assumption
That is one reason I always open files in binary mode.

> that I wouldn't neccessarily consider correct.  Some data is so "bad" that you
> can't
> expect the program to know what to do with it (unless that case is explicitly
> covered
> in the rules).  Should I make guesses at what other "bad" bytes in the file
> are "supposed
> to be" and adjust my token counts accordingly?
> 
> I just don't think it is reasonable, just as if there was a rule that said,
> "Program
> must continue to perform while computer is set on fire."
If anyone finds a program that can run on a burnt computer,
Id like to have a copy to use on my old laptop... smile

>

new topic     » goto parent     » topic index » view message » categorize

19. Re: Contest Update

Derek Parnell wrote:

> Derek Parnell wrote:
>
> [snip]
>
>> Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy?
>
> Done. The rules page is now clearer (I hope) on this point.
>
> By the way, on another similar issue. The specs say that the output
> should list all the token-length frequences up to the largest one
> found in the file. So what should be output if you get a zero-count
> for a length less than the largest one?

Strictly speaking: Zero.
Zero is not nothing. Zero is a count as valid as any other count.

> Some people have displayed
> a zero count and others have omitted the line altogether. I've been
> lenient and allowed both interpretations.
>
>   http://www.users.bigpond.com/ddparnell/contest1/rules.htm

Less strictly speaking: I think this is appropriate. smile

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

20. Re: Contest Update

Andy Serpa wrote:

> Derek Parnell wrote:
>>
>> I'll give a hint that some people may have tripped up on.
>>
>> A file opened as "text" will appear to prematurely end if it contains
>> the End-Of-File marker for text files.
>
> I suspected that might be the case.  But penalizing the programmer for
> treating a file that is supposed to be text as text seems wrong.  It
> the input file was not to be treated as text, then the rules should say
> that tokens contain the *bytes*:
>
> {65,66,67, etc.}
>
> instead of:
>
> "ABC ... etc."
>
> Making an assumption that one should continue after an EOF marker could
> be wrong if this was a "real-world" application.

Maybe in a certain context. However, AFAIK most modern programs use the
size of a file in order to detect its end, rather than the occurrence of
the ASCII 26 character.

In another context, continuing after an ASCII 26 character might be
absolutely reasonable. Users of one of my programs explicitly have asked
me to change the program, so that it *does* continue after an ASCII 26
character. This is because text files sometimes get corrupted, so that
an EOF marker gets somewhere inside it, where it was never intended to
be. I changed the program, so that it reads the text files in binary
mode, and replaces every ASCII 26 character with the string "<EOF>".
It's very robust now.

> Sticking to this esoterica would seem to be making the contest about
> "who can best interpret logical loopholes in the rules" rather than
> best program a well-defined task.

Derek's rules stated clearly: "The file should only contain bytes in the
range #00 - #7F". This includes 26.

<snip>

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu