1. Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 07, 2004
- 549 views
- Last edited Nov 08, 2004
Six new submissions on web. http://www.users.bigpond.com/ddparnell/contest1/rules.htm -- Derek Parnell Melbourne, Australia
2. Re: Contest Update
- Posted by Jason Gade <jaygade at yahoo.com> Nov 07, 2004
- 514 views
- Last edited Nov 08, 2004
Derek Parnell wrote: > > Six new submissions on web. > > <a > href="http://www.users.bigpond.com/ddparnell/contest1/rules.htm">http://www.users.bigpond.com/ddparnell/contest1/rules.htm</a> > > -- > Derek Parnell > Melbourne, Australia > Hi, Derek. I'm kind of confused on the results as I'm not sure how to fix them. My entry says wrong count and failed file6 (the empty file) but my program just prints a message and then exits. Should it still try to print some kind of count for a zero-length file or a file with no discernable words? I also don't understand how it failed on file8 (War and Peace) since I've been testing this file (or one similar) all through the development. Hmmm... Difficult to troubleshoot when I don't know what my program is doing wrong. Any troubleshooting hints anyone? ------------------------------------- Too many freaks, not enough circuses. j.
3. Re: Contest Update
- Posted by Jason Gade <jaygade at yahoo.com> Nov 07, 2004
- 529 views
- Last edited Nov 08, 2004
Also I realize that coding style is a subjective category, but as I am a hobbyist and not a professional do you have any tips for improvement? ------------------------------------ Too many freaks, not enough circuses. j.
4. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 07, 2004
- 551 views
- Last edited Nov 08, 2004
Jason Gade wrote: > > Derek Parnell wrote: > > > > Six new submissions on web. > > > > <a > > href="http://www.users.bigpond.com/ddparnell/contest1/rules.htm">http://www.users.bigpond.com/ddparnell/contest1/rules.htm</a> > > > > -- > > Derek Parnell > > Melbourne, Australia > > > > Hi, Derek. > > I'm kind of confused on the results as I'm not sure how to fix them. > > My entry says wrong count and failed file6 (the empty file) but my program > just prints a message and then exits. Should it still try to print some kind > of count for a zero-length file or a file with no discernable words? This is my mistake. Currently, the program I wrote to evaluate the results only looks for 'counts' and formatting. For this file, I need to be a bit smarter. However, this only lost you one point out of 460 so its not a big deal (yet). > I also don't understand how it failed on file8 (War and Peace) since I've been > testing this file (or one similar) all through the development. Don't know either. I'm not with the machine I tested the submission on at the moment so I can't inspect the detailed results. However, because you got all the top-used tokens correct, I'm going to guess that its the 'funny' tokens you are having problems with. My copy of file8 has been doctored somewhat to include some odd looking tokens. Check again for tokens that might contain quotes and/or hyphens, especially at the start or end of a string. Also, strings just made up of quotes gave my program problems at first. > Hmmm... Difficult to troubleshoot when I don't know what my program is doing > wrong. > > Any troubleshooting hints anyone? I'll give a hint that some people may have tripped up on. A file opened as "text" will appear to prematurely end if it contains the End-Of-File marker for text files. -- Derek Parnell Melbourne, Australia
5. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 07, 2004
- 524 views
- Last edited Nov 08, 2004
Jason Gade wrote: > > Also I realize that coding style is a subjective category, but as I am a > hobbyist > and not a professional do you have any tips for improvement? > >From memory, your coding style was pretty good. The contest results page shows a reduced style score because it failed some of the non-timed files. If I may suggest a book and web page ... http://www.cc2e.com/ This is the home page for Steve McConnell's "Code Complete" book. -- Derek Parnell Melbourne, Australia
6. Re: Contest Update
- Posted by Andy Serpa <ac at onehorseshy.com> Nov 07, 2004
- 525 views
- Last edited Nov 08, 2004
Derek Parnell wrote: > > I'll give a hint that some people may have tripped up on. > > A file opened as "text" will appear to prematurely end if it contains > the End-Of-File marker for text files. > I suspected that might be the case. But penalizing the programmer for treating a file that is supposed to be text as text seems wrong. It the input file was not to be treated as text, then the rules should say that tokens contain the *bytes*: {65,66,67, etc.} instead of: "ABC ... etc." Making an assumption that one should continue after an EOF marker could be wrong if this was a "real-world" application. Sticking to this esoterica would seem to be making the contest about "who can best interpret logical loopholes in the rules" rather than best program a well-defined task. Alternatively, simply put in a rule that says, "Input files should be opened in binary mode". -- Andy
7. Re: Contest Update
- Posted by Jason Gade <jaygade at yahoo.com> Nov 07, 2004
- 516 views
- Last edited Nov 08, 2004
Derek Parnell wrote: > > Jason Gade wrote: > > I also don't understand how it failed on file8 (War and Peace) since I've > > been > > testing this file (or one similar) all through the development. > > Don't know either. I'm not with the machine I tested the submission on > at the moment so I can't inspect the detailed results. However, because > you got all the top-used tokens correct, I'm going to guess that its the > 'funny' tokens you are having problems with. My copy of file8 has been > doctored somewhat to include some odd looking tokens. > > Check again for tokens that might contain quotes and/or hyphens, > especially at the start or end of a string. Also, strings just made > up of quotes gave my program problems at first. > > > > Hmmm... Difficult to troubleshoot when I don't know what my program is doing > > wrong. > > > > Any troubleshooting hints anyone? > > I'll give a hint that some people may have tripped up on. > > A file opened as "text" will appear to prematurely end if it contains > the End-Of-File marker for text files. > > -- > Derek Parnell > Melbourne, Australia > Okay. So in my testing I made a file that contained edge cases identified in the rules and they were counted correctly. Also I do open the file in binary mode, so... hmm. I may need to think of some new edge cases to test for. Currently, the program follows these rules: -- words consist of upper and lower case letters, digits 0-9, single quote and dash; -- for the purposes of comparison, case does not matter and quotes are not counted; -- words consisting of only digits, or digits and dashes, are not counted as words -- unless they are quoted; -- words of zero length after quotes are removed are not counted. If I am interpreting the rules correctly, I will try to come up with a new (short) test file to validate with. I wish now that I had saved the version of your web page that had your unique counts and total counts for each file posted -- at least then it would be easier to compare with. It kind of sucks that the calibration file works perfectly but the others do not!! ;^) ------------------------------------ Too many freaks, not enough circuses. j.
8. Re: Contest Update
- Posted by Patrick Barnes <mrtrick at gmail.com> Nov 07, 2004
- 519 views
- Last edited Nov 08, 2004
On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com> wrote: > Making an assumption that one should continue after an EOF marker could be > wrong if this was a "real-world" application. Sticking to this esoterica would > seem to be making the contest about "who can best interpret logical loopholes in > the rules" rather than best program a well-defined task. > > Alternatively, simply put in a rule that says, "Input files should be opened > in binary mode". I believe that this is covered in the programming style criteria: "Defensive coding that is tolerant of bad data." That's why files 6-11 are there - they contain a lot of border cases that may trip up programs less tolerant. You're not penalised anywhere near as much for making mistakes with these files than you are with the first 5. -- MrTrick
9. Re: Contest Update
- Posted by cklester <cklester at yahoo.com> Nov 07, 2004
- 519 views
- Last edited Nov 08, 2004
Derek Parnell wrote: > > Six new submissions on web. LOL! I take reconciliation that each attempt has at least been faster! DOH!! -=ck "Programming in a state of EUPHORIA." http://www.cklester.com/euphoria/
10. Re: Contest Update
- Posted by Andy Serpa <ac at onehorseshy.com> Nov 07, 2004
- 506 views
- Last edited Nov 08, 2004
Patrick Barnes wrote: > > On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com> > wrote: > > Making an assumption that one should continue after an EOF marker could be > > wrong if this was > a "real-world" application. Sticking to this esoterica would seem to be > making the contest about "who can best interpret > logical loopholes in the rules" rather than best program a well-defined > task.</font></i> > > > > Alternatively, simply put in a rule that says, "Input files should be opened > > in binary mode". > > I believe that this is covered in the programming style criteria: > "Defensive coding that is tolerant of bad data." That's why files 6-11 > are there - they contain a lot of border cases that may trip up > programs less tolerant. You're not penalised anywhere near as much for > making mistakes with these files than you are with the first 5. > > I understand bad data, but ignoring an EOF marker in a text file is making an assumption that I wouldn't neccessarily consider correct. Some data is so "bad" that you can't expect the program to know what to do with it (unless that case is explicitly covered in the rules). Should I make guesses at what other "bad" bytes in the file are "supposed to be" and adjust my token counts accordingly? I just don't think it is reasonable, just as if there was a rule that said, "Program must continue to perform while computer is set on fire."
11. Re: Contest Update
- Posted by Patrick Barnes <mrtrick at gmail.com> Nov 07, 2004
- 513 views
- Last edited Nov 08, 2004
On Sun, 07 Nov 2004 15:13:18 -0800, Andy Serpa <guest at rapideuphoria.com> wrote: Hey at least you're told what sort of file it is. Knowing that a file is random binary values will give you a bit of a clue... -- MrTrick
12. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 07, 2004
- 521 views
- Last edited Nov 08, 2004
Andy Serpa wrote: > > Patrick Barnes wrote: > > > > On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com> > > wrote: > > > Making an assumption that one should continue after an EOF marker could be > > > wrong if this was > > a "real-world" application. Sticking to this esoterica would seem to be > > making the contest > about "who can best interpret</font></i> > > logical loopholes in the rules" rather than best program a well-defined > > task.</font></i> > > > > > > Alternatively, simply put in a rule that says, "Input files should be > > > opened in binary mode". > > > > I believe that this is covered in the programming style criteria: > > "Defensive coding that is tolerant of bad data." That's why files 6-11 > > are there - they contain a lot of border cases that may trip up > > programs less tolerant. You're not penalised anywhere near as much for > > making mistakes with these files than you are with the first 5. > > > > > I understand bad data, but ignoring an EOF marker in a text file is making an > assumption > that I wouldn't neccessarily consider correct. Some data is so "bad" that you > can't > expect the program to know what to do with it (unless that case is explicitly > covered > in the rules). Should I make guesses at what other "bad" bytes in the file > are "supposed > to be" and adjust my token counts accordingly? > > I just don't think it is reasonable, just as if there was a rule that said, > "Program > must continue to perform while computer is set on fire." If I may weigh in here, the EOF marker is not bad data. It is perfectly allowable in text files and *any* program that reads a file as a text file ought to take it into consideration, as you have said. However, who said they were going to be text files? The confusion might come about because it was assumed that the test files would be 'text' files. There is no statement in the rules that this would be the case. All it says, and I quote rule 13, ... "The file should only contain bytes in the range #00 - #7F, and you can consider those to be ASCII characters." A better program would most likely validate the input rather than assuming it to be correct. I'm sorry I didn't make it too easy for people, but this specification is still a whole lot better than most specifications received from a client. So yes, I could have told you how to open the file, how to hash the tokens, how to store the lists of tokens, how to sort or order the tokens, how to ... etc... but I didn't. I wanted you to work out some of the 'traps' that might be there. Its a game more than a contest, so one needs a bit of an interesting challenge. If anyone wants to withdraw, just let me know. -- Derek Parnell Melbourne, Australia
13. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 07, 2004
- 531 views
- Last edited Nov 08, 2004
Jason Gade wrote: [snip] > Okay. So in my testing I made a file that contained edge cases identified in > the rules > and they were counted correctly. > > Also I do open the file in binary mode, so... hmm. > > I may need to think of some new edge cases to test for. > > Currently, the program follows these rules: > > -- words consist of upper and lower case letters, digits 0-9, single quote > and dash; > -- for the purposes of comparison, case does not matter and quotes are not > counted; > -- words consisting of only digits, or digits and dashes, are not counted > as words > -- unless they are quoted; > -- words of zero length after quotes are removed are not counted. > > If I am interpreting the rules correctly, I will try to come up with a new > (short) > test file to validate with. That summary is pretty good. It seems you understand the 'token' idea. > I wish now that I had saved the version of your web page that had your unique > counts > and total counts for each file posted -- at least then it would be easier to > compare > with. It kind of sucks that the calibration file works perfectly but the > others > do not!! ;^) Agreed. My first attempt worked perfectly well, and fast, with the calibration file. In fact, I thought I had everything under control so that became my 'frozen' code. Then people started submitting their efforts and they were consistently getting different counts to my program. After a few, I realized that these people were agreeing with each other but not my results. Then it I got it! My program had a bug (or two). My program failed all the other test files, even though it breezed through the first file. So how stupid do I look! LOL! Tonight when I get home, I'll have a closer look at your results to see if I can find any clues for you. -- Derek Parnell Melbourne, Australia
14. Re: Contest Update
- Posted by Andy Serpa <ac at onehorseshy.com> Nov 08, 2004
- 514 views
Derek Parnell wrote: > > > If I may weigh in here, the EOF marker is not bad data. It is perfectly > allowable in text files and *any* program that reads a file as a text file > ought to take it into consideration, as you have said. However, who said > they were going to be text files? > > The confusion might come about because it was assumed that the test files > would be 'text' files. There is no statement in the rules that this would > be the case. All it says, and I quote rule 13, ... > > "The file should only contain bytes in the range #00 - #7F, and you can > consider those to be ASCII characters." > > A better program would most likely validate the input rather than assuming > it to be correct. > > I'm sorry I didn't make it too easy for people, but this specification is > still a whole lot better than most specifications received from a client. > So yes, I could have told you how to open the file, how to hash the tokens, > how to store the lists of tokens, how to sort or order the tokens, > how to ... etc... but I didn't. I wanted you to work out some of the > 'traps' that might be there. > > Its a game more than a contest, so one needs a bit of an interesting > challenge. > > If anyone wants to withdraw, just let me know. > I just figured you wanted this to be a programming challenge rather than a "rule-reading" challenge. I would submit that your referring to the tokens as "characters" rather than "bytes" (at least in some places) would allow one to assume that the inputs are *supposed* to be text files. "Characters" only exist in text, at least in a Euphoria context (maybe if this were C, where a character might be assumed to be any single byte). I also thinking making assumptions about the intentions of the creator of the input file (maybe the EOF is supposed to be there, and you are NOT intended to read past it) is questionable. If the scope of the contest is to include the possibility of an ambiguous program specification where the true wants of the (hypothetical) "client" are therefore unknown, then my interpretation is just as valid as yours, and the programmer should only be penalized if his program handles the same (ambiguous) situation inconsistenly with different input files. Or, alternatively, as I said before, if you (as the "client") remove such ambiguousness from the rules/specification.
15. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 08, 2004
- 509 views
Andy Serpa wrote: > > Derek Parnell wrote: > > > > > > If I may weigh in here, the EOF marker is not bad data. It is perfectly > > allowable in text files and *any* program that reads a file as a text file > > ought to take it into consideration, as you have said. However, who said > > they were going to be text files? > > > > The confusion might come about because it was assumed that the test files > > would be 'text' files. There is no statement in the rules that this would > > be the case. All it says, and I quote rule 13, ... > > > > "The file should only contain bytes in the range #00 - #7F, and you can > > consider those to be ASCII characters." > > > > A better program would most likely validate the input rather than assuming > > it to be correct. > > > > I'm sorry I didn't make it too easy for people, but this specification is > > still a whole lot better than most specifications received from a client. > > So yes, I could have told you how to open the file, how to hash the tokens, > > how to store the lists of tokens, how to sort or order the tokens, > > how to ... etc... but I didn't. I wanted you to work out some of the > > 'traps' that might be there. > > > > Its a game more than a contest, so one needs a bit of an interesting > > challenge. > > > > If anyone wants to withdraw, just let me know. > > > > I just figured you wanted this to be a programming challenge rather than a > "rule-reading" > challenge. Yes I did. But I submit that 'programming' is more than 'coding'. It also includes understanding the specs amongst other things. > I would submit that your referring to the tokens as "characters" rather > than "bytes" (at least in some places) would allow one to assume that the > inputs are > *supposed* to be text files. Yes again. That was one of the 'traps' I included. I refer to 'token text' and characters. The idea behind this was so that one might make the assumption that the file itself was a true ASCII text file, rather than a file that contained ASCII text tokens. However, it is also quite possible that some might not make this assumption, and many people did not. > "Characters" only exist in text, at least in a Euphoria > context (maybe if this were C, where a character might be assumed to be any > single > byte). A 'binary' can contain text though, no? Have look at your typical .EXE file and while mostly it is not text, you can see text embedded in it. > I also thinking making assumptions about the intentions of the creator of the > input file (maybe the EOF is supposed to be there, and you are NOT intended to > read > past it) is questionable. Yes, you are right again. So did anyone question this before the contest started? Time was made available for clarifying the specification. This point was not brought up, so I didn't clarify it. Didn't want to make it a perfect spec, did I? > If the scope of the contest is to include the possibility > of an ambiguous program specification where the true wants of the > (hypothetical) "client" > are therefore unknown, then my interpretation is just as valid as yours, and > the programmer > should only be penalized if his program handles the same (ambiguous) situation > inconsistenly > with different input files. So, you want to withdraw then? Want your money back? No problems. Look, a number of people tripped on this one, and few even fixed it up themselves. But as I could see that some others were having an issue with it, I tried to help with a 'hint'. It's a game. If I was a real client I would have mentioned this much, much earlier. That's what prototypes are good at doing - defining the real specification. > Or, alternatively, as I said before, if you (as the "client") remove such > ambiguousness > from the rules/specification. Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy? -- Derek Parnell Melbourne, Australia
16. Re: Contest Update
- Posted by Derek Parnell <ddparnell at bigpond.com> Nov 08, 2004
- 509 views
Derek Parnell wrote: [snip] > Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy? Done. The rules page is now clearer (I hope) on this point. By the way, on another similar issue. The specs say that the output should list all the token-length frequences up to the largest one found in the file. So what should be output if you get a zero-count for a length less than the largest one? Some people have displayed a zero count and others have omitted the line altogether. I've been lenient and allowed both interpretations. http://www.users.bigpond.com/ddparnell/contest1/rules.htm -- Derek Parnell Melbourne, Australia
17. Re: Contest Update
- Posted by Andy Serpa <ac at onehorseshy.com> Nov 08, 2004
- 522 views
Derek Parnell wrote: > > Look, a number of people tripped on this one, and few even fixed it > up themselves. But as I could see that some others were having an issue > with it, I tried to help with a 'hint'. It's a game. If I was a real > client I would have mentioned this much, much earlier. That's what > prototypes are good at doing - defining the real specification. > I was actually arguing that I was *not* tripped up -- that I in fact did right according to the rules but that the rules were wrong for what you apparently wanted. Arrogant, I know. > > Or, alternatively, as I said before, if you (as the "client") remove such > > ambiguousness > > from the rules/specification. > > Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy? > No need to get upset. If you were a real "client", of course I'd be asking you questions. You'd even be allowed to answer them...
18. Re: Contest Update
- Posted by CoJaBo <cojabo at suscom.net> Nov 08, 2004
- 542 views
Andy Serpa wrote: > > Patrick Barnes wrote: > > > > On Sun, 07 Nov 2004 14:50:20 -0800, Andy Serpa <guest at rapideuphoria.com> > > wrote: > > > Making an assumption that one should continue after an EOF marker could be > > > wrong if this was > > a "real-world" application. Sticking to this esoterica would seem to be > > making the contest > about "who can best interpret</font></i> > > logical loopholes in the rules" rather than best program a well-defined > > task.</font></i> > > > > > > Alternatively, simply put in a rule that says, "Input files should be > > > opened in binary mode". > > > > I believe that this is covered in the programming style criteria: > > "Defensive coding that is tolerant of bad data." That's why files 6-11 > > are there - they contain a lot of border cases that may trip up > > programs less tolerant. You're not penalised anywhere near as much for > > making mistakes with these files than you are with the first 5. > > > > > I understand bad data, but ignoring an EOF marker in a text file is making an > assumption That is one reason I always open files in binary mode. > that I wouldn't neccessarily consider correct. Some data is so "bad" that you > can't > expect the program to know what to do with it (unless that case is explicitly > covered > in the rules). Should I make guesses at what other "bad" bytes in the file > are "supposed > to be" and adjust my token counts accordingly? > > I just don't think it is reasonable, just as if there was a rule that said, > "Program > must continue to perform while computer is set on fire." If anyone finds a program that can run on a burnt computer, Id like to have a copy to use on my old laptop... >
19. Re: Contest Update
- Posted by "Juergen Luethje" <j.lue at gmx.de> Nov 08, 2004
- 527 views
Derek Parnell wrote: > Derek Parnell wrote: > > [snip] > >> Okay, seeing the 'trap' has been sprung, I'll clarify the rules. Happy? > > Done. The rules page is now clearer (I hope) on this point. > > By the way, on another similar issue. The specs say that the output > should list all the token-length frequences up to the largest one > found in the file. So what should be output if you get a zero-count > for a length less than the largest one? Strictly speaking: Zero. Zero is not nothing. Zero is a count as valid as any other count. > Some people have displayed > a zero count and others have omitted the line altogether. I've been > lenient and allowed both interpretations. > > http://www.users.bigpond.com/ddparnell/contest1/rules.htm Less strictly speaking: I think this is appropriate. Regards, Juergen
20. Re: Contest Update
- Posted by "Juergen Luethje" <j.lue at gmx.de> Nov 08, 2004
- 546 views
Andy Serpa wrote: > Derek Parnell wrote: >> >> I'll give a hint that some people may have tripped up on. >> >> A file opened as "text" will appear to prematurely end if it contains >> the End-Of-File marker for text files. > > I suspected that might be the case. But penalizing the programmer for > treating a file that is supposed to be text as text seems wrong. It > the input file was not to be treated as text, then the rules should say > that tokens contain the *bytes*: > > {65,66,67, etc.} > > instead of: > > "ABC ... etc." > > Making an assumption that one should continue after an EOF marker could > be wrong if this was a "real-world" application. Maybe in a certain context. However, AFAIK most modern programs use the size of a file in order to detect its end, rather than the occurrence of the ASCII 26 character. In another context, continuing after an ASCII 26 character might be absolutely reasonable. Users of one of my programs explicitly have asked me to change the program, so that it *does* continue after an ASCII 26 character. This is because text files sometimes get corrupted, so that an EOF marker gets somewhere inside it, where it was never intended to be. I changed the program, so that it reads the text files in binary mode, and replaces every ASCII 26 character with the string "<EOF>". It's very robust now. > Sticking to this esoterica would seem to be making the contest about > "who can best interpret logical loopholes in the rules" rather than > best program a well-defined task. Derek's rules stated clearly: "The file should only contain bytes in the range #00 - #7F". This includes 26. <snip> Regards, Juergen