error detection : was Re: Robert...EDS - questons/comments
- Posted by Kat <gertie at PELL.NET> Feb 13, 2001
- 407 views
Ok, consider the following sentence, which combines a few errors i have seen on irc in the last few days: "i dent no there car was int he shop, i guest the engine tailed?" Every word is spelled right, and is in the dictionary ("int" being a legal abbreviation), but 7 of the 14 words is incorrect! And some prasers would fault "i" because it isn't capitalised, making an error rate of 64% for that one sentence. Granted, it is a compilation of several errors which actually took place over several sentences, but i wanted to make the point, you cannot count on anything being correct. I have seen sentences *i* could not understand, with every word in a sentence fragment being misspelled. Toss in the possibility that you have a person who doesn't use english as a native language and mixed in words from another language, some syntax errors in both languages, and possible attempts at humor by playing on words, and anything goes! Even if you have one error per sentence, in some rooms the text flows so fast that it is difficult for a human to read. The program must be able to read at least as fast as a human in order to be useable to the channel, twice as fast would be better. At the moment, i will haveto consider simply ignoring some text, due to it being un- understandable by the program, but processing power will be wasted discovering it is too riddled with errors prior to reaching the conclusion that it is not understandable. I have pestered a couple people about linking puters online, to distribute the parsing work, but no one is interested. Ai listservs are folding faster than they are popping up, and practical natural language parser listservs are folding even faster. Oh well <sigh>.... Kat