error detection : was Re: Robert...EDS - questons/comments

new topic     » topic index » view thread      » older message » newer message

Ok, consider the following sentence, which combines a few errors i have seen on
irc in
the last few days:

"i dent no there car was int he shop, i guest the engine tailed?"

Every word is spelled right, and is in the dictionary ("int" being a legal
abbreviation),
but 7 of the 14 words is incorrect! And some prasers would fault "i" because it
isn't
capitalised, making an error rate of 64% for that one sentence. Granted, it is a
compilation of several errors which actually took place over several sentences,
but i
wanted to make the point, you cannot count on anything being correct. I have
seen
sentences *i* could not understand, with every word in a sentence fragment being
misspelled. Toss in the possibility that you have a person who doesn't use
english as
a native language and mixed in words from another language, some syntax errors
in
both languages, and possible attempts at humor by playing on words, and anything
goes! Even if you have one error per sentence, in some rooms the text flows so
fast
that it is difficult for a human to read. The program must be able to read at
least as fast
as a human in order to be useable to the channel, twice as fast would be better.

At the moment, i will haveto consider simply ignoring some text, due to it being
un-
understandable by the program, but processing power will be wasted discovering
it is
too riddled with errors prior to reaching the conclusion that it is not
understandable. I
have pestered a couple people about linking puters online, to distribute the
parsing
work, but no one is interested. Ai listservs are folding faster than they are
popping up,
and practical natural language parser listservs are folding even faster. Oh well
<sigh>....

Kat

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu