1. sequences and data manipulation

is there any book on data manipulation.i mean something deeper.i would like to get my hands on something like that

new topic     » topic index » view message » categorize

2. Re: sequences and data manipulation

mexzony said...

is there any book on data manipulation.i mean something deeper.i would like to get my hands on something like that

Deeper than WHAT? You may likely have to be a little more specific about what you want, particularly with regard to what you want to DO.

new topic     » goto parent     » topic index » view message » categorize

3. Re: sequences and data manipulation

DanM said...
mexzony said...

is there any book on data manipulation.i mean something deeper.i would like to get my hands on something like that

Deeper than WHAT? You may likely have to be a little more specific about what you want, particularly with regard to what you want to DO.


If you want to only toss tokens around, look at the strtok lib i wrote years ago (oops, no, you need strtok v4, which remains uselessly unreleased). If you want to store and sort tokens, look at Jiri Babor's "associated lists" code, especially his later works. If you want to do basic data extraction, spend a lifetime studying all the different regex versions. If you want to do real data extraction, google "data mining". If you want your program to have a clue what the data you just mined means, see if you understand what the following urls' content means.

http://en.wikipedia.org/wiki/Type-token_distinction
http://en.wikipedia.org/wiki/Semantic_spectrum
http://en.wikipedia.org/wiki/Knowledge_representation
http://en.wikipedia.org/wiki/Strong_AI

You may also want to study http://en.wikipedia.org/wiki/Punctuation as it relates to being better understood online.

PS (added after first post)
David Cuny wrote a basic difference engine include at my request some 10 years ago that i find nearly indispensable for understanding natural language. His code runs as submitted, but there's a lot of features one can add to it. For getting around four bytes (plus some) per ascii char, see the C-strings in mixedlib by Bernie Ryan (others also wrote some, and it might be in win32lib) written 10 years ago. If you are deluded by "markov generation", someone wrote on in Eu, years ago. If you want to consider nlp as a series of executeable code lines, try pestering Matt to put his exec()/eval() (from OOEU) into Eu v4 (go ahead, try pestering him, i was asking for this function in Eu since the 1990's). I also find my revisions of Soundex (almost a joke), Metaphone (much better), and the unreleased Tiggrphone (very versatile), to be essential for figuring out wtf some human just typoed. For scouring the internet to find data, look at the basics of news.ex (not the new version, the old code i wrote that was thrown out, even tho it worked faster than anyone elses), or use http://www.gnu.org/software/wget/ .

As always now,
by popular demand,
i remain,

useless

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu