1. Re: [OT]Where's everyone from? Contest
- Posted by "Kat" <gertie at visionsix.com> Aug 08, 2004
- 410 views
On 8 Aug 2004, at 12:00, EUforum at topica.com wrote: > On 7 Aug 2004, at 7:32, irv mullins wrote: > > > > > posted by: irv mullins <irvm at ellijay.com> > > > > irv mullins wrote: > > > > > Perhaps the program would be more of a challenge, and more useful, > > > if it could access an existing database on the web. That way, it > > > could immediately be used by others. For example, a business could > > > visualize their customer base, a web-ring could chart their members, > > > etc. A well-done program like that would certainly get some attention > > > on SourceForge or Slashdot, etc. Perhaps even a writeup in a magazine, > > > which > > > would be good for Euphoria (and RDS). > > > > Replying to my own message, http://worldatlas.com/aatlas/imageg.htm > > has the needed info for almost everywhere in the world. Writing code > > to access that site and extract the needed info would be an interesting > > task. > > It would be as trivial as mining APOD (because of the pic links) and > pantheon.org (for the data), which i have done. And as trivial as what i have been doing for a month now. On one topic of interest, started a month ago, grabbing several hundred megabytes of web pages (64megs from one domain, and i just copied over 4 full zip disks for someone on another topic), using 3 different remote proxies (for assorted reasons), and running at 100% cpu 24-7 the last 2 weeks doing the data extraction. I am at about 65 megabytes of *extracted/munged* data from *one* of 5 domains now, and am only on 'E'. One list of urls alone is 32 megabytes (300,000+ urls), another is 25 megabytes (245,000+ urls). Two other url lists look like they will grow past those sizes. Probably be just another case of "oh you have all that data, but i need only one line of it, and since you have it already, i will go look for it myself.". Data mining is trivial, even using an existing online source in real time. (Tiggr used Babblefish about 1998-99, but the results were often too wierd to understand, and lag was horrible.) The catch is that people think your bot is broken because of internet lag (the Mars roving bots get better bandwidth from Mars to Earth than i can get), and they disparage it frequently and deeply, and then they begin abusing it. Kat