1. Web scraping
- Posted by Craig Welch <craig at singmail.com> Oct 17, 2005
- 560 views
Has anyone written a web scraper in any form?
2. Re: Web scraping
- Posted by Michael J. Sabal <m_sabal at yahoo.com> Oct 17, 2005
- 548 views
Craig Welch wrote: > > Has anyone written a web scraper in any form? > > I'm not sure what you mean by a "web scraper", but I have a primitive version of a library I started for use with my natural language project. http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu The project is GPL, but the routines in this library are so basic I don't think it applies. I'm sure others in the community have done things that are a little closer to what you're looking for, but hopefully this will give you a push in the right direction. O, did I mention it was written for Linux? Michael J. Sabal Project page: https://sourceforge.net/projects/teknik
3. Re: Web scraping
- Posted by Craig Welch <craig at singmail.com> Oct 18, 2005
- 564 views
Michael J. Sabal wrote: >I'm not sure what you mean by a "web scraper", but I have a primitive version >of a library I started for use with my natural language project. > >http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu > > Thanks, those routines are most useful for me. From Wikipedia: "Screen scraping is the act of capturing data from a system or program by capturing and interpreting the contents of some display that is not actually intended for data transport or inspection by programs". From the website of a commercial web-scraping product: You might use our technology and services to: * Extract product information from an e-commerce web site, then download it to a spreadsheet * Build a meta-search engine that queries multiple search engines simultaneously in real-time * Generate an RSS feed from a company intranet announcements page * Integrate multiple web-based applications into a single interface
4. Re: Web scraping
- Posted by Craig Welch <craig at singmail.com> Oct 18, 2005
- 569 views
Michael J. Sabal wrote: >I'm not sure what you mean by a "web scraper", but I have a primitive version >of a library I started for use with my natural language project. > >http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu > > Thanks, those routines are most useful for me. From Wikipedia: "Screen scraping is the act of capturing data from a system or program by capturing and interpreting the contents of some display that is not actually intended for data transport or inspection by programs". From the website of a commercial web-scraping product: You might use our technology and services to: * Extract product information from an e-commerce web site, then download it to a spreadsheet * Build a meta-search engine that queries multiple search engines simultaneously in real-time * Generate an RSS feed from a company intranet announcements page * Integrate multiple web-based applications into a single interface -- Craig