1. Web scraping

Has anyone written a web scraper in any form?

new topic     » topic index » view message » categorize

2. Re: Web scraping

Craig Welch wrote:
> 
> Has anyone written a web scraper in any form?
> 
> 


I'm not sure what you mean by a "web scraper", but I have a primitive version
of a library I started for use with my natural language project.  

http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu

The project is GPL, but the routines in this library are so basic I don't 
think it applies.  I'm sure others in the community have done things that
are a little closer to what you're looking for, but hopefully this will
give you a push in the right direction.  

O, did I mention it was written for Linux?

Michael J. Sabal

Project page:
https://sourceforge.net/projects/teknik

new topic     » goto parent     » topic index » view message » categorize

3. Re: Web scraping

Michael J. Sabal wrote:

>I'm not sure what you mean by a "web scraper", but I have a primitive version
>of a library I started for use with my natural language project.  
>
>http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu
>  
>
Thanks, those routines are most useful for me.

 From Wikipedia: "Screen scraping is the act of capturing data from a
system or program by capturing and interpreting the contents of some
display that is not actually intended for data transport or inspection
by programs".

 From the website of a commercial web-scraping product:  You might use
our technology and services to:

    * Extract product information from an e-commerce web site, then
download it to a spreadsheet
    * Build a meta-search engine that queries multiple search engines
simultaneously in real-time
    * Generate an RSS feed from a company intranet announcements page
    * Integrate multiple web-based applications into a single interface

new topic     » goto parent     » topic index » view message » categorize

4. Re: Web scraping

Michael J. Sabal wrote:

>I'm not sure what you mean by a "web scraper", but I have a primitive version
>of a library I started for use with my natural language project.  
>
>http://cvs.sourceforge.net/viewcvs.py/teknik/components-english/webtools.eu
>  
>
Thanks, those routines are most useful for me.

 From Wikipedia: "Screen scraping is the act of capturing data from a 
system or program by capturing and interpreting the contents of some 
display that is not actually intended for data transport or inspection 
by programs".

 From the website of a commercial web-scraping product:  You might use 
our technology and services to:

    * Extract product information from an e-commerce web site, then 
download it to a spreadsheet
    * Build a meta-search engine that queries multiple search engines 
simultaneously in real-time
    * Generate an RSS feed from a company intranet announcements page
    * Integrate multiple web-based applications into a single interface

-- 
Craig

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu