1. Python vs Euphoria for web scraping
- Posted by euphoric (admin) Jul 30, 2019
- 1685 views
This article says Python is the "ideal language for this job." I suspect Euphoria or Phix could be better.
Anyway, it would be cool to see a Euphoria or Phix implementation.
2. Re: Python vs Euphoria for web scraping
- Posted by ghaberek (admin) Jul 31, 2019
- 1683 views
I always tout Euphoria as a great language for text parsing! If someone could port Python's built-in HTMLParser to Euphoria, it might be easier to lure developers with that.
And if we could port Beautiful Soup, that'd be even more impressive. (It uses a lot of Python-isms that may not translate well to Euphoria.)
-Greg
3. Re: Python vs Euphoria for web scraping
- Posted by euphoric (admin) Jul 31, 2019
- 1677 views
There seem to be a ton of HTML parser libraries made with C. Would it be cheating to use one of those?
HTMLTidy seems very capable!
4. Re: Python vs Euphoria for web scraping
- Posted by ChrisB (moderator) Jul 31, 2019
- 1608 views
Hi
Some people call it cheating, some people call it using the expired patent on the wheel.
Cheers
Chris
5. Re: Python vs Euphoria for web scraping
- Posted by ghaberek (admin) Jul 31, 2019
- 1610 views
Would it be cheating to use one of those?
I would say that if you're designing an application which needs the functionality, using an external library is perfectly fine.
But, if you're trying to show off Euphoria's strengths over other languages, it's probably best to write it natively in Euphoria.
-Greg
6. Re: Python vs Euphoria for web scraping
- Posted by petelomax Jul 31, 2019
- 1619 views
BTDT:
https://rosettacode.org/wiki/Web_scraping#Phix
https://rosettacode.org/wiki/Rosetta_Code/Rank_languages_by_popularity#Phix
https://rosettacode.org/wiki/Rosetta_Code/Tasks_without_examples#Phix
7. Re: Python vs Euphoria for web scraping
- Posted by euphoric (admin) Jul 31, 2019
- 1619 views
Nice!
But you have got to make grabbing URL output easier. Please!
I should be able to do something like this:
string page_html = get_url("http://phix.x10.mx/docs/html/phix.htm")
without having to go through all the "easy" cURL set-up steps, especially for just web scraping.
Or are all those steps unavoidable?
8. Re: Python vs Euphoria for web scraping
- Posted by petelomax Aug 01, 2019
- 1562 views
But you have got to make grabbing URL output easier. Please!
I should be able to do something like this:
string page_html = get_url("http://phix.x10.mx/docs/html/phix.htm")
without having to go through all the "easy" cURL set-up steps, especially for just web scraping.
Or are all those steps unavoidable?
No problem. In the next release curl_easy_perform_ex() will accept either a curl handle (as now), or a plain string url.
Should you want it sooner, just replace this in libcurl.e:
global function curl_easy_perform_ex(object curl) -- see also curl_multi_perform_ex, if you modify this. enter_cs(ceb_cs) integer slot_no = 0 for i=1 to length(curl_easy_buffers) do if integer(curl_easy_buffers[i]) then curl_easy_buffers[i] = "" slot_no = i exit end if end for if slot_no=0 then curl_easy_buffers = append(curl_easy_buffers,"") -- curl_multi_rids = append(curl_multi_rids,0) slot_no = length(curl_easy_buffers) end if leave_cs(ceb_cs) bool free_curl = false, was_global_init = global_init if string(curl) then string url = curl if not was_global_init then curl_global_init() end if curl = curl_easy_init() curl_easy_setopt(curl, CURLOPT_URL, url) free_curl = true end if -- set callback function to receive data curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_cb) curl_easy_setopt(curl, CURLOPT_WRITEDATA, slot_no) -- get file integer ret = curl_easy_perform(curl) if free_curl then curl_easy_cleanup(curl) if not was_global_init then curl_global_cleanup() end if end if enter_cs(ceb_cs) string res = curl_easy_buffers[slot_no] curl_easy_buffers[slot_no] = 0 -- (can now be reused) leave_cs(ceb_cs) if ret!=CURLE_OK then return ret end if return res end function
I have also simplified https://rosettacode.org/wiki/Web_scraping#Phix