Re: Python vs Euphoria for web scraping
- Posted by petelomax Aug 01, 2019
- 1811 views
euphoric said...
But you have got to make grabbing URL output easier. Please!
I should be able to do something like this:
string page_html = get_url("http://phix.x10.mx/docs/html/phix.htm")
without having to go through all the "easy" cURL set-up steps, especially for just web scraping.
Or are all those steps unavoidable?
No problem. In the next release curl_easy_perform_ex() will accept either a curl handle (as now), or a plain string url.
Should you want it sooner, just replace this in libcurl.e:
global function curl_easy_perform_ex(object curl) -- see also curl_multi_perform_ex, if you modify this. enter_cs(ceb_cs) integer slot_no = 0 for i=1 to length(curl_easy_buffers) do if integer(curl_easy_buffers[i]) then curl_easy_buffers[i] = "" slot_no = i exit end if end for if slot_no=0 then curl_easy_buffers = append(curl_easy_buffers,"") -- curl_multi_rids = append(curl_multi_rids,0) slot_no = length(curl_easy_buffers) end if leave_cs(ceb_cs) bool free_curl = false, was_global_init = global_init if string(curl) then string url = curl if not was_global_init then curl_global_init() end if curl = curl_easy_init() curl_easy_setopt(curl, CURLOPT_URL, url) free_curl = true end if -- set callback function to receive data curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_cb) curl_easy_setopt(curl, CURLOPT_WRITEDATA, slot_no) -- get file integer ret = curl_easy_perform(curl) if free_curl then curl_easy_cleanup(curl) if not was_global_init then curl_global_cleanup() end if end if enter_cs(ceb_cs) string res = curl_easy_buffers[slot_no] curl_easy_buffers[slot_no] = 0 -- (can now be reused) leave_cs(ceb_cs) if ret!=CURLE_OK then return ret end if return res end function
I have also simplified https://rosettacode.org/wiki/Web_scraping#Phix