Re: http_get does not retrieve page content
- Posted by useless_ Jan 09, 2013
- 1400 views
My recent post on this topic in the ticket section was deleted. In question is how Eu parses a http header, to retrieve webpages.
Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.
It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.
This sounds very reasonable to me.
What exactly is it in the webpage data that's causing the Eu library routines to reject those webpages?
The decision, as i understand it, is which chars are line terminators in the header, and what order and quantity they are. There is an RFC and a "standard", which isn't always followed to the letter, and this is a common situation online (the frequent cause of browser wars).
It's my contention that trim() will solve for all "standard" situations as well as non-standard situations.
useless