OpenEuphoria: Forum: http_get does not retrieve page content

1. http_get does not retrieve page content

Posted by useless_ Jan 09, 2013
1508 views

My recent post on this topic in the ticket section was deleted. In question is how Eu parses a http header, to retrieve webpages.

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

useless

new topic » topic index » view message » categorize

2. Re: http_get does not retrieve page content

Posted by useless_ Jan 09, 2013
1422 views

useless_ said...

My recent post on this topic in the ticket section was deleted. In question is how Eu parses a http header, to retrieve webpages.

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

useless

My first post on this subject now shows an EDIT flag. I didn't edit that post. (i did edit this one).

useless

Someone also edited this post, again changing what i said, and then deleted a followup post. There is a distinct lack of editorial integrity on this forum.

new topic » goto parent » topic index » view message » categorize

3. Re: http_get does not retrieve page content

Posted by DerekParnell (admin) Jan 09, 2013
1389 views

Kat said...

My recent post on this topic in the ticket section was deleted. In question is how Eu parses a http header, to retrieve webpages.

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

This sounds very reasonable to me.

What exactly is it in the webpage data that's causing the Eu library routines to reject those webpages?

new topic » goto parent » topic index » view message » categorize

4. Re: http_get does not retrieve page content

Posted by useless_ Jan 09, 2013
1398 views

DerekParnell said...

Kat said...

My recent post on this topic in the ticket section was deleted. In question is how Eu parses a http header, to retrieve webpages.

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

This sounds very reasonable to me.

What exactly is it in the webpage data that's causing the Eu library routines to reject those webpages?

The decision, as i understand it, is which chars are line terminators in the header, and what order and quantity they are. There is an RFC and a "standard", which isn't always followed to the letter, and this is a common situation online (the frequent cause of browser wars).

It's my contention that trim() will solve for all "standard" situations as well as non-standard situations.

useless

new topic » goto parent » topic index » view message » categorize

5. Re: http_get does not retrieve page content

Posted by jimcbrown (admin) Jan 09, 2013
1392 views

DerekParnell said...

Kat said...

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

This sounds very reasonable to me.

Me too, although I'd like to see some hard data (specific websites that demonstrate the symptoms of this issue - and if possible, statistics on how widespread this is and how other HTTP library implementations deal with this issue) before I'd feel comfortable implementing this kind of change myself.

Still, I'll defer to the group decision, as I usually do.

new topic » goto parent » topic index » view message » categorize

6. Re: http_get does not retrieve page content

Posted by useless_ Jan 09, 2013
1385 views

jimcbrown said...

DerekParnell said...

Kat said...

Not all http servers adhere exactly to the "standard". By strictly following the "standard" Eu will "break" when attempting to fetch a page with any deviation from what is acceptable to Eu. This procedure is now broken.

It's not Eu place to enforce the "standards", and it's unacceptable that Eu is voluntarily broken, and intolerant of any slight difference in the "standard". Eu will now refuse to get those webpages, when it is quite possible to gracefully get them.

This sounds very reasonable to me.

Me too, although I'd like to see some hard data (specific websites that demonstrate the symptoms of this issue - and if possible, statistics on how widespread this is and how other HTTP library implementations deal with this issue) before I'd feel comfortable implementing this kind of change myself.

Still, I'll defer to the group decision, as I usually do.

I don't log the different header syntax, i just get the webpages. And i no longer research other computer languages, because i found this one. I am telling you the situation has occured.

useless

new topic » goto parent » topic index » view message » categorize

7. Re: http_get does not retrieve page content

Posted by CoJaBo2 Jan 09, 2013
1365 views

useless_ said...

I don't log the different header syntax, i just get the webpages. And i no longer research other computer languages, because i found this one. I am telling you the situation has occured.

Ok, so log it next time. In the mean time, there is nothing that can be done about it, since there is no way to tell what the problem was. Indeed, it could well have simply been bug #831, which is now fixed.

new topic » goto parent » topic index » view message » categorize

8. Re: http_get does not retrieve page content

Posted by DerekParnell (admin) Jan 09, 2013
1373 views

CoJaBo2 said...

Ok, so log it next time. In the mean time, there is nothing that can be done about it, since there is no way to tell what the problem was. Indeed, it could well have simply been bug #831, which is now fixed.

I think that 'trim()' is a better way to go as that would be more tolerant of the variations that might exist in web servers out in the wild.

new topic » goto parent » topic index » view message » categorize

9. Re: http_get does not retrieve page content

Posted by CoJaBo2 Jan 09, 2013
1376 views

DerekParnell said...

I think that 'trim()' is a better way to go as that would be more tolerant of the variations that might exist in web servers out in the wild.

Is there any evidence to suggest that such servers do exist? I would be interested in seeing one.

new topic » goto parent » topic index » view message » categorize

10. Re: http_get does not retrieve page content

Posted by useless_ Jan 09, 2013
1363 views

CoJaBo2 said...

DerekParnell said...

I think that 'trim()' is a better way to go as that would be more tolerant of the variations that might exist in web servers out in the wild.

Is there any evidence to suggest that such servers do exist? I would be interested in seeing one.

There is my words.

useless

new topic » goto parent » topic index » view message » categorize

11. Re: http_get does not retrieve page content

Posted by DerekParnell (admin) Jan 09, 2013
1338 views

CoJaBo2 said...

DerekParnell said...

I think that 'trim()' is a better way to go as that would be more tolerant of the variations that might exist in web servers out in the wild.

Is there any evidence to suggest that such servers do exist? I would be interested in seeing one.

I have none.

But let's assume that they don't; in other words, every web server that currently exists and will ever exist will always deliver correct webpage headers. In that case, trim() does no harm and runs very quickly (has almost nothing to do).

Now let's continue hypothesising ... what if, a web sever actually does exist that can deliver headers that have non-standard whitespace. We need to ask ourselves, do we care? ... does the user of our application care? Probably not. So if the library uses trim(), it will be as if standard headers were used and the application doesn't trip over or do something unintended.

So on the balance of probability (there exists a chance of bad web servers existing), and in the interests of using defensive programming, why not use trim() regardless.

new topic » goto parent » topic index » view message » categorize

12. Re: http_get does not retrieve page content

Posted by m_sabal Jan 10, 2013
1401 views

Step 1: Please test the offending web sites using the old eunet. I suspect you will find the same behavior. Step 2: Identify what characters are being used as line terminators in place of the standard.

Trim will not work in this case, because the entire document, headers plus body, is being sent as one large block of data (or in some cases, multiple n-byte blocks). The program then needs to parse each of the header lines out of that block, and then return what's left as the body of the document. Trim will only remove null characters and white space from the beginning and end of the entire transmission. It won't help with the parsing.

I made the decision to make eunet strictly standards compliant until enough use cases could be identified that violated the standard to program for the exception. It's all open source. You are welcome to add the additional code you need to solve your problem.

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. http_get does not retrieve page content

2. Re: http_get does not retrieve page content

3. Re: http_get does not retrieve page content

4. Re: http_get does not retrieve page content

5. Re: http_get does not retrieve page content

6. Re: http_get does not retrieve page content

7. Re: http_get does not retrieve page content

8. Re: http_get does not retrieve page content

9. Re: http_get does not retrieve page content

10. Re: http_get does not retrieve page content

11. Re: http_get does not retrieve page content

12. Re: http_get does not retrieve page content

Search

Include:

Quick Links

User menu

Misc Menu