Euphoria
Ticket #831:
http_get does not retrieve page content
-
Reported by
jmduro
Jan 04, 2013
Hello,
object r= http_get("http://docs.openstack.org/essex/openstack-compute/admin/content/creating-a-windows-image.html")
returns following header indicating a non-null content, but does not retrieve the corresponding content:
HTTP/1.1 200 OK
server Apache/2.2
content-type text/html; charset=UTF-8
date Fri, 04 Jan 2013 06:43:58 GMT
accept-ranges bytes
connection close
set-cookie X-Mapping-jebomepa=59D88FA58BBDA5FE9C10CF182D6499F6; path=/
set-cookie X-Mapping-jebomepa=59D88FA58BBDA5FE9C10CF182D6499F6; path=/
last-modified Sat, 15 Dec 2012 04:08:48 GMT
content-length 59109
r[1][1][1] = 'HTTP/1.1'
r[1][1][2] = '200'
r[1][1][3] = 'OK'
r[1][2][1] = 'server'
r[1][2][2] = 'Apache/2.2'
r[1][3][1] = 'content-type'
r[1][3][2] = 'text/html; charset=UTF-8'
r[1][4][1] = 'date'
r[1][4][2] = 'Fri, 04 Jan 2013 06:43:58 GMT'
r[1][5][1] = 'accept-ranges'
r[1][5][2] = 'bytes'
r[1][6][1] = 'connection'
r[1][6][2] = 'close'
r[1][7][1] = 'set-cookie'
r[1][7][2] = 'X-Mapping-jebomepa=59D88FA58BBDA5FE9C10CF182D6499F6; path=/'
r[1][8][1] = 'set-cookie'
r[1][8][2] = 'X-Mapping-jebomepa=59D88FA58BBDA5FE9C10CF182D6499F6; path=/'
r[1][9][1] = 'last-modified'
r[1][9][2] = 'Sat, 15 Dec 2012 04:08:48 GMT'
r[1][10][1] = 'content-length'
r[1][10][2] = '59109\r'
r[2] = {}
Regards
Jean-Marc
Details
1. Comment by jimcbrown
Jan 04, 2013
See: hg:euphoria/rev/fd5e5231784c
changeset: 5899:fd5e5231784c branch: 4.0 tag: tip user: Jim C. Brown date: Fri Jan 04 07:13:42 2013 -0500 files: include/std/net/http.e description:
- Trim out whitespace in content-length header
- Fixes ticket:831
2. Comment by jimcbrown
Jan 04, 2013
See: hg:euphoria/rev/7728398269c3
changeset: 5900:7728398269c3 parent: 5702:f269927c332b user: Jim C. Brown date: Fri Jan 04 07:10:05 2013 -0500 files: include/std/net/http.e description:
- Trim out whitespace in content-length header
- Fixes ticket:831
3. Comment by jmduro
Jan 04, 2013
OK Jim,
I replaced line 287
content_length = to_number(this_header[2])
by this
content_length = to_number(trim(this_header[2]))
It does the job. Thanks Jean-Marc
4. Comment by useless_
Jan 04, 2013
Best to trim(line,"\n\r ") all the header lines. Long ago and far away (the past 16 years i have been fetching webpages) "\n\r" ended all lines, and "\n\r\n\r" (or vice versa) ended the header. But i have seen all mixes of '\n' and '\r', however "illegal", the one fairly constant (but not always) is the two '\r' and two '\n' at the end of the header. I did a shortcut using strtok, and parsed the entire return on "\r\n\r" (not the same as {10,13,10}) and then parsed the header on {10,13}. That gave me the parsed header, and the body started at the first '<' , altho many page bodies are sent starting with ' ' or '\t' or some combination of more {10,13}'s. Point is, if you follow the spec exactly, then any deviation by the server and Eu becomes "broken".
useless
5. Comment by CoJaBo2
Jan 09, 2013
The fix checked in is completely wrong.
All it does is mask the actual bug a few lines up-
sequence raw_header = content[1..header_end_pos]
should be:
sequence raw_header = content[1..header_end_pos-1]
There is no need to trim extra whitespace in header values; servers that add any would not function in modern browsers. Indeed, doing so is a bad idea per the above.
6. Comment by jimcbrown
Jan 09, 2013
See: hg:euphoria/rev/10f7a86e5fd1
changeset: 5905:10f7a86e5fd1 tag: tip parent: 5903:e146b072d53e user: Jim C Brown date: Wed Jan 09 04:44:25 2013 -0500 files: include/std/net/http.e description:
- Fixes ticket:831
- Use better method suggeted by CoJaBo
7. Comment by jimcbrown
Jan 09, 2013
See: hg:euphoria/rev/570575b971cc
changeset: 5906:570575b971cc branch: 4.0 tag: tip parent: 5899:fd5e5231784c user: Jim C Brown date: Wed Jan 09 04:46:14 2013 -0500 files: include/std/net/http.e description:
- Fixes ticket:831
- Use better method suggested by CoJaBo
8. Comment by SDPringle
May 04, 2018
See: hg:euphoria/rev/5b8912a4a2cf
changeset: 6468:5b8912a4a2cf branch: 4.0 user: Shawn David Pringle B.Sc. <shawn.pringle@gmail.com> date: Fri May 04 10:00:59 2018 -0300 files: docs/release/4.0.6.txt description:
- updated the release notes. ticket 831, ticket 907, ticket 803, ticket 853, ticket 928, ticket 938, ticket 752, ticket 915, ticket 948, ticket 921