1. RE: Geting web pages via Linux [and etc]
- Posted by irv at take.maxleft.com
Jun 02, 2002
jbrown105 at speedymail.org wrote:
> I know that a lot of eu programs can access the net, and that
> i can use linux progs (like lynx) to emulate access to the net,
> but is there any way to access stuff such as web pages via
> an Eu lib? if not, any way to write one or wrap a C lib?
Sure, it just takes a little bit of sockets code.
You could get a start by downloading my EuMail code,
and modifying it to use the correct protocol and port.
I wrote a demo which could download and save web pages,
but I think I must have deleted it sometime back.
See downloads page at http://take.maxleft.com
Regards,
Irv
2. RE: Geting web pages via Linux [and etc]
- Posted by irv at take.maxleft.com
Jun 02, 2002
> jbrown105 at speedymail.org wrote:
> > I know that a lot of eu programs can access the net, and that
> > i can use linux progs (like lynx) to emulate access to the net,
> > but is there any way to access stuff such as web pages via
> > an Eu lib? if not, any way to write one or wrap a C lib?
A little more searching turned up the socks code to
download a web page. I've forwarded it to jbrown.
Anyone else who wants it please let me know.
BTW: it can download most webpages, except for RDS.
Instead, I get the main addr.com page.
It may have something to do with RDS being on a
virtual server.
Irv
3. RE: Geting web pages via Linux [and etc]
irv at take.maxleft.com wrote:
> BTW: it can download most webpages, except for RDS.
> Instead, I get the main addr.com page.
> It may have something to do with RDS being on a
> virtual server.
Hi Irv,
I was having the same problem with my euTCP4u library.
I haven't looked much into yet but I was suspecting because euTCP4u
only handles the HTTP 1.0 protocol and these sites use HTTP: 1.1
protocol.
I'm probably completely wrong :( ... but that's what I was going to
look at.
Regards,
Ray Smith
http://rays-web.com
4. RE: Geting web pages via Linux [and etc]
--Message-Boundary-27689
Content-description: Mail message body
On 3 Jun 2002, at 4:54, Ray Smith wrote:
>
>
> irv at take.maxleft.com wrote:
> > BTW: it can download most webpages, except for RDS.
> > Instead, I get the main addr.com page.
> > It may have something to do with RDS being on a
> > virtual server.
>
> Hi Irv,
>
> I was having the same problem with my euTCP4u library.
> I haven't looked much into yet but I was suspecting because euTCP4u
> only handles the HTTP 1.0 protocol and these sites use HTTP: 1.1
> protocol.
I suspect if you use myHttpGetFileEx and spec the rest of the header, you
can get the site. I used the attached file to get the RDS site. Prolly some
errors in it, Ray can fix them.
Also, in tcp4u.ew, there is an error in line 505, because connect_socket
should be an atom.
Kat
--Message-Boundary-27689
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Content-description: Text from file 'get_file2.exw'
--
-- Downloads a file from the web
-- Ray Smith
-- 22/8/2000
--
include tcp4u.ew
with trace
integer ret, server_port, sock
sequence proxy, TheWebPage
sequence remote_file
sequence local_file
sequence sock_receive, server_ip
atom connected, writefile
-- setup some defaults
TheWebPage = ""
server_port = 80
-- Resolved addr.com to 209.249.147.252
-- Resolved www.rapideuphoria.com to 209.249.147.13
server_ip = "209.249.147.13"
proxy = ""
remote_file = "http://www.rapideuphoria.com/"
local_file = "rds.txt"
---------------------------------------------------------------------------------------------------------
global function ServerNeedsAttention()
object ret
ret = tcp4u_is_data_avail(sock)
return ret
end function -- ServerNeedsAttention()
-----------------------------------------------------------------------------------
global function ReadServer()
sequence databuffer
databuffer = ""
sock_receive = ""
if tcp4u_is_data_avail(sock) then
sock_receive = tcp4u_receive(sock,1000,0)
if sock_receive[1] > 0 then
databuffer = databuffer & sock_receive[2][1..sock_receive[1]]
end if
end if
return databuffer
end function -- readserver
-----------------------------------------------------------------------------------
global procedure SendToServer(sequence data)
atom ret
ret = tcp4u_send(sock, data, length(data))
end procedure -- SendToServer(sequence data)
-----------------------------------------------------------------------------------
global procedure make_connection()
sequence sock_connect
if tcp4u_init() != TCP4U_SUCCESS then
wait_abort("tcp4u_init error")
connected = 0
end if
sock_connect = tcp4u_connect(server_ip, NULL, server_port)
if sock_connect[tcp4u_ret] != TCP4U_SUCCESS
then
printf(1, "tcp4u_connect error
'%s'\n",{tcp4u_error_string(sock_connect[tcp4u_ret])} )
puts(1,"\naborting on any keypress")
connected = 0
else
connected = 1
sock = sock_connect[2]
end if
end procedure -- make_connection()
-------------------------------------------------------------------------------------------------------------
-- show a little intro
puts(1, "Download Web File Demo\n")
http4u_set_timeout(60)
-- get the file
printf(1, "...downloading file \nfrom '%s', \nto '%s' \nusing proxy '%s'\n",
{remote_file, local_file, proxy} )
--ret = http4u_get_file(remote_file, proxy, local_file)
-- knock on the door
make_connection()
-- tell it what we want
SendToServer("GET / HTTP/1.0\r\n"&
"Referer: http://www.rapideuphoria.com/\r\n"&
"Accept: */*\r\n"&
"Accept-Language: en-us\r\n"&
-- "Accept-Encoding: gzip, deflate\r\n"&
"User-Agent: <sigh>\r\n"&
"Host: rapideuphoria.com\r\n"&
"Forwarded: rapideuphoria.com\r\n"&
"\r\n")
-- give the server time to think about it..
while not ServerNeedsAttention() do sleep(2) end while
-- ok, data there, get it!
while ServerNeedsAttention() do
TheWebPage &= ReadServer()
sleep(1)
end while
--close tcp4u
ret = tcp4u_cleanup()
if ret != TCP4U_SUCCESS then
printf(1, "Error on tcp4u_cleanup '%s'\n", {http4u_error_string(ret)} )
end if
writefile = open(local_file,"w")
puts(writefile,TheWebPage)
close(writefile)
-- finished
puts(1, "\n\npress any key to abort.")
ret = wait_key()
--Message-Boundary-27689--
5. RE: Geting web pages via Linux [and etc]
- Posted by irv at take.maxleft.com
Jun 03, 2002
Ray Smith wrote:
>
> irv at take.maxleft.com wrote:
> > BTW: it can download most webpages, except for RDS.
> > Instead, I get the main addr.com page.
> > It may have something to do with RDS being on a
> > virtual server.
>
> Hi Irv,
>
> I was having the same problem with my euTCP4u library.
> I haven't looked much into yet but I was suspecting because euTCP4u
> only handles the HTTP 1.0 protocol and these sites use HTTP: 1.1
> protocol.
> I'm probably completely wrong :( ... but that's what I was going to
> look at.
Yep; It's due to the 'virtual hosting'. See
http://www.webcom.com/glossary/http1.1.shtml for a clear
explanation.
Irv
6. RE: Geting web pages via Linux [and etc]
- Posted by irv at take.maxleft.com
Jun 03, 2002
The solution is simple:
I changed the 'GET' request to:
write(SOCKET, sprintf("GET /%s HTTP/1.1\nHost: %s\n\n",
{filename,hostname}))
It now works fine with virtual domains.
Irv
7. RE: Geting web pages via Linux [and etc]
On 3 Jun 2002, at 13:43, irv at take.maxleft.com wrote:
>
> The solution is simple:
>
> I changed the 'GET' request to:
>
> write(SOCKET, sprintf("GET /%s HTTP/1.1\nHost: %s\n\n",
> {filename,hostname}))
>
> It now works fine with virtual domains.
Yes, but that doesn't work in the http functions in tcp4u.
Kat