1. needed: webpage getter!

Greets again all, webpages bugging me again. The url in question is 

wwws.sheetmusicplus.com

and almost every page after that. The only page getter that i have tried 
(written in Eu) is Webshepard, and it pesters me for cookies 5 times for each 
page. In all the other webgetters, no cookie is bothered with, but the urls on 
the pages are munged to fit their search engine, and are essentially useless. 
The urls are ok in IE5.0, and much much shorter too.  I am looking witless, 
can anyone tell me why Eu apps cannot get this page properly? Internet 
Exporer gets it fine, with and without the http proxy. Tcp4u won't get it with 
the tcp4u_ calls, nor the http4u_ calls.

Kat,
puzzled.

new topic     » topic index » view message » categorize

2. Re: needed: webpage getter!

Kat wrote:
> 
> Greets again all, webpages bugging me again. The url in question is 
> 
> wwws.sheetmusicplus.com
> 
> and almost every page after that. The only page getter that i have tried 
> (written in Eu) is Webshepard, and it pesters me for cookies 5 times for each 
> page. In all the other webgetters, no cookie is bothered with, but the urls on
>
> the pages are munged to fit their search engine, and are essentially useless. 
> The urls are ok in IE5.0, and much much shorter too.  I am looking witless, 
> can anyone tell me why Eu apps cannot get this page properly? Internet 
> Exporer gets it fine, with and without the http proxy. Tcp4u won't get it with
>
> the tcp4u_ calls, nor the http4u_ calls.

My socks library can't read the page either, it hangs waiting to read a 
byte from the server. If you just need to get something done, wget works 
fine. If you must use Eu, then perhaps from the following trace you can 
find what is missing (my guess is something to do with tbe "Moved temporarily" 
line:

[irv@localhost irv]$ wget -S wwws.sheetmusicplus.com/index.html
--09:53:43--  http://wwws.sheetmusicplus.com/index.html
           => `index.html'
Resolving wwws.sheetmusicplus.com... 63.90.205.227
Connecting to wwws.sheetmusicplus.com[63.90.205.227]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.0 302 Moved Temporarily
 2 Location: http://www.sheetmusicplus.com/?r=emdb2
 3 Server: WebSTAR/4.5(SSL) ID/70713
 4 Content-Length: 0
Location: http://www.sheetmusicplus.com/?r=emdb2 [following]
--09:53:44--  http://www.sheetmusicplus.com/?r=emdb2
           => `index.html?r=emdb2.5'
Resolving www.sheetmusicplus.com... 63.90.205.101, 63.90.205.110, 63.90.205.220,
...
Connecting to www.sheetmusicplus.com[63.90.205.101]:80... connected.
HTTP request sent, awaiting response...
 1 HTTP/1.0 200 OK
 2 Content-type: text/html
3 Set-Cookie: CustNum=62509557827; expires=Wednesday, 31-Dec-2005 23:12:40 GMT;
 path=/; domain=.sheetmusicplus.com
4 Set-Cookie: from=; expires=Wednesday, 31-Dec-2005 23:12:40 GMT; path=/;
 domain=.sheetmusicplus.com
5 Set-Cookie: ad=/default.tmpl; expires=Wednesday, 31-Dec-2005 23:12:40 GMT;
 path=/; domain=.sheetmusicplus.com
6 Set-Cookie: Visits=1; expires=Wednesday, 31-Dec-2005 23:12:40 GMT; path=/;
 domain=.sheetmusicplus.com
 7 Server: WebSTAR/4.5(SSL) ID/70713
 8 Content-Length: 37798

100%[====================================>] 37,798         9.62K/s    ETA 00:00

new topic     » goto parent     » topic index » view message » categorize

3. Re: needed: webpage getter!

Kat wrote:

> Greets again all, webpages bugging me again. The url in question is 
> 
> wwws.sheetmusicplus.com
> 
> and almost every page after that. The only page getter that i have tried 
> (written in Eu) is Webshepard, and it pesters me for cookies 5 times for each 
> page.

I'd like to have a look at 'Webshepard'. Where can I get it?

> In all the other webgetters, no cookie is bothered with, but the urls on 
> the pages are munged to fit their search engine, and are essentially useless. 
> The urls are ok in IE5.0, and much much shorter too.  I am looking witless, 
> can anyone tell me why Eu apps cannot get this page properly? Internet 
> Exporer gets it fine, with and without the http proxy. Tcp4u won't get it with
>
> the tcp4u_ calls, nor the http4u_ calls.

I just tried to get that page, using the demo program that ships with
'http.zip' by PatRat --> No reaction at all. sad

WinHTTrack <http://www.httrack.com> says:
"File has moved from wwws.sheetmusicplus.com/
 to http://www.sheetmusicplus.com/?r=emdb2"

This is consistent with the behaviour of IE 5.5.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

4. Re: needed: webpage getter!

On 25 Jun 2004, at 13:37, irv mullins wrote:

> 
> 
> posted by: irv mullins <irvm at ellijay.com>
> 
> Kat wrote:
> > 
> > Greets again all, webpages bugging me again. The url in question is 
> > 
> > wwws.sheetmusicplus.com
> > 
> > and almost every page after that. The only page getter that i have tried
> > (written in Eu) is Webshepard, and it pesters me for cookies 5 times for
> > each
> > page. In all the other webgetters, no cookie is bothered with, but the urls
> > on
> > the pages are munged to fit their search engine, and are essentially
> > useless.
> > The urls are ok in IE5.0, and much much shorter too.  I am looking witless,
> > can anyone tell me why Eu apps cannot get this page properly? Internet
> > Exporer
> > gets it fine, with and without the http proxy. Tcp4u won't get it with the
> > tcp4u_ calls, nor the http4u_ calls.
> 
> My socks library can't read the page either, it hangs waiting to read a 
> byte from the server. 

Hanging is a definite drawback. Using tcp4u, i poll the 
tcp4u_is_data_avail(sock) a bit to see if anything is happening, then deal 
with the fallout depending on the info available.

    while not ServerNeedsAttention() do
      sleep(1)
      readcount += 1
      if (readcount > 100) then exit end if
    end while

> If you just need to get something done, wget works 
> fine.

wget? No results in User Contribs page, and i don't have it on my puter 
anywhere.

> If you must use Eu, then perhaps from the following trace you can 
> find what is missing (my guess is something to do with tbe "Moved temporarily"
> line:
> 
> [irv@localhost irv]$ wget -S wwws.sheetmusicplus.com/index.html
> --09:53:43--  http://wwws.sheetmusicplus.com/index.html
>            => `index.html'
> Resolving wwws.sheetmusicplus.com... 63.90.205.227
> Connecting to wwws.sheetmusicplus.com[63.90.205.227]:80... connected.
> HTTP request sent, awaiting response...
>  1 HTTP/1.0 302 Moved Temporarily

Yes, i have code to handle the 302 reply. Problem is, with IE there is no 302 
(i'd know, the proxy can block or merely report the 302 to me), and on the 
new location:

>  2 Location: http://www.sheetmusicplus.com/?r=emdb2

(which isn't the same new location i get), the urls are munged so badly that 
they are all essentially the same page, with zero album info. The 302 
locations i get look like:

wwws.sheetmusicplus.com/store/smp_artbrowseresults.html?cart=32970725
841381355&style=artist&artist=ABBA&searchtitle=sheet%20music&max=20
&counter=0

which isn't the dedicated Abba page at all!

Kat

new topic     » goto parent     » topic index » view message » categorize

5. Re: needed: webpage getter!

Kat wrote:
> 
> On 25 Jun 2004, at 13:37, irv mullins wrote:

> > If you just need to get something done, wget works 
> > fine.
> 
> wget? No results in User Contribs page, and i don't have it on my puter 
> anywhere.

It's not a Euphoria program. There are versions for Unix and Windows:
http://www.interlog.com/~tcharron/wgetwin.html

Irv

new topic     » goto parent     » topic index » view message » categorize

6. Re: needed: webpage getter!

On 25 Jun 2004, at 13:32, Juergen Luethje wrote:

> 
> 
> posted by: Juergen Luethje <j.lue at gmx.de>
> 
> Kat wrote:
> 
> > Greets again all, webpages bugging me again. The url in question is 
> > 
> > wwws.sheetmusicplus.com
> > 
> > and almost every page after that. The only page getter that i have tried
> > (written in Eu) is Webshepard, and it pesters me for cookies 5 times for
> > each
> > page.

Actually, that should have said "The only page getter that gets the page is..."

> I'd like to have a look at 'Webshepard'. Where can I get it?

It's in the user archives (http://www.rapideuphoria.com/webshep.zip).  
Beware of 3 things with it:

1) it apparently saves the file on C: before moving it to the file you specify,
so
there's a lot of fileswapping going on after the file is retrieved, which is 
significant with 50meg files.

2) the Webshepherd window cannot be moved or minimized after you click 
on "Download"

3) If you have previously allowed cookies in IE (or netscrape, or etc), but now 
use a proxy to stop or fake cookies, WebShepherd will accept cookies

> > In all the other webgetters, no cookie is bothered with, but the urls on the
> > pages are munged to fit their search engine, and are essentially useless.
> > The
> > urls are ok in IE5.0, and much much shorter too.  I am looking witless, can
> > anyone tell me why Eu apps cannot get this page properly? Internet Exporer
> > gets it fine, with and without the http proxy. Tcp4u won't get it with the
> > tcp4u_ calls, nor the http4u_ calls.
> 
> I just tried to get that page, using the demo program that ships with
> 'http.zip' by PatRat --> No reaction at all. sad
> 
> WinHTTrack <http://www.httrack.com> says:
> "File has moved from wwws.sheetmusicplus.com/
>  to http://www.sheetmusicplus.com/?r=emdb2"
> 
> This is consistent with the behaviour of IE 5.5.

Yes, if cookies are refused, the domain makes up new urls. I can't find a way 
yet that will work on all webpages. And this dialup is so slow, fetching an 
email can take 5 minutes, and i have had webpages take 10 minutes to 
open. Prefetching with Eu to cache locally is almost a necessity.

Kat

new topic     » goto parent     » topic index » view message » categorize

7. Re: needed: webpage getter!

irv mullins wrote:

> It's not a Euphoria program. There are versions for Unix and Windows:
> http://www.interlog.com/~tcharron/wgetwin.html

It's aVB program!!! ARGG!!!! I removed VB runtime files from my puters, and 
am not going to reinstall them. I don't use VB, and so many viruses and 
trojans use VB, it's not worth the risk. I still want to know why a Eu app 
cannot download that webpage properly. What's it doing that other sites don't 
do?

Btw, Topica is 4 hours delayed from the RDS webpage now. Either that, or 
it's broken again and not sending me any email.

Kat

new topic     » goto parent     » topic index » view message » categorize

8. Re: needed: webpage getter!

Kat wrote:

>  irv mullins wrote:
>
>> It's not a Euphoria program. There are versions for Unix and Windows:
>> http://www.interlog.com/~tcharron/wgetwin.html
>
> It's aVB program!!! ARGG!!!! I removed VB runtime files from my puters, and
> am not going to reinstall them. I don't use VB, and so many viruses and
> trojans use VB, it's not worth the risk.

That's why I like those GNU programs, there are almost always at least
27 different versions ... getlost
I got 'wget' from here yesterday: http://xoomer.virgilio.it/hherold/
I think *that* is not a VB program.

The website copier WinHTTrack that I mentioned yesterday, also contains
a command-line version: http://www.httrack.com

Maybe you want to try it, too. Since I also want to have a *reliable*
way to get web pages with Euphoria, I'll test and compare both programs
during the next weeks. If one of them does what I want, writing a "shell"
in Euphoria should be easy.

> I still want to know why a Eu app cannot download that webpage properly.
> What's it doing that other sites don't do?

I've been using WinHTTrack as a stand-alone program for a long time, and
there were many versions, containing many bugs (I'm using version 3.32
now, which I can actually recommend). So it seems to me, that writing
such a program is not too easy. Since there are several very smart
programmers in this community, I'm pretty sure that someone will be able
to write a good website copier in Euphoria, but I think it will take a
lot of time.
BTW: Didn't you write that 'Web Shepherd' (written in Eu) is the only
     page getter that gets the page that you want?

As you wrote, 'EuTcp4u' does not work as desired, and I tried
'AsyncHTTP.ew' without success.

What about 'tcp.ew' by Jason Mirwald? On the User Contributions Page, it
reads: "A full library for asynchronous TCP socket communication. It has
been used to write an IRC client, IRC bot, an IM (instant messaging)
client and server, and a web page server".
That sounds powerful, doesn't it? Unfortunally, 'tcp.zip' doesn't
contain a single demo program. There are seperate demo programs
(tcpdemos.zip), but they don't show how to use 'tcp.ew' for downloading
web pages. sad

> Btw, Topica is 4 hours delayed from the RDS webpage now. Either that, or
> it's broken again and not sending me any email.

Referring to your other post:
Count me in favor of email listservs, but against Topica, too.

Regards,
   Juergen

-- 
 /"\  ASCII ribbon campain  |    |\      _,,,---,,_
 \ /  against HTML in       |    /,`.-'`'    -.  ;-;;,_
  X   e-mail and news,      |   |,4-  ) )-,_..;\ (  `'-'
 / \  and unneeded MIME     |  '---''(_/--'  `-'\_)

new topic     » goto parent     » topic index » view message » categorize

9. Re: needed: webpage getter!

Kat wrote:

> On 25 Jun 2004, at 13:32, Juergen Luethje wrote:
>
>> Kat wrote:
>>
>>> Greets again all, webpages bugging me again. The url in question is
>>>
>>> wwws.sheetmusicplus.com
>>>
>>> and almost every page after that. The only page getter that i have tried
>>> (written in Eu) is Webshepard, and it pesters me for cookies 5 times for
>>> each
>>> page.
>
> Actually, that should have said "The only page getter that gets the page
> is..."
>
>> I'd like to have a look at 'Webshepard'. Where can I get it?
>
> It's in the user archives (http://www.rapideuphoria.com/webshep.zip).

Thanks! I didn't find it when searching for 'Webshepard' on the User
Contributions page. Now I know, that I should have looked for
'Web Shepherd'. smile

> Beware of 3 things with it:
>
> 1) it apparently saves the file on C: before moving it to the file you
> specify, so
> there's a lot of fileswapping going on after the file is retrieved, which is
> significant with 50meg files.
>
> 2) the Webshepherd window cannot be moved or minimized after you click
> on "Download"
>
> 3) If you have previously allowed cookies in IE (or netscrape, or etc), but
> now
> use a proxy to stop or fake cookies, WebShepherd will accept cookies

Thanks for the hints!
But when I tell Web Shepherd to get the page
"http://www.rapideuphoria.com/index.html",
it just says "A parameter in the canonicalize URL function is bad."

"spider" by Daniel Kluss and "eulibcURL" by Ray Smith look promising in
general, but unfortunately can't get the page that you mentioned, too.

>>> In all the other webgetters, no cookie is bothered with, but the urls on the
>>> pages are munged to fit their search engine, and are essentially useless.
>>> The
>>> urls are ok in IE5.0, and much much shorter too.  I am looking witless, can
>>> anyone tell me why Eu apps cannot get this page properly? Internet Exporer
>>> gets it fine, with and without the http proxy. Tcp4u won't get it with the
>>> tcp4u_ calls, nor the http4u_ calls.
>>
>> I just tried to get that page, using the demo program that ships with
>> 'http.zip' by PatRat --> No reaction at all. sad
>>
>> WinHTTrack <http://www.httrack.com> says:
>> "File has moved from wwws.sheetmusicplus.com/
>>  to http://www.sheetmusicplus.com/?r=emdb2"
>>
>> This is consistent with the behaviour of IE 5.5.
>
> Yes, if cookies are refused, the domain makes up new urls. I can't find a way
> yet that will work on all webpages. And this dialup is so slow, fetching an
> email can take 5 minutes, and i have had webpages take 10 minutes to
> open. Prefetching with Eu to cache locally is almost a necessity.

If you want, I can get the page for you, using WinHTTrack (at least try
to do so), and then send it to you as ZIP file by mail, or offer the ZIP
file for downlod on my website.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

10. Re: needed: webpage getter!

Kat wrote:
> 
>  irv mullins wrote:
> 
> > It's not a Euphoria program. There are versions for Unix and Windows:
> > <a
> > href="http://www.interlog.com/~tcharron/wgetwin.html">http://www.interlog.com/~tcharron/wgetwin.html</a>
> 
> It's aVB program!!! ARGG!!!!

Odd, since I have the Windows source code, and all the files end in .c
The package requires either Visual C++ or Watcom to compile.

Irv

new topic     » goto parent     » topic index » view message » categorize

11. Re: needed: webpage getter!

On 26 Jun 2004, at 5:28, irv mullins wrote:

> 
> 
> posted by: irv mullins <irvm at ellijay.com>
> 
> Kat wrote:
> > 
> >  irv mullins wrote:
> > 
> > > It's not a Euphoria program. There are versions for Unix and Windows:
> > > <a
> > >
> > > href="http://www.interlog.com/~tcharron/wgetwin.html">http://www.interlog.co
> > > m/~tcharron/wgetwin.html</a>
> > 
> > It's aVB program!!! ARGG!!!!
> 
> Odd, since I have the Windows source code, and all the files end in .c
> The package requires either Visual C++ or Watcom to compile.

And it doesn't call any VB .dlls? Only the gui at 
http://www.jensroesner.de/wgetgui/ is VB then?

Kat

new topic     » goto parent     » topic index » view message » categorize

12. Re: needed: webpage getter!

Kat wrote:

> And it doesn't call any VB .dlls? Only the gui at 
> <a
> href="http://www.jensroesner.de/wgetgui/">http://www.jensroesner.de/wgetgui/</a>
> is VB then?

That's just a GUI interface for the text-mode program. 
Most likely the text mode would be more appropriate for interfacing 
with Eu.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu