Re: needed: webpage getter!

new topic     » goto parent     » topic index » view thread      » older message » newer message

On 25 Jun 2004, at 13:37, irv mullins wrote:

> 
> 
> posted by: irv mullins <irvm at ellijay.com>
> 
> Kat wrote:
> > 
> > Greets again all, webpages bugging me again. The url in question is 
> > 
> > wwws.sheetmusicplus.com
> > 
> > and almost every page after that. The only page getter that i have tried
> > (written in Eu) is Webshepard, and it pesters me for cookies 5 times for
> > each
> > page. In all the other webgetters, no cookie is bothered with, but the urls
> > on
> > the pages are munged to fit their search engine, and are essentially
> > useless.
> > The urls are ok in IE5.0, and much much shorter too.  I am looking witless,
> > can anyone tell me why Eu apps cannot get this page properly? Internet
> > Exporer
> > gets it fine, with and without the http proxy. Tcp4u won't get it with the
> > tcp4u_ calls, nor the http4u_ calls.
> 
> My socks library can't read the page either, it hangs waiting to read a 
> byte from the server. 

Hanging is a definite drawback. Using tcp4u, i poll the 
tcp4u_is_data_avail(sock) a bit to see if anything is happening, then deal 
with the fallout depending on the info available.

    while not ServerNeedsAttention() do
      sleep(1)
      readcount += 1
      if (readcount > 100) then exit end if
    end while

> If you just need to get something done, wget works 
> fine.

wget? No results in User Contribs page, and i don't have it on my puter 
anywhere.

> If you must use Eu, then perhaps from the following trace you can 
> find what is missing (my guess is something to do with tbe "Moved temporarily"
> line:
> 
> [irv@localhost irv]$ wget -S wwws.sheetmusicplus.com/index.html
> --09:53:43--  http://wwws.sheetmusicplus.com/index.html
>            => `index.html'
> Resolving wwws.sheetmusicplus.com... 63.90.205.227
> Connecting to wwws.sheetmusicplus.com[63.90.205.227]:80... connected.
> HTTP request sent, awaiting response...
>  1 HTTP/1.0 302 Moved Temporarily

Yes, i have code to handle the 302 reply. Problem is, with IE there is no 302 
(i'd know, the proxy can block or merely report the 302 to me), and on the 
new location:

>  2 Location: http://www.sheetmusicplus.com/?r=emdb2

(which isn't the same new location i get), the urls are munged so badly that 
they are all essentially the same page, with zero album info. The 302 
locations i get look like:

wwws.sheetmusicplus.com/store/smp_artbrowseresults.html?cart=32970725
841381355&style=artist&artist=ABBA&searchtitle=sheet%20music&max=20
&counter=0

which isn't the dedicated Abba page at all!

Kat

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu