1. parsing HTML
Can I parse HTML pages with Thomas Parslows's XML library? Will it work
cause HTML pages don't have so strict syntax like XML, for example: <P> tag
doesn't need closing </P> tag ...
That XML library would be ideal because it allows that you pass it XML data
by parts and I am not reading whole HTML site but reading it by pieces.
Or is there some other library for parsing HTML pages?
I need to get title of page and extract all links on page, that's for now,
something else might come up later.
Tone ©koda
2. Re: parsing HTML
On 2 Aug 2002, at 17:51, 10963508 at europeonline.com wrote:
>
> Can I parse HTML pages with Thomas Parslows's XML library? Will it work
> cause HTML pages don't have so strict syntax like XML, for example: <P> tag
> doesn't need closing </P> tag ... That XML library would be ideal because it
> allows that you pass it XML data by parts and I am not reading whole HTML site
> but reading it by pieces.
>
> Or is there some other library for parsing HTML pages?
There is this in strtok (any version)
global function getxml(sequence record, sequence starttag, sequence
endtag, integer tagnum)
You don't haveto fill in all the parms, see the comments in the .ew file, i
didn't doc it in the readme. I have used this, pretty much unchanged, since i
first wrote it in 1992 or so in Turbo Pascal.
Kat
3. Re: parsing HTML
Thanks, I'm going take look at it now.
----- Original Message -----
From: "Kat" <gertie at PELL.NET>
To: "EUforum" <EUforum at topica.com>
Subject: Re: parsing HTML
>
> On 2 Aug 2002, at 17:51, 10963508 at europeonline.com wrote:
>
> >
> > Can I parse HTML pages with Thomas Parslows's XML library? Will it work
> > cause HTML pages don't have so strict syntax like XML, for example: <P>
tag
> > doesn't need closing </P> tag ... That XML library would be ideal
because it
> > allows that you pass it XML data by parts and I am not reading whole
HTML site
> > but reading it by pieces.
> >
> > Or is there some other library for parsing HTML pages?
>
> There is this in strtok (any version)
>
> global function getxml(sequence record, sequence starttag, sequence
> endtag, integer tagnum)
>
> You don't haveto fill in all the parms, see the comments in the .ew file,
i
> didn't doc it in the readme. I have used this, pretty much unchanged,
since i
> first wrote it in 1992 or so in Turbo Pascal.
>
> Kat
>
>
>
>
4. Re: parsing HTML
> Can I parse HTML pages with Thomas Parslows's XML library? Will it work
> cause HTML pages don't have so strict syntax like XML, for example: <P> tag
> doesn't need closing </P> tag ...
> That XML library would be ideal because it allows that you pass it XML data
> by parts and I am not reading whole HTML site but reading it by pieces.
> Or is there some other library for parsing HTML pages?
> I need to get title of page and extract all links on page, that's for now,
> something else might come up later.
> Tone ©koda
Hi,
My library would not really be very useful for that, it expects
conformant XML and returns an error if it is not.
Thomas Parslow (PatRat)
E-Mail/Jabber: tom at almostobsolete.net
ICQ: 26359483