1. parsing HTML
- Posted by 10963508 at europeonline.com Aug 02, 2002
- 470 views
Can I parse HTML pages with Thomas Parslows's XML library? Will it work cause HTML pages don't have so strict syntax like XML, for example: <P> tag doesn't need closing </P> tag ... That XML library would be ideal because it allows that you pass it XML data by parts and I am not reading whole HTML site but reading it by pieces. Or is there some other library for parsing HTML pages? I need to get title of page and extract all links on page, that's for now, something else might come up later. Tone ©koda
2. Re: parsing HTML
- Posted by Kat <gertie at PELL.NET> Aug 02, 2002
- 461 views
On 2 Aug 2002, at 17:51, 10963508 at europeonline.com wrote: > > Can I parse HTML pages with Thomas Parslows's XML library? Will it work > cause HTML pages don't have so strict syntax like XML, for example: <P> tag > doesn't need closing </P> tag ... That XML library would be ideal because it > allows that you pass it XML data by parts and I am not reading whole HTML site > but reading it by pieces. > > Or is there some other library for parsing HTML pages? There is this in strtok (any version) global function getxml(sequence record, sequence starttag, sequence endtag, integer tagnum) You don't haveto fill in all the parms, see the comments in the .ew file, i didn't doc it in the readme. I have used this, pretty much unchanged, since i first wrote it in 1992 or so in Turbo Pascal. Kat
3. Re: parsing HTML
- Posted by 10963508 at europeonline.com Aug 02, 2002
- 461 views
Thanks, I'm going take look at it now. ----- Original Message ----- From: "Kat" <gertie at PELL.NET> To: "EUforum" <EUforum at topica.com> Subject: Re: parsing HTML > > On 2 Aug 2002, at 17:51, 10963508 at europeonline.com wrote: > > > > > Can I parse HTML pages with Thomas Parslows's XML library? Will it work > > cause HTML pages don't have so strict syntax like XML, for example: <P> tag > > doesn't need closing </P> tag ... That XML library would be ideal because it > > allows that you pass it XML data by parts and I am not reading whole HTML site > > but reading it by pieces. > > > > Or is there some other library for parsing HTML pages? > > There is this in strtok (any version) > > global function getxml(sequence record, sequence starttag, sequence > endtag, integer tagnum) > > You don't haveto fill in all the parms, see the comments in the .ew file, i > didn't doc it in the readme. I have used this, pretty much unchanged, since i > first wrote it in 1992 or so in Turbo Pascal. > > Kat > > > >
4. Re: parsing HTML
- Posted by "Thomas Parslow (PatRat)" <tom at almostobsolete.net> Aug 03, 2002
- 482 views
> Can I parse HTML pages with Thomas Parslows's XML library? Will it work > cause HTML pages don't have so strict syntax like XML, for example: <P> tag > doesn't need closing </P> tag ... > That XML library would be ideal because it allows that you pass it XML data > by parts and I am not reading whole HTML site but reading it by pieces. > Or is there some other library for parsing HTML pages? > I need to get title of page and extract all links on page, that's for now, > something else might come up later. > Tone ©koda Hi, My library would not really be very useful for that, it expects conformant XML and returns an error if it is not. Thomas Parslow (PatRat) E-Mail/Jabber: tom at almostobsolete.net ICQ: 26359483