Re: Need Code for This [Cklester and/or Pete]
- Posted by "Kat" <kat12 at coosahs.net> Aug 27, 2006
- 568 views
> > > posted by: don cole <doncole at pacbell.net> > > Kat wrote: > > > > > > <snip> > > > > That is a task getxml() was written for around 1999 (i wrote it in > > Turbo Pascal way before that, early 1990's), in the first strtok.e. > > You can parse the page by <table> and the data lines on the webpage > > by <tr>, then each <td> inside is a data item for that <tr>. You can > > ask for the <td> #1, or #2, etc in each <tr>. > > > > I still can't help CK tho. > > > > Kat > > > > > I don't do it that way. But I'll look into parsing with strtok.e I > have that in my include files so I must be using it in something. > > Does the WebMaster or Mistess always use the same <table>, <tr>, > <td> scheme? and do all WebMasters and Mistesses use the same > scheme? No, you can count on webpages being different between each domain. There is no convention between using <TR> or <Tr> or <tr>. Even "<font face" can be "<FONT ecaf". No table on any webpage is like any other table on another domain's pages. And some domains like to add new "features" and delete others occasionally. Comments will be changed, so will "class" names. Some sites change the web address of the pictures on a page every 5 minutes (so people cannot link to them). Advertising is inserted or deleted in the html, and the domain of the adserver will be changed. Javascript or style sheets will be edited as authors find whatever they write won't work the same on all browsers. You may notice some html tags can include other html tags, on some pages, but some other site will use separate tags. It's a good idea to verify the page format you write code for hasn't changed periodically. Put in some code so that if the page's format has changed, or the data item in a table is not what you expect, it alerts you. Perhaps you search for "<!-- begin data -->", and someone changes that to "<!-- Data Goes Below -->". It's also a good idea to make sure your automatic "browser" doesn't register as a spammer or as a feeble denial of service attack by accessing the site too often. Kat