Re: Parsing

new topic     » goto parent     » topic index » view thread      » older message » newer message

I lost the original message and a couple other messages too.

Someone asked about parsing.
They even mentioned <HTML>.

Here is a throw together HTML parser.
This strips out MOST <tags>,
replaces linefeeds with spaces, and
replaces <BR>, <HR>, </H????>, </TITLE> with line feed.


----------Parses and displays file.htm---------
-----Few comments involved
include wildcard.e--used for changing some text to upper case.

sequence buffer
object line
integer handle
integer l, g
--l is lessthan
--g is greaterthan
integer lf
--lf is linefeed
lf = 10


handle = open("file.htm", "r")

buffer = {}
while 1 do
  line = gets(handle)
  if atom(line) then
    exit
  end if
  line[length(line)] = 32
  buffer = buffer & line
end while

l = find('<', buffer)
while l do
  g = find('>', buffer)
  buffer[l..g] = upper(buffer[l..g])
  if compare(buffer[l..g], "<BR>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..g], "<HR>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..g], "</TITLE>") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  elsif compare(buffer[l..l + 2], "</H") = 0 then
    buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
  else
    buffer = buffer[1..l - 1] & buffer[g + 1..length(buffer)]
  end if
  l = find('<', buffer)
end while
puts(1, buffer)
------------------End file------------

--Lucius Lamar Hilley III
--  E-mail at luciuslhilleyiii at juno.com
--  I support transferring of files less than 60K.
--  I can Decode both UU and Base64 format.

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu