Re: Parsing
I lost the original message and a couple other messages too.
Someone asked about parsing.
They even mentioned <HTML>.
Here is a throw together HTML parser.
This strips out MOST <tags>,
replaces linefeeds with spaces, and
replaces <BR>, <HR>, </H????>, </TITLE> with line feed.
----------Parses and displays file.htm---------
-----Few comments involved
include wildcard.e--used for changing some text to upper case.
sequence buffer
object line
integer handle
integer l, g
--l is lessthan
--g is greaterthan
integer lf
--lf is linefeed
lf = 10
handle = open("file.htm", "r")
buffer = {}
while 1 do
line = gets(handle)
if atom(line) then
exit
end if
line[length(line)] = 32
buffer = buffer & line
end while
l = find('<', buffer)
while l do
g = find('>', buffer)
buffer[l..g] = upper(buffer[l..g])
if compare(buffer[l..g], "<BR>") = 0 then
buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
elsif compare(buffer[l..g], "<HR>") = 0 then
buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
elsif compare(buffer[l..g], "</TITLE>") = 0 then
buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
elsif compare(buffer[l..l + 2], "</H") = 0 then
buffer = buffer[1..l - 1] & 10 & buffer[g + 1..length(buffer)]
else
buffer = buffer[1..l - 1] & buffer[g + 1..length(buffer)]
end if
l = find('<', buffer)
end while
puts(1, buffer)
------------------End file------------
--Lucius Lamar Hilley III
-- E-mail at luciuslhilleyiii at juno.com
-- I support transferring of files less than 60K.
-- I can Decode both UU and Base64 format.
|
Not Categorized, Please Help
|
|