Re: Stripping HTML Tags from a Text File
- Posted by jeremy (admin) May 28, 2009
- 1342 views
I second the use of html2txt, it's a great product. If you just want a simple function to remove html tags, then something like this should do:
function strip_html(sequence html) integer tag_start while tag_start != 0 with entry do integer tag_end = find_from('>', html, tag_start) if tag_end = 0 then puts(1, "Malformed HTML, aborting\n") abort(1) end if html = remove(html, tag_start, tag_end) entry tag_start = find('<', html) end while return html end function
However, that simply strips the HTML tags, it does no conversion of HTML to Text, which is what you seem to want. It does not truly detect malformed HTML, for instance, this will pass: "<html Hello <b>World!</b>" as it simply strips from <html Hello <b> as 1 tag.
Jeremy