Re: Stripping HTML Tags from a Text File

new topic     » goto parent     » topic index » view thread      » older message » newer message

I second the use of html2txt, it's a great product. If you just want a simple function to remove html tags, then something like this should do:

function strip_html(sequence html) 
	integer tag_start 
 
	while tag_start != 0 with entry do 
		integer tag_end = find_from('>', html, tag_start) 
		if tag_end = 0 then 
			puts(1, "Malformed HTML, aborting\n") 
			abort(1) 
		end if 
 
		html = remove(html, tag_start, tag_end) 
	entry 
		tag_start = find('<', html) 
	end while 
 
	return html 
end function 

However, that simply strips the HTML tags, it does no conversion of HTML to Text, which is what you seem to want. It does not truly detect malformed HTML, for instance, this will pass: "<html Hello <b>World!</b>" as it simply strips from <html Hello <b> as 1 tag.

Jeremy

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu