OpenEuphoria: Forum: Re: Stripping HTML Tags from a Text File

Re: Stripping HTML Tags from a Text File

new topic » goto parent » topic index » view thread » older message » newer message

Posted by jeremy (admin) May 28, 2009
1342 views

I second the use of html2txt, it's a great product. If you just want a simple function to remove html tags, then something like this should do:

function strip_html(sequence html) 
	integer tag_start 
 
	while tag_start != 0 with entry do 
		integer tag_end = find_from('>', html, tag_start) 
		if tag_end = 0 then 
			puts(1, "Malformed HTML, aborting\n") 
			abort(1) 
		end if 
 
		html = remove(html, tag_start, tag_end) 
	entry 
		tag_start = find('<', html) 
	end while 
 
	return html 
end function

However, that simply strips the HTML tags, it does no conversion of HTML to Text, which is what you seem to want. It does not truly detect malformed HTML, for instance, this will pass: "<html Hello <b>World!</b>" as it simply strips from <html Hello <b> as 1 tag.

Jeremy

new topic » goto parent » topic index » view thread » older message » newer message

OpenEuphoria

Re: Stripping HTML Tags from a Text File

Search

Include:

Quick Links

User menu

Misc Menu