OpenEuphoria.org
Ticket #822:
Bad xml
-
Reported by
mattlewis
Nov 26, 2012
The rss feed sometimes results in bad xml, apparently due to encoding issues. In particular, on http://openeuphoria.org/forum/m/119918.wc:
opções
This is saved as: 70 E7 F5 65 73. Character F5 is not valid UTF8. It should be C3 B5. When I wget the feed and open it with my editor, it auto-detects the encoding as ISO-8859-15. Not sure where the encoding problem occurs (DB collation, euweb?).
This problem causes problems for RSS readers that only display feeds with correct XML.
Details
1. Comment by mattlewis
Nov 27, 2012
Actually, that post, while failing validation, seems ok in my reader. It was actually the original post that was problematic. It had some weird character data in there (I edited it to get rid of the bad data):
<p> &lt;eucode&gt; &lt;/eucode&gt;</p>
After the first
>, there are the bytes 1A AD, which cause validation to fail.
2. Comment by mattlewis
Nov 29, 2012
I tried adding a wrapper around iconv to sanitize the UTF-8, but that leaves in the 1A character, which seems to be bad for RSS. A simple removal of '1A's seems to fix the feed as far as that goes, though the sanitization might be good to have anyways.
(See code in the test directory tree on openeuphoria.org.)