OpenEuphoria.org Ticket #822: Bad xml

The rss feed sometimes results in bad xml, apparently due to encoding issues. In particular, on http://openeuphoria.org/forum/m/119918.wc:

opções 

This is saved as: 70 E7 F5 65 73. Character F5 is not valid UTF8. It should be C3 B5. When I wget the feed and open it with my editor, it auto-detects the encoding as ISO-8859-15. Not sure where the encoding problem occurs (DB collation, euweb?).

This problem causes problems for RSS readers that only display feeds with correct XML.

Details

Type: Bug Report Severity: Normal Category: General
Assigned To: unknown Status: New Reported Release:
Fixed in SVN #: View VCS: none Milestone:

1. Comment by mattlewis Nov 27, 2012

Actually, that post, while failing validation, seems ok in my reader. It was actually the original post that was problematic. It had some weird character data in there (I edited it to get rid of the bad data):

<p> ­<eucode> </eucode></p> 
After the first >, there are the bytes 1A AD, which cause validation to fail.

2. Comment by mattlewis Nov 29, 2012

I tried adding a wrapper around iconv to sanitize the UTF-8, but that leaves in the 1A character, which seems to be bad for RSS. A simple removal of '1A's seems to fix the feed as far as that goes, though the sanitization might be good to have anyways.

(See code in the test directory tree on openeuphoria.org.)

Search



Quick Links

User menu

Not signed in.

Misc Menu