Re: euphoria text processing
- Posted by petelomax Jun 03, 2013
- 1859 views
seany said...
written with unicode
I assume you mean files saved in various encodings, with the appropriate identifing BOM (Byte Order Mark)...
As a first step I think we would need something that can read the following files. What I mean by the following is five test files, each containing "Hello", ranging from 5 to 12 bytes. The () just indicate the BOM, nothing else.
Hello.txt: 48 65 6C 6C 6F Hello.Unicode.little.endian.txt: (FF FE) 48 00 65 00 6C 00 6C 00 6F 00 Hello.Unicode.big.endian.txt: (FE FF) 00 48 00 65 00 6C 00 6C 00 6F Hello.UTF8.txt: (EF BB BF) 48 65 6C 6C 6F Hello.UTF7.txt: (2B 2F 76 38 2D) 48 65 6C 6C 6F
I might be able to cobble something up for Windows, but would have no idea for Linux.
Or is there already something that can do this?
Pete