Re: EuGTK and UTF-8
- Posted by jimcbrown (admin) Dec 16, 2012
- 1608 views
EuGTK will work, but only by using the one byte font.
AS IT STANDS WITH EUPHORIA, EUGTK WILL NOT WORK USING THE FULL TWO BYTE STORAGE.
This is flat out wrong. I tested EuGTK demos test7.ex (single line text entry) and test48.ex (a simple text editor) with Hangul syllablic block characters and with Hanzi (both of which require a minimum of at least two bytes per character in any character set), and it worked fine. Of course, I used UTF-8 (which is backwards compatible with ASCII at the binary level, unlike UTF-16).
1. Kindly look at and provide the Peeks of one of these strings.
Here you go: {228,189,160,229,165,189,233,169,172,239,188,159}
2. Try extracting the third and 4th syllable of a string, like extracting "mc" from "jimbrown" in your EuGTK created Chinese text, using Euphoria and then extracting 6th and 7th similar to extracting "ro" from "jimbrown".
Hmm. I know how to do that (using http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html#g-utf8-find-next-char ) but the ancient version of EuGTK that I just happened to have lying around when I decided to do these tests doesn't seem to have that wrapped.
Still, even if I have to wrap glib by hand myself, glib makes it a lot easier. Heck, they even make it easy to compare utf-8 strings! http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html#g-utf8-collate
On the other hand, I could simply wrap http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html#g-utf8-to-ucs4-fast and http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html#g-ucs4-to-utf8 and then do any character comparision or substring extraction via ucs4.
Not only does glib make it easy, but it gives you choices!
Not that I have anything against wxWidgets, mind you. I haven't looked as closely into wxWidgets, but I'm sure that the unicode version makes things just as easy as Glib does. Heck, you can build wxWidgets on top of GTK and have the best of both worlds!
There are other aspects of your posts reminiscent of the year 200o-2003, and I will address them later as I get time.
Perhaps you could refresh my memory and point out a few specific examples from that time? As you address these other aspects of my more recent posts, of course.
Whenever i come accross Unicode and utf-8, theorizing seems to be the order of day. Outside of simple nice text under utw-8 in the net like this:
"जैसा ये लिखा है",
a major part of it is vapourware.
Again, I shall point you to http://www.gtk.org/api/2.6/glib/glib-Unicode-Manipulation.html