Re: internal storage
- Posted by Derek Parnell <ddparnell at bigpond.com> Sep 20, 2004
- 454 views
CoJaBo wrote: > > Derek Parnell wrote: [snip] > > All 'characters' are stored as 4-byte integers and not stored as single > > bytes. > > This DEFINATLY should be improved in a new version of Euphoria. > There are many times where I use allocated memory to get around > this problem. > Euphoria 2.5(or 2.6 if it would take too long) should use 1-byte > instead of 4-byte whenever possible. On the other hand, Euphoria's choice of 30-bit characters makes Unicode very, very easy to implement. Encoding in UTF-32 is a one-to-one mapping for most characters and only a small number would need to be stored in atoms. At the risk of complicating Euphoria, there may be a case to argue for a native UTF-8 character string. This would mean that English text would use 8-bit characters, and most European languages would average around 8-10 bits per character, though the East Asian languages would more than likely average 16-20 bits per character. Microsoft have decided to store Unicode strings as UTF-16 encoding which means that most languages in the world use about 2 bytes per character. Of course, you could do roll-your-own 'packed' string type for Euphoria sequences at the cost of slower execution speed. But there can't be many applications where the need for all text to be simulanteously stored in RAM is actually a performance boost. Most applications would only be dealing with a subset of the text at any one time. I don't think Google keeps all its cached pages in RAM <anacadote> I once wrote a tiny text editor (4KB of assembler) in which the text was never stored in RAM, just the disk address of each line. It ran so fast on an Intel-8088 that people didn't notice it was continually going out to disk to read text in. </anacadote> -- Derek Parnell Melbourne, Australia