Re: Kat's 8bit sequences
- Posted by Chris Bensler <eu at creat?veportal.c?> Jun 04, 2008
- 752 views
(Hi everybody :) Sorry if this has been said already or if I'm completely off-base. Afaik, there is two main reasons why we don't have 8-bit handling for sequences. 1) It would introduce a new type. Even if only an internally used type, it still must be designated somehow so that euphoria can determine how to handle the data. Currently, data types in euphoria are stored as part of the actual memory pointer to the data. Eu cleverly uses 3 bits of the 32 bit pointer to represent the various data types. It can use 3 bits because eu/C ensures that all pointers are DWORD aligned. This means that the first 3 bits will always be 0 and and can be used for eu's purpose as long as it nulls those bits before attempting to access the pointer. For example, the first bit is true if the type is integer, if so, then the next bit indicates the sign, and the remaining 30 bits are the value. For other data types, the 2nd and 3rd bits are used to designate atom_int (large integer values), atom, sequence and object. For the sake of example ( don't know the exact flags off-hand), these flags might be represented as.. integer: 1xx atom_int: 010 atom: 001 sequence: 011 object: 000 2) Adding additional types, even if they are internally represented types, means the parser must handle additional cases for any type of data manipulations, such as math operations. This seems trivial but the complexity of the typechecking in each of these operations increases exponentially for every type that must be handled. On the other hand, if string handling can be optimized, it may outweigh or balance the tradeoff due to how common string manipulation is. This is in part how Eu can be as fast as it is. By utilizing the unused bits in the data pointers, Eu avoids having to lookup the data type in a separate table. And by having a limited set of data types, Eu avoids alot of costly type handling. There are some ways we can get around this, such as adding a flag to the sequence header to indicate if it's a string or a homogenous array, however this still adds a fair amount of complexity as special cases would have to be implemented for anytime a byte string must be manipulated, rather than simply introducing a new 8-bit integer type that can be handled universally. Of course it can be done, but both of these reasons would likely make a fairly significant impact on performance. I think this issue mostly boils down to a compromise between execution speed and storage efficiency. In general, string storage in Euphoria is not a problem and I don't believe it would be worth the compromise. Chris Bensler Code is Alchemy