Re: Kat's 8bit sequences

new topic     » goto parent     » topic index » view thread      » older message » newer message

(Hi everybody :)

Sorry if this has been said already or if I'm completely off-base.

Afaik, there is two main reasons why we don't have 8-bit handling for sequences.

1) It would introduce a new type. Even if only an internally used type, it still
must be designated somehow so that euphoria can determine how to handle the data.
Currently, data types in euphoria are stored as part of the actual memory pointer
to the data. Eu cleverly uses 3 bits of the 32 bit pointer to represent the
various data types. It can use 3 bits because eu/C ensures that all pointers are
DWORD aligned. This means that the first 3 bits will always be 0 and and can be
used for eu's purpose as long as it nulls those bits before attempting to access
the pointer.

For example, the first bit is true if the type is integer, if so, then the next
bit indicates the sign, and the remaining 30 bits are the value. For other data
types, the 2nd and 3rd bits are used to designate atom_int (large integer
values), atom, sequence and object. For the sake of example ( don't know the
exact flags off-hand), these flags might be represented as..

integer: 1xx
atom_int: 010
atom: 001
sequence: 011
object: 000

2) Adding additional types, even if they are internally represented types, means
the parser must handle additional cases for any type of data manipulations, such
as math operations. This seems trivial but the complexity of the typechecking in
each of these operations increases exponentially for every type that must be
handled. On the other hand, if string handling can be optimized, it may outweigh
or balance the tradeoff due to how common string manipulation is.

This is in part how Eu can be as fast as it is. By utilizing the unused bits in
the data pointers, Eu avoids having to lookup the data type in a separate table.
And by having a limited set of data types, Eu avoids alot of costly type
handling.

There are some ways we can get around this, such as adding a flag to the
sequence header to indicate if it's a string or a homogenous array, however this
still adds a fair amount of complexity as special cases would have to be
implemented for anytime a byte string must be manipulated, rather than simply
introducing a new 8-bit integer type that can be handled universally.

Of course it can be done, but both of these reasons would likely make a fairly
significant impact on performance. I think this issue mostly boils down to a
compromise between execution speed and storage efficiency.

In general, string storage in Euphoria is not a problem and I don't believe it
would be worth the compromise.

Chris Bensler
Code is Alchemy

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu