1. Kat's 8bit sequences

** The Idea **

Some sequences store their values using 64-bits each value others 32 bits.  Yet
it is transparent to the user.  See the performance note under strings.  What
was done for 32 bit values could also be done for 8 bit byte values.  Call them
Kat sequences.  The EUPHORIA syntax wouldn't change, yet under the hood the
amount of memory used for some sequences is 1 byte per value plus sequence
overhead.


** The Need? **

When EUPHORIA was released 15 years ago when computers were lucky to have a
500 MB HARD DRIVE this wasn't a problem.  It would seem less important these
days
with so much RAM.  Yet, Robert Craig didn't think it was an issue then.  Is it
an issue today?

War and Peace ( 3.13 MB )
The Bible ( 4.57 MB )
Don Quixote ( 2.24 MB )
[ From project Gutenburg ]

You could multiply these sizes by 10 without significantly taxing RAM.  If
you are working with ASCII records.  It would seem to me you only need two
in RAM at the same time, if you are doing a comparison operation of some kind.

Perhaps Kat is working for Big Brother and has everybody's DNA finger print in
Alabama on her computer.

When you move to HTML even, you still are shy of this 500 MB limit for the 
longest books of all time.

Although it seems ridiculous to most of us.  It may become reasonable as RAM
gets cheaper to load an entire video file into memory for editing, for example.
And It might be more convenient to edit as a stream of bytes rather than quads.


Shawn Pringle

new topic     » topic index » view message » categorize

2. Re: Kat's 8bit sequences

I sure could use 8-bit (or 16-bit Unicode) strings! I'm coming up on my next
release of wxEditor (boy, does it sure look nice! /gloat) and one stumbling block
I've found is that I'm loading files into sequences of strings, one line being
its own sequence (much easier to manipulate this way, IMHO), which quickly
inflates the memory usage, because I've got 4x (or 2x) the RAM being used per
character. I've tried writing my own 8-bit string routines, but nothing comes
close to the ease of manipulation I get with sequences.

I really think we'd benefit from storing strings as strings and sequences as
sequences, and still manipulate them equally.

-Greg

new topic     » goto parent     » topic index » view message » categorize

3. Re: Kat's 8bit sequences

Shawn Pringle wrote:
> 
> ** The Idea **
> 
> Some sequences store their values using 64-bits each value others 32 bits. 
> Yet
> it is transparent to the user.  See the performance note under strings.  What
> was done for 32 bit values could also be done for 8 bit byte values.  Call
> them
> Kat sequences.  The EUPHORIA syntax wouldn't change, yet under the hood the
> amount of memory used for some sequences is 1 byte per value plus sequence
> overhead.
> 
> 
> ** The Need? **
> 
> When EUPHORIA was released 15 years ago when computers were lucky to have a
> 500 MB HARD DRIVE this wasn't a problem.  It would seem less important these
> days with so much RAM.  Yet, Robert Craig didn't think it was an issue then. 
> Is
> it an issue today?

I learned a long time ago two things about programming:

1. I could manipulate many megs of data using only a few hundred bytes of
memory.
2. I would be an idiot to try to do it that way.

Why take weeks to create a slow, complex, probably bug-ridden program when 
you can throw cheap hardware at the problem and get the results much quicker,
with less chance for errors, using a simple script?

So is there a need? Not for most of us, in fact probably only one here.
The others who do this kind of thing probably investigated Eu and decided 
it was handicapped compared to other languages. Therefore, you won't see 
them here. That doesn't mean they don't exist.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Kat's 8bit sequences

Greg Haberek wrote:
> 
> 
> I sure could use 8-bit (or 16-bit Unicode) strings! I'm coming up on my next
> release of wxEditor (boy, does it sure look nice! /gloat) and one stumbling
> block I've found is that I'm loading files into sequences of strings, one line
> being its own sequence (much easier to manipulate this way, IMHO), which
> quickly
> inflates the memory usage, because I've got 4x (or 2x) the RAM being used per
> character. I've tried writing my own 8-bit string routines, but nothing comes
> close to the ease of manipulation I get with sequences.
> 
> I really think we'd benefit from storing strings as strings and sequences as
> sequences, and still manipulate them equally.

You make sense Greg.

If the goal is to make Euphoria more 'full featured' then adding other types
like strings would seem to be part of that goal.  It the goal is, "Hey, I'm
doing this volunteer work.  I'll code whatever cool thing I think of" that
'goal' would likely produce different results.  If your goal is to fill in
gaps or holes without much changing the language... well, that's a different
thing as well.

My concern about adding string types is the potential that Euphoria will then
have to do the type of garbage collection that is normally associated with
string usage.  This can create unacceptable performance lags.  If strings
can be implemented without the danger of such hiccups, I'd be much less
concerned about there addition.

Is there a formal process for adding features?  I ask because I'm not aware
of any.

new topic     » goto parent     » topic index » view message » categorize

5. Re: Kat's 8bit sequences

ken mortenson wrote:
> 
> Greg Haberek wrote:
> > 
> > 
> > I sure could use 8-bit (or 16-bit Unicode) strings! I'm coming up on my next
> > release of wxEditor (boy, does it sure look nice! /gloat) and one stumbling
> > block I've found is that I'm loading files into sequences of strings, one
> > line
> > being its own sequence (much easier to manipulate this way, IMHO), which
> > quickly
> > inflates the memory usage, because I've got 4x (or 2x) the RAM being used
> > per
> > character. I've tried writing my own 8-bit string routines, but nothing
> > comes
> > close to the ease of manipulation I get with sequences.
> > 
> > I really think we'd benefit from storing strings as strings and sequences as
> > sequences, and still manipulate them equally.
> 
> You make sense Greg.
> 
> If the goal is to make Euphoria more 'full featured' then adding other types
> like strings would seem to be part of that goal.  It the goal is, "Hey, I'm
> doing this volunteer work.  I'll code whatever cool thing I think of" that
> 'goal' would likely produce different results.  If your goal is to fill in
> gaps or holes without much changing the language... well, that's a different
> thing as well.

Not necessarily. More types are a bad thing, IMO, except for user-defined types.

I like the generecity of the language as it is.

> 
> My concern about adding string types is the potential that Euphoria will then
> have to do the type of garbage collection that is normally associated with
> string usage.  This can create unacceptable performance lags.  If strings
> can be implemented without the danger of such hiccups, I'd be much less
> concerned about there addition.

Euphoria already does garbage collection by reference counting. There's no
reason this would change with the addition of strings.

> 
> Is there a formal process for adding features?  I ask because I'm not aware
> of any.

http://sourceforge.net/tracker/?group_id=182827&atid=902785

--
A complex system that works is invariably found to have evolved from a simple
system that works.
--John Gall's 15th law of Systemantics.

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare

j.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Kat's 8bit sequences

ken mortenson wrote:
> 
> If the goal is to make Euphoria more 'full featured' then adding other types
> like strings would seem to be part of that goal.  It the goal is, "Hey, I'm
> doing this volunteer work.  I'll code whatever cool thing I think of" that
> 'goal' would likely produce different results.  If your goal is to fill in
> gaps or holes without much changing the language... well, that's a different
> thing as well.

I don't think there's a whole lot of "whatever cool thing" going on.  They
often get proposed, but there's usually a fair amount of discussion before
these things happen.  Still, since it's all volunteer work, those who are
interested are the ones who tend to write the code.

> My concern about adding string types is the potential that Euphoria will then
> have to do the type of garbage collection that is normally associated with
> string usage.  This can create unacceptable performance lags.  If strings
> can be implemented without the danger of such hiccups, I'd be much less
> concerned about there addition.

As Jason said, we've already got garbage collection.  There are other 
implementation details that have so far prevented the addition.

> Is there a formal process for adding features?  I ask because I'm not aware
> of any.

Not really.  It gets talked about, and if the general consensus is thumbs
up, then someone has to do it.  You could always code something up and 
present it, and try to get it accepted by the community (again, no 
formal process currently exists).  The project is open to pretty much 
anyone who is interested.

Matt

new topic     » goto parent     » topic index » view message » categorize

7. Re: Kat's 8bit sequences

Jason Gade wrote:
> 
> 
> Not necessarily. More types are a bad thing, IMO, except for user-defined
> types.
> 
> I like the generecity of the language as it is.
> 

Kat-sequences wouldn't compromise generecity(is that a word?).  They would
simply be a memory optimization perhaps with a flag in the sequence structure
telling us it is 8 bit or 16 bit.  Whoever thinks that kat-sequences 
would make the language less general misunderstands the idea.

The idea, nothing would change in either the API or the core language
definition.
Simply an implementation change.  Two bits in the sequence structure could say,
okay everything stored as members are 8-bit or 16-bit integers and then the 
interpreter would have to convert in combining sequences of different internal
types.  Which generally would never happen, but the interpreter would have to 
check for that.



Shawn Pringle B.Sc.


> > 
> > My concern about adding string types is the potential that Euphoria will
> > then
> > have to do the type of garbage collection that is normally associated with
> > string usage.  This can create unacceptable performance lags.  If strings
> > can be implemented without the danger of such hiccups, I'd be much less
> > concerned about there addition.
> 
> Euphoria already does garbage collection by reference counting. There's no
> reason
> this would change with the addition of strings.
> 
> > 
> > Is there a formal process for adding features?  I ask because I'm not aware
> > of any.
> 
> <a
> href="http://sourceforge.net/tracker/?group_id=182827&atid=902785">http://sourceforge.net/tracker/?group_id=182827&atid=902785</a>
> 
> --
> A complex system that works is invariably found to have evolved from a simple
> system that works.
> --John Gall's 15th law of Systemantics.
> 
> "Premature optimization is the root of all evil in programming."
> --C.A.R. Hoare
> 
> j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Kat's 8bit sequences

There are many ways to use 8-bit data manipulation .

1. Use my MIXEDLIB which contains "C" string routines written in Euphoria 

and assembler which should work on any Platform.

2. Use the "C" string calls in the Windows C-runtime.

3. Use the "C" string calls in Linux C-runtime.

The only thing it takes is time to write the code.

Bernie

My files in archive:
WMOTOR, XMOTOR, W32ENGIN, MIXEDLIB, EU_ENGIN, WIN32ERU, WIN32API 

Can be downloaded here:
http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan

new topic     » goto parent     » topic index » view message » categorize

9. Re: Kat's 8bit sequences

Jason Gade wrote:
> More types are a bad thing, IMO, except for user-defined types.

I'm in general agreement with this statement; however, different opinions
exist.  Core types are a big part of the Rebol language for instance.

There are a lot of things that VB gives me that I am unwilling to give up.
So perhaps I should write a sequence type for use in VB?  The problem with
that is that every operation on a VB sequence class would be of the nature...

sequence_class.ADD(sequence_type_var,10)

Instead of the prefered (IMHO) syntax...

sequence_type_var += 10

With respect to adding any type to Euphoria this is among the issues.

It's all tradeoffs, my friends.

new topic     » goto parent     » topic index » view message » categorize

10. Re: Kat's 8bit sequences

Bernie Ryan wrote:
> 
> 
> There are many ways to use 8-bit data manipulation .
> 
> 1. Use my MIXEDLIB which contains "C" string routines written in Euphoria 
> 
> and assembler which should work on any Platform.
> 
> 2. Use the "C" string calls in the Windows C-runtime.
> 
> 3. Use the "C" string calls in Linux C-runtime.
> 
> The only thing it takes is time to write the code.
> 
> Bernie
> 
> My files in archive:
> WMOTOR, XMOTOR, W32ENGIN, MIXEDLIB, EU_ENGIN, WIN32ERU, WIN32API 
> 
> Can be downloaded here:
> <a
> href="http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan">http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan</a>

But Bernie, mixedlib is NOT listed on that page!

What is the latest version number for mixedlib? There's no equivalent to Eu's
match() for the C strings (i do see several char search functions tho)?

Kat

new topic     » goto parent     » topic index » view message » categorize

11. Re: Kat's 8bit sequences

Kat wrote:
> There's no equivalent to Eu's
> match() for the C strings (i do see several char search functions tho)?
> 
> Kat

strstr() is C's version of match() for C strings. Bernie's mixedlib.e probably
implements this as well.

(For the C version, if you want an index instead of a pointer, do
"strstr(haystack,needle)-haystack;".)

new topic     » goto parent     » topic index » view message » categorize

12. Re: Kat's 8bit sequences

Jim Brown wrote:
> 
> Kat wrote:
> > There's no equivalent to Eu's
> > match() for the C strings (i do see several char search functions tho)?
> > 
> > Kat
> 
> strstr() is C's version of match() for C strings. Bernie's mixedlib.e probably
> implements this as well.

No, it's not there. I did find it first in Igor's version of w32engine, which is
Bernie's with extra 400K of stuff, so i looked in Bernie's w2engine, and found
the stuff below....

> (For the C version, if you want an index instead of a pointer, do
> "strstr(haystack,needle)-haystack;".)

w32engine has this stuff, among tons of other stuff...
Def &= {{0,"StrStrA",11,{pointer_,pointer_},pointer_}}
global function StrStrA(object p) return f_(1816,p) end function
Def &= {{0,"StrStrIA",11,{pointer_,pointer_},pointer_}}
global function StrStrIA(object p) return f_(1817,p) end function
Def &= {{0,"StrStrIW",11,{pointer_,pointer_},pointer_}}
global function StrStrIW(object p) return f_(1818,p) end function
Def &= {{0,"StrStrW",11,{pointer_,pointer_},pointer_}}
global function StrStrW(object p) return f_(1819,p) end function

So,, you are going to tell me to google it all, right?

Kat

new topic     » goto parent     » topic index » view message » categorize

13. Re: Kat's 8bit sequences

irv mullins wrote:
> 
> Shawn Pringle wrote:
> > 
> > ** The Idea **
> > 
> > Some sequences store their values using 64-bits each value others 32 bits. 
> > Yet
> > it is transparent to the user.  See the performance note under strings. 
> > What
> > was done for 32 bit values could also be done for 8 bit byte values.  Call
> > them
> > Kat sequences.  The EUPHORIA syntax wouldn't change, yet under the hood the
> > amount of memory used for some sequences is 1 byte per value plus sequence
> > overhead.
> > 
> > 
> > ** The Need? **
> > 
> > When EUPHORIA was released 15 years ago when computers were lucky to have a
> > 500 MB HARD DRIVE this wasn't a problem.  It would seem less important these
> > days with so much RAM.  Yet, Robert Craig didn't think it was an issue then.
> >  Is
> > it an issue today?
> 
> I learned a long time ago two things about programming:
> 
> 1. I could manipulate many megs of data using only a few hundred bytes of
> memory.
> 2. I would be an idiot to try to do it that way.
> 
> Why take weeks to create a slow, complex, probably bug-ridden program when 
> you can throw cheap hardware at the problem and get the results much quicker,
> with less chance for errors, using a simple script?

Because what "cheap" hardware means to one person may mean unaffordably 
expensive to another, and their TIME may be much more easily spent?

> 
> So is there a need? Not for most of us, in fact probably only one here.
> The others who do this kind of thing probably investigated Eu and decided 
> it was handicapped compared to other languages. Therefore, you won't see 
> them here. That doesn't mean they don't exist.

But *if there's no performance hit* for adding 8 bit byte value sequences,
then those "others" might be encouraged to use Euphoria, which would generally
be recognized as a (mostly) good thing for Euphoria?

Dan

new topic     » goto parent     » topic index » view message » categorize

14. Re: Kat's 8bit sequences

Dan Moyer wrote:

> But *if there's no performance hit* for adding 8 bit byte value sequences,
> then those "others" might be encouraged to use Euphoria, which would generally
> be recognized as a (mostly) good thing for Euphoria? 

But that's the point. The current thinking is that Euphoria will actually run
slower than now if byte sequences were added to the mix of datatypes. And that
means slower for every Euphoria program not just those using byte sequences.

The reason goes along the lines that currently, most operations in the backend
that use datatypes, need to decide what sort of thing it is first; is it an
integer, an atom, a sequence or an object? There are two bits set aside in the
32-bit datatype entity reserved to tell the backend what sort of thing the entity
is. 2-bits means four different possibilities. Now if we add a fifth datatype we
need to do something clever to identify it. A byte sequence is obviously of type
of sequence, so we could add to the sequence 'struct' a code that differentiates
between a 32-bit sequence and an 8-bit sequence, but then every backend operation
that deals with sequences will have to do this extra test before it can perform
the operation, because the operation implementation will be different for the
different sequence types.

And how many Euphoria programs are there that doesn't use sequences.

Now, how much of an impact this might be is pure guess work at this time.

Then there is the new code needed to atually implement efficient 8-bit sequence
operations. And there is a quite a bit of that needed.

All this is going to take a lot of time to do. So, don't expect 8-bit sequences
to make it in v4.0, but the door is definitely not closed on them. I expect that
there will be a few experimental editions to test out completing implementations
and to gain insights into how to make it all efficient. As Euphoria is open
source that means those with expectations for byte sequences can be a part of
their development.

I'm very sure that Euphoria will get byte sequences one day, but not right away.

-- 
Derek Parnell
Melbourne, Australia
Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

15. Re: Kat's 8bit sequences

Derek Parnell wrote:
> 
> Dan Moyer wrote:
> 
> > But *if there's no performance hit* for adding 8 bit byte value sequences,
> > then those "others" might be encouraged to use Euphoria, which would
> > generally
> > be recognized as a (mostly) good thing for Euphoria? 
> 
> But that's the point. The current thinking is that Euphoria will actually run
> slower than now if byte sequences were added to the mix of datatypes. And that
> means slower for every Euphoria program not just those using byte sequences.
> 

Ok.  

> The reason goes along the lines that currently, most operations in the backend
> that use datatypes, need to decide what sort of thing it is first; is it an
> integer, an atom, a sequence or an object? There are two bits set aside in the
> 32-bit datatype entity reserved to tell the backend what sort of thing the
> entity
> is. 2-bits means four different possibilities. Now if we add a fifth datatype
> we need to do something clever to identify it. A byte sequence is obviously
> of type of sequence, so we could add to the sequence 'struct' a code that
> differentiates
> between a 32-bit sequence and an 8-bit sequence, but then every backend
> operation
> that deals with sequences will have to do this extra test before it can
> perform
> the operation, because the operation implementation will be different for the
> different sequence types.

Thanks for the explanation Derek. 

> 
> And how many Euphoria programs are there that doesn't use sequences.
> 
> Now, how much of an impact this might be is pure guess work at this time.
> 
> Then there is the new code needed to atually implement efficient 8-bit
> sequence
> operations. And there is a quite a bit of that needed.
> 
> All this is going to take a lot of time to do. So, don't expect 8-bit
> sequences
> to make it in v4.0, but the door is definitely not closed on them. I expect
> that there will be a few experimental editions to test out completing
> implementations
> and to gain insights into how to make it all efficient. As Euphoria is open
> source that means those with expectations for byte sequences can be a part of
> their development.
> 
> I'm very sure that Euphoria will get byte sequences one day, but not right
> away.

Sounds good.

Dan

> 
> -- 
> Derek Parnell
> Melbourne, Australia
> Skype name: derek.j.parnell

new topic     » goto parent     » topic index » view message » categorize

16. Re: Kat's 8bit sequences

Kat wrote:
> w32engine has this stuff, among tons of other stuff...
> Def &= {{0,"StrStrA",11,{pointer_,pointer_},pointer_}}
> global function StrStrA(object p) return f_(1816,p) end function
<other redundant stuff snipped>

Ugh. That's ugly.

Ok, here is your match function for C strings, using StrStrA:

constant LIB = open_dll("shlwapi.dll")
--constant LIB = open_dll("shell32.dll")
constant StrStrA = define_c_func(LIB, "StrStrA", {C_POINTER, C_POINTER},
C_POINTER)
global function match(atom a, atom b)
atom ret
ret = c_func(StrStrA, {a, b})
if ret != NULL then
ret = ret - a
else
ret = -1
end if
return ret
end function

> 
> So,, you are going to tell me to google it all, right?

Yes, because then you'll see your message on Google. :D

Here is a (signifcantly slower) implementation in pure Eu:

global function match(atom a, atom b)
for i = 0 to strlen(a) do
for j = 0 to strlen(b) do
if slice(a,i+j) != slice(b,j) then
exit
elsif j = strlen(b) then
return i
else
--continue
end if
end for
end for
return -1
end function

> 
> Kat

new topic     » goto parent     » topic index » view message » categorize

17. Re: Kat's 8bit sequences

Dan Moyer wrote:
> 
> irv mullins wrote:

> > I learned a long time ago two things about programming:
> > 
> > 1. I could manipulate many megs of data using only a few hundred bytes of
> > memory.
> > 2. I would be an idiot to try to do it that way.
> > 
> > Why take weeks to create a slow, complex, probably bug-ridden program when 
> > you can throw cheap hardware at the problem and get the results much
> > quicker,
> > with less chance for errors, using a simple script?
> 
> Because what "cheap" hardware means to one person may mean unaffordably 
> expensive to another, and their TIME may be much more easily spent?
> 
> Dan

I think any reasonable person would understand that if they are asking you
to massage gigabytes of data, there should be at least a few hundred 
dollars provided to buy appropriate hardware.
Somewhat easier to come up with that than a few thousand dollars for 
programmers' time.

If you are doing this without funding on discarded old hardware, 
and your time is worthless, and there's no profit to be made, then it's 
a hobby, so whatever works...

new topic     » goto parent     » topic index » view message » categorize

18. Re: Kat's 8bit sequences

irv mullins wrote:
> 
> Dan Moyer wrote:
> > 
> > irv mullins wrote:
> 
> > > I learned a long time ago two things about programming:
> > > 
> > > 1. I could manipulate many megs of data using only a few hundred bytes of
> > > memory.
> > > 2. I would be an idiot to try to do it that way.
> > > 
> > > Why take weeks to create a slow, complex, probably bug-ridden program when
> > >
> > > you can throw cheap hardware at the problem and get the results much
> > > quicker,
> > > with less chance for errors, using a simple script?
> > 
> > Because what "cheap" hardware means to one person may mean unaffordably 
> > expensive to another, and their TIME may be much more easily spent?
> > 
> > Dan
> 
> I think any reasonable person would understand that if they are asking you
> to massage gigabytes of data, there should be at least a few hundred 
> dollars provided to buy appropriate hardware.
> Somewhat easier to come up with that than a few thousand dollars for 
> programmers' time.
> 
> If you are doing this without funding on discarded old hardware, 
> and your time is worthless, and there's no profit to be made, then it's 
> a hobby, so whatever works...

Irv,

Given that this is a thread about KAT's sequences, I was trying, without
knowing exactly what she's doing, to relate to what some have suggested
might be useful to her.  That said, I don't think we can presume what her
margin of profit is and what she can and cannot spend it on.   

And I never said anything about anyones time being worthless, just that some
may find spending time easier to do than spending money.  Nor did I say that
there was no profit to be made. 

I don't think scrounging to eke out a living can truly be considered
a "hobby". 

Dan

new topic     » goto parent     » topic index » view message » categorize

19. Re: Kat's 8bit sequences

Kat wrote:
> But Bernie, mixedlib is NOT listed on that page!
> 
> What is the latest version number for mixedlib? There's no equivalent to Eu's
> match() for the C strings (i do see several char search functions tho)?
> 

Kat

Rob called it "Utility Library C-functions"
Bernie

My files in archive:
WMOTOR, XMOTOR, W32ENGIN, MIXEDLIB, EU_ENGIN, WIN32ERU, WIN32API 

Can be downloaded here:
http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan

new topic     » goto parent     » topic index » view message » categorize

20. Re: Kat's 8bit sequences

(Hi everybody :)

Sorry if this has been said already or if I'm completely off-base.

Afaik, there is two main reasons why we don't have 8-bit handling for sequences.

1) It would introduce a new type. Even if only an internally used type, it still
must be designated somehow so that euphoria can determine how to handle the data.
Currently, data types in euphoria are stored as part of the actual memory pointer
to the data. Eu cleverly uses 3 bits of the 32 bit pointer to represent the
various data types. It can use 3 bits because eu/C ensures that all pointers are
DWORD aligned. This means that the first 3 bits will always be 0 and and can be
used for eu's purpose as long as it nulls those bits before attempting to access
the pointer.

For example, the first bit is true if the type is integer, if so, then the next
bit indicates the sign, and the remaining 30 bits are the value. For other data
types, the 2nd and 3rd bits are used to designate atom_int (large integer
values), atom, sequence and object. For the sake of example ( don't know the
exact flags off-hand), these flags might be represented as..

integer: 1xx
atom_int: 010
atom: 001
sequence: 011
object: 000

2) Adding additional types, even if they are internally represented types, means
the parser must handle additional cases for any type of data manipulations, such
as math operations. This seems trivial but the complexity of the typechecking in
each of these operations increases exponentially for every type that must be
handled. On the other hand, if string handling can be optimized, it may outweigh
or balance the tradeoff due to how common string manipulation is.

This is in part how Eu can be as fast as it is. By utilizing the unused bits in
the data pointers, Eu avoids having to lookup the data type in a separate table.
And by having a limited set of data types, Eu avoids alot of costly type
handling.

There are some ways we can get around this, such as adding a flag to the
sequence header to indicate if it's a string or a homogenous array, however this
still adds a fair amount of complexity as special cases would have to be
implemented for anytime a byte string must be manipulated, rather than simply
introducing a new 8-bit integer type that can be handled universally.

Of course it can be done, but both of these reasons would likely make a fairly
significant impact on performance. I think this issue mostly boils down to a
compromise between execution speed and storage efficiency.

In general, string storage in Euphoria is not a problem and I don't believe it
would be worth the compromise.

Chris Bensler
Code is Alchemy

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu