1. gets() and "string" variable type

I have followed the CR / LF / CRLF discussion with interest.
Personally I am with Christian in agreeing that gets() is to
look for the reasonable end-of-line indicator, which can be
any of those arrangements, with VT or FF thrown in too.

What I would like to see addressed too is the possibility
of a type "string", which is a string of bytes rather than
a sequence of 32-bit words. The memory saving is significant,
and the possibility of strings containing non-characters is
completely removed. Granted one could force the latter but
you still have this wasteful 32-bits per character. A poke()/
peek() could reduce the data size but is a very messy way to go.

I suppose the Unicode argument will wreck all I have said,
but a string type would simplify a lot of text processing.
So - comments?

new topic     » topic index » view message » categorize

2. Re: gets() and "string" variable type

Andy Drummond wrote:
> 
> I have followed the CR / LF / CRLF discussion with interest.
> Personally I am with Christian in agreeing that gets() is to
> look for the reasonable end-of-line indicator, which can be
> any of those arrangements, with VT or FF thrown in too.
> 
> What I would like to see addressed too is the possibility
> of a type "string", which is a string of bytes rather than
> a sequence of 32-bit words. The memory saving is significant,
> and the possibility of strings containing non-characters is
> completely removed. Granted one could force the latter but
> you still have this wasteful 32-bits per character. A poke()/
> peek() could reduce the data size but is a very messy way to go.
> 
> I suppose the Unicode argument will wreck all I have said,
> but a string type would simplify a lot of text processing.
> So - comments?

This and other extremely useful additional Eu types will resquire more native
types included. This will be very difficult to do without a thorough
reexamination of bit and test patterns, as discussed before. Adding them 
straight into the interpreter is probably impossible.

Since this idea raised only "don't do anything" knee jerk reactions, while
collective, careful study of the backend, coupled with extensive experimentation
and testing, is probably what was needed, do as I do: use other languages for
efficient string handling. You'll usually lose the benefits of the & operator in
the process - can't have it all.

CChris

new topic     » goto parent     » topic index » view message » categorize

3. Re: gets() and "string" variable type

CChris wrote:

> This and other extremely useful additional Eu types will resquire more native
> types included. This will be very difficult to do without a thorough
> reexamination
> of bit and test patterns, as discussed before. Adding them  straight into the
> interpreter is probably impossible.

Is it possible to get unicode functionality via an include file?

new topic     » goto parent     » topic index » view message » categorize

4. Re: gets() and "string" variable type

c.k.lester wrote:
> 
> CChris wrote:
> 
> > This and other extremely useful additional Eu types will resquire more
> > native
> > types included. This will be very difficult to do without a thorough
> > reexamination
> > of bit and test patterns, as discussed before. Adding them  straight into
> > the
> > interpreter is probably impossible.
> 
> Is it possible to get unicode functionality via an include file?

It depends on how much functionality you need.

If you only need to process files, and probably most devices, which are not to
be parsed by the interpreter, then yes. You keep reading bytes, it's how you
interpret them that changes, and you can do that in an include file. Likewise,
you preprocess your Unicode chars into bytes before basically outputting bytes.

If you want the interpreter to handle identifiers with Unicode chars in it, then
it's a different matter. I didn't think about what it would take to achieve that.

If you want to process stdin input which is done using Unicode, and not one byte
at a time, then an include file may not be enough. I don't have OS versions that
support Unicode installed, so I cannot even experiment with this. When the OS
reads on devices in an uniform way, like Windows does, the point may be moot.
Otherwise, I don't know.

I haven't used Tommy's Unicode Library, nor Aku's read/write CSV files library,
which relies on the latter. They are the only general tools currently listed in
the archive (there's also a lib by Greg Haberek, but it is Windows only), and may
give you a clearer idea of what can and cannot be done.

CChris

new topic     » goto parent     » topic index » view message » categorize

5. Re: gets() and "string" variable type

Andy Drummond wrote:
> 
> What I would like to see addressed too is the possibility
> of a type "string", which is a string of bytes rather than
> a sequence of 32-bit words. The memory saving is significant,
> and the possibility of strings containing non-characters is
> completely removed.
>
> So - comments?
The s[i] op is of critical importance, which I now know from practical
experience, and supporting both 8-bit strings and 4-byte sequences would
inevitably lead to an overhead of about 20%. Still want?

Regards,
Pete
PS [OT] Can someone explain the last line of this snippet from be_execute.e:

    case L_RHS_SUBS_CHECK:
	if (!IS_SEQUENCE(*(object_ptr)pc[1])) {
	    goto subsfail;
	}
	/* FALL THROUGH */
    case L_RHS_SUBS: /* rhs subscript of a sequence */
	top = *(object_ptr)pc[2];  /* the subscript */
	obj_ptr = (object_ptr)SEQ_PTR(*(object_ptr)pc[1]);/* the sequence */
	if ((unsigned long)(top-1) >= ((s1_ptr)obj_ptr)->length) {
	    tpc = pc;
	    top = recover_rhs_subscript(top, (s1_ptr)obj_ptr);
	}
	top = (object)*(top + ((s1_ptr)obj_ptr)->base);

My grasp of C is limited and I do not actually understand how it knows to get
"top" from "base+top*4" rather than just "base+top". Not that it does not seem
logical, more "how does it know?"  Specifically, which parts of

   typedef long object;
   typedef object *object_ptr;

are applied/applicable when and where? Is it the earlier *(object_ptr) or the
latter (object)* or both or what?

new topic     » goto parent     » topic index » view message » categorize

6. Re: gets() and "string" variable type

Pete Lomax wrote:
> 
> Andy Drummond wrote:
> > 
> > What I would like to see addressed too is the possibility
> > of a type "string", which is a string of bytes rather than
> > a sequence of 32-bit words. The memory saving is significant,
> > and the possibility of strings containing non-characters is
> > completely removed.
> >
> > So - comments?
> The s[i] op is of critical importance, which I now know from practical
> experience,
> and supporting both 8-bit strings and 4-byte sequences would inevitably lead
> to an overhead of about 20%. Still want?
> 
> Regards,
> Pete
> PS [OT] Can someone explain the last line of this snippet from be_execute.e:
> 
>     case L_RHS_SUBS_CHECK:
> 	if (!IS_SEQUENCE(*(object_ptr)pc[1])) {
> 	    goto subsfail;
> 	}
> 	/* FALL THROUGH */
>     case L_RHS_SUBS: /* rhs subscript of a sequence */
> 	top = *(object_ptr)pc[2];  /* the subscript */
> 	obj_ptr = (object_ptr)SEQ_PTR(*(object_ptr)pc[1]);/* the sequence */
> 	if ((unsigned long)(top-1) >= ((s1_ptr)obj_ptr)->length) {
> 	    tpc = pc;
> 	    top = recover_rhs_subscript(top, (s1_ptr)obj_ptr);
> 	}
> 	top = (object)*(top + ((s1_ptr)obj_ptr)->base);
> 
> My grasp of C is limited and I do not actually understand how it knows to get
> "top" from "base+top*4" rather than just "base+top". Not that it does not seem
> logical, more "how does it know?"  Specifically, which parts of
> 
>    typedef long object;
>    typedef object *object_ptr;
> 
> are applied/applicable when and where? Is it the earlier *(object_ptr) or the
> latter (object)* or both or what?
Your question confuses me, but then again, so does the code.

To me, the last line of code is saying this:
top = -- self explanatory
(object) -- cast the next value as if it were an object type (the value of the
expression)
* -- the value pointed to by the expression of
(top +
((s1_ptr) -- cast the next value as an s1_ptr type
obj_ptr) -- the name of the structure pointer variable actually being used
->base); -- the member of the structure

Does that make more sense? I don't think there is any multiplication going on
(unless I misunderstand your question), just pointer dereferencing and casting.

I'm no C expert either, at least where it comes to very complicated expressions.
There are a lot of gotchas there. And I haven't yet studied the C source enough
to really understand what's going on in detail.

As for your typedef question, again I'm not sure what you are asking. The
typedef statement says that object_ptr is a pointer to an object (which is really
a long). The part where the code says (object)* is a cast to an object, not an
object pointer. The asterisk binds to the right, not the left, I believe.

Clear as mud? Or is that why we use Euphoria in the first place?

--
A complex system that works is invariably found to have evolved from a simple
system that works.
--John Gall's 15th law of Systemantics.

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare

j.

new topic     » goto parent     » topic index » view message » categorize

7. Re: gets() and "string" variable type

I think I understand your "*4" question a little bit better...

Euphoria assumes that longs and pointers are the same size, namely 4 bytes. The
compiler takes care of the "*4". There is no byte addressing done, it's all done
by 4.

Does that help any? The pointer arithmetic takes the type's sizeof into account
automagically.

--
A complex system that works is invariably found to have evolved from a simple
system that works.
--John Gall's 15th law of Systemantics.

"Premature optimization is the root of all evil in programming."
--C.A.R. Hoare

j.

new topic     » goto parent     » topic index » view message » categorize

8. Re: gets() and "string" variable type

Jason Gade wrote:
> 
> I think I understand your "*4" question a little bit better...
> 
> Euphoria assumes that longs and pointers are the same size, namely 4 bytes.
> The compiler takes care of the "*4". There is no byte addressing done, it's
> all done by 4.
> 
> Does that help any? The pointer arithmetic takes the type's sizeof into
> account
> automagically.
> 

I would like to say that this demonstrates the excellence of Euphoria
sequences - and the language as a whole. No more do I have to wonder
if my pointer should step by 1, 2 or 4. No more type-casting ulongs to
HWND, no more adding code to check pointers are pointing to valid data,
no more shuffling things about to add a string where there was only a
word before...
And maybe the idea of byte-sized items in string sequences isn't quite
so good; certainly it is so easy to interface with DLLs that I can
easily write myself a DLL as a text buffer or whatever...
So thanks, Rob, your language has made programming umpteen times easier!
Andy

new topic     » goto parent     » topic index » view message » categorize

9. Re: gets() and "string" variable type

Pete Lomax wrote:
> 
> PS [OT] Can someone explain the last line of this snippet from be_execute.e:
> 

> 	top = (object)*(top + ((s1_ptr)obj_ptr)->base);
> 
> My grasp of C is limited and I do not actually understand how it knows to get
> "top" from "base+top*4" rather than just "base+top". Not that it does not seem
> logical, more "how does it know?"  Specifically, which parts of
> 
>    typedef long object;
>    typedef object *object_ptr;
> 
> are applied/applicable when and where? Is it the earlier *(object_ptr) or the
> latter (object)* or both or what?

The type defs just tell the compiler, "When I say 'object' please read it
as 'long', and when I say 'object_ptr' please read it as 'long*'."

The expression you put from be_execute.c is what's often referred to as
pointer arithmetic.  When you add 1 to a pointer, it points to the next
place in memory, *after the current value*.  This applies for structs, too,
BTW.  Another way to have written this would be:

   top = ((s1_ptr)obj_ptr)->base[top]

Which, to me, is usually clearer (and how I tend to code that sort of thing,
so when you see it in the code, it's probably something I wrote).

Matt

new topic     » goto parent     » topic index » view message » categorize

10. Re: gets() and "string" variable type

Matt Lewis wrote:
> 
> Pete Lomax wrote:
> > 
> > PS [OT] Can someone explain
> > 	top = (object)*(top + ((s1_ptr)obj_ptr)->base);
> > 
> pointer arithmetic.  When you add 1 to a pointer, it points to the next
> place in memory, *after the current value*.  applies for structs, too, BTW.
Ah! I see, it is the definition of base as long* which does it.

> Another way to have written this would be:
> 
>    top = ((s1_ptr)obj_ptr)->base[top]
> 
> Which, to me, is usually clearer
> 
Yes, much clearer. It was the "top+" part which confused me, also not realising
that C compilers recognise pointer arithmetic like that, and therefore tweak the
meaning of the "top+" just parsed with "* sizeof".

Still no fan of C though smile

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu