1. gets() and "string" variable type
- Posted by Andy Drummond <andy at kestre?t?le.com> Sep 14, 2007
- 607 views
I have followed the CR / LF / CRLF discussion with interest. Personally I am with Christian in agreeing that gets() is to look for the reasonable end-of-line indicator, which can be any of those arrangements, with VT or FF thrown in too. What I would like to see addressed too is the possibility of a type "string", which is a string of bytes rather than a sequence of 32-bit words. The memory saving is significant, and the possibility of strings containing non-characters is completely removed. Granted one could force the latter but you still have this wasteful 32-bits per character. A poke()/ peek() could reduce the data size but is a very messy way to go. I suppose the Unicode argument will wreck all I have said, but a string type would simplify a lot of text processing. So - comments?
2. Re: gets() and "string" variable type
- Posted by CChris <christian.cuvier at ag?icu?ture.gouv.fr> Sep 14, 2007
- 560 views
Andy Drummond wrote: > > I have followed the CR / LF / CRLF discussion with interest. > Personally I am with Christian in agreeing that gets() is to > look for the reasonable end-of-line indicator, which can be > any of those arrangements, with VT or FF thrown in too. > > What I would like to see addressed too is the possibility > of a type "string", which is a string of bytes rather than > a sequence of 32-bit words. The memory saving is significant, > and the possibility of strings containing non-characters is > completely removed. Granted one could force the latter but > you still have this wasteful 32-bits per character. A poke()/ > peek() could reduce the data size but is a very messy way to go. > > I suppose the Unicode argument will wreck all I have said, > but a string type would simplify a lot of text processing. > So - comments? This and other extremely useful additional Eu types will resquire more native types included. This will be very difficult to do without a thorough reexamination of bit and test patterns, as discussed before. Adding them straight into the interpreter is probably impossible. Since this idea raised only "don't do anything" knee jerk reactions, while collective, careful study of the backend, coupled with extensive experimentation and testing, is probably what was needed, do as I do: use other languages for efficient string handling. You'll usually lose the benefits of the & operator in the process - can't have it all. CChris
3. Re: gets() and "string" variable type
- Posted by c.k.lester <euphoric at ckles?er.?om> Sep 14, 2007
- 534 views
CChris wrote: > This and other extremely useful additional Eu types will resquire more native > types included. This will be very difficult to do without a thorough > reexamination > of bit and test patterns, as discussed before. Adding them straight into the > interpreter is probably impossible. Is it possible to get unicode functionality via an include file?
4. Re: gets() and "string" variable type
- Posted by CChris <christian.cuvier at agriculture.??uv.fr> Sep 14, 2007
- 533 views
c.k.lester wrote: > > CChris wrote: > > > This and other extremely useful additional Eu types will resquire more > > native > > types included. This will be very difficult to do without a thorough > > reexamination > > of bit and test patterns, as discussed before. Adding them straight into > > the > > interpreter is probably impossible. > > Is it possible to get unicode functionality via an include file? It depends on how much functionality you need. If you only need to process files, and probably most devices, which are not to be parsed by the interpreter, then yes. You keep reading bytes, it's how you interpret them that changes, and you can do that in an include file. Likewise, you preprocess your Unicode chars into bytes before basically outputting bytes. If you want the interpreter to handle identifiers with Unicode chars in it, then it's a different matter. I didn't think about what it would take to achieve that. If you want to process stdin input which is done using Unicode, and not one byte at a time, then an include file may not be enough. I don't have OS versions that support Unicode installed, so I cannot even experiment with this. When the OS reads on devices in an uniform way, like Windows does, the point may be moot. Otherwise, I don't know. I haven't used Tommy's Unicode Library, nor Aku's read/write CSV files library, which relies on the latter. They are the only general tools currently listed in the archive (there's also a lib by Greg Haberek, but it is Windows only), and may give you a clearer idea of what can and cannot be done. CChris
5. Re: gets() and "string" variable type
- Posted by Pete Lomax <petelomax at blue?onder.co?uk> Sep 15, 2007
- 594 views
Andy Drummond wrote: > > What I would like to see addressed too is the possibility > of a type "string", which is a string of bytes rather than > a sequence of 32-bit words. The memory saving is significant, > and the possibility of strings containing non-characters is > completely removed. > > So - comments? The s[i] op is of critical importance, which I now know from practical experience, and supporting both 8-bit strings and 4-byte sequences would inevitably lead to an overhead of about 20%. Still want? Regards, Pete PS [OT] Can someone explain the last line of this snippet from be_execute.e: case L_RHS_SUBS_CHECK: if (!IS_SEQUENCE(*(object_ptr)pc[1])) { goto subsfail; } /* FALL THROUGH */ case L_RHS_SUBS: /* rhs subscript of a sequence */ top = *(object_ptr)pc[2]; /* the subscript */ obj_ptr = (object_ptr)SEQ_PTR(*(object_ptr)pc[1]);/* the sequence */ if ((unsigned long)(top-1) >= ((s1_ptr)obj_ptr)->length) { tpc = pc; top = recover_rhs_subscript(top, (s1_ptr)obj_ptr); } top = (object)*(top + ((s1_ptr)obj_ptr)->base); My grasp of C is limited and I do not actually understand how it knows to get "top" from "base+top*4" rather than just "base+top". Not that it does not seem logical, more "how does it know?" Specifically, which parts of typedef long object; typedef object *object_ptr; are applied/applicable when and where? Is it the earlier *(object_ptr) or the latter (object)* or both or what?
6. Re: gets() and "string" variable type
- Posted by Jason Gade <jaygade at yahoo.c??> Sep 15, 2007
- 561 views
Pete Lomax wrote: > > Andy Drummond wrote: > > > > What I would like to see addressed too is the possibility > > of a type "string", which is a string of bytes rather than > > a sequence of 32-bit words. The memory saving is significant, > > and the possibility of strings containing non-characters is > > completely removed. > > > > So - comments? > The s[i] op is of critical importance, which I now know from practical > experience, > and supporting both 8-bit strings and 4-byte sequences would inevitably lead > to an overhead of about 20%. Still want? > > Regards, > Pete > PS [OT] Can someone explain the last line of this snippet from be_execute.e: > > case L_RHS_SUBS_CHECK: > if (!IS_SEQUENCE(*(object_ptr)pc[1])) { > goto subsfail; > } > /* FALL THROUGH */ > case L_RHS_SUBS: /* rhs subscript of a sequence */ > top = *(object_ptr)pc[2]; /* the subscript */ > obj_ptr = (object_ptr)SEQ_PTR(*(object_ptr)pc[1]);/* the sequence */ > if ((unsigned long)(top-1) >= ((s1_ptr)obj_ptr)->length) { > tpc = pc; > top = recover_rhs_subscript(top, (s1_ptr)obj_ptr); > } > top = (object)*(top + ((s1_ptr)obj_ptr)->base); > > My grasp of C is limited and I do not actually understand how it knows to get > "top" from "base+top*4" rather than just "base+top". Not that it does not seem > logical, more "how does it know?" Specifically, which parts of > > typedef long object; > typedef object *object_ptr; > > are applied/applicable when and where? Is it the earlier *(object_ptr) or the > latter (object)* or both or what? Your question confuses me, but then again, so does the code. To me, the last line of code is saying this: top = -- self explanatory (object) -- cast the next value as if it were an object type (the value of the expression) * -- the value pointed to by the expression of (top + ((s1_ptr) -- cast the next value as an s1_ptr type obj_ptr) -- the name of the structure pointer variable actually being used ->base); -- the member of the structure Does that make more sense? I don't think there is any multiplication going on (unless I misunderstand your question), just pointer dereferencing and casting. I'm no C expert either, at least where it comes to very complicated expressions. There are a lot of gotchas there. And I haven't yet studied the C source enough to really understand what's going on in detail. As for your typedef question, again I'm not sure what you are asking. The typedef statement says that object_ptr is a pointer to an object (which is really a long). The part where the code says (object)* is a cast to an object, not an object pointer. The asterisk binds to the right, not the left, I believe. Clear as mud? Or is that why we use Euphoria in the first place? -- A complex system that works is invariably found to have evolved from a simple system that works. --John Gall's 15th law of Systemantics. "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
7. Re: gets() and "string" variable type
- Posted by Jason Gade <jaygade at ya?oo.co?> Sep 15, 2007
- 573 views
I think I understand your "*4" question a little bit better... Euphoria assumes that longs and pointers are the same size, namely 4 bytes. The compiler takes care of the "*4". There is no byte addressing done, it's all done by 4. Does that help any? The pointer arithmetic takes the type's sizeof into account automagically. -- A complex system that works is invariably found to have evolved from a simple system that works. --John Gall's 15th law of Systemantics. "Premature optimization is the root of all evil in programming." --C.A.R. Hoare j.
8. Re: gets() and "string" variable type
- Posted by Andy Drummond <andy at kes??eltele.com> Sep 15, 2007
- 628 views
Jason Gade wrote: > > I think I understand your "*4" question a little bit better... > > Euphoria assumes that longs and pointers are the same size, namely 4 bytes. > The compiler takes care of the "*4". There is no byte addressing done, it's > all done by 4. > > Does that help any? The pointer arithmetic takes the type's sizeof into > account > automagically. > I would like to say that this demonstrates the excellence of Euphoria sequences - and the language as a whole. No more do I have to wonder if my pointer should step by 1, 2 or 4. No more type-casting ulongs to HWND, no more adding code to check pointers are pointing to valid data, no more shuffling things about to add a string where there was only a word before... And maybe the idea of byte-sized items in string sequences isn't quite so good; certainly it is so easy to interface with DLLs that I can easily write myself a DLL as a text buffer or whatever... So thanks, Rob, your language has made programming umpteen times easier! Andy
9. Re: gets() and "string" variable type
- Posted by Matt Lewis <matthewwalkerlewis at g?ail.c?m> Sep 15, 2007
- 580 views
Pete Lomax wrote: > > PS [OT] Can someone explain the last line of this snippet from be_execute.e: > > top = (object)*(top + ((s1_ptr)obj_ptr)->base); > > My grasp of C is limited and I do not actually understand how it knows to get > "top" from "base+top*4" rather than just "base+top". Not that it does not seem > logical, more "how does it know?" Specifically, which parts of > > typedef long object; > typedef object *object_ptr; > > are applied/applicable when and where? Is it the earlier *(object_ptr) or the > latter (object)* or both or what? The type defs just tell the compiler, "When I say 'object' please read it as 'long', and when I say 'object_ptr' please read it as 'long*'." The expression you put from be_execute.c is what's often referred to as pointer arithmetic. When you add 1 to a pointer, it points to the next place in memory, *after the current value*. This applies for structs, too, BTW. Another way to have written this would be: top = ((s1_ptr)obj_ptr)->base[top] Which, to me, is usually clearer (and how I tend to code that sort of thing, so when you see it in the code, it's probably something I wrote). Matt
10. Re: gets() and "string" variable type
- Posted by Pete Lomax <petelomax at ??ueyonder.co.uk> Sep 15, 2007
- 542 views
Matt Lewis wrote: > > Pete Lomax wrote: > > > > PS [OT] Can someone explain > > top = (object)*(top + ((s1_ptr)obj_ptr)->base); > > > pointer arithmetic. When you add 1 to a pointer, it points to the next > place in memory, *after the current value*. applies for structs, too, BTW. Ah! I see, it is the definition of base as long* which does it. > Another way to have written this would be: > > top = ((s1_ptr)obj_ptr)->base[top] > > Which, to me, is usually clearer > Yes, much clearer. It was the "top+" part which confused me, also not realising that C compilers recognise pointer arithmetic like that, and therefore tweak the meaning of the "top+" just parsed with "* sizeof". Still no fan of C though Regards, Pete