1. C-like Structures
- Posted by mattlewis (admin) Jul 08, 2011
- 3115 views
I've been working, lately on developing support for C-like structures (i.e., structures in memory). I've got some code that is functional, though not fully developed, and certainly missing some features. I've posted some binaries. The code is all in the struct branch of the repo. There is a simple unit test (t_memstruct.e) in the tests directory of the repo that actually uses the structures of the euphoria backend symbol table. I've also started work on the documentaion.
There are two new keywords: memstruct and memunion, which basically correspond to their C analogs, struct and union.
I went with a very C-like syntax for declarations. A lot more so than my original posts. I've gotten some criticism for this, but I think it makes sense. These structures are useful for communicating with outside libraries, and when you run into structures, they're likely to be documented and implemented as C structs. So keeping close to C should make it easier for euphoria programmers to communicate with external libraries.
The data types are: char, short, int, long, long long, float, double long double, eudouble and object. The only two that aren't actual C types are eudouble and object. These are meant to make it easier to write portable code, in that eudouble uses the double type of whichever version of euphoria is running (32 bit euphoria still uses 64-bit doubles, but 64-bit euphoria uses 80-bit long doubles). And objects are just integers that are the same size as pointers (which is how euphoria objects are actually implemented).
Basically, to use memstructs, you need a declaration and a pointer (i.e., an atom pointing to some memory). To read or assign from/to the memstruct, a dot notation is used, where the pointer variable comes first, then the name of the memstruct, then the member or members used to access the memory. This removes the requirement to peek/poke and remember offsets. The translator translates these into actual C structs, and so translated memstruct use should be faster than interpreted.
A simple example:
memstruct point int x int y end memstruct atom ptr = allocate( sizeof( point ) ) ptr.point.x = 5 ptr.point.y = 10 ptr.point.x += ptr.point.y ? ptr.point.x -- 15 ? ptr.point.* -- { 15, 10 }
The last bit, using the asterisk, is something I've been thinking about as serialization of a memstruct. That's an aspect that is definitely not complete, and probably somewhat buggy. I'd also like to do the reverse (i.e., assign all the members of a memstruct at one go), but just haven't gotten there yet.
You can also specify data members to be unsigned or pointers to something. I've changed the C notation a little bit, making pointers a little more obvious:
memstruct foo pointer int bar end memstruct atom ptr = allocate( sizeof( foo ) ) ptr.foo.bar = allocate( 4 ) ptr.foo.bar = 100
Matt
2. Re: C-like Structures
- Posted by raseu Jul 09, 2011
- 2994 views
nice!
couple of questions
1) is the memstruct implementation dot preprocessor based or part of the
euphoria language. ie : is memstruct a new data type (along with sequence, atom etc ...)
2) i have taken a look at the memstruct test case
http://scm.openeuphoria.org/hg/euphoria/file/78edc3d4f6d8/tests/t_memstruct.e
given the allocation
63 atom symtab = allocate( 5 * sizeof( symtab_entry ) )
64 poke( symtab, repeat( 0, 5 * sizeof( symtab_entry ) ) )
what do the following lines assign to (first item in array ?)
65 symtab.symtab_entry.obj = 9
66 symtab.symtab_entry.obj += 5
67 symtab.symtab_entry.obj -= 2
68 symtab.symtab_entry.obj *= 6
69 symtab.symtab_entry.obj /= 3
i ask because further down you have the following
73 symtab.symtab_entry[1].obj = 1
tia
3. Re: C-like Structures
- Posted by mattlewis (admin) Jul 10, 2011
- 2892 views
couple of questions
1) is the memstruct implementation dot preprocessor based or part of the
euphoria language. ie : is memstruct a new data type (along with sequence, atom etc ...)
A memstruct isn't really a data type like a sequence or an atom. It's just a way to access data stored in memory, instead of using a pointer, offsets and peeks and pokes.
2) i have taken a look at the memstruct test case
http://scm.openeuphoria.org/hg/euphoria/file/78edc3d4f6d8/tests/t_memstruct.e
given the allocation
63 atom symtab = allocate( 5 * sizeof( symtab_entry ) )\\ 64 poke( symtab, repeat( 0, 5 * sizeof( symtab_entry ) ) ) \\
what do the following lines assign to (first item in array ?)
-- peek/poke equivalent (assuming 32-bit eu): 65 symtab.symtab_entry.obj = 9 -- poke4( symtab + OBJ_OFFSET, 0 ) 66 symtab.symtab_entry.obj += 5 -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) + 5 ) 67 symtab.symtab_entry.obj -= 2 -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) - 2 ) 68 symtab.symtab_entry.obj *= 6 -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) * 6 ) 69 symtab.symtab_entry.obj /= 3 -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) / 3 )
i ask because further down you have the following
73 symtab.symtab_entry[1].obj = 1\\
Good eyes. This is actually something else that comes from C. Basically, you can access pointers in an array like fashion. In this case, symtab holds the pointer. So the above statement is equivalent to:
poke4( symtab + OBJ_OFFSET + sizeof( symtab_entry ), 1 )
Matt
4. Re: C-like Structures
- Posted by DerekParnell (admin) Jul 13, 2011
- 2811 views
I went with a very C-like syntax for declarations. A lot more so than my original posts. I've gotten some criticism for this, but I think it makes sense. These structures are useful for communicating with outside libraries, and when you run into structures, they're likely to be documented and implemented as C structs. So keeping close to C should make it easier for euphoria programmers to communicate with external libraries.
The data types are: char, short, int, long, long long, float, double long double, eudouble and object.
I agree that having familiar 'words' will help using this feature but I'm still of the mind that the 'datatypes' named above should have been defined in terms of memstruct constructs rather than hard coding them in the parser. My suggestion would only require two low level words, which would be used to define the number of physical bytes of RAM to assign to a memstruct label, and to optionally modify the type of access to fetch/store data into those bytes (eg. signed data).
For example, int could have been defined as ...
define memstruct int ifdef ARCH32 then rambytes(4) x elsedef rambytes(8) x end ifdef end memstructthen int could be used exactly as your examples.
define memstruct point int X int Y end memstruct
I'm sure that this would simplify the parser and allow future 'predefined' data types to be much more easily built instead of updating the parser to cater for them.
5. Re: C-like Structures
- Posted by mattlewis (admin) Jul 13, 2011
- 2854 views
I agree that having familiar 'words' will help using this feature but I'm still of the mind that the 'datatypes' named above should have been defined in terms of memstruct constructs rather than hard coding them in the parser. My suggestion would only require two low level words, which would be used to define the number of physical bytes of RAM to assign to a memstruct label, and to optionally modify the type of access to fetch/store data into those bytes (eg. signed data).
...snip...
I'm sure that this would simplify the parser and allow future 'predefined' data types to be much more easily built instead of updating the parser to cater for them.
I'd still rather avoid the explicit sizing in euphoria code. I'm interested in hearing what others think.
I'm actually skeptical of being able/needing to have more or less arbitrary byte size elements. Is there any place/language/compiler where this is really used for structures? Also, this would complicate the translator (and, really, the interpreter). You certainly wouldn't be able to natively access odd sized integers.
The currently implemented data types are actually fairly simply implemented in the parser and in the backend and translator. I suspect using arbitrary sizes would be at least as complex, and probably more so, especially taking into account integer sizes that C compilers don't support.
Currently, the size of various types of integers are handled for us automatically by the compiler used to build the interpreter or the translated program. So any weirdness that you get from platform to platform (like different "long int" sizes on Win64 and 64-bit Linux) is handled automatically, without having to figure out sizes. Likewise for pointers.
The one thing that neither scheme would support would be bit fields, which are nonstandard anyways, so I don't think either methodology really has anything to say about those.
Matt
6. Re: C-like Structures
- Posted by mattlewis (admin) Jul 13, 2011
- 2820 views
For example, int could have been defined as ...
define memstruct int ifdef ARCH32 then rambytes(4) x elsedef rambytes(8) x end ifdef end memstructthen int could be used exactly as your examples.
define memstruct point int X int Y end memstruct
Some more thoughts about this...
With my current implementation, you'd access a point like:
ptr.point.X.x = 1 ? ptr.point.Y.x
The way you've defined an int is as its own structure. To be able to use it and access without the extra level, you'd need something like a C typedef, so that whenever you put int the parser would really see rambytes(4).
Now, I'm not saying that we maybe don't want to include some sort of typedef mechanism. This would be handy, for instance, in Windows programming, so you could use LRESULT or HWND, or whatever, to make life easier.
Matt
7. Re: C-like Structures
- Posted by SDPringle Jul 14, 2011
- 2797 views
Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.
type point(atom x) with memstruct int x int y end memstruct -- validate x and y return (x.x > 0) and (x.y > 0) end type point p1 p1.x = 50 p1.y = 40 p1.x =-1 -- type check error!
With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.
8. Re: C-like Structures
- Posted by mattlewis (admin) Jul 14, 2011
- 2706 views
Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.
type point(atom x) with memstruct int x int y end memstruct -- validate x and y return (x.x > 0) and (x.y > 0) end type point p1 p1.x = 50 p1.y = 40 p1.x =-1 -- type check error!
With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.
The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).
I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.
And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?
Matt
9. Re: C-like Structures
- Posted by SDPringle Jul 14, 2011
- 2751 views
Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.
type point(atom x) with memstruct int x int y end memstruct -- validate x and y return (x.x > 0) and (x.y > 0) end type point p1 p1.x = 50 p1.y = 40 p1.x =-1 -- type check error!
With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.
The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).
There are times where we only want the aliases for the members. I agree with that.
I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.
Yes, it is different but it seems to go against the spirit of the work to abandon it if we cannot have both at the same time.
And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?
Matt
I think if type-checking is a good idea for EUPHORIA types, it is a good idea for C types as well. An object can contain most numbers you will think of but not all of these numbers can be correct in all contexts. For example, a point in a display must have positive numbers and must be limited to the size of the display it belongs to.
Once we get all of these ideas worked out this would be nice to do in sequences as well. Instead of always defining constants for members of a sequence.
Shawn Pringle
10. Re: C-like Structures
- Posted by mattlewis (admin) Jul 14, 2011
- 2809 views
The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).
There are times where we only want the aliases for the members. I agree with that.
I don't understand what you're saying here. I was pointing out the difference between:
my_point.point.x -- works with any atom that holds a valid pointer -- vs my_point.x -- ONLY works when we absolutely KNOW that my_point is of euphoria type point
I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.
Yes, it is different but it seems to go against the spirit of the work to abandon it if we cannot have both at the same time.
I disagree. Remember, this is a replacement for peek/poke, ultimately. We allow overflow in pokes, so I think it would make sense to do it here.
And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?
I think if type-checking is a good idea for EUPHORIA types, it is a good idea for C types as well. An object can contain most numbers you will think of but not all of these numbers can be correct in all contexts. For example, a point in a display must have positive numbers and must be limited to the size of the display it belongs to.
That's completely false. It's perfectly permissible to have negative points. But anyways, if you wanted only non-negative numbers, use an unsigned type. I suppose it's true, however, that some libraries could apply their own bounds for certain members.
I'll concede that some automatic type checking for memstruct assignments might be worthwhile.
Once we get all of these ideas worked out this would be nice to do in sequences as well. Instead of always defining constants for members of a sequence.
Yes, that's been on most people's wish lists, I think, for some time.
Matt
11. Re: C-like Structures
- Posted by mattlewis (admin) Jul 15, 2011
- 2738 views
Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.
I think a better approach might be to keep them separate, but allow memstructs to leverage the type system:
type nonnegative_int( object o ) if atom( o ) and o >= 0 then return 1 end if return 0 end type memstruct nonnegative_point -- and then apply the type either like this: unsigned int x as nonnegative_int -- or like this unsigned int as nonnegative_int y -- or even unsigned int(nonnegative_int) z end memstruct
Matt
12. Re: C-like Structures
- Posted by mattlewis (admin) Jul 20, 2011
- 2556 views
Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.
I think a better approach might be to keep them separate, but allow memstructs to leverage the type system:
I think I'm leaning towards:
type nonnegative_int( object o ) if atom( o ) and o >= 0 then return 1 end if return 0 end type memstruct nonnegative_point -- assignments will be type checked as nonnegative_int unsigned int as nonnegative_int y end memstruct
Matt
13. Re: C-like Structures
- Posted by petelomax Jul 23, 2011
- 2514 views
I think I'm leaning towards:
-- assignments will be type checked as nonnegative_int unsigned int as nonnegative_int y
That would be my preference from the three options you listed. Would it be practical/sensible to also permit:
-- assignments will be type checked as nonnegative_int nonnegative_int as unsigned int y
Pete
14. Re: C-like Structures
- Posted by mattlewis (admin) Jul 23, 2011
- 2479 views
I think I'm leaning towards:
-- assignments will be type checked as nonnegative_int unsigned int as nonnegative_int y
That would be my preference from the three options you listed. Would it be practical/sensible to also permit:
-- assignments will be type checked as nonnegative_int nonnegative_int as unsigned int y
I think that might be even better. I think we should pick one way and stick to it.
Matt