1. C-like Structures

I've been working, lately on developing support for C-like structures (i.e., structures in memory). I've got some code that is functional, though not fully developed, and certainly missing some features. I've posted some binaries. The code is all in the struct branch of the repo. There is a simple unit test (t_memstruct.e) in the tests directory of the repo that actually uses the structures of the euphoria backend symbol table. I've also started work on the documentaion.

There are two new keywords: memstruct and memunion, which basically correspond to their C analogs, struct and union.

I went with a very C-like syntax for declarations. A lot more so than my original posts. I've gotten some criticism for this, but I think it makes sense. These structures are useful for communicating with outside libraries, and when you run into structures, they're likely to be documented and implemented as C structs. So keeping close to C should make it easier for euphoria programmers to communicate with external libraries.

The data types are: char, short, int, long, long long, float, double long double, eudouble and object. The only two that aren't actual C types are eudouble and object. These are meant to make it easier to write portable code, in that eudouble uses the double type of whichever version of euphoria is running (32 bit euphoria still uses 64-bit doubles, but 64-bit euphoria uses 80-bit long doubles). And objects are just integers that are the same size as pointers (which is how euphoria objects are actually implemented).

Basically, to use memstructs, you need a declaration and a pointer (i.e., an atom pointing to some memory). To read or assign from/to the memstruct, a dot notation is used, where the pointer variable comes first, then the name of the memstruct, then the member or members used to access the memory. This removes the requirement to peek/poke and remember offsets. The translator translates these into actual C structs, and so translated memstruct use should be faster than interpreted.

A simple example:

memstruct point 
    int x 
    int y 
end memstruct 
atom ptr = allocate( sizeof( point ) ) 
ptr.point.x = 5 
ptr.point.y = 10 
 
ptr.point.x += ptr.point.y 
 
? ptr.point.x  -- 15 
? ptr.point.*  -- { 15, 10 } 

The last bit, using the asterisk, is something I've been thinking about as serialization of a memstruct. That's an aspect that is definitely not complete, and probably somewhat buggy. I'd also like to do the reverse (i.e., assign all the members of a memstruct at one go), but just haven't gotten there yet.

You can also specify data members to be unsigned or pointers to something. I've changed the C notation a little bit, making pointers a little more obvious:

memstruct foo 
    pointer int bar 
end memstruct 
atom ptr = allocate( sizeof( foo ) ) 
ptr.foo.bar = allocate( 4 ) 
ptr.foo.bar = 100 

Matt

new topic     » topic index » view message » categorize

2. Re: C-like Structures

nice!

couple of questions

1) is the memstruct implementation dot preprocessor based or part of the
euphoria language. ie : is memstruct a new data type (along with sequence, atom etc ...)

2) i have taken a look at the memstruct test case

http://scm.openeuphoria.org/hg/euphoria/file/78edc3d4f6d8/tests/t_memstruct.e

given the allocation

63 atom symtab = allocate( 5 * sizeof( symtab_entry ) )
64 poke( symtab, repeat( 0, 5 * sizeof( symtab_entry ) ) )

what do the following lines assign to (first item in array ?)

65 symtab.symtab_entry.obj = 9
66 symtab.symtab_entry.obj += 5
67 symtab.symtab_entry.obj -= 2
68 symtab.symtab_entry.obj *= 6
69 symtab.symtab_entry.obj /= 3

i ask because further down you have the following

73 symtab.symtab_entry[1].obj = 1

tia

new topic     » goto parent     » topic index » view message » categorize

3. Re: C-like Structures

raseu said...

couple of questions

1) is the memstruct implementation dot preprocessor based or part of the
euphoria language. ie : is memstruct a new data type (along with sequence, atom etc ...)

A memstruct isn't really a data type like a sequence or an atom. It's just a way to access data stored in memory, instead of using a pointer, offsets and peeks and pokes.

raseu said...

2) i have taken a look at the memstruct test case

http://scm.openeuphoria.org/hg/euphoria/file/78edc3d4f6d8/tests/t_memstruct.e

given the allocation

63 atom symtab = allocate( 5 * sizeof( symtab_entry ) )\\ 
64 poke( symtab, repeat( 0, 5 * sizeof( symtab_entry ) ) ) \\ 

what do the following lines assign to (first item in array ?)

 
                                 -- peek/poke equivalent (assuming 32-bit eu): 
65 symtab.symtab_entry.obj = 9   -- poke4( symtab + OBJ_OFFSET, 0 ) 
66 symtab.symtab_entry.obj += 5  -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) + 5 ) 
67 symtab.symtab_entry.obj -= 2  -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) - 2 ) 
68 symtab.symtab_entry.obj *= 6  -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) * 6 ) 
69 symtab.symtab_entry.obj /= 3  -- poke4( symtab + OBJ_OFFSET, peek4s( symtab + OBJ_OFFSET ) / 3 ) 
raseu said...

i ask because further down you have the following

73 symtab.symtab_entry[1].obj = 1\\ 

Good eyes. This is actually something else that comes from C. Basically, you can access pointers in an array like fashion. In this case, symtab holds the pointer. So the above statement is equivalent to:

poke4( symtab + OBJ_OFFSET + sizeof( symtab_entry ), 1 ) 

Matt

new topic     » goto parent     » topic index » view message » categorize

4. Re: C-like Structures

mattlewis said...

I went with a very C-like syntax for declarations. A lot more so than my original posts. I've gotten some criticism for this, but I think it makes sense. These structures are useful for communicating with outside libraries, and when you run into structures, they're likely to be documented and implemented as C structs. So keeping close to C should make it easier for euphoria programmers to communicate with external libraries.

The data types are: char, short, int, long, long long, float, double long double, eudouble and object.

I agree that having familiar 'words' will help using this feature but I'm still of the mind that the 'datatypes' named above should have been defined in terms of memstruct constructs rather than hard coding them in the parser. My suggestion would only require two low level words, which would be used to define the number of physical bytes of RAM to assign to a memstruct label, and to optionally modify the type of access to fetch/store data into those bytes (eg. signed data).

For example, int could have been defined as ...

define memstruct int 
ifdef ARCH32 then 
    rambytes(4) x 
elsedef 
    rambytes(8) x 
end ifdef 
end memstruct 
then int could be used exactly as your examples.
define memstruct point 
   int X 
   int Y 
end memstruct 

I'm sure that this would simplify the parser and allow future 'predefined' data types to be much more easily built instead of updating the parser to cater for them.

new topic     » goto parent     » topic index » view message » categorize

5. Re: C-like Structures

DerekParnell said...

I agree that having familiar 'words' will help using this feature but I'm still of the mind that the 'datatypes' named above should have been defined in terms of memstruct constructs rather than hard coding them in the parser. My suggestion would only require two low level words, which would be used to define the number of physical bytes of RAM to assign to a memstruct label, and to optionally modify the type of access to fetch/store data into those bytes (eg. signed data).

...snip...

I'm sure that this would simplify the parser and allow future 'predefined' data types to be much more easily built instead of updating the parser to cater for them.

I'd still rather avoid the explicit sizing in euphoria code. I'm interested in hearing what others think.

I'm actually skeptical of being able/needing to have more or less arbitrary byte size elements. Is there any place/language/compiler where this is really used for structures? Also, this would complicate the translator (and, really, the interpreter). You certainly wouldn't be able to natively access odd sized integers.

The currently implemented data types are actually fairly simply implemented in the parser and in the backend and translator. I suspect using arbitrary sizes would be at least as complex, and probably more so, especially taking into account integer sizes that C compilers don't support.

Currently, the size of various types of integers are handled for us automatically by the compiler used to build the interpreter or the translated program. So any weirdness that you get from platform to platform (like different "long int" sizes on Win64 and 64-bit Linux) is handled automatically, without having to figure out sizes. Likewise for pointers.

The one thing that neither scheme would support would be bit fields, which are nonstandard anyways, so I don't think either methodology really has anything to say about those.

Matt

new topic     » goto parent     » topic index » view message » categorize

6. Re: C-like Structures

DerekParnell said...

For example, int could have been defined as ...

define memstruct int 
ifdef ARCH32 then 
    rambytes(4) x 
elsedef 
    rambytes(8) x 
end ifdef 
end memstruct 
then int could be used exactly as your examples.
define memstruct point 
   int X 
   int Y 
end memstruct 

Some more thoughts about this...

With my current implementation, you'd access a point like:

ptr.point.X.x = 1 
? ptr.point.Y.x 

The way you've defined an int is as its own structure. To be able to use it and access without the extra level, you'd need something like a C typedef, so that whenever you put int the parser would really see rambytes(4).

Now, I'm not saying that we maybe don't want to include some sort of typedef mechanism. This would be handy, for instance, in Windows programming, so you could use LRESULT or HWND, or whatever, to make life easier.

Matt

new topic     » goto parent     » topic index » view message » categorize

7. Re: C-like Structures

Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.

type point(atom x) with memstruct  
int x 
int y 
end memstruct 
    -- validate x and y 
    return (x.x > 0) and (x.y > 0) 
end type 
 
point p1 
p1.x = 50 
p1.y = 40 
p1.x =-1 -- type check error! 

With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.

new topic     » goto parent     » topic index » view message » categorize

8. Re: C-like Structures

SDPringle said...

Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.

type point(atom x) with memstruct  
int x 
int y 
end memstruct 
    -- validate x and y 
    return (x.x > 0) and (x.y > 0) 
end type 
 
point p1 
p1.x = 50 
p1.y = 40 
p1.x =-1 -- type check error! 

With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.

The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).

I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.

And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?

Matt

new topic     » goto parent     » topic index » view message » categorize

9. Re: C-like Structures

mattlewis said...
SDPringle said...

Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.

type point(atom x) with memstruct  
int x 
int y 
end memstruct 
    -- validate x and y 
    return (x.x > 0) and (x.y > 0) 
end type 
 
point p1 
p1.x = 50 
p1.y = 40 
p1.x =-1 -- type check error! 

With this new system it is easier to access C structures than EUPHORIA sequences. It makes the use of constants to access members of a sequence look clunky.

The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).

There are times where we only want the aliases for the members. I agree with that.

mattlewis said...

I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.

Yes, it is different but it seems to go against the spirit of the work to abandon it if we cannot have both at the same time.

mattlewis said...

And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?

Matt

I think if type-checking is a good idea for EUPHORIA types, it is a good idea for C types as well. An object can contain most numbers you will think of but not all of these numbers can be correct in all contexts. For example, a point in a display must have positive numbers and must be limited to the size of the display it belongs to.

Once we get all of these ideas worked out this would be nice to do in sequences as well. Instead of always defining constants for members of a sequence.

Shawn Pringle

new topic     » goto parent     » topic index » view message » categorize

10. Re: C-like Structures

SDPringle said...
mattlewis said...

The access method here appears to rely on knowing the type, which is often not the case (such as when one gets stored in a sequence).

There are times where we only want the aliases for the members. I agree with that.

I don't understand what you're saying here. I was pointing out the difference between:

my_point.point.x  -- works with any atom that holds a valid pointer 
  -- vs 
my_point.x        -- ONLY  works when we absolutely KNOW that my_point is of euphoria type point 
SDPringle said...
mattlewis said...

I guess the type checking could flag when you try to assign something that doesn't fit into the field, or maybe a signed vs unsigned problem. Though I don't think we'd need to stuff that into a type, since it's really different than a UDT, and has plenty of type information automatically.

Yes, it is different but it seems to go against the spirit of the work to abandon it if we cannot have both at the same time.

I disagree. Remember, this is a replacement for peek/poke, ultimately. We allow overflow in pokes, so I think it would make sense to do it here.

SDPringle said...
mattlewis said...

And really, there's no point in validating in the way that you've set up here, because by definition, the bytes in RAM can only store what they can store. What would be an invalid value for an int?

I think if type-checking is a good idea for EUPHORIA types, it is a good idea for C types as well. An object can contain most numbers you will think of but not all of these numbers can be correct in all contexts. For example, a point in a display must have positive numbers and must be limited to the size of the display it belongs to.

That's completely false. It's perfectly permissible to have negative points. But anyways, if you wanted only non-negative numbers, use an unsigned type. I suppose it's true, however, that some libraries could apply their own bounds for certain members.

I'll concede that some automatic type checking for memstruct assignments might be worthwhile.

SDPringle said...

Once we get all of these ideas worked out this would be nice to do in sequences as well. Instead of always defining constants for members of a sequence.

Yes, that's been on most people's wish lists, I think, for some time.

Matt

new topic     » goto parent     » topic index » view message » categorize

11. Re: C-like Structures

SDPringle said...

Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.

I think a better approach might be to keep them separate, but allow memstructs to leverage the type system:

 
type nonnegative_int( object o ) 
    if atom( o ) and o >= 0 then 
        return 1 
    end if 
    return 0 
end type 
 
memstruct nonnegative_point 
    -- and then apply the type either like this: 
    unsigned int x as nonnegative_int 
 
    -- or like this 
    unsigned int as nonnegative_int y 
 
    -- or even 
    unsigned int(nonnegative_int) z 
end memstruct 

Matt

new topic     » goto parent     » topic index » view message » categorize

12. Re: C-like Structures

mattlewis said...
SDPringle said...

Rather than abandoning EUPHORIA types this could be an extension to the typing system we already have.

I think a better approach might be to keep them separate, but allow memstructs to leverage the type system:

I think I'm leaning towards:

 
type nonnegative_int( object o ) 
    if atom( o ) and o >= 0 then 
        return 1 
    end if 
    return 0 
end type 
 
memstruct nonnegative_point 
 
    -- assignments will be type checked as nonnegative_int 
    unsigned int as nonnegative_int y 
 
end memstruct 

Matt

new topic     » goto parent     » topic index » view message » categorize

13. Re: C-like Structures

mattlewis said...

I think I'm leaning towards:

    -- assignments will be type checked as nonnegative_int 
    unsigned int as nonnegative_int y 

That would be my preference from the three options you listed. Would it be practical/sensible to also permit:

    -- assignments will be type checked as nonnegative_int 
    nonnegative_int as unsigned int y 

Pete

new topic     » goto parent     » topic index » view message » categorize

14. Re: C-like Structures

petelomax said...
mattlewis said...

I think I'm leaning towards:

    -- assignments will be type checked as nonnegative_int 
    unsigned int as nonnegative_int y 

That would be my preference from the three options you listed. Would it be practical/sensible to also permit:

    -- assignments will be type checked as nonnegative_int 
    nonnegative_int as unsigned int y 

I think that might be even better. I think we should pick one way and stick to it.

Matt

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu