SquareBracketDereferencing

Introduction

The goal of this feature is to make it easier to use C-like structures in raw memory. This is often required when interfacing euphoria code with outside libraries, or the operating system itself. The proposed changes to euphoria are designed to make this interface easier and less error prone.

Since we're dealing with C-based structures, it makes a certain amount of sense to keep the usage similar to C. C uses the asterisk (*) to dereference pointers and access the pointed to value. This is often confusing, and can be difficult to use, especially when accessing deeply nested structures. This proposal suggests that a better way to dereference a pointer would encompass the entire reference, including all structure elements. This gives a clear indication as to what is happening.

Using C style structures

Atoms will be used to hold pointers to memory locations, just like before, but putting them inside square brackets will dereference the pointer and 'serialize' or 'deserialize' the euphoria objects into raw memory, or vice versa. We could potentially use raw atoms, which would require qualifying the structure being referenced:

    atom my_struct 
    .... 
    [my_struct.rect.x] = 5 
    .... 
    atom x = [my_struct.rect.x] 

Or alternatively, we could use structures like user defined types:

    rect my_struct 
    .... 
    [my_struct.x] = 5 
    .... 
    atom x = [my_struct.x] 

These two styles could coexist.

Initialization could be done using a new function or operator. The function could optionally be a machine function. Of course, this implies that a reference to a structure is a 'first class' object. It's possible that this would need to be converted to a 'structure id' similar to how routine ids are used. The default should probably be to automatically free the memory when the reference count drops to zero. It would be important to allow the possibility of not freeing the memory, since sometimes ownership of the memory passes to a separate library.

The bracket syntax is really just syntactic sugar for the peeks and pokes currently used in euphoria. In fact, the implementation might be to actually emit peek and poke IL.


Derek said...

I can foresee an ambiguity issue with the bracket syntax.

foo = bar 
[mystruct.x] = qwerty 

Because line endings are not significant in Euphoria, this is seen as ...

foo = bar[mystruct.x] = qwerty 

So now the brackets look like a sequence indexing operation.

Matt said...

Yes, thinking about this further, the entire bracket notation may not be necessary.


Declaring structures

A new keyword, 'structure' could be introduced to declare and define memory structures. There would be two primitive type of members, integer and atom, representing ints and floats, respectively, with additional size specifiers. Possibly these could be optional, with sensible defaults:

structure rect 
    integer(32) x 
    integer(32) y 
    atom(64) width 
    atom(64) height 
end structure 

This structure represents a rectangle where the x and y coordinates are represented as 32-bit integers, and the width and height as 64-bit floating point values.


Derek said...

I suspect another primitive might be useful - the pointer. Currently pointers are all 32-bits but soon 64-bit pointers will be common. So in order to avoid reworking structures when running on different architectures, maybe the bare term 'pointer' can map to the normal sized pointer for the current architecture and where that is not appropriate a size modifier could be used.

structure qwerty 
    pointer foo -- native pointer  
    pointer(32) bar -- 32-bit pointer. 
end structure 

At least two modifiers will be needed:

structure c_string 
    integer(8) with unsigned, null char[] 
end structure 

This structure defines the standard, null terminated c string, where each character is an 8-bit integer, and the array is null terminated.


Derek said...
  • The above definition seems to be defining a single 8-bit integer and not an array of bytes. In general, structs often contain fixed-size arrays and we need a way to define those too.
  • The "char[]" is inconsistent I think. Just the "null"" should be enough.
Matt said...

The char is the name of the element. The square brackets indicate that it's an array. I'm not sure that the null is redundant. I suppose the array could imply an array, but an array does not imply null termination.

Derek said...

Ok, got it. char is the name of the 8-bit integer. However it seems what you are trying to describe here is a structure that contains a pointer to an variable-length array of 8-bit integers that is terminated by a zero byte. But the structure definition above does not mention pointers at all. Or is this a structure in which char names the first 8-bit integer and it is implied that immediately following it in RAM is an array of other 8-bit integers?

Also, we still need a way to describe fixed length arrays that are contained in a struct.


In addition, we should be able to embed structures within other structures, either as a pointer, or as a part of the main structure.

structure my_rect 
    c_string [name] 
    rect rect 
end structure 

The above declaration defines a structure with a pointer to a string, and a rect embedded within it. It would be 28 bytes in size, assuming 4-byte pointers.

One other key feature of structures is the union:

structure shape 
    c_string [name] 
    union shape 
        rect r 
        circle c 
    end union 
end structure 


Derek said...

In the two examples above, the "c_string [name]" look like a sequence indexing operation. How do we resolve that?


Search



Quick Links

User menu

Not signed in.

Misc Menu