Historical SquareBracketDereferencing, Revision 2

Introduction

The goal of this feature is to make it easier to use C-like structures in raw memory. This is often required when interfacing euphoria code with outside libraries, or the operating system itself. The proposed changes to euphoria are designed to make this interface easier and less error prone.

Since we're dealing with C-based structures, it makes a certain amount of sense to keep the usage similar to C. C uses the asterisk (*) to dereference pointers and access the pointed to value. This is often confusing, and can be difficult to use, especially when accessing deeply nested structures. This proposal suggests that a better way to dereference a pointer would encompass the entire reference, including all structure elements. This gives a clear indication as to what is happening.

Using C style structures

Atoms will be used to hold pointers to memory locations, just like before, but putting them inside square brackets will dereference the pointer and 'serialize' or 'deserialize' the euphoria objects into raw memory, or vice versa. We could potentially use raw atoms, which would require qualifying the structure being referenced:

atom my_struct
    ....
    [my_struct.rect.x] = 5
    ....
    atom x = [my_struct.rect.x]

Or alternatively, we could use structures like user defined types:

rect my_struct
    ....
    [my_struct.x] = 5
    ....
    atom x = [my_struct.x]

These two styles could coexist.

Initialization could be done using a new function or operator. The function could optionally be a machine function. Of course, this implies that a reference to a structure is a 'first class' object. It's possible that this would need to be converted to a 'structure id' similar to how routine ids are used. The default should probably be to automatically free the memory when the reference count drops to zero. It would be important to allow the possibility of not freeing the memory, since sometimes ownership of the memory passes to a separate library.

The bracket syntax is really just syntactic sugar for the peeks and pokes currently used in euphoria. In fact, the implementation might be to actually emit peek and poke IL.

-- [by Derek] I can foresee an ambiguity issue with the bracket syntax.

foo = bar
[mystruct.x] = qwerty

Because line endings are not significant in Euphoria, this is seen as ...

foo = bar[mystruct.x] = qwerty

So now the brackets look like a sequence indexing operation. --

Declaring structures

A new keyword, 'structure' could be introduced to declare and define memory structures. There would be two primitive type of members, integer and atom, representing ints and floats, respectively, with additional size specifiers. Possibly these could be optional, with sensible defaults:

structure rect
    integer(32) x
    integer(32) y
    atom(64) width
    atom(64) height
end structure

This structure represents a rectangle where the x and y coordinates are represented as 32-bit integers, and the width and height as 64-bit floating point values.

-- [by Derek] I suspect another primitive might be useful - the pointer. Currently pointers are all 32-bits but soon 64-bit pointers will be common. So in order to avoid reworking structures when running on different architectures, maybe the bare term 'pointer' can map to the normal sized pointer for the current architecture and where that is not appropriate a size modifier could be used.

structure qwerty
    pointer foo
native pointer pointer(32) bar 32-bit pointer. end structure </eucode> --

At least two modifiers will be needed:

structure c_string
    integer(8) with unsigned, null char[]
end structure

This structure defines the standard, null terminated c string, where each character is an 8-bit integer, and the array is null terminated.

-- [by Derek] * The above definition seems to be defining a single 8-bit integer and not an array of bytes. In general, structs often contain fixed-size arrays and we need a way to define those too. * The "char[]" is inconsistent I think. Just the "null"" should be enough.

--

In addition, we should be able to embed structures within other structures, either as a pointer, or as a part of the main structure.

structure my_rect
    c_string [name]
    rect rect
end structure

The above declaration defines a structure with a pointer to a string, and a rect embedded within it. It would be 28 bytes in size, assuming 4-byte pointers.

One other key feature of structures is the union:

structure shape
    c_string [name]
    union shape
        rect r
        circle c
    end union
end structure

-- [by Derek] In the two examples above, the "c_string [name]" look like a sequence indexing operation. How do we resolve that? --

Search



Quick Links

User menu

Not signed in.

Misc Menu