Wiki Diff SquareBracketDereferencing, revision #3 to tip

<<TOC heading=yes>>

==Introduction

The goal of this feature is to make it easier to use C-like structures in raw memory. This is often required when interfacing euphoria code with outside libraries, or the operating system itself. The proposed changes to euphoria are designed to make this interface easier and less error prone.

Since we're dealing with C-based structures, it makes a certain amount of sense to keep the usage similar to C. C uses the asterisk (*) to dereference pointers and access the pointed to value. This is often confusing, and can be difficult to use, especially when accessing deeply nested structures. This proposal suggests that a better way to dereference a pointer would encompass the entire reference, including all structure elements. This gives a clear indication as to what is happening.

==Using C style structures

Atoms will be used to hold pointers to memory locations, just like before, but putting them inside square brackets will dereference the pointer and 'serialize' or 'deserialize' the euphoria objects into raw memory, or vice versa. We could potentially use raw atoms, which would require qualifying the structure being referenced:
<eucode>
atom my_struct
....
[my_struct.rect.x] = 5
....
atom x = [my_struct.rect.x]
</eucode>
Or alternatively, we could use structures like user defined types:
<eucode>
rect my_struct
....
[my_struct.x] = 5
....
atom x = [my_struct.x]
</eucode>
These two styles could coexist.

Initialization could be done using a new function or operator. The function could optionally be a machine function. Of course, this implies that a reference to a structure is a 'first class' object. It's possible that this would need to be converted to a 'structure id' similar to how routine ids are used. The default should probably be to automatically free the memory when the reference count drops to zero. It would be important to allow the possibility of not freeing the memory, since sometimes ownership of the memory passes to a separate library.

The bracket syntax is really just syntactic sugar for the peeks and pokes currently used in euphoria. In fact, the implementation might be to actually emit peek and poke IL.

>
----
[quote Derek] I can foresee an ambiguity issue with the bracket syntax.
<eucode>
foo = bar
[mystruct.x] = qwerty
</eucode>
Because line endings are not significant in Euphoria, this is seen as ...
<eucode>
foo = bar[mystruct.x] = qwerty
</eucode>
So now the brackets look like a sequence indexing operation.
[/quote]
[quote Matt]
Yes, thinking about this further, the entire bracket notation may not be necessary.
[/quote]
----



==Declaring structures

A new keyword, 'structure' could be introduced to declare and define memory structures. There would be two primitive type of members, integer and atom, representing ints and floats, respectively, with additional size specifiers. Possibly these could be optional, with sensible defaults:
<eucode>
structure rect
integer(32) x
integer(32) y
atom(64) width
atom(64) height
end structure
</eucode>
This structure represents a rectangle where the x and y coordinates are represented as 32-bit integers, and the width and height as 64-bit floating point values.

>
----
[quote Derek] I suspect another primitive might be useful - the pointer. Currently pointers are all 32-bits but soon 64-bit pointers will be common. So in order to avoid reworking structures when running on different architectures, maybe the bare term 'pointer' can map to the normal sized pointer for the current architecture and where that is not appropriate a size modifier could be used.
<eucode>
structure qwerty
pointer foo -- native pointer
pointer(32) bar -- 32-bit pointer.
end structure
</eucode>
[/quote]
----


At least two modifiers will be needed:
<eucode>
structure c_string
integer(8) with unsigned, null char[]
end structure
</eucode>
This structure defines the standard, null terminated c string, where each character is an 8-bit integer, and the array is null terminated.

>
----
[quote Derek]
* The above definition seems to be defining a single 8-bit integer and not an array of bytes. In general, structs often contain fixed-size arrays and we need a way to define those too.
* The "##char[]##" is inconsistent I think. Just the "##null##"" should be enough.
[/quote]

[quote Matt]
The ##char## is the name of the element. The square brackets indicate that it's an array. I'm not sure that the null is redundant. I suppose the array could imply an array, but an array does not imply null termination.
[quote Derek]
Ok, got it. ##char## is the name of the 8-bit integer. However it seems what you are trying to describe here is a structure that contains a **pointer** to an variable-length array of 8-bit integers that is terminated by a zero byte. But the structure definition above does not mention pointers at all. Or is this a structure in which ##char## names the first 8-bit integer and it is implied that immediately following it in RAM is an array of other 8-bit integers?

Also, we still need a way to describe fixed length arrays that are contained in a struct.
[/quote]
[/quote]
[/quote]
----


In addition, we should be able to embed structures within other structures, either as a pointer, or as a part of the main structure.
<eucode>
structure my_rect
c_string [name]
rect rect
end structure
</eucode>
The above declaration defines a structure with a pointer to a string, and a rect embedded within it. It would be 28 bytes in size, assuming 4-byte pointers.

One other key feature of structures is the union:
<eucode>
structure shape
c_string [name]
union shape
rect r
circle c
end union
end structure
</eucode>

>
----
[quote Derek] In the two examples above, the "##c_string [name]##" look like a sequence indexing operation. How do we resolve that? [/quote]
----






Search



Quick Links

User menu

Not signed in.

Misc Menu