1. Constrained sequences
- Posted by Peter Robinson <indorlaw at yahoo.??m.au> May 30, 2008
- 656 views
I may have some time in a few weeks to look at the source, and I'm thinking about the proposal for sequences of types. I'm thinking it shouldn't be all that difficult. Correct me if I'm wrong. The following syntax would only declare a sequence. It doesn't declare any integers or atoms:- sequence of integer x sequence of sequence of atom x What it says is that I'm declaring a sequence, but when I make assignments to it in the future they must fit the constraint (“of integer” or “of sequence of atom”). Only the far right type is open – everything to the left of an “of” must be “sequence”. So, in the parsing phase, instead of adding a symbol to the table with an attribute like SEQ, it needs a pair attribute like {depth, type}, where depth is the level of the last “of” and “type” is the far-right type. So the attributes of the declarations above (using zero-based indexing) would be something like:- sequence of integer x -- { 1, INTEGER } sequence of sequence of atom y -- { 2, ATOM } Notice that both these pairs represent sequences. Any attribute with depth > 0 represents a sequence. Traditional types could be typed by the same system:- sequence x -- { 0, SEQ } atom y -- { 0, ATOM } So for starters, apart from adjusting the scanner and adding the keyword “of”, it would be necessary to slightly change the structure of symbol attributes, and obviously the functions that test those attributes. The constraints only play a significant role when an assignment occurs TO the constrained sequence. But this is a trigger point anyway. The interpreter has to check the type of assignments. The checking code just needs an extra branch to handle data flagged as a constrainted sequence. It has to check that the RHS matches the pattern of the constraint. When such an assignment occurs, I'm figuring that the types of the assigned values are already stored in a symbol table since the interpreter needs them for certain purposes. (e.g when you assign to an element of an element: x[2]{3] = 7, where x[2] must be a sequence with at least 3 elements; OR function_call( x[1] ), where x[1] must be an integer). So the interpreter already handles typing the elements of a sequence. Occurrences of elements of a constrained sequence in an expression would also have to be considered, but I suspect that they're just variations of the above. Once an assignment to a constrained sequence is validated, other operations on the sequence occur as normal. It occurred to me that the proposed atribute pair would also provide an opportunity to implement strings as a form of constrained sequence. It was a coincidence that Matt happened to raise a post about this recently. The new attrbitue pair would have no natural meaning for negative values of the depth variable. Sequences with no (or implied) depth specifictions could be represented by negative depth values. E.g:- ascii sequence x -- { -1, INTEGER } unicode sequence y -- { -2, INTEGER } By treating these quasi-types as constrained sequences, all the usual sequence treatments can apply without change unless we choose to override them. In relation to strings, we may well want to change the translation of operators like “=”. When the interpeter sees a sequence operation on “=”, it must look at the variable's constraint to check whether it is a string, and apply the natural operation. Other types that could be handled by negative attributes are natural numbers (unsigned) and structures (although there' s more thinking to be done on that). A poll has already approved structures. I realise this analysis is simplistic, not based on the actual source code, which I haven't yet read. But I'm struggling to see why these things would be so difficult. Why am I wrong? This idea doesn't deal with the issue of minimal storage of strings, but like Shawn, I think 8-bit chars are becoming obsolete. Cheers Peter Robinson
2. Re: Constrained sequences
- Posted by CChris <christian.cuvier at agriculture.go?v.?r> May 30, 2008
- 637 views
Peter Robinson wrote: > > I may have some time in a few weeks to look at the source, and I'm thinking > about the proposal for sequences of types. I'm thinking it shouldn't be all > that difficult. Correct me if I'm wrong. > > The following syntax would only declare a sequence. It doesn't declare any > integers > or atoms:- > > sequence of integer x > sequence of sequence of atom x > > What it says is that I'm declaring a sequence, but when I make assignments to > it in the future they must fit the constraint (“of integer” or “of sequence > of atom”). Only the far right type is open – everything to the left of an “of” > must be “sequence”. So, in the parsing phase, instead of adding a symbol to > the table with an attribute like SEQ, it needs a pair attribute like {depth, > type}, where depth is the level of the last “of” and “type” is the far-right > type. So the attributes of the declarations above (using zero-based indexing) > would be something like:- > > sequence of integer x -- { 1, INTEGER } > sequence of sequence of atom y -- { 2, ATOM } > > Notice that both these pairs represent sequences. Any attribute with depth > > 0 represents a sequence. Traditional types could be typed by the same system:- > > sequence x -- { 0, SEQ } I'd think {1, OBJECT} rather. Just nitpicking. > atom y -- { 0, ATOM } > > So for starters, apart from adjusting the scanner and adding the keyword “of”, > it would be necessary to slightly change the structure of symbol attributes, > and obviously the functions that test those attributes. > > The constraints only play a significant role when an assignment occurs TO the > constrained sequence. But this is a trigger point anyway. The interpreter has > to check the type of assignments. The checking code just needs an extra branch > to handle data flagged as a constrainted sequence. It has to check that the > RHS matches the pattern of the constraint. > This means the parser will have more checks to perform. But isn't the aim to help optimised code to be run instead? That's why a separate string type is needed. Possibly in 8/16/32 flavors. > When such an assignment occurs, I'm figuring that the types of the assigned > values are already stored in a symbol table since the interpreter needs them > for certain purposes. (e.g when you assign to an element of an element: > x[2]{3] > = 7, where x[2] must be a sequence with at least 3 elements; OR function_call( > x[1] ), where x[1] must be an integer). So the interpreter already handles > typing > the elements of a sequence. > > Occurrences of elements of a constrained sequence in an expression would also > have to be considered, but I suspect that they're just variations of the > above. > Once an assignment to a constrained sequence is validated, other operations > on the sequence occur as normal. > > It occurred to me that the proposed atribute pair would also provide an > opportunity > to implement strings as a form of constrained sequence. It was a coincidence > that Matt happened to raise a post about this recently. > > The new attrbitue pair would have no natural meaning for negative values of > the depth variable. Sequences with no (or implied) depth specifictions could > be represented by negative depth values. E.g:- > > ascii sequence x -- { -1, INTEGER } > unicode sequence y -- { -2, INTEGER } > How would the following be coded: sequence of unicode sequence of sequence of string ? > By treating these quasi-types as constrained sequences, all the usual sequence > treatments can apply without change unless we choose to override them. In > relation > to strings, we may well want to change the translation of operators like “=”. > When the interpeter sees a sequence operation on “=”, it must look at the > variable's > constraint to check whether it is a string, and apply the natural operation. > > Other types that could be handled by negative attributes are natural numbers > (unsigned) and structures (although there' s more thinking to be done on > that). > A poll has already approved structures. > There is a sequence() type operator. How would I build a "sequence of sequence of atom"() operato? Or an UDT based on a nestde sequence? Well this could be handle by an "is" keyword:
ssa is sequence of sequence of atom type sssma(ssa x) --sequence of sequence of small atoms for i=1 to length(x) do for j=1 to length(x[i]) do if x[i][j]<-3.14 or x[i][j]>9.7 then return 0 end if end for end for return 1 end type
That would work. > I realise this analysis is simplistic, not based on the actual source code, > which I haven't yet read. But I'm struggling to see why these things would be > so difficult. Why am I wrong? > > This idea doesn't deal with the issue of minimal storage of strings, but like > Shawn, I think 8-bit chars are becoming obsolete. > They are, but a lot of apps still doesn't need to deal with many different languages, so any UTF-8 encoding would do, with minimal storage requirements. Further, copying substings is much faster, because chars in a string don't have a reference count, don't point to atoms holding a large value and so on. This is why I don't think an implementation piggybacking on sequences will reap much bnefits. Unfortunately, like for NIL or large integers, adding a new native type in the current backend looks hopeless. I have been saying that to myself for years, but never felt the motivation to start that rewrite, which would be a huge effort for... which benefits? If we do it collectivelt, there is a chane to get a meaner Eu. Not in 4.0. CChris > Cheers > Peter Robinson
3. Re: Constrained sequences
- Posted by Peter Robinson <indorlaw at ?ahoo.?om.au> May 31, 2008
- 646 views
Hi Chris CChris wrote: > > 0 represents a sequence. Traditional types could be typed by the same > > system:- > > > > sequence x -- { 0, SEQ } > > I'd think {1, OBJECT} rather. Just nitpicking. Not sure why you think this is more natural, or better. But it's just a representation anyway - one's as good as another. > > The constraints only play a significant role when an assignment occurs TO > > the > > constrained sequence. But this is a trigger point anyway. The interpreter > > has > > to check the type of assignments. The checking code just needs an extra > > branch > > to handle data flagged as a constrainted sequence. It has to check that the > > RHS matches the pattern of the constraint. > > > > This means the parser will have more checks to perform. But isn't the aim to > help optimised code to be run instead? That's why a separate string type is > needed. Possibly in 8/16/32 flavors. I'm uncertain what you're responding to here. This is the part of my proposal dealing with sequence of types - nothing to do with strings. The sequence of types proposal has been voted on (it wasn't my proposal), and inevitably implementation of extra options means extra branches/checking, n'est-ce pas? This is where I made a passing suggestion about strings. > > It occurred to me that the proposed atribute pair would also provide an > > opportunity > > to implement strings as a form of constrained sequence. It was a coincidence > > that Matt happened to raise a post about this recently. > > > > The new attrbitue pair would have no natural meaning for negative values of > > the depth variable. Sequences with no (or implied) depth specifictions could > > be represented by negative depth values. E.g:- > > > > ascii sequence x -- { -1, INTEGER } > > unicode sequence y -- { -2, INTEGER } > > > > How would the following be coded: > sequence of unicode > sequence of sequence of string > ? And yes, this is a good point, that if you combine both proposals (sequence of types and string sequences), my suggestion of using the redundant negative values doesn't do the job. So you would have to tag the inner dimension in another way - but that's not a problem. > There is a sequence() type operator. How would I build a "sequence of sequence > of atom"() operato? I didn't overlook this, but didn't mention it because when it was touched on in the original discussions, (from memory) it received at best a mixed reception. Since the syntax 'sequence of type' is infinite, I don't see that you can mirror the built-in syntax: sequence( x). But you could have a built-in that took 2 (extra) args, the depth and the type of the inner constraint. I don't see this as affecting my proposal. > Or an UDT based on a nestde sequence? > Well this could be handle by an "is" keyword: > }}} <eucode> > ssa is sequence of sequence of atom > type sssma(ssa x) --sequence of sequence of small atoms > for i=1 to length(x) do > for j=1 to length(x[i]) do > if x[i][j]<-3.14 or x[i][j]>9.7 then > return 0 > end if > end for > end for > return 1 > end type > </eucode> {{{ > That would work. Yes. But if we build-in 'sequence of type', we should also build-in the test function, shouldn't we? Is new syntax required for this narrow concept? Is 'is' better here than an old-fashioned function? (Mind you, I'm keen on introducing declaratory-style syntax, and 'is' would fit that). > They are, but a lot of apps still doesn't need to deal with many different > languages, > so any UTF-8 encoding would do, with minimal storage requirements. Further, > copying substings is much faster, because chars in a string don't have a > reference > count, don't point to atoms holding a large value and so on. This is why I > don't > think an implementation piggybacking on sequences will reap much bnefits. > Unfortunately, like for NIL or large integers, adding a new native type in the > current backend looks hopeless. I have been saying that to myself for years, > but never felt the motivation to start that rewrite, which would be a huge > effort > for... which benefits? If we do it collectivelt, there is a chane to get a > meaner > Eu. Not in 4.0. > > CChris > In terms of speed and allocation, I think it's certain that an implementaion piggy-backing on sequences would NOT reap benefits. But at least some of the discussion about string-handling in the past has been about the failure of Euphoria to treat strings naturally in some operational contexts - applying sequenc operations rather than string operations. If strings were implemented as sequences tagged with special attributes, some of these could be handled quite easily. If intriducing a new primitve string type is nearly impossible, doesn't that support an intermediate solution? In any event Chris, thanks for the reply. But one reason I posted was to get feedback on my general thoughts on implementation of sequence of types. That isn' a 'new primitve type' issue so far as I can see. I don't read you as directly criticising my proposal on that. Have I misunderstood? Cheers Peter Robinson
4. Re: Constrained sequences
- Posted by CChris <christian.cuvier at agricu?ture.gou?.fr> May 31, 2008
- 640 views
Peter Robinson wrote: > > Hi Chris > > CChris wrote: > > > 0 represents a sequence. Traditional types could be typed by the same > > > system:- > > > > > > sequence x -- { 0, SEQ } > > > > I'd think {1, OBJECT} rather. Just nitpicking. > > Not sure why you think this is more natural, or better. But it's just a > representation > anyway - one's as good as another. > > > > The constraints only play a significant role when an assignment occurs TO > > > the > > > constrained sequence. But this is a trigger point anyway. The interpreter > > > has > > > to check the type of assignments. The checking code just needs an extra > > > branch > > > to handle data flagged as a constrainted sequence. It has to check that > > > the > > > RHS matches the pattern of the constraint. > > > > > > > This means the parser will have more checks to perform. But isn't the aim to > > help optimised code to be run instead? That's why a separate string type is > > needed. Possibly in 8/16/32 flavors. > > I'm uncertain what you're responding to here. This is the part of my proposal > dealing with sequence of types - nothing to do with strings. The sequence of > types proposal has been voted on (it wasn't my proposal), and inevitably > implementation > of extra options means extra branches/checking, n'est-ce pas? > > This is where I made a passing suggestion about strings. > > > It occurred to me that the proposed atribute pair would also provide an > > > opportunity > > > to implement strings as a form of constrained sequence. It was a > > > coincidence > > > that Matt happened to raise a post about this recently. > > > > > > The new attrbitue pair would have no natural meaning for negative values > > > of > > > the depth variable. Sequences with no (or implied) depth specifictions > > > could > > > be represented by negative depth values. E.g:- > > > > > > ascii sequence x -- { -1, INTEGER } > > > unicode sequence y -- { -2, INTEGER } > > > > > > > How would the following be coded: > > sequence of unicode > > sequence of sequence of string > > ? > > And yes, this is a good point, that if you combine both proposals (sequence > of types and string sequences), my suggestion of using the redundant negative > values doesn't do the job. So you would have to tag the inner dimension in > another > way - but that's not a problem. > > > There is a sequence() type operator. How would I build a "sequence of > > sequence > > of atom"() operato? > I didn't overlook this, but didn't mention it because when it was touched on > in the original discussions, (from memory) it received at best a mixed > reception. > Since the syntax 'sequence of type' is infinite, I don't see that you can > mirror > the built-in syntax: sequence( x). But you could have a built-in that took > 2 (extra) args, the depth and the type of the inner constraint. I don't see > this as affecting my proposal. > > > Or an UDT based on a nestde sequence? > > Well this could be handle by an "is" keyword: > > }}} <eucode> > > ssa is sequence of sequence of atom > > type sssma(ssa x) --sequence of sequence of small atoms > > for i=1 to length(x) do > > for j=1 to length(x[i]) do > > if x[i][j]<-3.14 or x[i][j]>9.7 then > > return 0 > > end if > > end for > > end for > > return 1 > > end type > > </eucode> {{{ > > That would work. > > Yes. But if we build-in 'sequence of type', we should also build-in the test > function, shouldn't we? Is new syntax required for this narrow concept? Is > 'is' > better here than an old-fashioned function? (Mind you, I'm keen on introducing > declaratory-style syntax, and 'is' would fit that). > > > They are, but a lot of apps still doesn't need to deal with many different > > languages, > > so any UTF-8 encoding would do, with minimal storage requirements. Further, > > copying substings is much faster, because chars in a string don't have a > > reference > > count, don't point to atoms holding a large value and so on. This is why I > > don't > > think an implementation piggybacking on sequences will reap much bnefits. > > Unfortunately, like for NIL or large integers, adding a new native type in > > the > > current backend looks hopeless. I have been saying that to myself for years, > > but never felt the motivation to start that rewrite, which would be a huge > > effort > > for... which benefits? If we do it collectivelt, there is a chane to get a > > meaner > > Eu. Not in 4.0. > > > > CChris > > > > In terms of speed and allocation, I think it's certain that an implementaion > piggy-backing on sequences would NOT reap benefits. But at least some of the > discussion about string-handling in the past has been about the failure of > Euphoria > to treat strings naturally in some operational contexts - applying sequenc > operations > rather than string operations. If strings were implemented as sequences tagged > with special attributes, some of these could be handled quite easily. If > intriducing > a new primitve string type is nearly impossible, doesn't that support an > intermediate > solution? > > In any event Chris, thanks for the reply. But one reason I posted was to get > feedback on my general thoughts on implementation of sequence of types. That > isn' a 'new primitve type' issue so far as I can see. I don't read you as > directly > criticising my proposal on that. Have I misunderstood? > > Cheers > Peter Robinson No a direct criticism. However, * given how sequences are actually implemented, fast handling for optimised sequences of types won't work often; * assuming it does, the backend will have to choose which sort of optimised handling to use. The likelihood of the added, very very frequent testing offsetting the benefit from the partial optimisations is quite high. This is why, while having such types, _prominently among which_ string types, would be a clear bonus to Euphoria, I'm afraid they won't play nice in the current implementation of the language. Hoping to be proved wrong. CChris