1. Constrained sequences

I may have some time in a few weeks to look at the source, and I'm thinking
about the proposal for sequences of types. I'm thinking it shouldn't be all that
difficult. Correct me if I'm wrong.

The following syntax would only declare a sequence. It doesn't declare any
integers or atoms:-

	sequence of integer x
	sequence of sequence of atom x

What it says is that I'm declaring a sequence, but when I make assignments to it
in the future they must fit the constraint (“of integer” or “of sequence of
atom”). Only the far right type is open – everything to the left of an “of” must
be “sequence”. So, in the parsing phase, instead of adding a symbol to the table
with an attribute like SEQ, it needs a pair attribute like {depth, type}, where
depth is the level of the last “of” and “type” is the far-right type. So the
attributes of the declarations above (using zero-based indexing) would be
something like:-

	sequence of integer x           -- { 1, INTEGER }
	sequence of sequence of atom  y -- { 2, ATOM }

Notice that both these pairs represent sequences. Any attribute with depth > 0
represents a sequence. Traditional types could be typed by the same system:-

	sequence x -- { 0, SEQ }
	atom y     -- { 0, ATOM }

So for starters, apart from adjusting the scanner and adding the keyword “of”,
it would be necessary to slightly change the structure of symbol attributes, and
obviously the functions that test those attributes.

The constraints only play a significant role when an assignment occurs TO the
constrained sequence. But this is a trigger point anyway. The interpreter has to
check the type of assignments. The checking code just needs an extra branch to
handle data flagged as a constrainted sequence. It has to check that the RHS
matches the pattern of the constraint.

When such an assignment occurs, I'm figuring that the types of the assigned
values are already  stored in a symbol table since the interpreter needs them for
certain purposes. (e.g when you assign to an element of an element: x[2]{3] = 7,
where x[2] must be a sequence with at least 3 elements; OR function_call( x[1] ),
where x[1] must be an integer). So the interpreter already handles typing the
elements of a sequence.

Occurrences of elements of a constrained sequence in an expression would also
have to be considered, but I suspect that they're just variations of the above.
Once an assignment to a constrained sequence is validated, other operations on
the sequence occur as normal.

It occurred to me that the proposed atribute pair would also provide an
opportunity to implement strings as a form of constrained sequence. It was a
coincidence that Matt happened to raise a post about this recently.

The new attrbitue pair would have no natural meaning for negative values of the
depth variable. Sequences with no (or implied) depth specifictions could be
represented by negative depth values. E.g:-

	ascii sequence x -- { -1, INTEGER }
	unicode sequence y -- { -2, INTEGER }

By treating these quasi-types as constrained sequences, all the usual sequence
treatments can apply without change unless we choose to override them. In
relation to strings, we may well want to change the translation of operators like
“=”. When the interpeter sees a sequence operation on “=”, it must look at the
variable's constraint to check whether it is a string, and apply the natural
operation.

Other types that could be handled by negative attributes are natural numbers
(unsigned) and structures (although there' s more thinking to be done on that). A
poll has already approved structures.

I realise this analysis is simplistic, not based on the actual source code,
which I haven't yet read. But I'm struggling to see why these things would be so
difficult. Why am I wrong?

This idea doesn't deal with the issue of minimal storage of strings, but like
Shawn, I think 8-bit chars are becoming obsolete.

Cheers
Peter Robinson

new topic     » topic index » view message » categorize

2. Re: Constrained sequences

Peter Robinson wrote:
> 
> I may have some time in a few weeks to look at the source, and I'm thinking
> about the proposal for sequences of types. I'm thinking it shouldn't be all
> that difficult. Correct me if I'm wrong.
> 
> The following syntax would only declare a sequence. It doesn't declare any
> integers
> or atoms:-
> 
> 	sequence of integer x
> 	sequence of sequence of atom x
> 
> What it says is that I'm declaring a sequence, but when I make assignments to
> it in the future they must fit the constraint (“of integer” or “of sequence
> of atom”). Only the far right type is open – everything to the left of an “of”
> must be “sequence”. So, in the parsing phase, instead of adding a symbol to
> the table with an attribute like SEQ, it needs a pair attribute like {depth,
> type}, where depth is the level of the last “of” and “type” is the far-right
> type. So the attributes of the declarations above (using zero-based indexing)
> would be something like:-
> 
> 	sequence of integer x           -- { 1, INTEGER }
> 	sequence of sequence of atom  y -- { 2, ATOM }
> 
> Notice that both these pairs represent sequences. Any attribute with depth >
> 0 represents a sequence. Traditional types could be typed by the same system:-
> 
> 	sequence x -- { 0, SEQ }

I'd think {1, OBJECT} rather. Just nitpicking.

> 	atom y     -- { 0, ATOM }
> 
> So for starters, apart from adjusting the scanner and adding the keyword “of”,
> it would be necessary to slightly change the structure of symbol attributes,
> and obviously the functions that test those attributes.
> 
> The constraints only play a significant role when an assignment occurs TO the
> constrained sequence. But this is a trigger point anyway. The interpreter has
> to check the type of assignments. The checking code just needs an extra branch
> to handle data flagged as a constrainted sequence. It has to check that the
> RHS matches the pattern of the constraint.
> 

This means the parser will have more checks to perform. But isn't the aim to
help optimised code to be run instead? That's why a separate string type is
needed. Possibly in 8/16/32 flavors.

> When such an assignment occurs, I'm figuring that the types of the assigned
> values are already  stored in a symbol table since the interpreter needs them
> for certain purposes. (e.g when you assign to an element of an element:
> x[2]{3]
> = 7, where x[2] must be a sequence with at least 3 elements; OR function_call(
> x[1] ), where x[1] must be an integer). So the interpreter already handles
> typing
> the elements of a sequence.
> 
> Occurrences of elements of a constrained sequence in an expression would also
> have to be considered, but I suspect that they're just variations of the
> above.
> Once an assignment to a constrained sequence is validated, other operations
> on the sequence occur as normal.
> 
> It occurred to me that the proposed atribute pair would also provide an
> opportunity
> to implement strings as a form of constrained sequence. It was a coincidence
> that Matt happened to raise a post about this recently.
> 
> The new attrbitue pair would have no natural meaning for negative values of
> the depth variable. Sequences with no (or implied) depth specifictions could
> be represented by negative depth values. E.g:-
> 
> 	ascii sequence x -- { -1, INTEGER }
> 	unicode sequence y -- { -2, INTEGER }
> 

How would the following be coded:
sequence of unicode
sequence of sequence of string
?

> By treating these quasi-types as constrained sequences, all the usual sequence
> treatments can apply without change unless we choose to override them. In
> relation
> to strings, we may well want to change the translation of operators like “=”.
> When the interpeter sees a sequence operation on “=”, it must look at the
> variable's
> constraint to check whether it is a string, and apply the natural operation.
> 
> Other types that could be handled by negative attributes are natural numbers
> (unsigned) and structures (although there' s more thinking to be done on
> that).
> A poll has already approved structures.
> 

There is a sequence() type operator. How would I build a "sequence of sequence
of atom"() operato? Or an UDT based on a nestde sequence?
Well this could be handle by an "is" keyword:
ssa is sequence of sequence of atom
type sssma(ssa x) --sequence of sequence of small atoms
  for i=1 to length(x) do
    for j=1 to length(x[i]) do
      if x[i][j]<-3.14 or x[i][j]>9.7 then
        return 0
      end if
    end for
  end for
  return 1
end type

That would work.

> I realise this analysis is simplistic, not based on the actual source code,
> which I haven't yet read. But I'm struggling to see why these things would be
> so difficult. Why am I wrong?
> 
> This idea doesn't deal with the issue of minimal storage of strings, but like
> Shawn, I think 8-bit chars are becoming obsolete.
> 

They are, but a lot of apps still doesn't need to deal with many different
languages, so any UTF-8 encoding would do, with minimal storage requirements.
Further, copying substings is much faster, because chars in a string don't have a
reference count, don't point to atoms holding a large value and so on. This is
why I don't think an implementation piggybacking on sequences will reap much
bnefits.
Unfortunately, like for NIL or large integers, adding a new native type in the
current backend looks hopeless. I have been saying that to myself for years, but
never felt the motivation to start that rewrite, which would be a huge effort
for... which benefits? If we do it collectivelt, there is a chane to get a meaner
Eu. Not in 4.0.

CChris

> Cheers
> Peter Robinson

new topic     » goto parent     » topic index » view message » categorize

3. Re: Constrained sequences

Hi Chris

CChris wrote:
> > 0 represents a sequence. Traditional types could be typed by the same
> > system:-
> > 
> > 	sequence x -- { 0, SEQ }
> 
> I'd think {1, OBJECT} rather. Just nitpicking.

Not sure why you think this is more natural, or better. But it's just a
representation anyway - one's as good as another.

> > The constraints only play a significant role when an assignment occurs TO
> > the
> > constrained sequence. But this is a trigger point anyway. The interpreter
> > has
> > to check the type of assignments. The checking code just needs an extra
> > branch
> > to handle data flagged as a constrainted sequence. It has to check that the
> > RHS matches the pattern of the constraint.
> > 
> 
> This means the parser will have more checks to perform. But isn't the aim to
> help optimised code to be run instead? That's why a separate string type is
> needed. Possibly in 8/16/32 flavors.

I'm uncertain what you're responding to here. This is the part of my proposal
dealing with sequence of types - nothing to do with strings. The sequence of
types proposal has been voted on (it wasn't my proposal), and inevitably
implementation of extra options means extra branches/checking, n'est-ce pas?

This is where I made a passing suggestion about strings.
> > It occurred to me that the proposed atribute pair would also provide an
> > opportunity
> > to implement strings as a form of constrained sequence. It was a coincidence
> > that Matt happened to raise a post about this recently.
> > 
> > The new attrbitue pair would have no natural meaning for negative values of
> > the depth variable. Sequences with no (or implied) depth specifictions could
> > be represented by negative depth values. E.g:-
> > 
> > 	ascii sequence x -- { -1, INTEGER }
> > 	unicode sequence y -- { -2, INTEGER }
> > 
> 
> How would the following be coded:
> sequence of unicode
> sequence of sequence of string
> ?

And yes, this is a good point, that if you combine both proposals (sequence of
types and string sequences), my suggestion of using the redundant negative values
doesn't do the job. So you would have to tag the inner dimension in another way -
but that's not a problem.

> There is a sequence() type operator. How would I build a "sequence of sequence
> of atom"() operato? 
I didn't overlook this, but didn't mention it because when it was touched on in
the original discussions, (from memory) it received at best a mixed reception.
Since the syntax 'sequence of type' is infinite, I don't see that you can mirror
the built-in syntax:  sequence( x). But you could have a built-in that took 2
(extra) args, the depth and the type of the inner constraint. I don't see this as
affecting my proposal.

> Or an UDT based on a nestde sequence?
> Well this could be handle by an "is" keyword:
> }}}
<eucode>
> ssa is sequence of sequence of atom
> type sssma(ssa x) --sequence of sequence of small atoms
>   for i=1 to length(x) do
>     for j=1 to length(x[i]) do
>       if x[i][j]<-3.14 or x[i][j]>9.7 then
>         return 0
>       end if
>     end for
>   end for
>   return 1
> end type
> </eucode>
{{{

> That would work.

Yes. But if we build-in 'sequence of type', we should also build-in the test
function, shouldn't we? Is new syntax required for this narrow concept? Is 'is'
better here than an old-fashioned function? (Mind you, I'm keen on introducing
declaratory-style syntax, and 'is' would fit that).
 
> They are, but a lot of apps still doesn't need to deal with many different
> languages,
> so any UTF-8 encoding would do, with minimal storage requirements. Further,
> copying substings is much faster, because chars in a string don't have a
> reference
> count, don't point to atoms holding a large value and so on. This is why I
> don't
> think an implementation piggybacking on sequences will reap much bnefits.
> Unfortunately, like for NIL or large integers, adding a new native type in the
> current backend looks hopeless. I have been saying that to myself for years,
> but never felt the motivation to start that rewrite, which would be a huge
> effort
> for... which benefits? If we do it collectivelt, there is a chane to get a
> meaner
> Eu. Not in 4.0.
> 
> CChris
> 

In terms of speed and allocation, I think it's certain that an implementaion
piggy-backing on sequences would NOT reap benefits. But at least some of the
discussion about string-handling in the past has been about the failure of
Euphoria to treat strings naturally in some operational contexts - applying
sequenc operations rather than string operations. If strings were implemented as
sequences tagged with special attributes, some of these could be handled quite
easily. If intriducing a new primitve string type is nearly impossible, doesn't
that support an intermediate solution?

In any event Chris, thanks for the reply. But one reason I posted was to get
feedback on my general thoughts on implementation of sequence of types. That isn'
a 'new primitve type' issue so far as I can see. I don't read you as directly
criticising my proposal on that. Have I misunderstood?

Cheers
Peter Robinson

new topic     » goto parent     » topic index » view message » categorize

4. Re: Constrained sequences

Peter Robinson wrote:
> 
> Hi Chris
> 
> CChris wrote:
> > > 0 represents a sequence. Traditional types could be typed by the same
> > > system:-
> > > 
> > > 	sequence x -- { 0, SEQ }
> > 
> > I'd think {1, OBJECT} rather. Just nitpicking.
> 
> Not sure why you think this is more natural, or better. But it's just a
> representation
> anyway - one's as good as another.
> 
> > > The constraints only play a significant role when an assignment occurs TO
> > > the
> > > constrained sequence. But this is a trigger point anyway. The interpreter
> > > has
> > > to check the type of assignments. The checking code just needs an extra
> > > branch
> > > to handle data flagged as a constrainted sequence. It has to check that
> > > the
> > > RHS matches the pattern of the constraint.
> > > 
> > 
> > This means the parser will have more checks to perform. But isn't the aim to
> > help optimised code to be run instead? That's why a separate string type is
> > needed. Possibly in 8/16/32 flavors.
> 
> I'm uncertain what you're responding to here. This is the part of my proposal
> dealing with sequence of types - nothing to do with strings. The sequence of
> types proposal has been voted on (it wasn't my proposal), and inevitably
> implementation
> of extra options means extra branches/checking, n'est-ce pas?
> 
> This is where I made a passing suggestion about strings.
> > > It occurred to me that the proposed atribute pair would also provide an
> > > opportunity
> > > to implement strings as a form of constrained sequence. It was a
> > > coincidence
> > > that Matt happened to raise a post about this recently.
> > > 
> > > The new attrbitue pair would have no natural meaning for negative values
> > > of
> > > the depth variable. Sequences with no (or implied) depth specifictions
> > > could
> > > be represented by negative depth values. E.g:-
> > > 
> > > 	ascii sequence x -- { -1, INTEGER }
> > > 	unicode sequence y -- { -2, INTEGER }
> > > 
> > 
> > How would the following be coded:
> > sequence of unicode
> > sequence of sequence of string
> > ?
> 
> And yes, this is a good point, that if you combine both proposals (sequence
> of types and string sequences), my suggestion of using the redundant negative
> values doesn't do the job. So you would have to tag the inner dimension in
> another
> way - but that's not a problem.
> 
> > There is a sequence() type operator. How would I build a "sequence of
> > sequence
> > of atom"() operato? 
> I didn't overlook this, but didn't mention it because when it was touched on
> in the original discussions, (from memory) it received at best a mixed
> reception.
> Since the syntax 'sequence of type' is infinite, I don't see that you can
> mirror
> the built-in syntax:  sequence( x). But you could have a built-in that took
> 2 (extra) args, the depth and the type of the inner constraint. I don't see
> this as affecting my proposal. 
> 
> > Or an UDT based on a nestde sequence?
> > Well this could be handle by an "is" keyword:
> > }}}
<eucode>
> > ssa is sequence of sequence of atom
> > type sssma(ssa x) --sequence of sequence of small atoms
> >   for i=1 to length(x) do
> >     for j=1 to length(x[i]) do
> >       if x[i][j]<-3.14 or x[i][j]>9.7 then
> >         return 0
> >       end if
> >     end for
> >   end for
> >   return 1
> > end type
> > </eucode>
{{{

> > That would work.
> 
> Yes. But if we build-in 'sequence of type', we should also build-in the test
> function, shouldn't we? Is new syntax required for this narrow concept? Is
> 'is'
> better here than an old-fashioned function? (Mind you, I'm keen on introducing
> declaratory-style syntax, and 'is' would fit that).
>  
> > They are, but a lot of apps still doesn't need to deal with many different
> > languages,
> > so any UTF-8 encoding would do, with minimal storage requirements. Further,
> > copying substings is much faster, because chars in a string don't have a
> > reference
> > count, don't point to atoms holding a large value and so on. This is why I
> > don't
> > think an implementation piggybacking on sequences will reap much bnefits.
> > Unfortunately, like for NIL or large integers, adding a new native type in
> > the
> > current backend looks hopeless. I have been saying that to myself for years,
> > but never felt the motivation to start that rewrite, which would be a huge
> > effort
> > for... which benefits? If we do it collectivelt, there is a chane to get a
> > meaner
> > Eu. Not in 4.0.
> > 
> > CChris
> > 
> 
> In terms of speed and allocation, I think it's certain that an implementaion
> piggy-backing on sequences would NOT reap benefits. But at least some of the
> discussion about string-handling in the past has been about the failure of
> Euphoria
> to treat strings naturally in some operational contexts - applying sequenc
> operations
> rather than string operations. If strings were implemented as sequences tagged
> with special attributes, some of these could be handled quite easily. If
> intriducing
> a new primitve string type is nearly impossible, doesn't that support an
> intermediate
> solution?
> 
> In any event Chris, thanks for the reply. But one reason I posted was to get
> feedback on my general thoughts on implementation of sequence of types. That
> isn' a 'new primitve type' issue so far as I can see. I don't read you as
> directly
> criticising my proposal on that. Have I misunderstood?
> 
> Cheers
> Peter Robinson

No a direct criticism.
However, 
* given how sequences are actually implemented, fast handling for optimised
sequences of types won't work often;
* assuming it does, the backend will have to choose which sort of optimised
handling to use. The likelihood of the added, very very frequent testing
offsetting the benefit from the partial optimisations is quite high.

This is why, while having such types, _prominently among which_ string types,
would be a clear bonus to Euphoria, I'm afraid they won't play nice in the
current implementation of the language.
Hoping to be proved wrong.

CChris

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu