Re: Problems with structures
- Posted by "Boehme, Gabriel" <gboehme at MUSICLAND.COM> Feb 04, 1999
- 389 views
Robert Pilkington <pilking at BELLATLANTIC.NET> writes: >>As you can see, it's very easy to understand what's going on here. You have >>a deck of 52 cards, identified here by suit and value. You don't have to >>define constants to use as subscripts (i.e. Suit = 1, Value = 2) -- you can >>give the individual portions field names to uniquely identify them. Plus, >>the compiler will *always* know if a field reference is valid or not, and >>can tell you right away if you've made a mistake. If I were to try this... >> >>DIM Hand(1 TO 7) AS STRING * 9 >>Hand(1).Suit = "Spade" >> >>..the compiler would immediately flag it as an error -- the "Hand" array is >>not defined to contain "Card"-type data, so the ".Suit" reference has no >>meaning here. The same goes for all the other languages containing >>structures -- since every data element's type *must* be defined beforehand, >>the compiler can know instantly if a field reference is valid or not. > > >Right. Argument #1 for structures: Type checking. Argument #2 hasn't been >touched on in your message, so I'll point it out here: Speed. Structures, in >theory, should be faster than the dynamic sequence Euphoria has. I would agree with argument #1, on some level -- named references should be available *somehow*. But it doesn't work within the context of Euphoria. Neither does argument #2, not that I can see. Read on for more details... >What about: > >Card Hand -- Define Hand as Card. >sequence Card Hand -- Make it an "array" Does Euphoria have arrays? No. Euphoria has *sequences*. Once again, we are creating contradictory syntax for sequence definitions, and diluting the *essential* features of generic sequences... >Now, before you say the interpreter could see this: > >sequence Card >Hand > >Remember: What would happen here: > >type Card(atom x) > return atom(x) >end type > >sequence Card > >We get an error, right? That's because the interpreter *KNOWS* that Card has >been defined as a type. Now, if it *KNOWS* that Card was defined as a >structure, then it would know what to do with the next "line". The only >problem is that Euphoria's syntax is usually pretty predictable. This is >slightly less consistent. This is true -- "sequence structure variable" just looks confusing. This method of variable declaration could cause problems within routines, as well. Please try this yourself -- create a Euphoria executable containing just this code: type five(integer x) return x = 5 end type procedure test() integer five five = 4 ? five end procedure test() Now run it. Hey, it works! Type-declarations are allowed to be used as variable names within routines. Would structures be any different? Sequences are *by definition*, a collection of *any* Euphoria objects. The only current restriction on this is to declare a sequence as a different type: my_sequence Card This way, it's obvious how to type-check Card -- just look at the type-declaration of my_sequence() to see what's legal. With your three-identifier declaration, it's unclear just how the type-checking takes place. Argument #1 for structures should not *create* any type-check confusion -- it should *solve* it. >Now, this next argument you provide, has one huge flaw in it: > >>Secondly, examples such as these don't deal adequately with generic Euphoria >>objects. Most of the structure ideas I've seen show how we could define and >>use the structures, but don't really examine how *normal* Euphoria would >>work with them. What if I try this: >> >>object one_card >>one_card = Deck[1] >>? one_card.Suit >>? one_card.Value >> >>It seems simple enough at first glance -- the object "one_card" is assigned >>the value from "Deck[1]", which is in "Card" format. So naturally the field >>references would make sense. But how does the Euphoria "compiler" know for >>sure what's in "one_card"? How does it know that "one_card.Suit" is a valid >>field reference? After all, "one_card" is a generic object. There's no >>guarantee that it will *always* contain "Card"-structured data. In theory, I >>suppose the Euphoria "compiler" could try to get really clever -- it knows >>that "Deck[1]" contains "Card"-structured data, so it "allows" the field >>references to be attached to the object "one_card" in this particular case. >>In practice, however, this would be a *nightmare* to implement, especially >>if the value is passed back via a function: > >Well, how does Euphoria know that an object is a sequence or atom? It would >also know if it is a structure or not. Think about this: > >object a >a = 1 >? a[1] -- Error! > >Why on Earth would it do that? Could it be that Euphoria somehow magically >knew that a was currently an atom? So why would it be so hard to keep this >from happening? > >Object a >a = 1 >? a.Suit > >Same basic principle: Access a part of a that doesn't exist. Both cases, a >crash. Yes, but a crash at *runtime*. Structures in every other language can be verified at "compile" time. Euphoria won't flag "? a[1]" as an error until the statement is actually executed. Please, try this on your own -- create a Euphoria executable containing *just* this code: procedure try_this() object a a = 1 ? a[1] end procedure Notice that we're *not* calling try_this() from anywhere, so we're not executing anything. The only way "? a[1]" can be flagged as an error is if the "compiler" catches it. Now run the executable. Well, what do you know, no errors! Again, every other language can verify its structure field references at compile-time -- please refer back to the QuickBasic example given at the beginning. Euphoria's "compiler" would be unable to do this. True, it lets subscript references like "a[1]" go by, but subscripts at least potentially apply to *any* sequence within the program. Structure field names would *only* apply to their specifically-defined structures -- yet they, too must be allowed *anywhere* in the program. Programmers from other languages would wonder why Euphoria is unable to do "compile"-time field reference checks. >>object one_card >>one_card = 1 >>? one_card.Suit >> >>As anyone can see, this is ridiculous. There is no way that "one_card" can >>possibly contain "Card" data here. But "one_card" has the *potential* of >>holding "Card"-structured data, so this *must* be allowed to pass. This will >>cause an error at runtime, of course, but the point is that this shouldn't >>be allowed past the "compile" step in the first place. > > >Euphoria wouldn't allow one_card[1] to pass (at runtime). But it has the >*potential* of holding sequence data, so it *must* be allowed to pass, >right? It would crash at runtime. I don't see any problem here. Well, this >problem already exists... So I don't see any NEW problem with the crashing >with structures. With "one_card[1]", Euphoria doesn't have to do any major type-check work to determine if "one_card" is a sequence or atom. For "one_card.Suit", how will Euphoria know if "one_card" contains "Card"-type data? Why, it will have to do a type-check, of course -- EVERY TIME IT'S REFERENCED WITH A FIELD NAME. object one_card one_card = Deck[1] ? one_card.Suit -- type-check required here ? one_card.Value -- type-check required here, too This would be a MAJOR performance-killer. Kinda goes against argument #2 for structures, doesn't it? True, the "without type_check" directive could turn this off. But this would (again!) create major changes in the way Euphoria does business. Currently, the only time it needs to do user-defined type-checks is when a value is assigned to the variable -- now it needs to do them just to *reference* the varaible! >And this is actually the only really good argument against structures: They >just don't fit. As was pointed out, we need "something". > >Something that fits these three parameters, of possible: > >Is faster when not being resized. Remember, there is a lot of code that uses >a sequence to emulate a structure. The 2nd level, usually, never grows or >shrinks. Could the speed to access such a non-dynamic part of a sequence be >sped up? Is it already like that? This might be a possibility. However, Euphoria's current design allows for changes to *any* of the sequences. Exceptions would require extra logic, which would kill speed. >Has type-checking: What if the data of a sequence, which somewhere has an >atom where it should have a sequence, is sent to a generic save-to-disk >routine? Then when it is loaded, the corrupted data will crash when sent as >an argument to say pixel() as the coordinents. This could be a tough bug to >track down, especially in a large program saved to disk by Edom. (Data isn't >human-readable on disk) trace() would be the only hope, and it WOULD work, >it would just take a bit of work. Type check the data before you access it. I don't understand why pro-structure people are so allergic to big type-checks. They're only performance killers when type-checking is *on*. That should be perfectly fine when you're testing your program. When you're confident your program is stable, you can specify "without type_check" at the beginning of your source. It's true that this *could* be a problem if you haven't plugged all the code leaks in your program. That's why type-checks are so cool to begin with. Structures are an attempt to "force" Euphoria to rigidly "lock" the types of data it can use and pass around. But structures would *still* require type-checks -- they'd just be hidden under the surface of the "structure" syntax. Plus, they'd introduce all sorts of other exceptional conditions that sequences don't have to deal with, so I don't see what we're gaining with structures. For problems as described above -- an incoming file that has somehow become corrupted -- just type-check your incoming data *once* before passing it on to the rest of your program. You know, call the type like a function to verify that the data is legal. This way, if there's a problem, you get a very easy-to-understand type-check error -- it's stopped before it gets too far into your program, and doesn't become "tough to track down". You still seem to be thinking in terms of other programming languages -- *they* don't have anything like type-checks or structure verification, so it doesn't occur to you that Euphoria's type-checking can catch this *before* it becomes a major problem. (I hope I'm not being too presumptuous here...) Euphoria type-checks are the best I've found in *any* language. You can restrict any value to any structure or range you desire. Instead of being tied to machine-types like other languages, Euphoria allows you to logically define your types in any way you need to. Yes, this kind of type-checking is slower. But the flexibility and runtime safety that's gained, IMHO, is worth the exchange. >Elemenates name-space conflicts. If two pseudo structures have the same >field, LIFE, then that field has to be in the *SAME SPOT* in each structure, >because we can't have two constant declarations of LIFE. This can be quite >irritating, when you have to reorder a logically based constant order >because you need another pseudo structure to have access to LIFE. I have run >into this porting C code to Euphoria. It can be worked around, but it is >*VERY* annoying. I agree with this point. But I don't see this as a structure issue -- more of a NAMESPACE issue instead. Perhaps solutions to namespace problems will eliminate the need for structures? Gabriel Boehme