1. RE: Data validation (was Re: Stu--- Just how many times has this changed?)
- Posted by "Patrick Barnes" <mistertrik at hotmail.com> Jun 02, 2004
- 509 views
------------------------------------------------------------------------ magnae clunes mihi placent, nec possum de hac re mentiri. ------------------------------------------------------------------------ MrTrick > > >From: Derek Parnell <guest at RapidEuphoria.com> > > >Subject: Re: Stupid Newbie-sounding question. > > > > Okay Rob... What sort of structure do these sequences have to have? >I > > >don't > > > > know! If we had the ability to use stronger typing, then function > > > > myFunc(text_picture r1, string r2, flag_list r3) would make a lot >more > > > > sense. I could go and look at the type declaration, and figure out >what > > >to > > > > pass to the function. > > > > People don't use the types like this at the moment because it's so >damn > > > > computationally expensive for sequences. > > > > > >This is true. And there is the trade off - speed & flexibility against > > >structure and complexity. Think of it in terms of a bicycle without > > >training wheels and one with training wheels> > > > > >RDS has chosen speed. Meaning that the coder has more responsibility to > > >provide quality assurance than the translator does. Yes this does mean > > >more work for the coder and better discipline. > > > > Well, if you're writing a library that other people would use, I would >say > > that it's the library programmer's responsibility to check the data >passed > > to it. > >This is a good issue to discuss. I must deal with this problem with the >Win32lib library. The question for me is, how *much* parameter validation >should the library perform? > >On one extreme, it could be said that I should not do any parameter >validation because it is the *user's* responsibility to provide >parameter values as documented. If they do not heed the documentation, why >is it considered to be my problem? This approach would speed up the >execution of win32lib applications, and slow down the development of >quality applications. > >On the other extreme, I should do everything in my power to protect >the user from using incorrect parameter values. As a service to the >coder (and end-user?) I should try to provide meaningful exception >messages and/or codes when I detect unusable parameter values being >passed. This approach would speed up the development of >quality applications and slow down the execution of win32lib applications. > >And there are many shades in between these two extremes. Maybe I should >provide two versions of the library - one with training wheels and one >without? > >At this stage I don't have the answer. The current library does some >checking but could do more. > > > I think that types could be implemented in a way that would run *faster* > > than without. Without these types, libraries and functions need to check > > that the data passed to it is valid. This results in redundant checks. > > Having types supplemented with the "of" command speeds it up many-fold >over > > the old type system, because data will only be checked that has changed. >And > > because that if a variable is of a certain type, it is *known* already >that > > it is valid, so it won't need to be checked multiple times. > >What you are saying is true, but it comes at a cost to RDS. It introduces >more potential bugs in the interpreter and translator. It also will cause >Euphoria apps to run slower than if they had no type checking. >BTW, how would the user defined type routine get to know which parts of >the variable have been modified? Currently, the entire object is passed >to the routine, but there is no indication which parts were changed. This is what forms the basis of this suggestion... Say you have this: type positive_int(integer t) return (t >= 0) end type type index( sequence of positive_int x) return length(x) < 10 end type If an element or groups of elements change, the base type (index) is not checked, but each element that changed has the "of" type (positive_int) checked against it. If the aggregate properties of the index change (ie length), then the base type is checked, then the elements that changed. The base type should only check things that affect the entire sequence, like length. Example: index x x = {6} --1 checks base, and first element x &= 4 --3 checks base (cause length has changed) and new element, but not first x[1] = 5 --4 checks element 1, but not base (length has not changed) x[1..2] = {4,1} --5 checks element 1 and 2 x &= {10, 0} --6 checks base, and the new elements, but not existing x = {0} --7 x is completely reassigned. If existing index assigned, no checking is done, otherwise check everything x[1] = -1 --error here (element 1 is checked, and fails.) That sounds a little convoluted, but to check the base type is very simple, and the interpreter could optimise the base type right out if all it does is return 1. Also, if you append a slice of an index to an existing index, then the types are the same - you don't need to recheck all those elements, just the literals and originally non-indexes. > > It's still a valid argument (slicing and seq errors). It's very easy to > > misplace a subscript, or make some error if you are breaking up and > > reassembling a sequence in one line. > >Yep. Any ideas how to make this less error prone? See rest of thread :o) The checking would not protect against mis-slicing (ie 2..n instead of 2..n-1), but it would protect the structure of your data, and you wouldn't get mysterious "Attempt to subscript atom" messages 100 lines further on. And, as shown above, reassembling these types would not cause much performance penalty. For example, to right(left?) shift a sequence, you just write myvar = mvar[2..length(myvar)] & {myvar[1]} ----(or append, or something). Because all of the elements are already known to be a certain type, and are being assigned to the same level as they were before, they don't need to be checked. Only the base type does. > > >Ummmm? Why not code so that it works? > > > > > > type myStruct (object s) > > > if sequence(s) return 1 > > > else return 0 > > > end type > > > > Because it is non-intuitive. > >It is for me. It is saying that if 's' is a sequence then its okay >otherwise >its not okay. Well if it's not ok, it should shortcut and return 0 before processing the type body rather than crashing. This is more of an implementation thing, in that it would affect processing for the "of" system. It should be apparent from reading the above > > Derek, if someone passes a badly-formed sequence into one of the >Win32lib > > functions, the error is stated to be inside that function, when in fact >it > > was the fault of the other programmer. The trace window or ex.err may >show > > the value of the sequence (or maybe only the first part), but it may not >be > > easy to elucidate that a) the error resulted from their own mistake. b) > > Exactly what was wrong with that sequence they passed, anyway. > >That's what documentation is useful for
True, but sometimes it's there, sometimes it's not. :o)