1. RE: A sequence, by any other name
- Posted by Jonas Temple <jktemple at yhti.net> Feb 28, 2001
- 436 views
> Offhand there are two ways i can think of to do this. > In general, for any arbitrary sequence there is no way to > distinguish a character from a number that is stored in it. > For example: > n={65} --these are both exactly > n={'A'} --the same sequence. > > There is a second method which involves adding > the number 256 to any positive integers in order to > distinguish between characters and integers but it involves > alot more testing on the number before you know what it is. > If your interested i'll post that method next time. > One good point about it is that it doesnt require any more > space to store numbers like the first method does, which > could be a big benefit for large data. > Al, Thanks for the reply. I would indeed be interested in seeing what you've done. I'm also glad that doing this is not ane easy task...I thought I was missing something. I'll let the cat out of the bag here...what I'm TRYING to do is do add support of Matt's EUSQL library into my SQL utility that got posted to the contributions page. The utility only supports ODBC but I would like to also include the EDS. I use the EDS quite a bit and would like a tool to verify that my program is doing file i/o correctly. I also have the EDS viewer by Tone but I like being able to see the entire record. So that's why I need to be able to determine numeric vs character sequence. I don't know what data types I would be dealing with until run time. Maybe the solution here is to beg Matt to change the EUSQL to not only store the field name but the type as well. That would make my life easier. Just a thought. Jonas
2. RE: A sequence, by any other name
- Posted by matthewwalkerlewis at YAHOO.COM Feb 28, 2001
- 406 views
> From: Jonas Temple [mailto:jktemple at yhti.net] > Maybe the solution here is to beg Matt to change the EUSQL to > not only > store the field name but the type as well. That would make my life > easier. Just a thought. That's one thing I've been considering doing, and it'll probably happen [fairly] soon. I've been putting in a more detailed API, that sort of gives you a more flexible way to manipulate records (other than simply with SQL statements), that's building up to being able to index and verify records. In the next release, it will be recommended that programs using EuSQL no longer make any calls to database.e. You won't need to worry about record numbers--just keys. One thing I'll have to determine is what datatypes to support. I suppose: integer (standard Eu definition) atom (standard Eu definition) varchar (string) sequence (standard Eu definition) binary (single depth sequence of atoms) I'm not sure what exactly to do about the sequence datatype, since there's no real parallel in other DBMS's. Probably, it would treat those as invalid fields for non-Eu processes (I think someone wanted to access EDS through Delphi), while an Eu-based process could handle them just fine. There'd be several ways to get around this, though, including creation of subfields or compressing a sequence before saving it to the record, and decompressing in whatever way made sense. I'll probably handle this (API-wise) along the lines of ODBC--maybe even write an ODBC driver for EDS!--where you can get descriptions of the various fields, including names and datatypes. Also, I've just about got the ODBC code set up to return less than full recordsets. It changes the way you interface with the lib, but not too dramatically. Matt Lewis
3. RE: A sequence, by any other name
- Posted by Jonas Temple <jktemple at yhti.net> Feb 28, 2001
- 392 views
matthewwalkerlewis at YAHOO.COM wrote: > I'm not sure what exactly to do about the sequence datatype, since > there's > no real parallel in other DBMS's. Probably, it would treat those as > invalid > fields for non-Eu processes (I think someone wanted to access EDS > through > Delphi), while an Eu-based process could handle them just fine. There'd > be > several ways to get around this, though, including creation of subfields > or > compressing a sequence before saving it to the record, and decompressing > in > whatever way made sense. > > I'll probably handle this (API-wise) along the lines of ODBC--maybe even > write an ODBC driver for EDS!--where you can get descriptions of the > various > fields, including names and datatypes. > > Also, I've just about got the ODBC code set up to return less than full > recordsets. It changes the way you interface with the lib, but not too > dramatically. Geez, Matt, where do you find the time? I have been working on some Eu stuff for months now and can barely find time. For the EUSQL and EDS ODBC, how about this: - Include the data type as part of the TABLEDEF information, as you stated you are considering. - Since ODBC doesn't handle sequences how about the EUSQL/EDS ODBC driver expand the record down to the lowest sequence? Heck, you could even write the ODBC driver for EDS and not have to continue to support EUSQL. For example, take the following sequence: TABLEDEF: {{"First Name"}, {}}, {{{"First Name"},{"Last Name"}},{{"123"},{"456"},{"7890"}}} Full Name Phone numner a select * would return: {"First Name", "Last Name", "123", "456", "7890"}
4. RE: A sequence, by any other name
- Posted by Jonas Temple <jktemple at yhti.net> Feb 28, 2001
- 404 views
Darnit, I hit "Send Now" by accident! Anyway, to sum up what my previous post said, my thoughts are: - When returning the field defenitions return the "lowest" field definition. In other words, don't return the name of a sequence that contains other sequences. - When returning the data elements return the "lowest" field value. Oh by the way, I looked at the documentation on MSDN and ran screaming when I saw the specifics about writing an ODBC driver. I think it would be great to have and EDS ODBC driver but the sequence thing will be a hurdle. Like I rambled in my other post, you could stop supporting EUSQL and go to strictly ODBC. However, some non-Windows folks might still want EUSQL. ONE more thing...I noticed when using 'select * from table' the field listing returned is '*'. Would it be possible to return ALL field names? Just a thought. Jonas
5. RE: A sequence, by any other name
- Posted by Al Getz <Xaxo at aol.com> Feb 28, 2001
- 379 views
Hi again, I think it would be preposterous for someone to write a library that returns ambiguous data making it impossible for the user to determine what type it is that is being returned. Normally, the type is determined from the context, such as field1 is always a number field2 is always a string of characters etc. or else the user has control over what is stored, and therefore determines his own context beforehand. Im sure if you ask the writer they will provide you with that information. The second method also assumes you are the one doing the storing of the data. All you really do is add 256 to any positive integers just before storing, but leave your characters alone. When you read the data back, you simply test the number to see if its an integer, if it is, you test it again to see if its equal to or over 256. If it is, you know its an integer, not a character. If its under 256, you know its a character and not an integer. In this way you only have to store one number per integer (or character) so you dont use any more storage space then you do when you normally store something. To detect character strings fast, you simply follow one more simple rule: RULE #2: you store character strings separately from integer sets like this: to store the string "ABCDE" n="ABCDE" but to store the set n={65,'B','C','D','E'} --the number 65 followed by string "BCDE" you actually store it as: n={{65+256},"BCDE"} That way you only have to test the first number of each sub sequence to determine whether or not it is a character string or a set of integers. Note also that negative numbers go unchanged, as well as floating point numbers. (see demo below) Here are some functions to illustrate the idea, but you'll have to expand on this idea to include character strings. (Shouldnt be too hard). Note that one of these functions is implemented using a sort of pseudo polymorphism. You always pass a sequence, if the sequence is two elements long, its taken to be a character, but if one element long, a number. This is mainly because you dont have to convert character strings, they will always be stored exactly as they normally appear in a sequence. You do have to convert all numbers though, because if its a positive integer it has to be augmented with 256 in order to detect that fact when reading back the data from the data base or whatever. If you also follow rule #2 then you only have to test the first element as stated before. If you dont follow rule #2 then you really have to test every single element, which could get really slow. --------------------------------- with trace trace(1) sequence n,a atom x constant CHARACTER=0,NUMBER=1 function ConvertForStorage(sequence a) atom x x=a[1] if length(a)<2 then --of type NUMBER: if integer(x) then if x>=0 then x=x+256 if integer(x) then return x else printf(1,"%s\n",{"Integer too large"}) abort(1)--modify this to suite application end if else return x end if else return x end if else --of type CHARACTER: --(dont really have to call this for characters, -- they always go unchanged) return x end if end function function ConvertBackToOriginal(atom x) if integer(x) then if x<0 then --its a negative integer so just return it: return {NUMBER,x} elsif x>=256 then --its a positive integer so subtract 256 to get the --original value: return {NUMBER,(x-256)} else --its a character so dont subtract: return {CHARACTER,x} end if else --its not an integer so just return it return {NUMBER,x} end if end function function Number(sequence a) --quick test to determine read back type if a[1]=NUMBER then return 1 else return 0 end if end function --this is what the test sequence will look like: -- n={-65,65,321,65.1} -- store -65, 'A', +65, and +65.1 -- note: 321=65+256 n=repeat(0,4) x=ConvertForStorage({-65}) --note: pass one element long for numbers n[1]=x x=ConvertForStorage({'A',CHARACTER})--note: --pass two elements for a char n[2]=x x=ConvertForStorage({65}) n[3]=x x=ConvertForStorage({65.1}) n[4]=x for k=1 to length(n) do x=n[k] a=ConvertBackToOriginal(x) x=a[2] if Number(a) then ?x --print the number else printf(1,"%s\n",{x}) --print the character end if end for --------------------------------- One last note: the high end range of possible integers that can be stored is effectively decreased by exactly 256. This means the top end range decreases from #3FFFFFFF to #3FFFFEFF (not much at all). If you try to store a positive integer greater then #3FFFFEFF you'll see an error print out on the screen just before abort. You can modify that to whatever you wish, but you really have to include that test in the code somewhere in order to insure you can accurately detect the correct type when reading back the data, because if the integer overflows into an atom it wont be detected as an integer during read back and therefore wont get decreased by 256 back to the original number. This of course compromises the integrity of the stored data. Of course as mentioned before these methods slow down the code to some degree. Usually you can keep track of what is where without resorting to these types of methods, except maybe in a data base program made to store arbitrary types of data. In any case, you are the only one that can decide what method is best for your application. Good luck with it. --Al
6. RE: A sequence, by any other name
- Posted by Kat <gertie at PELL.NET> Feb 28, 2001
- 402 views
On 28 Feb 2001, at 12:48, Al Getz wrote: > Hi again, > > I think it would be preposterous for someone to write a library > that returns ambiguous data making it impossible for the user > to determine what type it is that is being returned. Normally, > the type is determined from the context, such as > field1 is always a number > field2 is always a string of characters > etc. > or else the user has control over what is stored, and therefore > determines his own context beforehand. Or make everything a sequence, with xml tags: <s>hello?</s> <n>-3</n> <n1>12309.876</n1> <n2>6</n2> <s1>M1A1</s1> <s34>string #34</s34> That could be 5 stored sequences, or one nested sequence, or anything in between. Kat
7. RE: A sequence, by any other name
- Posted by matthewwalkerlewis at YAHOO.COM Feb 28, 2001
- 395 views
> -----Original Message----- > From: Jonas Temple [mailto:jktemple at yhti.net] > - Include the data type as part of the TABLEDEF information, as you > stated you are considering. I'll probably add a second field to TABLEDEF: DATATYPES or some such. I'm also automating the table/field creation process somewhat, so the user should never have to touch TABLEDEF directly. Once indices come along, there will be another system table: INDEXDEF, to keep track of indexed fields. This should speed up queries over large databases MUCH faster. > - Since ODBC doesn't handle sequences how about the EUSQL/EDS ODBC > driver expand the record down to the lowest sequence? Heck, you could > even write the ODBC driver for EDS and not have to continue to support > EUSQL. Well, the only problem with that is that ODBC drivers give you access to the DBMS. Unfortunately, there isn't really anything comparable to a DBMS for EDS (EuSQL notwithstanding :). I'll still need to have the code for EuSQL to do the manipulations. > For example, take the following sequence: >TABLEDEF: >{{{"First Name"},{"Last Name"}},{{"123"},{"456"},{"7890"}}} > Full Name Phone numner >a select * would return: >{"First Name", "Last Name", "123", "456", "7890"} I thought about doing this, but opted for the simplicity of returning "*", since there could be many nested sequences. I'd need to flatten out the record, which might be a good idea. > Anyway, to sum up what my previous post said, my thoughts are: > - When returning the field defenitions return the "lowest" field > definition. In other words, don't return the name of a sequence that > contains other sequences. > - When returning the data elements return the "lowest" field value. Yep, that's basically what I'm thinking. > Oh by the way, I looked at the documentation on MSDN and ran > screaming > when I saw the specifics about writing an ODBC driver. I > think it would > be great to have and EDS ODBC driver but the sequence thing will be a > hurdle. Like I rambled in my other post, you could stop supporting > EUSQL and go to strictly ODBC. However, some non-Windows folks might > still want EUSQL. Since the code would all be in Eu, it should be portable to linux as an .so, at least. I also just fixed a bug regarding the handling of conditions. It wasn't looking at the correct fields if all fields in a table weren't consecutive in the SQL statement (ie, if there was a field from another table mixed in the order). Matt Lewis