1. RE: A sequence, by any other name

> Offhand there are two ways i can think of to do this.
> In general, for any arbitrary sequence there is no way to
> distinguish a character from a number that is stored in it.
> For example:
> n={65}   --these are both exactly
> n={'A'}  --the same sequence.
> 
> There is a second method which involves adding
> the number 256 to any positive integers in order to
> distinguish between characters and integers but it involves
> alot more testing on the number before you know what it is.
> If your interested i'll post that method next time.
> One good point about it is that it doesnt require any more
> space to store numbers like the first method does, which
> could be a big benefit for large data.
> 
Al,

Thanks for the reply.  I would indeed be interested in seeing what 
you've done.  I'm also glad that doing this is not ane easy task...I 
thought I was missing something.  

I'll let the cat out of the bag here...what I'm TRYING to do is do add 
support of Matt's EUSQL library into my SQL utility that got posted to 
the contributions page.  The utility only supports ODBC but I would like 
to also include the EDS.  I use the EDS quite a bit and would like a 
tool to verify that my program is doing file i/o correctly.  I also have 
the EDS viewer by Tone but I like being able to see the entire record.

So that's why I need to be able to determine numeric vs character 
sequence.  I don't know what data types I would be dealing with until 
run time.

Maybe the solution here is to beg Matt to change the EUSQL to not only 
store the field name but the type as well.  That would make my life 
easier.  Just a thought.

Jonas

new topic     » topic index » view message » categorize

2. RE: A sequence, by any other name

> From: Jonas Temple [mailto:jktemple at yhti.net]

> Maybe the solution here is to beg Matt to change the EUSQL to 
> not only 
> store the field name but the type as well.  That would make my life 
> easier.  Just a thought.

That's one thing I've been considering doing, and it'll probably happen
[fairly] soon.  I've been putting in a more detailed API, that sort of gives
you a more flexible way to manipulate records (other than simply with SQL
statements), that's building up to being able to index and verify records.
In the next release, it will be recommended that programs using EuSQL no
longer make any calls to database.e.  You won't need to worry about record
numbers--just keys.

One thing I'll have to determine is what datatypes to support.  I suppose:

integer (standard Eu definition)
atom (standard Eu definition)
varchar (string)
sequence (standard Eu definition)
binary (single depth sequence of atoms)

I'm not sure what exactly to do about the sequence datatype, since there's
no real parallel in other DBMS's.  Probably, it would treat those as invalid
fields for non-Eu processes (I think someone wanted to access EDS through
Delphi), while an Eu-based process could handle them just fine.  There'd be
several ways to get around this, though, including creation of subfields or
compressing a sequence before saving it to the record, and decompressing in
whatever way made sense.

I'll probably handle this (API-wise) along the lines of ODBC--maybe even
write an ODBC driver for EDS!--where you can get descriptions of the various
fields, including names and datatypes.

Also, I've just about got the ODBC code set up to return less than full
recordsets.  It changes the way you interface with the lib, but not too
dramatically.

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

3. RE: A sequence, by any other name

matthewwalkerlewis at YAHOO.COM wrote:
> I'm not sure what exactly to do about the sequence datatype, since 
> there's
> no real parallel in other DBMS's.  Probably, it would treat those as 
> invalid
> fields for non-Eu processes (I think someone wanted to access EDS 
> through
> Delphi), while an Eu-based process could handle them just fine.  There'd 
> be
> several ways to get around this, though, including creation of subfields 
> or
> compressing a sequence before saving it to the record, and decompressing 
> in
> whatever way made sense.
> 
> I'll probably handle this (API-wise) along the lines of ODBC--maybe even
> write an ODBC driver for EDS!--where you can get descriptions of the 
> various
> fields, including names and datatypes.
> 
> Also, I've just about got the ODBC code set up to return less than full
> recordsets.  It changes the way you interface with the lib, but not too
> dramatically.

Geez, Matt, where do you find the time?  I have been working on some Eu 
stuff for months now and can barely find time.  

For the EUSQL and EDS ODBC, how about this:

- Include the data type as part of the TABLEDEF information, as you 
stated you are considering.
- Since ODBC doesn't handle sequences how about the EUSQL/EDS ODBC 
driver expand the record down to the lowest sequence?  Heck, you could 
even write the ODBC driver for EDS and not have to continue to support 
EUSQL.  For example, take the following sequence:
TABLEDEF:
{{"First Name"}, {}},

{{{"First Name"},{"Last Name"}},{{"123"},{"456"},{"7890"}}} 
   Full Name                       Phone numner
a select * would return:
{"First Name", "Last Name", "123", "456", "7890"}

new topic     » goto parent     » topic index » view message » categorize

4. RE: A sequence, by any other name

Darnit, I hit "Send Now" by accident!

Anyway, to sum up what my previous post said, my thoughts are:
- When returning the field defenitions return the "lowest" field 
definition.  In other words, don't return the name of a sequence that 
contains other sequences.
- When returning the data elements return the "lowest" field value.

Oh by the way, I looked at the documentation on MSDN and ran screaming 
when I saw the specifics about writing an ODBC driver.  I think it would 
be great to have and EDS ODBC driver but the sequence thing will be a 
hurdle.  Like I rambled in my other post, you could stop supporting 
EUSQL and go to strictly ODBC.  However, some non-Windows folks might 
still want EUSQL.

ONE more thing...I noticed when using 'select * from table' the field 
listing returned is '*'.  Would it be possible to return ALL field 
names?  Just a thought.

Jonas

new topic     » goto parent     » topic index » view message » categorize

5. RE: A sequence, by any other name

Hi again,

I think it would be preposterous for someone to write a library
that returns ambiguous data making it impossible for the user
to determine what type it is that is being returned.  Normally,
the type is determined from the context, such as 
    field1 is always a number
    field2 is always a string of characters
    etc.
or else the user has control over what is stored, and therefore 
determines his own context beforehand.

Im sure if you ask the writer they will provide you with that 
information.

The second method also assumes you are the one doing the storing of the 
data.  All you really do is add 256 to any positive integers just
before storing, but leave your characters alone.  When you read the
data back, you simply test the number to see if its an integer, if it
is, you test it again to see if its equal to or over 256.  If it is,
you know its an integer, not a character.  If its under 256, you know
its a character and not an integer.  In this way you only have to
store one number per integer (or character) so you dont use any more
storage space then you do when you normally store something.
To detect character strings fast, you simply follow one more simple
rule:

RULE #2:
you store character strings separately from integer sets like this:
to store the string "ABCDE"
  n="ABCDE"
but to store the set
  n={65,'B','C','D','E'}  --the number 65 followed by string "BCDE"
you actually store it as:
n={{65+256},"BCDE"}

That way you only have to test the first number of each sub sequence
to determine whether or not it is a character string or a set of
integers.  Note also that negative numbers go unchanged, as well as
floating point numbers. (see demo below)

Here are some functions to illustrate the idea, but you'll have to
expand on this idea to include character strings. (Shouldnt be too
hard).

Note that one of these functions is implemented using a sort of
pseudo polymorphism. You always pass a sequence, if the sequence is
two elements long, its taken to be a character, but if one element
long, a number. This is mainly because you dont have to convert
character strings, they will always be stored exactly as they
normally appear in a sequence. You do have to convert all numbers
though, because if its a positive integer it has to be augmented with
256 in order to detect that fact when reading back the data from the
data base or whatever.
If you also follow rule #2 then you only have to test the first
element as stated before.  If you dont follow rule #2 then you
really have to test every single element, which could get really
slow.

---------------------------------
with trace

trace(1)

sequence n,a
atom x
constant CHARACTER=0,NUMBER=1

function ConvertForStorage(sequence a)
  atom x

  x=a[1]
  if length(a)<2 then
    --of type NUMBER:
    if integer(x) then
      if x>=0 then
        x=x+256
        if integer(x) then
          return x
        else
          printf(1,"%s\n",{"Integer too large"})
          abort(1)--modify this to suite application
        end if
      else
        return x
      end if
    else
      return x
    end if
  else
    --of type CHARACTER:
    --(dont really have to call this for characters,
    -- they always go unchanged)
    return x
  end if
end function

function ConvertBackToOriginal(atom x)
  if integer(x) then
    if x<0 then
      --its a negative integer so just return it:
      return {NUMBER,x}
    elsif x>=256 then
      --its a positive integer so subtract 256 to get the
      --original value:
      return {NUMBER,(x-256)}
    else
      --its a character so dont subtract:
      return {CHARACTER,x}
    end if
  else
    --its not an integer so just return it
    return {NUMBER,x}
  end if 
end function

function Number(sequence a)
  --quick test to determine read back type
  if a[1]=NUMBER then 
    return 1
  else
    return 0
  end if
end function

--this is what the test sequence will look like:
--  n={-65,65,321,65.1}  -- store -65, 'A', +65, and +65.1
--  note: 321=65+256

n=repeat(0,4)

x=ConvertForStorage({-65})  --note: pass one element long for numbers
n[1]=x

x=ConvertForStorage({'A',CHARACTER})--note:
                                    --pass two elements for a char
n[2]=x

x=ConvertForStorage({65})
n[3]=x

x=ConvertForStorage({65.1})
n[4]=x

for k=1 to length(n) do
  x=n[k]
  a=ConvertBackToOriginal(x)
  x=a[2]
  if Number(a) then
    ?x --print the number
  else
    printf(1,"%s\n",{x})  --print the character
  end if
end for

---------------------------------

One last note:
the high end range of possible integers that can be stored is
effectively decreased by exactly 256.  This means the top end range
decreases from #3FFFFFFF to #3FFFFEFF (not much at all).
If you try to store a positive integer greater then #3FFFFEFF
you'll see an error print out on the screen just before abort.
You can modify that to whatever you wish, but you really have to
include that test in the code somewhere in order to insure you can
accurately detect the correct type when reading back the data,
because if the integer overflows into an atom it wont be detected as
an integer during read back and therefore wont get decreased by 256
back to the original number.  This of course compromises the
integrity of the stored data.

Of course as mentioned before these methods slow down the code to
some degree.  Usually you can keep track of what is where without
resorting to these types of methods, except maybe in a data base
program made to store arbitrary types of data.  In any case, you are
the only one that can decide what method is best for your application.

Good luck with it.
--Al

new topic     » goto parent     » topic index » view message » categorize

6. RE: A sequence, by any other name

On 28 Feb 2001, at 12:48, Al Getz wrote:

> Hi again,
> 
> I think it would be preposterous for someone to write a library
> that returns ambiguous data making it impossible for the user
> to determine what type it is that is being returned.  Normally,
> the type is determined from the context, such as 
>     field1 is always a number
>     field2 is always a string of characters
>     etc.
> or else the user has control over what is stored, and therefore 
> determines his own context beforehand.

Or make everything a sequence, with xml tags:

<s>hello?</s>
<n>-3</n>
<n1>12309.876</n1>
<n2>6</n2>
<s1>M1A1</s1>
<s34>string #34</s34>

That could be 5 stored sequences, or one nested sequence, or anything in
between.

Kat

new topic     » goto parent     » topic index » view message » categorize

7. RE: A sequence, by any other name

> -----Original Message-----
> From: Jonas Temple [mailto:jktemple at yhti.net]

> - Include the data type as part of the TABLEDEF information, as you 
> stated you are considering.

I'll probably add a second field to TABLEDEF: DATATYPES or some such.  I'm
also automating the table/field creation process somewhat, so the user
should never have to touch TABLEDEF directly.  Once indices come along,
there will be another system table: INDEXDEF, to keep track of indexed
fields.  This should speed up queries over large databases MUCH faster.

> - Since ODBC doesn't handle sequences how about the EUSQL/EDS ODBC 
> driver expand the record down to the lowest sequence?  Heck, you could 
> even write the ODBC driver for EDS and not have to continue to support 
> EUSQL.

Well, the only problem with that is that ODBC drivers give you access to the
DBMS.  Unfortunately, there isn't really anything comparable to a DBMS for
EDS (EuSQL notwithstanding :).  I'll still need to have the code for EuSQL
to do the manipulations.

>  For example, take the following sequence:
>TABLEDEF:

>{{{"First Name"},{"Last Name"}},{{"123"},{"456"},{"7890"}}} 
>   Full Name                       Phone numner
>a select * would return:
>{"First Name", "Last Name", "123", "456", "7890"}

I thought about doing this, but opted for the simplicity of returning "*",
since there could be many nested sequences.  I'd need to flatten out the
record, which might be a good idea.

> Anyway, to sum up what my previous post said, my thoughts are:
> - When returning the field defenitions return the "lowest" field 
> definition.  In other words, don't return the name of a sequence that 
> contains other sequences.
> - When returning the data elements return the "lowest" field value.

Yep, that's basically what I'm thinking.

> Oh by the way, I looked at the documentation on MSDN and ran 
> screaming 
> when I saw the specifics about writing an ODBC driver.  I 
> think it would 
> be great to have and EDS ODBC driver but the sequence thing will be a 
> hurdle.  Like I rambled in my other post, you could stop supporting 
> EUSQL and go to strictly ODBC.  However, some non-Windows folks might 
> still want EUSQL.

Since the code would all be in Eu, it should be portable to linux as an .so,
at least.  I also just fixed a bug regarding the handling of conditions.  It
wasn't looking at the correct fields if all fields in a table weren't
consecutive in the SQL statement (ie, if there was a field from another
table mixed in the order).

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu