RE: A sequence, by any other name

new topic     » goto parent     » topic index » view thread      » older message » newer message

Hi again,

I think it would be preposterous for someone to write a library
that returns ambiguous data making it impossible for the user
to determine what type it is that is being returned.  Normally,
the type is determined from the context, such as 
    field1 is always a number
    field2 is always a string of characters
    etc.
or else the user has control over what is stored, and therefore 
determines his own context beforehand.

Im sure if you ask the writer they will provide you with that 
information.

The second method also assumes you are the one doing the storing of the 
data.  All you really do is add 256 to any positive integers just
before storing, but leave your characters alone.  When you read the
data back, you simply test the number to see if its an integer, if it
is, you test it again to see if its equal to or over 256.  If it is,
you know its an integer, not a character.  If its under 256, you know
its a character and not an integer.  In this way you only have to
store one number per integer (or character) so you dont use any more
storage space then you do when you normally store something.
To detect character strings fast, you simply follow one more simple
rule:

RULE #2:
you store character strings separately from integer sets like this:
to store the string "ABCDE"
  n="ABCDE"
but to store the set
  n={65,'B','C','D','E'}  --the number 65 followed by string "BCDE"
you actually store it as:
n={{65+256},"BCDE"}

That way you only have to test the first number of each sub sequence
to determine whether or not it is a character string or a set of
integers.  Note also that negative numbers go unchanged, as well as
floating point numbers. (see demo below)

Here are some functions to illustrate the idea, but you'll have to
expand on this idea to include character strings. (Shouldnt be too
hard).

Note that one of these functions is implemented using a sort of
pseudo polymorphism. You always pass a sequence, if the sequence is
two elements long, its taken to be a character, but if one element
long, a number. This is mainly because you dont have to convert
character strings, they will always be stored exactly as they
normally appear in a sequence. You do have to convert all numbers
though, because if its a positive integer it has to be augmented with
256 in order to detect that fact when reading back the data from the
data base or whatever.
If you also follow rule #2 then you only have to test the first
element as stated before.  If you dont follow rule #2 then you
really have to test every single element, which could get really
slow.

---------------------------------
with trace

trace(1)

sequence n,a
atom x
constant CHARACTER=0,NUMBER=1

function ConvertForStorage(sequence a)
  atom x

  x=a[1]
  if length(a)<2 then
    --of type NUMBER:
    if integer(x) then
      if x>=0 then
        x=x+256
        if integer(x) then
          return x
        else
          printf(1,"%s\n",{"Integer too large"})
          abort(1)--modify this to suite application
        end if
      else
        return x
      end if
    else
      return x
    end if
  else
    --of type CHARACTER:
    --(dont really have to call this for characters,
    -- they always go unchanged)
    return x
  end if
end function

function ConvertBackToOriginal(atom x)
  if integer(x) then
    if x<0 then
      --its a negative integer so just return it:
      return {NUMBER,x}
    elsif x>=256 then
      --its a positive integer so subtract 256 to get the
      --original value:
      return {NUMBER,(x-256)}
    else
      --its a character so dont subtract:
      return {CHARACTER,x}
    end if
  else
    --its not an integer so just return it
    return {NUMBER,x}
  end if 
end function

function Number(sequence a)
  --quick test to determine read back type
  if a[1]=NUMBER then 
    return 1
  else
    return 0
  end if
end function

--this is what the test sequence will look like:
--  n={-65,65,321,65.1}  -- store -65, 'A', +65, and +65.1
--  note: 321=65+256

n=repeat(0,4)

x=ConvertForStorage({-65})  --note: pass one element long for numbers
n[1]=x

x=ConvertForStorage({'A',CHARACTER})--note:
                                    --pass two elements for a char
n[2]=x

x=ConvertForStorage({65})
n[3]=x

x=ConvertForStorage({65.1})
n[4]=x

for k=1 to length(n) do
  x=n[k]
  a=ConvertBackToOriginal(x)
  x=a[2]
  if Number(a) then
    ?x --print the number
  else
    printf(1,"%s\n",{x})  --print the character
  end if
end for

---------------------------------

One last note:
the high end range of possible integers that can be stored is
effectively decreased by exactly 256.  This means the top end range
decreases from #3FFFFFFF to #3FFFFEFF (not much at all).
If you try to store a positive integer greater then #3FFFFEFF
you'll see an error print out on the screen just before abort.
You can modify that to whatever you wish, but you really have to
include that test in the code somewhere in order to insure you can
accurately detect the correct type when reading back the data,
because if the integer overflows into an atom it wont be detected as
an integer during read back and therefore wont get decreased by 256
back to the original number.  This of course compromises the
integrity of the stored data.

Of course as mentioned before these methods slow down the code to
some degree.  Usually you can keep track of what is where without
resorting to these types of methods, except maybe in a data base
program made to store arbitrary types of data.  In any case, you are
the only one that can decide what method is best for your application.

Good luck with it.
--Al

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu