OpenEuphoria: Forum: RE: Berkeley DB -- anyone care?

1. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 04, 2003
558 views

> Maybe. I could port it to Linux for you, if you want. (Most Linux 
> distros,
> and free *nixes in general, iirc, come with the Berkeley DB library 
> already
> installed, btw. At least mine did ...)
> 

It is not necessarily a trivial conversion -- all of the functions are 
"virtual" (and cdecl) and called by pointer via fptr.e.  I don't know 
how easily it could be done...

new topic » topic index » view message » categorize

2. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
514 views

> Does that mean the Berkley db will store strings in Eu form, with 32bits 
> per 
> ascii character?
> 
No. It stores everything as 8-bit bytes (including 0) -- it is "8-bit 
clean".  One character = one byte.  (EDS only uses 1 byte per character 
on disk -- it is sequences in memory that have extra overhead, right?) 
So anything other than that needs to be converted to a string of bytes 
for storage.

Berkeley also supports in-memory only databases, but I haven't tried it. 
 One thing you might do is set a very large cache and then most or all 
of the database can fit into memory, except in Berkeley form, not 
Euphoria's (still about 25% extra for cache overhead, I believe.).

(By the way, Kat, I'm working on an pure Euphoria in-memory db system 
that uses direct memory -- should be able to hold much more in memory 
than you can in a sequence, although it will have to do a lot of peeking 
and poking.)

> > 
> > -- scale much better than EDS, files hundreds of MB large will see a 
> > slight slow down in update/insert speed (but MUCH less than EDS), and 
> > hardly any degradation in retrieval speed.
> > 
> > -- have a couple of not strictly EDS-compatible options that would allow 
> > 
> > it to go even faster, namely turning off support for logical record 
> > numbers (just access everything by key) & turning off "Euphorian-style" 
> > keys -- if you always use 1-dimensional character strings as your keys 
> > then the library can use its default sorting function instead of a 
> > Euphorian compare callback function in the wrapper.
> 
> Can it still store nested sequences as data?
>  

Yes.  I actually use the modified routines from EDS for converting a 
Euphoria object to or from a string of bytes -- I posted them to this 
list a while back under a thread called "Useful code stolen from Rob" -- 
I converted the routines to operate on sequences & objects in memory 
instead of reading/writing to disk as EDS does.  They are useful for all 
sorts of things actually (like my in-memory db project).

The usual table structure -- Btree -- stores items sorted by key, so 
keys need to be compared.  Character strings are already strings of 
8-bit bytes, so no conversion is necessary and the built-in lexical sort 
of the library can be used.  If you use nested sequences or atoms as 
keys, then you must use a callback function that converts the stored 
string of bytes (for two keys to be compared) to the proper Euphoria 
objects & compares them.  (Otherwise the keys will all be in the wrong 
order in the db).  Since a number of key comparisons need to be done 
every time you lookup or insert an item, using a callback slows down 
these functions by a factor of 2 or 3.  Data items are only compared 
when you enable the function that allows duplicate items for a single 
key, but that functionality won't be in the EDS version anyway, and even 
in that case there are many less comparisons.  So there is no 
performance penalty for using Euphorian data items.

Like I said, it will emulate EDS *exactly* (except for locking) in terms 
of what it can do -- but using only character strings as keys will give 
you a little performance boost.  (In addition to the major boost you're 
already gonna get just using the thing.)

new topic » goto parent » topic index » view message » categorize

3. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
523 views

jbrown1050 at hotpop.com wrote:
> On Sat, Jan 04, 2003 at 07:41:43PM +0000, Andy Serpa wrote:
> > 
> > 
> > > Maybe. I could port it to Linux for you, if you want. (Most Linux 
> > > distros,
> > > and free *nixes in general, iirc, come with the Berkeley DB library 
> > > already
> > > installed, btw. At least mine did ...)
> > > 
> > 
> > It is not necessarily a trivial conversion -- all of the functions are 
> > "virtual" (and cdecl) and called by pointer via fptr.e.  I don't know 
> > how easily it could be done...
> > 
> 
> Hmm ... I wasn't aware that the Berkely DB under Linux was cdecl ... but 
> still,
> iirc fptr.e works under Linux ... I'd like to take a look at your code 
> and try,
> at least.
> 
Well, I don't know really -- wouldn't cdecl be standard under Linux 
since stdcall was Microsoft's invention?  I have no idea really.

new topic » goto parent » topic index » view message » categorize

4. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
521 views

jordah at btopenworld.com wrote:
> 
> 
> > Yes.  I actually use the modified routines from EDS for converting a
> > Euphoria object to or from a string of bytes -- I posted them to this
> > list a while back under a thread called "Useful code stolen from Rob" --
> > I converted the routines to operate on sequences & objects in memory
> > instead of reading/writing to disk as EDS does.  They are useful for all
> > sorts of things actually (like my in-memory db project).
> 
> Could u please benchmark them with my binary.e and see the outcome. One
> known problem of binary.e is that it handles doubles as float32 so their 
> is
> very little inaccuaracy. Also binary.e produces faster and more 
> compressed
> objects than EDS does as far as i can remember.
> 

I guess you missed that post -- I actually suggested to you in there 
that these routines might go faster than the ones you used in your 
peek/poke objects to memory library.  I benchmarked the compress (which 
I renamed object_to_bytes) and found it slightly faster than yours on my 
limited.  But I didn't test the decompress/bytes_to_object because mine 
works on a sequence of bytes and yours peeks from memory, so it wasn't 
quite a fair test and I was to lazy to bother, so yours may be faster on 
the other end, I don't know.  The include file I'm using is below 
(remember, this is basically Rob's code -- from the "newer, faster 
version" of EDS):


obj_to_bytes.e
-----------------------------------------

include machine.e

constant
I2B = 249,   -- 2-byte signed integer follows
I3B = 250,   -- 3-byte signed integer follows
I4B = 251,   -- 4-byte signed integer follows
F4B = 252,   -- 4-byte f.p. number follows
F8B = 253,   -- 8-byte f.p. number follows
S1B = 254,   -- sequence
S4B = 255

constant
MIN1B = -9,
MAX1B = 239,
MIN2B = -power(2, 15),
MAX2B =  power(2, 15)-1,
MIN3B = -power(2, 23),
MAX3B =  power(2, 23)-1,
MIN4B = -power(2, 31)

integer spt
sequence cseq


-- turbo int_to_bytes() & bytes_to_int()
constant
ib_addr = allocate(4),
ib_peek = {ib_addr,4}

global function int2bytes(atom i)
    poke4(ib_addr, i)
    return peek(ib_peek)
end function

global function bytes2int(sequence b)
    poke(ib_addr, b)
    return peek4u(ib_addr)
end function


-- local recursive function does the work
function decomp()
    integer c
    sequence s
    atom len
    object result

    c = cseq[spt]
    spt += 1

    if c <= 248 then
        -- return small int
        return c + MIN1B
    end if

    if c = I2B then
        result = cseq[spt] + (#100 * cseq[spt+1]) + MIN2B
        spt += 2
        return result
    elsif c = I3B then
        result = cseq[spt] + (#100 * cseq[spt+1]) + (#10000 * 
cseq[spt+2]) + MIN3B
        spt += 3
        return result
    elsif c = I4B then
        result = bytes2int(cseq[spt..spt+3]) + MIN4B
        spt += 4
        return result
    elsif c = F4B then
        result = float32_to_atom(cseq[spt..spt+3])
        spt += 4
        return result
    elsif c = F8B then
        result = float64_to_atom(cseq[spt..spt+7])
        spt += 8
        return result
    else
        -- sequence
        if c = S1B then
            len = cseq[spt]
            spt += 1
        else
            len = bytes2int(cseq[spt..spt+3])
            spt += 4
        end if
        s = repeat(0, len)
        for i = 1 to len do
            c = cseq[spt]
            if c <= 248 then
                spt += 1
                s[i] = c + MIN1B
            else
                -- do not advance pointer
                s[i] = decomp()
            end if
        end for
        return s
    end if
end function



-- Global functions

global function bytes_to_object(sequence s)
object result
    spt = 1
    cseq = s
    result = decomp()
    cseq = {}
    return result
end function


global function object_to_bytes(object x)
-- return the compressed representation of a Euphoria object
-- as a sequence of bytes
    sequence x4, s
    if integer(x) then
        if x >= MIN1B and x <= MAX1B then
            return {x - MIN1B}

        elsif x >= MIN2B and x <= MAX2B then
            x -= MIN2B
            return {I2B, and_bits(x, #FF), floor(x / #100)}

        elsif x >= MIN3B and x <= MAX3B then
            x -= MIN3B
            return {I3B, and_bits(x, #FF), and_bits(floor(x / #100), 
#FF), floor(x / #10000)}

        else
            return I4B & int2bytes(x-MIN4B)

        end if

    elsif atom(x) then
        -- floating point
        x4 = atom_to_float32(x)
        if x = float32_to_atom(x4) then
            -- can represent as 4-byte float
            return F4B & x4
        else
            return F8B & atom_to_float64(x)
        end if

    else
        -- sequence
        if length(x) <= 255 then
            s = {S1B, length(x)}
        else
            s = S4B & int2bytes(length(x))
        end if
        for i = 1 to length(x) do
            s &= object_to_bytes(x[i])
        end for
        return s
    end if
end function


--------------------------

new topic » goto parent » topic index » view message » categorize

5. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
538 views

Robert Craig wrote:
> Andy Serpa writes:
> > Well, I don't know really -- wouldn't cdecl be standard under Linux 
> > since stdcall was Microsoft's invention?  I have no idea really. 
> 
> I believe the terms "cdecl" and "stdcall" are
> purely Windows terminology. On Linux there
> seems to be just one calling convention,
> probably different from either cdecl or stdcall.
> I don't know if it even has a name.
> It's insane that Windows has two conventions.
> 

Excellent.  Linux wins again.

new topic » goto parent » topic index » view message » categorize

6. RE: Berkeley DB -- anyone care?

Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Jan 06, 2003
545 views

> From: Robert Craig [mailto:rds at RapidEuphoria.com]

> I believe the terms "cdecl" and "stdcall" are
> purely Windows terminology. On Linux there
> seems to be just one calling convention,
> probably different from either cdecl or stdcall.
> I don't know if it even has a name.
> It's insane that Windows has two conventions.

Don't forget 'thiscall'.  That's the default calling convention for C++
member functions without variable arguments.  I ran into this trying to wrap
wxWindows.  Mostly got it solved ('this' gets stored in ecx).

Matt Lewis

new topic » goto parent » topic index » view message » categorize

7. RE: Berkeley DB -- anyone care?

Posted by Andy Serpa <ac at onehorseshy.com> Jan 06, 2003
561 views

jordah at btopenworld.com wrote:
> The code in binary.e might seem slower because of the overhead involved 
> in
> call_proc(). The original routine was clearly tested on #euphoria and
> clearly faster producing more compressed objects.
> 

In my test, the code I'm using was faster but did produce slightly 
bigger objects -- I could try it again.  In practice it would make no 
difference as they were very close.  The compress/decompress stuff is no 
bottleneck except when used for key comparison when it is called a huge 
number of times, and even then the difference between one routine & the 
other wouldn't make any difference (2 seconds over 100,000 calls or 
something) -- the time is lost in just calling the routine period, 
peeking the object, etc.

If your code doesn't preserve accuracy the point is moot, because that 
would make it unsuitable for a database anyway...

-- Andy

OpenEuphoria

1. RE: Berkeley DB -- anyone care?

2. RE: Berkeley DB -- anyone care?

3. RE: Berkeley DB -- anyone care?

4. RE: Berkeley DB -- anyone care?

5. RE: Berkeley DB -- anyone care?

6. RE: Berkeley DB -- anyone care?

7. RE: Berkeley DB -- anyone care?

Search

Include:

Quick Links

User menu

Misc Menu