1. Berkeley DB -- anyone care?

Hello,

Some of you may remember that a while back I was working on a wrapper 
for the Berkeley DB library (www.sleepycat.com).  I'm still working on 
it.

When I started working on it, I thought I would first make a version 
that would mirror the API & functionality of EDS (since Berkeley also 
stores things in key/data pairs -- it is a very similar system) and 
submit it to the archive as a super high-speed, more scalable version of 
EDS.  I did that part almost to completion, but while I was tweaking the 
error-reporting system and debugging, I also starting adding some of the 
extra features that Berkeley offers that I wanted to use as part of the 
personal project that motivated me to wrap it in the first place.  And 
then I actually starting building my own database, adding more features, 
debugging, rethinking the API (I made an API on top of the native 
Berkeley API, which is fairly low-level), etc.

Anyway, I've got quite an accomplished little system now, not quite 
ready for prime time, but I thought maybe it was about time I finished 
up the EDS clone version and at least released that.  That is, if anyone 
is interested.

The lite, EDS version would:

-- only work for Windows, because the Berkeley lib is a Windows .dll 
(although the source is cross-platform -- if someone wants to try a 
Linux version, it may be possible)

-- share the exact same functionality and API of EDS with the following 
exceptions:  locking is currently ignored, and there is no 
"db_compress()" function.  I use "bdb_" instead of "db_" in front of all 
the functions.  The key & data items can be absolutely any string of 
bytes, as in EDS.

-- any existing .edb database could be converted to a .bdb database in 
just a few minutes, and any existing EDS-based program could be 
converted to a Berkeley DB-based program with a simple search and 
replace "db_" -> "bdb_" and a different include file, of course.  
(Unless you depend on locking on db_compress()ing -- in any case, quick 
conversion.) You also have add a line to manually set a memory cache 
size for your database environment -- this allows the wrapper to turn 
off system buffering and gives a major performance boost.

-- would go a hell of a lot faster than EDS, except possibly for very 
small db's.

-- scale much better than EDS, files hundreds of MB large will see a 
slight slow down in update/insert speed (but MUCH less than EDS), and 
hardly any degradation in retrieval speed.

-- have a couple of not strictly EDS-compatible options that would allow 
it to go even faster, namely turning off support for logical record 
numbers (just access everything by key) & turning off "Euphorian-style" 
keys -- if you always use 1-dimensional character strings as your keys 
then the library can use its default sorting function instead of a 
Euphorian compare callback function in the wrapper.

-- the library is open-source, so anything you make publicly available 
would be bound by the Berkeley license, which means your program source 
must be availble.  You can buy a commercial license for making 
commercial programs.


So, anyone interested?

new topic     » topic index » view message » categorize

2. Re: Berkeley DB -- anyone care?

Yes, I'd like to see it (mostly because I have a very small vested 
interest in it  blink )



Andy Serpa wrote:

>
>Hello,
>
>Some of you may remember that a while back I was working on a wrapper 
>for the Berkeley DB library (www.sleepycat.com).  I'm still working on 
>it.
>
>When I started working on it, I thought I would first make a version 
>that would mirror the API & functionality of EDS (since Berkeley also 
>stores things in key/data pairs -- it is a very similar system) and 
>submit it to the archive as a super high-speed, more scalable version of 
>EDS.  I did that part almost to completion, but while I was tweaking the 
>error-reporting system and debugging, I also starting adding some of the 
>extra features that Berkeley offers that I wanted to use as part of the 
>personal project that motivated me to wrap it in the first place.  And 
>then I actually starting building my own database, adding more features, 
>debugging, rethinking the API (I made an API on top of the native 
>Berkeley API, which is fairly low-level), etc.
>
>Anyway, I've got quite an accomplished little system now, not quite 
>ready for prime time, but I thought maybe it was about time I finished 
>up the EDS clone version and at least released that.  That is, if anyone 
>is interested.
>
>The lite, EDS version would:
>
>-- only work for Windows, because the Berkeley lib is a Windows .dll 
>(although the source is cross-platform -- if someone wants to try a 
>Linux version, it may be possible)
>
>-- share the exact same functionality and API of EDS with the following 
>exceptions:  locking is currently ignored, and there is no 
>"db_compress()" function.  I use "bdb_" instead of "db_" in front of all 
>the functions.  The key & data items can be absolutely any string of 
>bytes, as in EDS.
>
>-- any existing .edb database could be converted to a .bdb database in 
>just a few minutes, and any existing EDS-based program could be 
>converted to a Berkeley DB-based program with a simple search and 
>replace "db_" -> "bdb_" and a different include file, of course.  
>(Unless you depend on locking on db_compress()ing -- in any case, quick 
>conversion.) You also have add a line to manually set a memory cache 
>size for your database environment -- this allows the wrapper to turn 
>off system buffering and gives a major performance boost.
>
>-- would go a hell of a lot faster than EDS, except possibly for very 
>small db's.
>
>-- scale much better than EDS, files hundreds of MB large will see a 
>slight slow down in update/insert speed (but MUCH less than EDS), and 
>hardly any degradation in retrieval speed.
>
>-- have a couple of not strictly EDS-compatible options that would allow 
>it to go even faster, namely turning off support for logical record 
>numbers (just access everything by key) & turning off "Euphorian-style" 
>keys -- if you always use 1-dimensional character strings as your keys 
>then the library can use its default sorting function instead of a 
>Euphorian compare callback function in the wrapper.
>
>-- the library is open-source, so anything you make publicly available 
>would be bound by the Berkeley license, which means your program source 
>must be availble.  You can buy a commercial license for making 
>commercial programs.
>
>
>So, anyone interested?
>
>
>
>TOPICA - Start your own email discussion group. FREE!
>
>

new topic     » goto parent     » topic index » view message » categorize

3. Re: Berkeley DB -- anyone care?

On Sat, Jan 04, 2003 at 03:20:17PM +0000, Andy Serpa wrote:
> 
> Hello,
> 
> Some of you may remember that a while back I was working on a wrapper 
> for the Berkeley DB library (www.sleepycat.com).  I'm still working on 
> it.
> 
<snip>
> The lite, EDS version would:
> 
> -- only work for Windows, because the Berkeley lib is a Windows .dll 
> (although the source is cross-platform -- if someone wants to try a 
> Linux version, it may be possible)
> 
<snip>
> 
> So, anyone interested?
> 

Maybe. I could port it to Linux for you, if you want. (Most Linux distros,
and free *nixes in general, iirc, come with the Berkeley DB library already
installed, btw. At least mine did ...)

jbrown

> 
> 
> TOPICA - Start your own email discussion group. FREE!

-- 
 /"\  ASCII ribbon
 \ /  campain against
  X   HTML e-mail and
 / \  news

new topic     » goto parent     » topic index » view message » categorize

4. Re: Berkeley DB -- anyone care?

On 4 Jan 2003, at 15:20, Andy Serpa wrote:

> 
> Hello,
> 
> Some of you may remember that a while back I was working on a wrapper 
> for the Berkeley DB library (www.sleepycat.com).  I'm still working on 
> it.

I remember!

> When I started working on it, I thought I would first make a version 
> that would mirror the API & functionality of EDS (since Berkeley also 
> stores things in key/data pairs -- it is a very similar system) and 
> submit it to the archive as a super high-speed, more scalable version of 
> EDS.  I did that part almost to completion, but while I was tweaking the 
> error-reporting system and debugging, I also starting adding some of the 
> extra features that Berkeley offers that I wanted to use as part of the 
> personal project that motivated me to wrap it in the first place.  And 
> then I actually starting building my own database, adding more features, 
> debugging, rethinking the API (I made an API on top of the native 
> Berkeley API, which is fairly low-level), etc.
> 
> Anyway, I've got quite an accomplished little system now, not quite 
> ready for prime time, but I thought maybe it was about time I finished 
> up the EDS clone version and at least released that.  That is, if anyone 
> is interested.

Color me "interested".
 
> The lite, EDS version would:
> 
> -- only work for Windows, because the Berkeley lib is a Windows .dll 
> (although the source is cross-platform -- if someone wants to try a 
> Linux version, it may be possible)
> 
> -- share the exact same functionality and API of EDS with the following 
> exceptions:  locking is currently ignored, and there is no 
> "db_compress()" function.  

Does that mean the Berkley db will store strings in Eu form, with 32bits per 
ascii character?

> I use "bdb_" instead of "db_" in front of all 
> the functions.  The key & data items can be absolutely any string of 
> bytes, as in EDS.
> 
> -- any existing .edb database could be converted to a .bdb database in 
> just a few minutes, and any existing EDS-based program could be 
> converted to a Berkeley DB-based program with a simple search and 
> replace "db_" -> "bdb_" and a different include file, of course.  
> (Unless you depend on locking on db_compress()ing -- in any case, quick 
> conversion.) You also have add a line to manually set a memory cache 
> size for your database environment -- this allows the wrapper to turn 
> off system buffering and gives a major performance boost.
> 
> -- would go a hell of a lot faster than EDS, except possibly for very 
> small db's.
> 
> -- scale much better than EDS, files hundreds of MB large will see a 
> slight slow down in update/insert speed (but MUCH less than EDS), and 
> hardly any degradation in retrieval speed.
> 
> -- have a couple of not strictly EDS-compatible options that would allow 
> it to go even faster, namely turning off support for logical record 
> numbers (just access everything by key) & turning off "Euphorian-style" 
> keys -- if you always use 1-dimensional character strings as your keys 
> then the library can use its default sorting function instead of a 
> Euphorian compare callback function in the wrapper.

Can it still store nested sequences as data?
 
> -- the library is open-source, so anything you make publicly available 
> would be bound by the Berkeley license, which means your program source 
> must be availble.  You can buy a commercial license for making 
> commercial programs.
> 
> 
> So, anyone interested?

Me.

Kat

new topic     » goto parent     » topic index » view message » categorize

5. Re: Berkeley DB -- anyone care?

On Sat, Jan 04, 2003 at 07:41:43PM +0000, Andy Serpa wrote:
> 
> 
> > Maybe. I could port it to Linux for you, if you want. (Most Linux 
> > distros,
> > and free *nixes in general, iirc, come with the Berkeley DB library 
> > already
> > installed, btw. At least mine did ...)
> > 
> 
> It is not necessarily a trivial conversion -- all of the functions are 
> "virtual" (and cdecl) and called by pointer via fptr.e.  I don't know 
> how easily it could be done...
> 

Hmm ... I wasn't aware that the Berkely DB under Linux was cdecl ... but still,
iirc fptr.e works under Linux ... I'd like to take a look at your code and try,
at least.

jbrown

> 
> 
> TOPICA - Start your own email discussion group. FREE!

-- 
 /"\  ASCII ribbon
 \ /  campain against
  X   HTML e-mail and
 / \  news

new topic     » goto parent     » topic index » view message » categorize

6. Re: Berkeley DB -- anyone care?

Andy Serpa writes:
> Well, I don't know really -- wouldn't cdecl be standard under Linux 
> since stdcall was Microsoft's invention?  I have no idea really. 

I believe the terms "cdecl" and "stdcall" are
purely Windows terminology. On Linux there
seems to be just one calling convention,
probably different from either cdecl or stdcall.
I don't know if it even has a name.
It's insane that Windows has two conventions.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

7. Re: Berkeley DB -- anyone care?

> Yes.  I actually use the modified routines from EDS for converting a
> Euphoria object to or from a string of bytes -- I posted them to this
> list a while back under a thread called "Useful code stolen from Rob" --
> I converted the routines to operate on sequences & objects in memory
> instead of reading/writing to disk as EDS does.  They are useful for all
> sorts of things actually (like my in-memory db project).

Could u please benchmark them with my binary.e and see the outcome. One
known problem of binary.e is that it handles doubles as float32 so their is
very little inaccuaracy. Also binary.e produces faster and more compressed
objects than EDS does as far as i can remember.

Jordah

new topic     » goto parent     » topic index » view message » categorize

8. Re: Berkeley DB -- anyone care?

On Sun, Jan 05, 2003 at 01:02:33AM -0500, Robert Craig wrote:
> 
> Andy Serpa writes:
> > Well, I don't know really -- wouldn't cdecl be standard under Linux 
> > since stdcall was Microsoft's invention?  I have no idea really. 
> 
> I believe the terms "cdecl" and "stdcall" are
> purely Windows terminology. On Linux there
> seems to be just one calling convention,
> probably different from either cdecl or stdcall.
> I don't know if it even has a name.
> It's insane that Windows has two conventions.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 

Turns out its a bit more complicated than that, thanks to multiple executable
formats under Linux, but since the interprerter is an ELF binary, there is only
the ELF dynamic library format to have to worry about - i.e., wrapping the db
in Linux should not have any major difficulties.

> 
> 
> TOPICA - Start your own email discussion group. FREE!

-- 
 /"\  ASCII ribbon
 \ /  campain against
  X   HTML e-mail and
 / \  news

new topic     » goto parent     » topic index » view message » categorize

9. Re: Berkeley DB -- anyone care?

On Sun, Jan 05, 2003 at 04:02:24PM +0000, Andy Serpa wrote:
> 
> 
> Robert Craig wrote:
> > Andy Serpa writes:
> > > Well, I don't know really -- wouldn't cdecl be standard under Linux 
> > > since stdcall was Microsoft's invention?  I have no idea really. 
> > 
> > I believe the terms "cdecl" and "stdcall" are
> > purely Windows terminology. On Linux there
> > seems to be just one calling convention,
> > probably different from either cdecl or stdcall.
> > I don't know if it even has a name.
> > It's insane that Windows has two conventions.
> > 
> 
> Excellent.  Linux wins again.

Too bad we cant win 30% of the Desktop market so we could convince Hardware
makers to provide us with Linux drivers. :[

> 
> 
> 
> TOPICA - Start your own email discussion group. FREE!

-- 
 /"\  ASCII ribbon
 \ /  campain against
  X   HTML e-mail and
 / \  news

new topic     » goto parent     » topic index » view message » categorize

10. Re: Berkeley DB -- anyone care?

The code in binary.e might seem slower because of the overhead involved in
call_proc(). The original routine was clearly tested on #euphoria and
clearly faster producing more compressed objects.

     My hard-disk got smoked so i lost all i had been building hopefully
mario Steele still has the original routines that were benchmarked

Jordah
----- Original Message -----
From: "Andy Serpa" <ac at onehorseshy.com>
To: "EUforum" <EUforum at topica.com>
Subject: RE: Berkeley DB -- anyone care?


>
>
> jordah at btopenworld.com wrote:
> >
> >
> > > Yes.  I actually use the modified routines from EDS for converting a
> > > Euphoria object to or from a string of bytes -- I posted them to this
> > > list a while back under a thread called "Useful code stolen from
Rob" --
> > > I converted the routines to operate on sequences & objects in memory
> > > instead of reading/writing to disk as EDS does.  They are useful for
all
> > > sorts of things actually (like my in-memory db project).
> >
> > Could u please benchmark them with my binary.e and see the outcome. One
> > known problem of binary.e is that it handles doubles as float32 so their
> > is
> > very little inaccuaracy. Also binary.e produces faster and more
> > compressed
> > objects than EDS does as far as i can remember.
> >
>
> I guess you missed that post -- I actually suggested to you in there
> that these routines might go faster than the ones you used in your
> peek/poke objects to memory library.  I benchmarked the compress (which
> I renamed object_to_bytes) and found it slightly faster than yours on my
> limited.  But I didn't test the decompress/bytes_to_object because mine
> works on a sequence of bytes and yours peeks from memory, so it wasn't
> quite a fair test and I was to lazy to bother, so yours may be faster on
> the other end, I don't know.  The include file I'm using is below
> (remember, this is basically Rob's code -- from the "newer, faster
> version" of EDS):
>
>
> obj_to_bytes.e
> -----------------------------------------
>
> include machine.e
>
> constant
> I2B = 249,   -- 2-byte signed integer follows
> I3B = 250,   -- 3-byte signed integer follows
> I4B = 251,   -- 4-byte signed integer follows
> F4B = 252,   -- 4-byte f.p. number follows
> F8B = 253,   -- 8-byte f.p. number follows
> S1B = 254,   -- sequence
> S4B = 255
>
> constant
> MIN1B = -9,
> MAX1B = 239,
> MIN2B = -power(2, 15),
> MAX2B =  power(2, 15)-1,
> MIN3B = -power(2, 23),
> MAX3B =  power(2, 23)-1,
> MIN4B = -power(2, 31)
>
> integer spt
> sequence cseq
>
>
> -- turbo int_to_bytes() & bytes_to_int()
> constant
> ib_addr = allocate(4),
> ib_peek = {ib_addr,4}
>
> global function int2bytes(atom i)
>     poke4(ib_addr, i)
>     return peek(ib_peek)
> end function
>
> global function bytes2int(sequence b)
>     poke(ib_addr, b)
>     return peek4u(ib_addr)
> end function
>
>
> -- local recursive function does the work
> function decomp()
>     integer c
>     sequence s
>     atom len
>     object result
>
>     c = cseq[spt]
>     spt += 1
>
>     if c <= 248 then
>         -- return small int
>         return c + MIN1B
>     end if
>
>     if c = I2B then
>         result = cseq[spt] + (#100 * cseq[spt+1]) + MIN2B
>         spt += 2
>         return result
>     elsif c = I3B then
>         result = cseq[spt] + (#100 * cseq[spt+1]) + (#10000 *
> cseq[spt+2]) + MIN3B
>         spt += 3
>         return result
>     elsif c = I4B then
>         result = bytes2int(cseq[spt..spt+3]) + MIN4B
>         spt += 4
>         return result
>     elsif c = F4B then
>         result = float32_to_atom(cseq[spt..spt+3])
>         spt += 4
>         return result
>     elsif c = F8B then
>         result = float64_to_atom(cseq[spt..spt+7])
>         spt += 8
<snip>

>
>

new topic     » goto parent     » topic index » view message » categorize

11. Re: Berkeley DB -- anyone care?

Hi Andy,

    It produces the common floating pointing inaccuracy. which is very very
small. i use atom_to_float32()/float32_to_atom(). other than
atom_to_float64. Thus for every floating point number it uses 4 bytes other
than 8 thus producing more compressed objects

Please keep on giving us updates of this DB and how its going. I know i'm
going to be needing it.

Jordah
----- Original Message -----
From: "Andy Serpa" <ac at onehorseshy.com>
To: "EUforum" <EUforum at topica.com>
Sent: Monday, January 06, 2003 7:08 PM
Subject: RE: Berkeley DB -- anyone care?


>
>
> jordah at btopenworld.com wrote:
> > The code in binary.e might seem slower because of the overhead involved
> > in
> > call_proc(). The original routine was clearly tested on #euphoria and
> > clearly faster producing more compressed objects.
> >
>
> In my test, the code I'm using was faster but did produce slightly
> bigger objects -- I could try it again.  In practice it would make no
> difference as they were very close.  The compress/decompress stuff is no
> bottleneck except when used for key comparison when it is called a huge
> number of times, and even then the difference between one routine & the
> other wouldn't make any difference (2 seconds over 100,000 calls or
> something) -- the time is lost in just calling the routine period,
> peeking the object, etc.
>
> If your code doesn't preserve accuracy the point is moot, because that
> would make it unsuitable for a database anyway...
>
> -- Andy
>
>
>
> TOPICA - Start your own email discussion group. FREE!
>

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu