RE: Berkeley DB -- anyone care?

new topic     » goto parent     » topic index » view thread      » older message » newer message

> Does that mean the Berkley db will store strings in Eu form, with 32bits 
> per 
> ascii character?
> 
No. It stores everything as 8-bit bytes (including 0) -- it is "8-bit 
clean".  One character = one byte.  (EDS only uses 1 byte per character 
on disk -- it is sequences in memory that have extra overhead, right?) 
So anything other than that needs to be converted to a string of bytes 
for storage.

Berkeley also supports in-memory only databases, but I haven't tried it. 
 One thing you might do is set a very large cache and then most or all 
of the database can fit into memory, except in Berkeley form, not 
Euphoria's (still about 25% extra for cache overhead, I believe.).

(By the way, Kat, I'm working on an pure Euphoria in-memory db system 
that uses direct memory -- should be able to hold much more in memory 
than you can in a sequence, although it will have to do a lot of peeking 
and poking.)

> > 
> > -- scale much better than EDS, files hundreds of MB large will see a 
> > slight slow down in update/insert speed (but MUCH less than EDS), and 
> > hardly any degradation in retrieval speed.
> > 
> > -- have a couple of not strictly EDS-compatible options that would allow 
> > 
> > it to go even faster, namely turning off support for logical record 
> > numbers (just access everything by key) & turning off "Euphorian-style" 
> > keys -- if you always use 1-dimensional character strings as your keys 
> > then the library can use its default sorting function instead of a 
> > Euphorian compare callback function in the wrapper.
> 
> Can it still store nested sequences as data?
>  

Yes.  I actually use the modified routines from EDS for converting a 
Euphoria object to or from a string of bytes -- I posted them to this 
list a while back under a thread called "Useful code stolen from Rob" -- 
I converted the routines to operate on sequences & objects in memory 
instead of reading/writing to disk as EDS does.  They are useful for all 
sorts of things actually (like my in-memory db project).

The usual table structure -- Btree -- stores items sorted by key, so 
keys need to be compared.  Character strings are already strings of 
8-bit bytes, so no conversion is necessary and the built-in lexical sort 
of the library can be used.  If you use nested sequences or atoms as 
keys, then you must use a callback function that converts the stored 
string of bytes (for two keys to be compared) to the proper Euphoria 
objects & compares them.  (Otherwise the keys will all be in the wrong 
order in the db).  Since a number of key comparisons need to be done 
every time you lookup or insert an item, using a callback slows down 
these functions by a factor of 2 or 3.  Data items are only compared 
when you enable the function that allows duplicate items for a single 
key, but that functionality won't be in the EDS version anyway, and even 
in that case there are many less comparisons.  So there is no 
performance penalty for using Euphorian data items.

Like I said, it will emulate EDS *exactly* (except for locking) in terms 
of what it can do -- but using only character strings as keys will give 
you a little performance boost.  (In addition to the major boost you're 
already gonna get just using the thing.)

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu