RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
- 515 views
> Does that mean the Berkley db will store strings in Eu form, with 32bits > per > ascii character? > No. It stores everything as 8-bit bytes (including 0) -- it is "8-bit clean". One character = one byte. (EDS only uses 1 byte per character on disk -- it is sequences in memory that have extra overhead, right?) So anything other than that needs to be converted to a string of bytes for storage. Berkeley also supports in-memory only databases, but I haven't tried it. One thing you might do is set a very large cache and then most or all of the database can fit into memory, except in Berkeley form, not Euphoria's (still about 25% extra for cache overhead, I believe.). (By the way, Kat, I'm working on an pure Euphoria in-memory db system that uses direct memory -- should be able to hold much more in memory than you can in a sequence, although it will have to do a lot of peeking and poking.) > > > > -- scale much better than EDS, files hundreds of MB large will see a > > slight slow down in update/insert speed (but MUCH less than EDS), and > > hardly any degradation in retrieval speed. > > > > -- have a couple of not strictly EDS-compatible options that would allow > > > > it to go even faster, namely turning off support for logical record > > numbers (just access everything by key) & turning off "Euphorian-style" > > keys -- if you always use 1-dimensional character strings as your keys > > then the library can use its default sorting function instead of a > > Euphorian compare callback function in the wrapper. > > Can it still store nested sequences as data? > Yes. I actually use the modified routines from EDS for converting a Euphoria object to or from a string of bytes -- I posted them to this list a while back under a thread called "Useful code stolen from Rob" -- I converted the routines to operate on sequences & objects in memory instead of reading/writing to disk as EDS does. They are useful for all sorts of things actually (like my in-memory db project). The usual table structure -- Btree -- stores items sorted by key, so keys need to be compared. Character strings are already strings of 8-bit bytes, so no conversion is necessary and the built-in lexical sort of the library can be used. If you use nested sequences or atoms as keys, then you must use a callback function that converts the stored string of bytes (for two keys to be compared) to the proper Euphoria objects & compares them. (Otherwise the keys will all be in the wrong order in the db). Since a number of key comparisons need to be done every time you lookup or insert an item, using a callback slows down these functions by a factor of 2 or 3. Data items are only compared when you enable the function that allows duplicate items for a single key, but that functionality won't be in the EDS version anyway, and even in that case there are many less comparisons. So there is no performance penalty for using Euphorian data items. Like I said, it will emulate EDS *exactly* (except for locking) in terms of what it can do -- but using only character strings as keys will give you a little performance boost. (In addition to the major boost you're already gonna get just using the thing.)