1. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 04, 2003
- 558 views
> Maybe. I could port it to Linux for you, if you want. (Most Linux > distros, > and free *nixes in general, iirc, come with the Berkeley DB library > already > installed, btw. At least mine did ...) > It is not necessarily a trivial conversion -- all of the functions are "virtual" (and cdecl) and called by pointer via fptr.e. I don't know how easily it could be done...
2. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
- 514 views
> Does that mean the Berkley db will store strings in Eu form, with 32bits > per > ascii character? > No. It stores everything as 8-bit bytes (including 0) -- it is "8-bit clean". One character = one byte. (EDS only uses 1 byte per character on disk -- it is sequences in memory that have extra overhead, right?) So anything other than that needs to be converted to a string of bytes for storage. Berkeley also supports in-memory only databases, but I haven't tried it. One thing you might do is set a very large cache and then most or all of the database can fit into memory, except in Berkeley form, not Euphoria's (still about 25% extra for cache overhead, I believe.). (By the way, Kat, I'm working on an pure Euphoria in-memory db system that uses direct memory -- should be able to hold much more in memory than you can in a sequence, although it will have to do a lot of peeking and poking.) > > > > -- scale much better than EDS, files hundreds of MB large will see a > > slight slow down in update/insert speed (but MUCH less than EDS), and > > hardly any degradation in retrieval speed. > > > > -- have a couple of not strictly EDS-compatible options that would allow > > > > it to go even faster, namely turning off support for logical record > > numbers (just access everything by key) & turning off "Euphorian-style" > > keys -- if you always use 1-dimensional character strings as your keys > > then the library can use its default sorting function instead of a > > Euphorian compare callback function in the wrapper. > > Can it still store nested sequences as data? > Yes. I actually use the modified routines from EDS for converting a Euphoria object to or from a string of bytes -- I posted them to this list a while back under a thread called "Useful code stolen from Rob" -- I converted the routines to operate on sequences & objects in memory instead of reading/writing to disk as EDS does. They are useful for all sorts of things actually (like my in-memory db project). The usual table structure -- Btree -- stores items sorted by key, so keys need to be compared. Character strings are already strings of 8-bit bytes, so no conversion is necessary and the built-in lexical sort of the library can be used. If you use nested sequences or atoms as keys, then you must use a callback function that converts the stored string of bytes (for two keys to be compared) to the proper Euphoria objects & compares them. (Otherwise the keys will all be in the wrong order in the db). Since a number of key comparisons need to be done every time you lookup or insert an item, using a callback slows down these functions by a factor of 2 or 3. Data items are only compared when you enable the function that allows duplicate items for a single key, but that functionality won't be in the EDS version anyway, and even in that case there are many less comparisons. So there is no performance penalty for using Euphorian data items. Like I said, it will emulate EDS *exactly* (except for locking) in terms of what it can do -- but using only character strings as keys will give you a little performance boost. (In addition to the major boost you're already gonna get just using the thing.)
3. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
- 523 views
jbrown1050 at hotpop.com wrote: > On Sat, Jan 04, 2003 at 07:41:43PM +0000, Andy Serpa wrote: > > > > > > > Maybe. I could port it to Linux for you, if you want. (Most Linux > > > distros, > > > and free *nixes in general, iirc, come with the Berkeley DB library > > > already > > > installed, btw. At least mine did ...) > > > > > > > It is not necessarily a trivial conversion -- all of the functions are > > "virtual" (and cdecl) and called by pointer via fptr.e. I don't know > > how easily it could be done... > > > > Hmm ... I wasn't aware that the Berkely DB under Linux was cdecl ... but > still, > iirc fptr.e works under Linux ... I'd like to take a look at your code > and try, > at least. > Well, I don't know really -- wouldn't cdecl be standard under Linux since stdcall was Microsoft's invention? I have no idea really.
4. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
- 521 views
jordah at btopenworld.com wrote: > > > > Yes. I actually use the modified routines from EDS for converting a > > Euphoria object to or from a string of bytes -- I posted them to this > > list a while back under a thread called "Useful code stolen from Rob" -- > > I converted the routines to operate on sequences & objects in memory > > instead of reading/writing to disk as EDS does. They are useful for all > > sorts of things actually (like my in-memory db project). > > Could u please benchmark them with my binary.e and see the outcome. One > known problem of binary.e is that it handles doubles as float32 so their > is > very little inaccuaracy. Also binary.e produces faster and more > compressed > objects than EDS does as far as i can remember. > I guess you missed that post -- I actually suggested to you in there that these routines might go faster than the ones you used in your peek/poke objects to memory library. I benchmarked the compress (which I renamed object_to_bytes) and found it slightly faster than yours on my limited. But I didn't test the decompress/bytes_to_object because mine works on a sequence of bytes and yours peeks from memory, so it wasn't quite a fair test and I was to lazy to bother, so yours may be faster on the other end, I don't know. The include file I'm using is below (remember, this is basically Rob's code -- from the "newer, faster version" of EDS): obj_to_bytes.e ----------------------------------------- include machine.e constant I2B = 249, -- 2-byte signed integer follows I3B = 250, -- 3-byte signed integer follows I4B = 251, -- 4-byte signed integer follows F4B = 252, -- 4-byte f.p. number follows F8B = 253, -- 8-byte f.p. number follows S1B = 254, -- sequence S4B = 255 constant MIN1B = -9, MAX1B = 239, MIN2B = -power(2, 15), MAX2B = power(2, 15)-1, MIN3B = -power(2, 23), MAX3B = power(2, 23)-1, MIN4B = -power(2, 31) integer spt sequence cseq -- turbo int_to_bytes() & bytes_to_int() constant ib_addr = allocate(4), ib_peek = {ib_addr,4} global function int2bytes(atom i) poke4(ib_addr, i) return peek(ib_peek) end function global function bytes2int(sequence b) poke(ib_addr, b) return peek4u(ib_addr) end function -- local recursive function does the work function decomp() integer c sequence s atom len object result c = cseq[spt] spt += 1 if c <= 248 then -- return small int return c + MIN1B end if if c = I2B then result = cseq[spt] + (#100 * cseq[spt+1]) + MIN2B spt += 2 return result elsif c = I3B then result = cseq[spt] + (#100 * cseq[spt+1]) + (#10000 * cseq[spt+2]) + MIN3B spt += 3 return result elsif c = I4B then result = bytes2int(cseq[spt..spt+3]) + MIN4B spt += 4 return result elsif c = F4B then result = float32_to_atom(cseq[spt..spt+3]) spt += 4 return result elsif c = F8B then result = float64_to_atom(cseq[spt..spt+7]) spt += 8 return result else -- sequence if c = S1B then len = cseq[spt] spt += 1 else len = bytes2int(cseq[spt..spt+3]) spt += 4 end if s = repeat(0, len) for i = 1 to len do c = cseq[spt] if c <= 248 then spt += 1 s[i] = c + MIN1B else -- do not advance pointer s[i] = decomp() end if end for return s end if end function -- Global functions global function bytes_to_object(sequence s) object result spt = 1 cseq = s result = decomp() cseq = {} return result end function global function object_to_bytes(object x) -- return the compressed representation of a Euphoria object -- as a sequence of bytes sequence x4, s if integer(x) then if x >= MIN1B and x <= MAX1B then return {x - MIN1B} elsif x >= MIN2B and x <= MAX2B then x -= MIN2B return {I2B, and_bits(x, #FF), floor(x / #100)} elsif x >= MIN3B and x <= MAX3B then x -= MIN3B return {I3B, and_bits(x, #FF), and_bits(floor(x / #100), #FF), floor(x / #10000)} else return I4B & int2bytes(x-MIN4B) end if elsif atom(x) then -- floating point x4 = atom_to_float32(x) if x = float32_to_atom(x4) then -- can represent as 4-byte float return F4B & x4 else return F8B & atom_to_float64(x) end if else -- sequence if length(x) <= 255 then s = {S1B, length(x)} else s = S4B & int2bytes(length(x)) end if for i = 1 to length(x) do s &= object_to_bytes(x[i]) end for return s end if end function --------------------------
5. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 05, 2003
- 538 views
Robert Craig wrote: > Andy Serpa writes: > > Well, I don't know really -- wouldn't cdecl be standard under Linux > > since stdcall was Microsoft's invention? I have no idea really. > > I believe the terms "cdecl" and "stdcall" are > purely Windows terminology. On Linux there > seems to be just one calling convention, > probably different from either cdecl or stdcall. > I don't know if it even has a name. > It's insane that Windows has two conventions. > Excellent. Linux wins again.
6. RE: Berkeley DB -- anyone care?
- Posted by Matthew Lewis <matthewwalkerlewis at YAHOO.COM> Jan 06, 2003
- 545 views
> From: Robert Craig [mailto:rds at RapidEuphoria.com] > I believe the terms "cdecl" and "stdcall" are > purely Windows terminology. On Linux there > seems to be just one calling convention, > probably different from either cdecl or stdcall. > I don't know if it even has a name. > It's insane that Windows has two conventions. Don't forget 'thiscall'. That's the default calling convention for C++ member functions without variable arguments. I ran into this trying to wrap wxWindows. Mostly got it solved ('this' gets stored in ecx). Matt Lewis
7. RE: Berkeley DB -- anyone care?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 06, 2003
- 561 views
jordah at btopenworld.com wrote: > The code in binary.e might seem slower because of the overhead involved > in > call_proc(). The original routine was clearly tested on #euphoria and > clearly faster producing more compressed objects. > In my test, the code I'm using was faster but did produce slightly bigger objects -- I could try it again. In practice it would make no difference as they were very close. The compress/decompress stuff is no bottleneck except when used for key comparison when it is called a huge number of times, and even then the difference between one routine & the other wouldn't make any difference (2 seconds over 100,000 calls or something) -- the time is lost in just calling the routine period, peeking the object, etc. If your code doesn't preserve accuracy the point is moot, because that would make it unsuitable for a database anyway... -- Andy