1. RE: Berkeley DB with Euphoria

Hi Andy,

I looked at wrapping Berkley DB awhile ago and come up against the
same obstacles as you.

An alternative might be SQLite (strangely enough I happen to have a 
wrapper for it already!!!! - see my web site below for euSQLite).
SQLite implements a large subset of SQL and is a very fast embeddable
SQL engine. Users of the SQLite mailing list claim databases many GB’s 
In size without problems. 
The other benefit of SQLIte over EDS and Berkley DB is the fact you get
a SQL engine built in!

So how big is to big for EDS?

Ray Smith
http://rays-web.com

Andy Serpa wrote:
> Hello,
> 
> I would really like to be able to make a Euphoria wrapper for the 
> Berkeley DB, which is open-source and found at:
> 
> http://www.sleepycat.com/
> 
> It looks like it would be sort of a high-end version of the EDS system 
> in that the data is stored in key/value pairs and the key & value can be 
> 
> anything.  A Euphoria wrapper could probably easily be made to use 
> similar functions as the EDS system, making it an easy transition.  (My 
> project is a little too large for EDS, but I would like the same 
> flexibility.)
> 
> Problem is the source (for Win32 anyway -- it is cross-platform) is 
> set-up for Visual C++ 6.0, which I do not have, and I certainly don't 
> have the knowledge/ability to attempt to compile it with something else 
> like Borland.  I was able to find some pre-compiled binaries for some 
> older versions on the net, but the latest version (4.1.x) has changed 
> the file format somewhat so I'd like to use that.
> 
> By any chance have any of you used/compiled this?
> 
> Barring that unlikely event, are there any *extremely* generous souls 
> out there with VC++ 6.0 that would like to build it?  (In which case 
> I'll make a wrapper, and then I'll upload that & the binaries to RDS so 
> anyone can use.)
> 
> -- Andy

new topic     » topic index » view message » categorize

2. RE: Berkeley DB with Euphoria

Ray Smith wrote:
> Hi Andy,
> 
> I looked at wrapping Berkley DB awhile ago and come up against the
> same obstacles as you.
> 
> An alternative might be SQLite (strangely enough I happen to have a 
> wrapper for it already!!!! - see my web site below for euSQLite).
> SQLite implements a large subset of SQL and is a very fast embeddable
> SQL engine. Users of the SQLite mailing list claim databases many GB’s 
> In size without problems. 

I tried SQLite, and actually I converted your wrapper to use cdecl calls 
with fptr.e so I could translate it and have it still work.  Interesting 
that people have said they use it with many GB's (!) because that is why 
I gave up on it.  (My db will be around 1.2 GB and grow about 100mb a 
month).  I kept having problems creating a large database (and things 
got corrupted once in a while).  I even posting my problems to the 
mailing list (this is a few months back) but never got a response.  
Anyway, I liked SQLite and will probably use it for other things, but he 
is making new updates constantly (which is good), but I just didn't feel 
it was "rock-solid" and that's what I need.

> The other benefit of SQLIte over EDS and Berkley DB is the fact you get
> a SQL engine built in!
> 

I've been using MySQL for this project since then, but I've soured on 
the relational/SQL model (for this project).  It just isn't the most 
appropriate (SQLite was actually preferable with its typeless data) to 
what I'm doing.

Whereas Berkeley sounds perfect, and has got a rep of being very stable.

So I think having both Berkeley DB & SQLite easily available as 
third-party database solutions would be a "good thing."  As far as the 
obstables, there is only one -- I ain't got VC++ 6.0 and I ain't gonna 
buy it.  (I actually found binaries for version 4.0 something, but 4.1 
has a format change so I wanted to go with that.  Plus that's what I 
have the docs for.) There is no licensing problems or anything since it 
is open-source.  The binaries, once created, can be freely distributed.  
Just getting someone somewhere to build them shouldn't be impossible... 
(anybody?)


> So how big is to big for EDS?
> 

Well, 1.2 GB is too big.  It is not the size so much as the speed as EDS 
gets slower and slower as it grows...

new topic     » goto parent     » topic index » view message » categorize

3. RE: Berkeley DB with Euphoria

Robert Craig wrote:
> Andy Serpa writes:
> > Well, 1.2 GB is too big.  It is not the size so much 
> > as the speed as EDS gets slower and slower as it grows...
> 
> I recently speeded up my copy of EDS.
> It reads records twice as fast as before, and it
> inserts and deletes into huge (100,000 record)
> databases 3 times as fast. Even the released
> EDS can insert/delete many times per second
> on huge databases on a slow machine. 
> If you acquire new records via a human entering 
> data into a GUI, you'll never notice the time.
> 

I'd like to get your copy.  Possible?

I speeded up my own copy a little bit (like caching key pointers so when 
it switches tables back & forth it doesn't have to read them in every 
time) but yours sounds like a major improvement.


> But if you are starting at 1.2GB and growing 0.1 GB/month
> I'd be a bit worried about the 4GB limit. Of course
> you could consider creating several separate databases
> of 4Gb or less each.
> 

Well, it holds so many years of data and grows every month, but at the 
end of the year you can dump a years worth of data off the beginning, so 
it doesn't grow forever.

But my intention was certainly not to trash anything.  I *LIKE* EDS, I 
*LIKE* SQLite, but I'd still like to give Berkeley DB a try!  Even an 
improved EDS certainly couldn't compete with optimized C in terms of 
speed (the web site says something about 1000's of inserts a second) and 
Berkeley also has things like transaction support, etc.

Really would love your version of EDS, though, Rob.

-- Andy

new topic     » goto parent     » topic index » view message » categorize

4. RE: Berkeley DB with Euphoria

I found a volunteer to attempt compilation of the sources.  If all goes 
well I'll make a wrapper in the next week or so and submit it to RDS...



Andy Serpa wrote:
> 
> Robert Craig wrote:
> > Andy Serpa writes:
> > > Well, 1.2 GB is too big.  It is not the size so much 
> > > as the speed as EDS gets slower and slower as it grows...
> > 
> > I recently speeded up my copy of EDS.
> > It reads records twice as fast as before, and it
> > inserts and deletes into huge (100,000 record)
> > databases 3 times as fast. Even the released
> > EDS can insert/delete many times per second
> > on huge databases on a slow machine. 
> > If you acquire new records via a human entering 
> > data into a GUI, you'll never notice the time.
> > 
> 
> I'd like to get your copy.  Possible?
> 
> I speeded up my own copy a little bit (like caching key pointers so when 
> 
> it switches tables back & forth it doesn't have to read them in every 
> time) but yours sounds like a major improvement.
> 
> 
> > But if you are starting at 1.2GB and growing 0.1 GB/month
> > I'd be a bit worried about the 4GB limit. Of course
> > you could consider creating several separate databases
> > of 4Gb or less each.
> > 
> 
> Well, it holds so many years of data and grows every month, but at the 
> end of the year you can dump a years worth of data off the beginning, so 
> 
> it doesn't grow forever.
> 
> But my intention was certainly not to trash anything.  I *LIKE* EDS, I 
> *LIKE* SQLite, but I'd still like to give Berkeley DB a try!  Even an 
> improved EDS certainly couldn't compete with optimized C in terms of 
> speed (the web site says something about 1000's of inserts a second) and 
> 
> Berkeley also has things like transaction support, etc.
> 
> Really would love your version of EDS, though, Rob.
> 
> -- Andy
> 
>

new topic     » goto parent     » topic index » view message » categorize

5. RE: Berkeley DB with Euphoria

> From: Robert Craig [mailto:rds at RapidEuphoria.com]

> I recently speeded up my copy of EDS.
> It reads records twice as fast as before, and it
> inserts and deletes into huge (100,000 record)
> databases 3 times as fast. Even the released
> EDS can insert/delete many times per second
> on huge databases on a slow machine. 
> If you acquire new records via a human entering 
> data into a GUI, you'll never notice the time.

Ooh, ooh!  When can we see it?!

Matt Lewis

new topic     » goto parent     » topic index » view message » categorize

6. RE: Berkeley DB with Euphoria

Robert Craig wrote:
> Matthew Lewis writes:
> > Ooh, ooh!  When can we see it?!
> 
> I've sent you a copy.
> I'll release it as part of version 2.4.
> I want to test it some more before
> I put a lot of people at risk of losing their data.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
> 

hi Rob
something i had seen in another program PocoMail. its scripting language 
had a feature that allowed up to four files open at the same time.  ie: 
user could have 2 inputs 1 output or other combo.
data could move between files without having overhead of 'selecting' 
before data transfer.

could something similar be a feature of EDS? having the different open 
databases /selected tables use their own 'buffers' without explict 
'selecting' them?
i am currently building 1 database and many tables but would like to 
break it up somewhat. and still move data between them easier.

thanks for the good work.
rudy

lotterywars

new topic     » goto parent     » topic index » view message » categorize

7. RE: Berkeley DB with Euphoria

I cached the key_pointers in my version in a very simple way.  Speeded 
it up enormously when flipping between tables.  When it selects a table, 
if there is already a "current table" it saves the pointers for that 
table in a global sequence, then it checks to see if it already has the 
key pointers for the new table in memory, and if so grabs them, or else 
loads them in as normal.

On a db_close or db_open it clears them out, so it only works if you're 
using one db at a time (with an exclusive lock).  I only had to add 
maybe 10 lines.

To keep multiple databases open without selecting between them all the 
time (a fixed, known number of them), I believe (I haven't tried it) the 
"namespace trick" would work, even if it is an ugly solution:

include database.e as dbA
include database.e as dbB

Now using namespaces with all the db functions you could keep them 
separate, right?



Robert Craig wrote:
> Rudy Toews writes:
> > ...could something similar be a feature of EDS? having the 
> > different open databases /selected tables use their own 
> > 'buffers' without explict 'selecting' them?
> > i am currently building 1 database and many tables 
> > but would like to break it up somewhat. and still move data 
> > between them easier.
> 
> At the very outset, in designing EDS, I had a choice
> of:
>     1. have the user specify the database & table 
>         on each operation, 
> or:
>     2. have a "context" where the database and
>         table were established beforehand, and all database
>        operations would have shorter parameter lists.
> 
> I went with #2. I don't regret it, but there are situations
> where it is inconvenient. 
> 
> When you select a table, EDS reads in 4 bytes per record
> in that table. This is inefficient when you have a lot of records
> (over 1000 say) and you are trying to rapidly flip back and forth 
> between tables. Obviously some buffering would help,
> i.e. keep the record pointers in memory. Maybe I'll do something
> about this. Note that it's dangerous to modify part of the data base
> structure in memory only, in case your program crashes 
> and the database on disk is left in an inconsistent (corrupted) state.
> Simply keeping a copy of the record pointers in memory 
> should be safe as long as other processes are locked out from
> making changes. There's also the issue of how much memory
> this might require. If you have lots of tables with lots of records
> you might run out of memory. 
> 
> If you look at db_compress() you'll see that it copies records,
> between tables, 20 at a time. That's another approach 
> to the problem.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    http://www.RapidEuphoria.com
> 
>

new topic     » goto parent     » topic index » view message » categorize

8. RE: Berkeley DB with Euphoria

Andy Serpa wrote:
> I cached the key_pointers in my version in a very simple way.  Speeded 
> it up enormously when flipping between tables.  When it selects a table, 
> 
> if there is already a "current table" it saves the pointers for that 
> table in a global sequence, then it checks to see if it already has the 
> key pointers for the new table in memory, and if so grabs them, or else 
> loads them in as normal.
> 
> On a db_close or db_open it clears them out, so it only works if you're 
> using one db at a time (with an exclusive lock).  I only had to add 
> maybe 10 lines.
> 
> To keep multiple databases open without selecting between them all the 
> time (a fixed, known number of them), I believe (I haven't tried it) the 
> 
> "namespace trick" would work, even if it is an ugly solution:
> 
> include database.e as dbA
> include database.e as dbB
> 
> Now using namespaces with all the db functions you could keep them 
> separate, right?
> 
> 
> Robert Craig wrote:
At the very outset, in designing EDS, I had a choice
> > of:
> >     1. have the user specify the database & table 
> >         on each operation, 
> > or:
> >     2. have a "context" where the database and
> >         table were established beforehand, and all database
> >        operations would have shorter parameter lists.
> > 
> > I went with #2. I don't regret it, but there are situations
> > where it is inconvenient. 
> > 
> > When you select a table, EDS reads in 4 bytes per record
> > in that table. This is inefficient when you have a lot of records
> > (over 1000 say) and you are trying to rapidly flip back and forth 
> > between tables. Obviously some buffering would help,
> > i.e. keep the record pointers in memory. Maybe I'll do something
> > about this. Note that it's dangerous to modify part of the data base
> > structure in memory only, in case your program crashes 
> > and the database on disk is left in an inconsistent (corrupted) state.
> > Simply keeping a copy of the record pointers in memory 
> > should be safe as long as other processes are locked out from
> > making changes. There's also the issue of how much memory
> > this might require. If you have lots of tables with lots of records
> > you might run out of memory. 
> > 
> > If you look at db_compress() you'll see that it copies records,
> > between tables, 20 at a time. That's another approach 
> > to the problem.

Thankyou Rob and Andy

hmm, i didn't really think of making a copy of the database functions 
and renaming them so they don't get mixed. i will give it a shot when i 
have progressed more on my windows programming level. almost understand 
how statements start to fit together to get what i want done. (danger 
level).

recently i have seen the colon used in functions that Phil Russel is 
working on in creating EuGrid. i forgot what the notation below means, i 
have seen it before but i don't remember (win32lib or somedatabase 
routines).

newkey = noah:saveanimal(dataset[i])
does it let noah refer to the current open window or .e file?
allowing multiple uses of the same function name?

thanks again
rudy

the use of the colon 


lotterywars

new topic     » goto parent     » topic index » view message » categorize

9. RE: Berkeley DB with Euphoria

> recently i have seen the colon used in functions that Phil Russel is 
> working on in creating EuGrid. i forgot what the notation below means, i 
> 
> have seen it before but i don't remember (win32lib or somedatabase 
> routines).
> 
> newkey = noah:saveanimal(dataset[i])
> does it let noah refer to the current open window or .e file?
> allowing multiple uses of the same function name?
> 

Yeah, exactly, so in my example of including database.e twice as "dbA" 
and "dbB" you could:

dbA:db_open("database A")
dbB:db_open("database B")

Now BOTH databases will be open and current so instead of using 
db_select() and switching back & forth you just use the appropriate 
"dbA:" or "dbB:" in front of the proper one.  Sort of a quick & dirty 
way to get separate "instances" in a OOP-like fashion...

(Unless there is some reason that won't work?  Like I said, I haven't 
actually tried it.)

new topic     » goto parent     » topic index » view message » categorize

10. RE: Berkeley DB with Euphoria

Andy Serpa wrote:

> Yeah, exactly, so in my example of including database.e twice as "dbA" 
> and "dbB" you could:
> 
> dbA:db_open("database A")
> dbB:db_open("database B")
> 
> Now BOTH databases will be open and current so instead of using 
> db_select() and switching back & forth you just use the appropriate 
> "dbA:" or "dbB:" in front of the proper one.  Sort of a quick & dirty 
> way to get separate "instances" in a OOP-like fashion...
> 
> (Unless there is some reason that won't work?  Like I said, I haven't 
> actually tried it.)

It won't work, because Rob's implementation of namespacing doesn't 
create multiple instances of an include, it just makes "aliases" for 
one single instance. I, for one, think this is unfortunate.

Therefore, no matter what you prefix the db_* commands with, they 
still use the single copy of database.e with its variables (including 
the currently selected database file).

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

11. RE: Berkeley DB with Euphoria

irv at take.maxleft.com wrote:
> 
> Andy Serpa wrote:
> 
> > Yeah, exactly, so in my example of including database.e twice as "dbA" 
> > and "dbB" you could:
> > 
> > dbA:db_open("database A")
> > dbB:db_open("database B")
> > 
> > Now BOTH databases will be open and current so instead of using 
> > db_select() and switching back & forth you just use the appropriate 
> > "dbA:" or "dbB:" in front of the proper one.  Sort of a quick & dirty 
> > way to get separate "instances" in a OOP-like fashion...
> > 
> > (Unless there is some reason that won't work?  Like I said, I haven't 
> > actually tried it.)
> 
> It won't work, because Rob's implementation of namespacing doesn't 
> create multiple instances of an include, it just makes "aliases" for 
> one single instance. I, for one, think this is unfortunate.
> 
> Therefore, no matter what you prefix the db_* commands with, they 
> still use the single copy of database.e with its variables (including 
> the currently selected database file).
> 

Ok, what if we just make two copies of database.e and then:

include databaseA.e as dbA
include databaseB.e as dbB

Now would they be separate?

new topic     » goto parent     » topic index » view message » categorize

12. RE: Berkeley DB with Euphoria

Andy Serpa wrote:

> Ok, what if we just make two copies of database.e and then:
> 
> include databaseA.e as dbA
> include databaseB.e as dbB
> 
> Now would they be separate?

It seems so. Here's a test:

include database.e as dbA
include database.f as dbB
object tablesA, tablesB

? dbA:db_create("Test.dba",0)
? dbB:db_create("Test.dbb",0)

? dbA:db_create_table("Table A1")
? dbB:db_create_table("Table B1")
? dbB:db_create_table("Table B2")

tablesA = dbA:db_table_list()
tablesB = dbB:db_table_list()

for i = 1 to length(tablesA) do
   printf(1,"Database A has %s\n",{tablesA[i]})
end for

for i = 1 to length(tablesB) do
   printf(1,"Database B has %s\n",{tablesB[i]})
end for

Results:

Database A has Table A1
Database B has Table B1
Database B has Table B2


Press Enter..

Can you depend on this to always work properly? I don't know, maybe Rob 
can answer.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu