1. RE: Future of EDS -- btree?

Kat wrote:
> 
> 
> On 23 Jan 2004, at 2:04, Andy Serpa wrote:
> 
> > 
> > Regarding this database talk, has anyone attempted to write a btree 
> > database implementation in pure Euphoria?  One thing that prevents use 
> > of EDS for really big databases (even if simple in structure -- one 
> > giant table, for instance) is that it doesn't scale well after a point 
> > (especially when adding new records, plus all of the keys are kept in 
> > memory, which can be a problem if there are a gazillion of them).  A 
> > btree structure would potentially remain fairly speedy with millions of 
> > records (the whole point of btree is too remain fast by minimizing disk 
> > access), and it is not terribly difficult to implement (I've played 
> > around with an in-memory version just to get an idea of how to code it 
> > in Euphoria, but I never got around to making a disk-based version).  Am 
> > 
> > I mistaken?
> 
> Jiri and others have written "associated lists" that do that, in various 
> ways. 
> Bach has "hash tables".
> 

Yeah, but that's all in-memory stuff.  (I use a custom version of the 
associated list routines in practically every program I make.  But I 
wouldn't store 2 million records in there.)

The btree algorithm is specifically designed for when what you want to 
store is much larger than available memory -- in other words, a 
database.  Btree is the base algorithm of practically every major 
database.  The speed of EDS is pretty fast for disk access; it just gets 
bogged down when it has too many records.  Not sure if that is because 
it has to move stuff around on disk or what. I don't think so because if 
you create a new database and add a million records immediately, it gets 
pretty slow after a few hundred thousand, but I think it just adds new 
blocks on the end of the file all the time.  The slowdown probably comes 
from the key comparison, and the fact that all the keys are in a giant 
sequence in memory, so it has to make a new copy of that everytime a 
record is inserted.  Btree could probably cut down the number of 
comparisons by a few orders of magnitude, and the keys & records are 
stored together on disk.  Even with millions of records, you usually 
only have to read 3 or 4 pages from the disk to find a key/record pair 
(although if the records are large they are sometimes stored on 
"overflow" pages separately).

SQLite has a btree engine, and that code is free for any use -- maybe a 
Euphoria translation could be done?

new topic     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu