1. RE: Future of EDS -- btree?
- Posted by Andy Serpa <ac at onehorseshy.com> Jan 23, 2004
- 444 views
Kat wrote: > > > On 23 Jan 2004, at 2:04, Andy Serpa wrote: > > > > > Regarding this database talk, has anyone attempted to write a btree > > database implementation in pure Euphoria? One thing that prevents use > > of EDS for really big databases (even if simple in structure -- one > > giant table, for instance) is that it doesn't scale well after a point > > (especially when adding new records, plus all of the keys are kept in > > memory, which can be a problem if there are a gazillion of them). A > > btree structure would potentially remain fairly speedy with millions of > > records (the whole point of btree is too remain fast by minimizing disk > > access), and it is not terribly difficult to implement (I've played > > around with an in-memory version just to get an idea of how to code it > > in Euphoria, but I never got around to making a disk-based version). Am > > > > I mistaken? > > Jiri and others have written "associated lists" that do that, in various > ways. > Bach has "hash tables". > Yeah, but that's all in-memory stuff. (I use a custom version of the associated list routines in practically every program I make. But I wouldn't store 2 million records in there.) The btree algorithm is specifically designed for when what you want to store is much larger than available memory -- in other words, a database. Btree is the base algorithm of practically every major database. The speed of EDS is pretty fast for disk access; it just gets bogged down when it has too many records. Not sure if that is because it has to move stuff around on disk or what. I don't think so because if you create a new database and add a million records immediately, it gets pretty slow after a few hundred thousand, but I think it just adds new blocks on the end of the file all the time. The slowdown probably comes from the key comparison, and the fact that all the keys are in a giant sequence in memory, so it has to make a new copy of that everytime a record is inserted. Btree could probably cut down the number of comparisons by a few orders of magnitude, and the keys & records are stored together on disk. Even with millions of records, you usually only have to read 3 or 4 pages from the disk to find a key/record pair (although if the records are large they are sometimes stored on "overflow" pages separately). SQLite has a btree engine, and that code is free for any use -- maybe a Euphoria translation could be done?