1. database theory
- Posted by Alan Tu <ATU5713 at COMPUSERVE.COM> Nov 20, 1998
- 694 views
In the Euphoria package somewhere, (it might have been in the demo program), it said that to create a large database one would use seek and where. Could someone please explain to me how that would be done to acce= ss a database? Thanks. Alan =
2. Re: database theory
- Posted by Irv Mullins <irv at ELLIJAY.COM> Nov 20, 1998
- 659 views
On Fri, 20 Nov 1998 10:37:13 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote: >In the Euphoria package somewhere, (it might have been in the demo >program), it said that to create a large database one would use seek and >where. Could someone please explain to me how that would be done to access >a database? Thanks. > Hi, Alan: In a large database, for example, 50,000 names and addresses, you have more data than will fit into memory comfortably. Plus, even if you could load it in all at once, the disk activity would take a long time. Same when saving the updated data. Therefore, it is better to read/write a single record at a time, making disk i/o very quick. If you keep a sorted list of keys, you can find any record in the file with 8 seeks or less - doing a binary search. The difficulty here is that Euphoria stores variable length records (sequences). How do you know which byte in the file is the first byte of Alan Tu's name and address? Two solutions: the easy one, and the one that I don't want to bother writing: Easy one: write all records to the same, fixed length. Let's say a name and address record is always 256 bytes long. You want the 3rd name in the file, so you seek byte (256 * 3), and read 256 bytes. This also makes updates to the file easy, since you're just writing 256 new bytes over 256 old. The other way involves a "linked list" approach, with variable length records, where the record length is stored in the record,or in a separate index file, so you can "leapfrog" that record and be at the beginning of the next. This can get really complex fast. Suppose you have to change Alan Tu's name to Alan W. Tu? Now your record is longer than before, so if you put it back where it was, you will overwrite part of the next record. What to do? There are books written on this. Regards, Irv
3. Re: database theory
- Posted by Irv Mullins <irv at ELLIJAY.COM> Nov 20, 1998
- 619 views
I should have added a couple of things to the previous message: Euphoria sequences, when stored on disk, are horribly wasteful of space. It is *far* better to do your storage in plain old fixed-length ascii text format. How? Using prinf() of course. In one statement, you get: 1.Fixed length fields 2.Ascii conversion 3.End of record marker (C/R) This is also nice because you can use any old text editor to edit or view your data files. The problem is when reading data back in, you must do your own parsing. Using fixed length fields, like so: Irv Mullins irv at ellijay.com Alan Tu ATU5713 at COMPUSERVE.COM ^ ^ 1 13 You see that [NAME] is always input[1..12], address is [13..25] and so on. This works pretty easily. Irv
4. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 20, 1998
- 630 views
- Last edited Nov 21, 1998
>In the Euphoria package somewhere, (it might have been in the demo >program), it said that to create a large database one would use seek and >where. Could someone please explain to me how that would be done to access >a database? Thanks. If you're going to create and maintain a small database (I would say up to 100 records, but that depends on the size of the record), it much easier to just print() and get() your database as an Euphoria sequence. If the database is bigger you should load into memory only the record(s) that you need (using get() loads the complete file into memory!). Using seek() you can make your program to position the next-to-be-read-character pointer. where() tells you where is this pointer pointing. Regards, Daniel Berstein daber at pair.com
5. Re: database theory
- Posted by C & K L <candk at TICNET.COM> Nov 20, 1998
- 617 views
- Last edited Nov 21, 1998
There are some programs on the EUPHORIA main site that compress sequences written to disk. EDOM, from Ralf, comes immediately to mind, but I know there are others... Perhaps these codes will make the writing and reading of sequences to disk more efficient, and therefore better able to handle the needs of a database. Irv Mullins wrote: > > I should have added a couple of things to the previous message: > Euphoria sequences, when stored on disk, are horribly wasteful > of space. It is *far* better to do your storage in plain old > fixed-length ascii text format. > How? Using prinf() of course. In one statement, you get: > 1.Fixed length fields > 2.Ascii conversion > 3.End of record marker (C/R) > > This is also nice because you can use any old text editor > to edit or view your data files. > The problem is when reading data back in, you must do your > own parsing. Using fixed length fields, > like so: > Irv Mullins irv at ellijay.com > Alan Tu ATU5713 at COMPUSERVE.COM > ^ ^ > 1 13 > You see that [NAME] is always input[1..12], address is [13..25] > and so on. > > This works pretty easily. > > Irv
6. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 21, 1998
- 632 views
>There are some programs on the EUPHORIA main site that compress >sequences written to disk. EDOM, from Ralf, comes immediately to mind, >but I know there are others... Perhaps these codes will make the writing >and reading of sequences to disk more efficient, and therefore better >able to handle the needs of a database. I wouldn't say that... the sequence IS compressed, but you're still holding the complete database in memory! EDOM is far better than using plain sequence, but it doesn't replace a real database system. Regards, Daniel Berstein daber at pair.com
7. Re: database theory
- Posted by C & K L <candk at TICNET.COM> Nov 21, 1998
- 636 views
The question is begged: Why not use a pre-packaged database? Next question: When is the Linux version of EUPHORIA coming out? Isn't that the next big thing? I would think that the cutting-edge people of this list would be working on Linux programs because that's the market that's going to blossom here in a few years... wouldn't you think? (And if it doesn't, you haven't really lost anything.) Later! ck Daniel Berstein wrote: > > >There are some programs on the EUPHORIA main site that compress > >sequences written to disk. EDOM, from Ralf, comes immediately to mind, > >but I know there are others... Perhaps these codes will make the writing > >and reading of sequences to disk more efficient, and therefore better > >able to handle the needs of a database. > > I wouldn't say that... the sequence IS compressed, but you're still holding > the > complete database in memory! EDOM is far better than using plain > sequence, but it doesn't replace a real database system. > > Regards, > Daniel Berstein > daber at pair.com
8. Re: database theory
- Posted by Alan Tu <ATU5713 at COMPUSERVE.COM> Nov 22, 1998
- 619 views
>>>>> Easy one: write all records to the same, fixed length. Let's say a name and address record is always 256 bytes long. You want the 3rd name in the file, so you seek byte (256 * 3), and read 256 bytes. This also makes updates to the file easy, since you're just writing 256 new bytes over 256 old. <<<<< Yes, but how would you know to seek the third one? Let me think. Well, here. How fast is this? sequence last_name last_name =3D "" while last_name !=3D "Jordan" -- the name I'm trying to retrieve counter =3D 1 seek(database,counter) -- database is an open file line =3D gets(database) -- line is a sequence index =3D find(',',line) last_name =3D line[1..index-1] counter =3D counter+80 -- line length, suppose end while Is this solution viable? Is there a better way if I can use the "easy" w= ay you describe? Thanks. Alan =
9. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 22, 1998
- 655 views
>The question is begged: Why not use a pre-packaged database? I don't understand your question. The compressed sequence (either in disk or in memory) will hold the COMPLETE database into memory... unless you have an unlimited amout of memory that isn't a good solution for an eficient and scalable database solution. Regards, Daniel Berstein daber at pair.com
10. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 22, 1998
- 637 views
>Yes, but how would you know to seek the third one? Let me think. Well, >here. How fast is this? >sequence last_name >last_name = "" >while last_name != "Jordan" -- the name I'm trying to retrieve > counter = 1 > seek(database,counter) -- database is an open file > line = gets(database) -- line is a sequence > index = find(',',line) > last_name = line[1..index-1] > counter = counter+80 -- line length, suppose >end while >Is this solution viable? Is there a better way if I can use the "easy" way >you describe? Thanks. No. You're a bit confused. Suppose you records are 100 bytes long, so to read a record you do: sequence record integer i, fn, offset, counter counter = 1 fn = open("database.dat","rb") -- Note that you must open the file as binary while record != something do seek(fn, 100 * (counter-1)) for i = 1 to 100 do -- Read 100 consecutive bytes from file record = record & getc(fn) end for counter = counter+1 end while This kind of searching is called secuential because you start from the first record until you find what you where looking for. Once you've understand this you can start experimenting with more sophisticated searching algorithms (external indexes, binary search, binary trees, hash tables, etc.). Regards, Daniel Berstein daber at pair.com
11. Re: database theory
- Posted by C & K L <candk at TICNET.COM> Nov 22, 1998
- 643 views
Why not simply use Access or FileMaker Pro or any one of many other already-built database management programs...? :) Daniel Berstein wrote: > > >The question is begged: Why not use a pre-packaged database? > > I don't understand your question. The compressed sequence (either in disk > or in memory) will hold the COMPLETE database into memory... unless > you have an unlimited amout of memory that isn't a good solution for an > eficient and scalable database solution. > > Regards, > Daniel Berstein > daber at pair.com
12. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 22, 1998
- 637 views
I forgot to mention: You can easily get each field from the record using sequence slices. If each record has 2 fields: NAME and LAST_NAME of length 50 each, the earlier example can get each fields with name = record[1..50] lastname = record[51..100]. The use of this kind of database file, as opposed to sequence-based, allows the data to be easily read from other programming languages (like Pascal, C, or Basic). Regards, Daniel Berstein daber at pair.com
13. Re: database theory
- Posted by Ad Rienks <Ad_Rienks at COMPUSERVE.COM> Nov 22, 1998
- 639 views
C & K L asked: >Why not simply use Access or FileMaker Pro or any one of many other >already-built database management programs...? :) Why are you programming?
14. Re: database theory
- Posted by Irv Mullins <irv at ELLIJAY.COM> Nov 22, 1998
- 607 views
On Sun, 22 Nov 1998 05:27:02 -0500, Ad Rienks <Ad_Rienks at COMPUSERVE.COM> wrote: >C & K L asked: > >>Why not simply use Access or FileMaker Pro or any one of many other >>already-built database management programs...? :) > >Why are you programming? Indeed. You may remember, in response to my question last week re: setting up a customer database, I received estimates ranging from 3 hours to 8 weeks to 6 months. I also promised to share some real-life data on this. The task, as described, with 4 users, using Windows, took : 90 seconds to design the database, input screen, and make it multi-user. The data import took 60 seconds to set up, perhaps half an hour to import. The system has been running for about 4 years, in daily use in a mail-order firm, with no loss of data. It is now a small part of a far more complex system. The same job could have easily taken 3 hours, or 8 weeks, or 6 months, or the rest-of-your-life, depending upon what software tools you use. If you want to learn, use what you have, whether BASIC or Euphoria or whatever. If you are paid by the hour, by all means use COBOL ;) If you are paid by the job, use something more suited to the task at hand. (In this case FMPro) Regards, Irv
15. Re: database theory
- Posted by C & K L <candk at TICNET.COM> Nov 22, 1998
- 605 views
That question doesn't make sense. Just because I suggest you take advantage of one of many database programs already available? Ad, why are you using an OS? Why not program it? Are you using an e-mail program YOU coded? Are you using word processing YOU coded? Get my point? Anyway, I am programming several projects... hopefully one of which will one day be complete! Thanks for asking. Ad Rienks wrote: > > C & K L asked: > > >Why not simply use Access or FileMaker Pro or any one of many other > >already-built database management programs...? :) > > Why are you programming?
16. Re: database theory
- Posted by Irv Mullins <irv at ELLIJAY.COM> Nov 22, 1998
- 620 views
On Sun, 22 Nov 1998 00:00:09 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote: >>>>>> >Easy one: write all records to the same, fixed length. >Let's say a name and address record is always 256 bytes long. >You want the 3rd name in the file, so you seek byte >(256 * 3), and read 256 bytes. This also makes updates >to the file easy, since you're just writing 256 new bytes >over 256 old. ><<<<< > >Yes, but how would you know to seek the third one? Let me think. Well, >here. How fast is this? > <Sequential search snipped> That will work, if the data file doesn't have a huge number of entries. Other ways include building an index file which holds, for example NAME,LOCATION for each record. Load and sort the index file. Then you can do a binary lookup in the index file which will take only a maximimum of 8 tries to find anyone, regardless of how long the list is. Many commercial databases use such a method (multiple index files) When adding a record, you tack it onto the end of the data. You also tack the NAME and LOCATION onto the index file, and re-sort the index. Even better is to use a hash scheme (see Junko's hash code on RDS' website) to hash the indices. Regards, Irv
17. Re: database theory
- Posted by Daniel Berstein <daber at PAIR.COM> Nov 22, 1998
- 613 views
- Last edited Nov 23, 1998
>Why not simply use Access or FileMaker Pro or any one of many other >already-built database management programs...? :) Now I understand your point! The answer: because there's still no way to access external DBMS from Euphoria. Perhaps RDS should add in the future a way to comunicate with ODBC drivers. If you want/need cool enterprise database enabled applications, look for Visual Basic, Delphi and JBuilder (or Visual Cafe). The examples I've gave before don't pretend to become a complete database engine, just a small database system for small bussiness problems ;) Regards, Daniel Berstein daber at pair.com
18. Re: database theory
- Posted by Alan Tu <ATU5713 at COMPUSERVE.COM> Nov 22, 1998
- 620 views
- Last edited Nov 23, 1998
>>>>> Load and sort the index file. Then you can do a binary lookup in the index file which will take only a maximimum of 8 tries to find anyone, regardless of how long the list is. Many commercial databases use such a method (multiple index files) <<<<< I hope I'm not causing too much trouble, but I don't get it. Presumably I've have something like this: { {"name1",location1} {"name2",location2} {"name3",location3} } What to you mean by sorting and binary lookup? Alan =
19. Re: database theory
- Posted by Irv Mullins <irv at ELLIJAY.COM> Nov 23, 1998
- 623 views
On Sun, 22 Nov 1998 23:04:27 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote: >>>>>> >Load and sort the index file. Then you can do a binary lookup >in the index file which will take only a maximimum of >8 tries to find anyone, regardless of how long the list >is. Many commercial databases use such a method (multiple >index files) ><<<<< > >I hope I'm not causing too much trouble, but I don't get it. Presumably >I've have something like this: > >{ >{"name1",location1} >{"name2",location2} >{"name3",location3} >} > >What to you mean by sorting and binary lookup? > >Alan > Let's say you are keeping a mailing list. You enter new names as you get them, so there is no particular order to them: Mullins, Irv Tu, Alan Smith, John Doh, John Craig, Rob ... Now, to find Craig, Rob on this list, you must either: 1. Start at the top and read each name, checking to see it it is the one you want (sequential access) 2. Sort the whole file by name. That's easy in Euphoria, just use the sort() function. Problem is, what happens when your list gets to be 30 or 40 megs long? Simple no longer works. 3. Make a shorter list that matches the long list and which can be sorted: Mullins, 1 Tu,2 Craig,3 Smith,4 Doh,5 ... Now, sort this alphabetically: Craig,3 Doh,5 Mullins,1 Smith,4 Tu,2 .... To find a name now, you look up the name and find an index a.k.a. record number, for that name. Craig is the third record in your main file. So read it in. But, you say, won't it still take a long time to look thru the index? Not if you do it right. It works this way: you are looking for Smith. You get the first name from the index,(Craig) and the middle name from the index (Mullins) is Smith => Craig ? yes is Smith <= Mullins ? no So you have ruled out the entire first half of your list in one ! call Next you get the second half of the list: Mullins --> Tu and divide it into halfs, checking the first half: is Smith => Mullins ? yes is Smith <= Smith ? yes So the name falls within this half of the file Divide this half again, and look again. Strange as it may sound, it only takes a few (8?) steps to locate the matching entry(s) in any file, regardless of how long it may be. This method is very fast, and can be written recursively, so there's not much code to test. Regards, Irv
20. Re: database theory
- Posted by Lucius Hilley III <lhilley at CDC.NET> Nov 24, 1998
- 615 views
- Last edited Nov 25, 1998
On Mon, 23 Nov 1998 08:53:56 -0500, Irv Mullins <irv at ELLIJAY.COM> wrote: <SNIP> Below is in reference to Binary Searches. > Strange as it may sound, it only takes a few (8?) steps > to locate the matching entry(s) in any file, regardless > of how long it may be. This method is very fast, and > can be written recursively, so there's not much code to > test. > > Regards, > Irv Excuse me but this magic number of (8) is driving me nuts. There is an exact math as to maximum number of tests will be required to find an entry. It isn't called binary searching for no reason. Only 8 tests will search a maximum of 255 entries. Now I no that both sounds rather limiting and also rather amazing. BUT the math follows as such. Just one more test. Say 9 tests will search a maximum of 511 entries. Notice! The number practically doubled. A database of 65,535 entries can be searched with no more than 16 tests. Here is a simple table. TESTS = Max # of tests required ENTRIES = # of Entries that can be searched. TESTS ENTRIES 1 1 2 3 4 15 8 255 16 65,535 32 4,294,967,295 I hope you now grasp the power of binary searches. I doubt you will ever require 32 tests for your sorted database. Of course for binary searches to work the database must be sorted at some level. _________________________ Lucius L. Hilley III lhilley at cdc.net http://www.cdc.net/~lhilley http://www.americanantiques.com http://www.dragonvet.com _________________________