1. database theory

In the Euphoria package somewhere, (it might have been in the demo
program), it said that to create a large database one would use seek and
where.  Could someone please explain to me how that would be done to acce=
ss
a database?  Thanks.

Alan
 =

new topic     » topic index » view message » categorize

2. Re: database theory

On Fri, 20 Nov 1998 10:37:13 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote:

>In the Euphoria package somewhere, (it might have been in the demo
>program), it said that to create a large database one would use seek and
>where.  Could someone please explain to me how that would be done to access
>a database?  Thanks.
>
Hi, Alan:

In a large database, for example, 50,000 names and addresses,
you have more data than will fit into memory comfortably.
Plus, even if you could load it in all at once, the disk
activity would take a long time. Same when saving the updated
data.

Therefore, it is better to read/write a single record at a
time, making disk i/o very quick.
If you keep a sorted list of keys, you can find any
record in the file with 8 seeks or less - doing a binary
search.

The difficulty here is that Euphoria stores variable
length records (sequences). How do you know which byte in the
file is the first byte of Alan Tu's name and address?
Two solutions: the easy one, and the one that I don't want
to bother writing:

Easy one: write all records to the same, fixed length.
Let's say a name and address record is always 256 bytes long.
You want the 3rd name in the file, so you seek byte
(256 * 3), and read 256 bytes. This also makes updates
to the file easy, since you're just writing 256 new bytes
over 256 old.

The other way involves a "linked list" approach, with
variable length records, where the record length is stored
in the record,or in a separate index file, so you can
"leapfrog" that record and be at the beginning of the  next.
This can get really complex fast.

Suppose you have to change Alan Tu's name to Alan W. Tu?
Now your record is longer than before, so if you put it back
where it was, you will overwrite part of the next record.
What to do? There are books written on this.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

3. Re: database theory

I should have added a couple of things to the previous message:
Euphoria sequences, when stored on disk, are horribly wasteful
of space. It is *far* better to do your storage in plain old
fixed-length ascii text format.
How? Using prinf() of course. In one statement, you get:
1.Fixed length fields
2.Ascii conversion
3.End of record marker (C/R)

This is also nice because you can use any old text editor
to edit or view your data files.
The problem is when reading data back in, you must do your
own parsing. Using fixed length fields,
like so:
Irv Mullins irv at ellijay.com
Alan Tu     ATU5713 at COMPUSERVE.COM
^           ^
1           13
You see that [NAME] is always input[1..12], address is [13..25]
and so on.

This works pretty easily.

Irv

new topic     » goto parent     » topic index » view message » categorize

4. Re: database theory

>In the Euphoria package somewhere, (it might have been in the demo
>program), it said that to create a large database one would use seek and
>where.  Could someone please explain to me how that would be done to access
>a database?  Thanks.


If you're going to create and maintain a small database (I would say up to
100
records, but that depends on the size of the record), it much easier to just
print()
and get() your database as an Euphoria sequence. If the database is bigger
you
should load into memory only the record(s) that you need (using get() loads
the
complete file into memory!). Using seek() you can make your program to
position
the next-to-be-read-character pointer. where() tells you where is this
pointer
pointing.


Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

5. Re: database theory

There are some programs on the EUPHORIA main site that compress
sequences written to disk. EDOM, from Ralf, comes immediately to mind,
but I know there are others... Perhaps these codes will make the writing
and reading of sequences to disk more efficient, and therefore better
able to handle the needs of a database.

Irv Mullins wrote:
>
> I should have added a couple of things to the previous message:
> Euphoria sequences, when stored on disk, are horribly wasteful
> of space. It is *far* better to do your storage in plain old
> fixed-length ascii text format.
> How? Using prinf() of course. In one statement, you get:
> 1.Fixed length fields
> 2.Ascii conversion
> 3.End of record marker (C/R)
>
> This is also nice because you can use any old text editor
> to edit or view your data files.
> The problem is when reading data back in, you must do your
> own parsing. Using fixed length fields,
> like so:
> Irv Mullins irv at ellijay.com
> Alan Tu     ATU5713 at COMPUSERVE.COM
> ^           ^
> 1           13
> You see that [NAME] is always input[1..12], address is [13..25]
> and so on.
>
> This works pretty easily.
>
> Irv

new topic     » goto parent     » topic index » view message » categorize

6. Re: database theory

>There are some programs on the EUPHORIA main site that compress
>sequences written to disk. EDOM, from Ralf, comes immediately to mind,
>but I know there are others... Perhaps these codes will make the writing
>and reading of sequences to disk more efficient, and therefore better
>able to handle the needs of a database.


I wouldn't say that... the sequence IS compressed, but you're still holding
the
complete database in memory! EDOM is far better than using plain
sequence, but it doesn't replace a real database system.

Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

7. Re: database theory

The question is begged: Why not use a pre-packaged database?

Next question: When is the Linux version of EUPHORIA coming out? Isn't
that the next big thing? I would think that the cutting-edge people of
this list would be working on Linux programs because that's the market
that's going to blossom here in a few years... wouldn't you think?

(And if it doesn't, you haven't really lost anything.)

Later!
ck

Daniel Berstein wrote:
>
> >There are some programs on the EUPHORIA main site that compress
> >sequences written to disk. EDOM, from Ralf, comes immediately to mind,
> >but I know there are others... Perhaps these codes will make the writing
> >and reading of sequences to disk more efficient, and therefore better
> >able to handle the needs of a database.
>
> I wouldn't say that... the sequence IS compressed, but you're still holding
> the
> complete database in memory! EDOM is far better than using plain
> sequence, but it doesn't replace a real database system.
>
> Regards,
>     Daniel   Berstein
>     daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

8. Re: database theory

>>>>>
Easy one: write all records to the same, fixed length.
Let's say a name and address record is always 256 bytes long.
You want the 3rd name in the file, so you seek byte
(256 * 3), and read 256 bytes. This also makes updates
to the file easy, since you're just writing 256 new bytes
over 256 old.
<<<<<

Yes, but how would you know to seek the third one?  Let me think.  Well,
here.  How fast is this?

sequence last_name
last_name =3D ""
while last_name !=3D "Jordan" -- the name I'm trying to retrieve
    counter =3D 1
    seek(database,counter) -- database is an open file
    line =3D gets(database) -- line is a sequence
    index =3D find(',',line)
    last_name =3D line[1..index-1]
    counter =3D counter+80 -- line length, suppose
end while

Is this solution viable?  Is there a better way if I can use the "easy" w=
ay
you describe?  Thanks.

Alan

 =

new topic     » goto parent     » topic index » view message » categorize

9. Re: database theory

>The question is begged: Why not use a pre-packaged database?


I don't understand your question. The compressed sequence (either in disk
or in memory) will hold the COMPLETE database into memory... unless
you have an unlimited amout of memory that isn't a good solution for an
eficient and scalable database solution.

Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

10. Re: database theory

>Yes, but how would you know to seek the third one?  Let me think.  Well,
>here.  How fast is this?

>sequence last_name
>last_name = ""
>while last_name != "Jordan" -- the name I'm trying to retrieve
>    counter = 1
>    seek(database,counter) -- database is an open file
>    line = gets(database) -- line is a sequence
>    index = find(',',line)
>    last_name = line[1..index-1]
>    counter = counter+80 -- line length, suppose
>end while

>Is this solution viable?  Is there a better way if I can use the "easy" way
>you describe?  Thanks.

No. You're a bit confused. Suppose you records are 100 bytes long, so to
read a record you do:

sequence record
integer i, fn, offset, counter

counter = 1
fn = open("database.dat","rb") -- Note that you must open the file as binary

while record != something do
    seek(fn, 100 * (counter-1))
    for i = 1 to 100 do    -- Read 100 consecutive bytes from file
        record = record & getc(fn)
    end for

    counter = counter+1
end while

This kind of searching is called secuential because you start from the first
record until you find what you where looking for. Once you've understand
this you can start experimenting with more sophisticated searching
algorithms (external indexes, binary search, binary trees, hash tables,
etc.).

Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

11. Re: database theory

Why not simply use Access or FileMaker Pro or any one of many other
already-built database management programs...? :)

Daniel Berstein wrote:
>
> >The question is begged: Why not use a pre-packaged database?
>
> I don't understand your question. The compressed sequence (either in disk
> or in memory) will hold the COMPLETE database into memory... unless
> you have an unlimited amout of memory that isn't a good solution for an
> eficient and scalable database solution.
>
> Regards,
>     Daniel   Berstein
>     daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

12. Re: database theory

I forgot to mention:

You can easily get each field from the record using
sequence slices. If each record has 2 fields: NAME
and LAST_NAME of length 50 each, the earlier
example can get each fields with

name = record[1..50]
lastname = record[51..100].

The use of this kind of database file, as opposed to
sequence-based, allows the data to be easily read
from other programming languages (like Pascal, C,
or Basic).

Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

13. Re: database theory

C & K L asked:

>Why not simply use Access or FileMaker Pro or any one of many other
>already-built database management programs...? :)

Why are you programming?

new topic     » goto parent     » topic index » view message » categorize

14. Re: database theory

On Sun, 22 Nov 1998 05:27:02 -0500, Ad Rienks <Ad_Rienks at COMPUSERVE.COM>
wrote:

>C & K L asked:
>
>>Why not simply use Access or FileMaker Pro or any one of many other
>>already-built database management programs...? :)
>
>Why are you programming?

Indeed. You may remember, in response to my question last week
re: setting up a customer database, I received estimates ranging
from 3 hours to 8 weeks to 6 months.

I also promised to share some real-life data on this.
The task, as described, with 4 users, using Windows,
took : 90 seconds to design the database, input screen,
and make it multi-user.
The data import took 60 seconds to set up, perhaps half
an hour to import.

The system has been running for about 4 years, in daily
use in a mail-order firm, with no loss of data. It is now
a small part of a far more complex system.

The same job could have easily taken 3 hours, or 8 weeks,
or 6 months, or the rest-of-your-life, depending upon what
software tools you use.

If you want to learn, use what you have, whether BASIC or
Euphoria or whatever. If you are paid by the hour, by all
means use COBOL ;) If you are paid by the job, use something
more suited to the task at hand. (In this case FMPro)

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

15. Re: database theory

That question doesn't make sense. Just because I suggest you take
advantage of one of many database programs already available?

Ad, why are you using an OS? Why not program it? Are you using an e-mail
program YOU coded? Are you using word processing YOU coded? Get my
point?

Anyway, I am programming several projects... hopefully one of which will
one day be complete!

Thanks for asking.

Ad Rienks wrote:
>
> C & K L asked:
>
> >Why not simply use Access or FileMaker Pro or any one of many other
> >already-built database management programs...? :)
>
> Why are you programming?

new topic     » goto parent     » topic index » view message » categorize

16. Re: database theory

On Sun, 22 Nov 1998 00:00:09 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote:

>>>>>>
>Easy one: write all records to the same, fixed length.
>Let's say a name and address record is always 256 bytes long.
>You want the 3rd name in the file, so you seek byte
>(256 * 3), and read 256 bytes. This also makes updates
>to the file easy, since you're just writing 256 new bytes
>over 256 old.
><<<<<
>
>Yes, but how would you know to seek the third one?  Let me think.  Well,
>here.  How fast is this?
>
<Sequential search snipped>

That will work, if the data file doesn't have a huge number
of entries.

Other ways include building an index file which holds,
for example NAME,LOCATION for each record.

Load and sort the index file. Then you can do a binary lookup
in the index file which will take only a maximimum of
8 tries to find anyone, regardless of how long the list
is. Many commercial databases use such a method (multiple
index files)

When adding a record, you tack it onto the end of the
data. You also tack the NAME and LOCATION onto the index
file, and re-sort the index.

Even better is to use a hash scheme (see Junko's hash
code on RDS' website) to hash the indices.

Regards,
Irv

new topic     » goto parent     » topic index » view message » categorize

17. Re: database theory

>Why not simply use Access or FileMaker Pro or any one of many other
>already-built database management programs...? :)


Now I understand your point!

The answer: because there's still no way to access external DBMS from
Euphoria. Perhaps RDS should add in the future a way to comunicate
with ODBC drivers. If you want/need cool enterprise database enabled
applications, look for Visual Basic, Delphi and JBuilder (or Visual Cafe).

The examples I've gave before don't pretend to become a complete database
engine, just a small database system for small bussiness problems ;)

Regards,
    Daniel   Berstein
    daber at pair.com

new topic     » goto parent     » topic index » view message » categorize

18. Re: database theory

>>>>>
Load and sort the index file. Then you can do a binary lookup
in the index file which will take only a maximimum of
8 tries to find anyone, regardless of how long the list
is. Many commercial databases use such a method (multiple
index files)
<<<<<

I hope I'm not causing too much trouble, but I don't get it.  Presumably
I've have something like this:

{
{"name1",location1}
{"name2",location2}
{"name3",location3}
}

What to you mean by sorting and binary lookup?

Alan
 =

new topic     » goto parent     » topic index » view message » categorize

19. Re: database theory

On Sun, 22 Nov 1998 23:04:27 -0500, Alan Tu <ATU5713 at COMPUSERVE.COM> wrote:

>>>>>>
>Load and sort the index file. Then you can do a binary lookup
>in the index file which will take only a maximimum of
>8 tries to find anyone, regardless of how long the list
>is. Many commercial databases use such a method (multiple
>index files)
><<<<<
>
>I hope I'm not causing too much trouble, but I don't get it.  Presumably
>I've have something like this:
>
>{
>{"name1",location1}
>{"name2",location2}
>{"name3",location3}
>}
>
>What to you mean by sorting and binary lookup?
>
>Alan
>

Let's say you are keeping a mailing list. You enter new names as you
get them, so there is no particular order to them:
Mullins, Irv
Tu, Alan
Smith, John
Doh, John
Craig, Rob
...
Now, to find Craig, Rob on this list, you must either:
1. Start at the top and read each name, checking to see it it
   is the one you want (sequential access)

2. Sort the whole file by name. That's easy in Euphoria, just
   use the sort() function. Problem is, what happens when your
   list gets to be 30 or 40 megs long? Simple no longer works.

3. Make a shorter list that matches the long list and which can
   be sorted:
   Mullins, 1
   Tu,2
   Craig,3
   Smith,4
   Doh,5
   ...
   Now, sort this alphabetically:
   Craig,3
   Doh,5
   Mullins,1
   Smith,4
   Tu,2
   ....
   To find a name now, you look up the name and find an index
   a.k.a. record number, for that name. Craig is the third record
   in your main file. So read it in.

   But, you say, won't it still take a long time to look thru the
   index? Not if you do it right.

   It works this way: you are looking for Smith.
   You get the first name from the index,(Craig)
   and the middle name from the index (Mullins)
   is Smith => Craig ? yes
   is Smith <= Mullins ? no
   So you have ruled out the entire first half of your
   list in one ! call
   Next you get the second half of the list:
   Mullins --> Tu
   and divide it into halfs, checking the first half:
   is Smith => Mullins ? yes
   is Smith <= Smith ? yes
   So the name falls within this half of the file
   Divide this half again, and look again.
   Strange as it may sound, it only takes a few (8?) steps
   to locate the matching entry(s) in any file, regardless
   of how long it may be. This method is very fast, and
   can be written recursively, so there's not much code to
   test.

   Regards,
   Irv

new topic     » goto parent     » topic index » view message » categorize

20. Re: database theory

On Mon, 23 Nov 1998 08:53:56 -0500, Irv Mullins <irv at ELLIJAY.COM> wrote:

  <SNIP>
Below is in reference to Binary Searches.
>   Strange as it may sound, it only takes a few (8?) steps
>   to locate the matching entry(s) in any file, regardless
>   of how long it may be. This method is very fast, and
>   can be written recursively, so there's not much code to
>   test.
>
>   Regards,
>   Irv

    Excuse me but this magic number of (8) is driving me
nuts.  There is an exact math as to maximum number of
tests will be required to find an entry.  It isn't
called binary searching for no reason.

Only 8 tests will search a maximum of 255 entries.
Now I no that both sounds rather limiting and also
rather amazing.  BUT the math follows as such.
Just one more test.  Say 9 tests will search a
maximum of 511 entries.  Notice! The number
practically doubled.

A database of 65,535 entries can be searched with
no more than 16 tests.  Here is a simple table.

TESTS = Max # of tests required
ENTRIES = # of Entries that can be searched.

TESTS  ENTRIES
1        1
2        3
4        15
8        255
16       65,535
32       4,294,967,295

    I hope you now grasp the power of binary searches.
I doubt you will ever require 32 tests for your
sorted database.
    Of course for binary searches to work the database
must be sorted at some level.
_________________________

Lucius L. Hilley III    lhilley at cdc.net
http://www.cdc.net/~lhilley
http://www.americanantiques.com
http://www.dragonvet.com
_________________________

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu