Re: Is Euphoria OK for a large Database?

new topic     » goto parent     » topic index » view thread      » older message » newer message

>> I am interested in writing a large (chess) database program. I am
>> considering using Euphoria as it seems relatively straightforward. I do
>> not have any great programming experience and would like some advice...

    Well, Euphoria is one those language with great support, through this
wonderfull list-serv.

>> The number of games in a database can be quite high(up to around 1
>> million!!)

    Again, Euphoria is a very nice choice, you'll (almost) never get an
out-of-memory error nor are sequences limited at their sizes in any way.
    Only when and the normal memory (base memory + extended memory) and the
HD are full you get an out-of-memory chrash. (that is: It will abort you
program and tell the user its out of memory, however I might be wrong, never
tried it nor was able to aquire so much memory)
    And off course Euphoria automatically caches disk operations for speed,
so people won't need to run smartdisk to gain that extra speed.

>> but usually come in smaller packets of around 500 to 2000 (say). A
>> useful database could easily be constructed with only 50000 games in it.
>> Sorting and/or searching will be required.

    Easy, in Euphoria you could write (and some have) a very generic sort
and/or searching routines, that will work with any type of data and they can
use any comparisation routine you like. (With version 2.0 you can give your
own custom iD for comparisation)

>> Question 1. Is Euphoria up to the task?
>yes.  Euphoria can handle that kind of structure without breaking a sweat.
>If you got the ram, it'll handle the data.

    Or the HD, but off course slower...
    (Wy you hink win95 is so slow, cause it uses more ram then we have)

>> Question 2. I read in an associated website that Euphoria stored
>> sequences in             a way that used a lot of diskspace.
>
>There are ways around that.  You can write it out as a string of ascii
>characters (or whatever) with "parsing characters" to save space, and a
>number of people have created compression methods which should work well
>with your data.

    Or get my EDOM, you can save a sequence with any type of data in it,
with any size to disk compressed.
    But it is kind of slow, it uses my old EDO (converting a sequence with
all its data to an *efficient* binary sequence (thus 1's and 0's) ) and
Daniel Berstein's compression routines.
    It saves *almost* as good as zipping, esspecially if you consider that
is also has to save the complete structure of the sequence and its datatypes
in there.

>>Can the data be stored in a reasonably efficient manner?
>>Would I use sequences for such a large database?
>
>That's the only way to do it.  Sequences are wonderful things.  You could
>have a sequence called Moves such that Moves[1] is everything for game 1,
>Moves[2] be game 2, etc.  Each element in Moves would be a sequence
>containing all the data pertaining to it.  So Moves[45][2] would be the
>black player's name for game 45. length(Moves[i]) is the number of lines
>required for game i. Indexing in and retrieving the info you need is a
>snap.

    And then:
    -- begin..
        include edom.e
        if not edo_save ("my_db.edo", my_sequence) then
            puts(1,"Unable to save!\n")
            abort(1)
        end if
    -- End of program... easy huh ?

    To load you could use: my_sequence = edo_load ("my_db.edo")
    It's very easy to use, and it compresses very well. (thanx to Daniel's
routines)
    But it's slow, very slow, for very big amounts of data.
    In that case you have to wait for EDOM2.
    I am currently working on it.
    I finally finished my EupBit library that will be used to quickly write
out bits and charaters at binary positions.
    It now also supports custom iD's to handle the input and output, so you
could give him input from a memory buffer, or output to one, or you could
write it out compressed or encrypted.
    EDOM2 won't compress your data, it will save it very efficiently, making
a lot of assumptions, handling scopes and exceptions. It's nearly done,
still a few algorithms needed.
    (It doesn't use compression, but because it uses EupBit you could set
special routine iD's to routines that compress or encrypt the data. So you
could have influence about the way the data is written out.)

>> Question 3. Euphoria seems slightly confusing with regard
>> to strings/sequences(see above?). Any
>> suggestions as to a nice simple             method of reading in the
>> aforementioned textfiles?
>
>There are ways to do it.  If you have the data in an ascii text format,
>you should be able to read the file in directly.

Euphoria might be consfusion because you have learned *bad* stuff, like most
programming language do. QBasic will tell you there is a difference between
a character, and integer or a floating point.

You could get a very nice tutorial about Euphoria, by David Gay. (See link
in \other sites related to Euphoria at the Official Euphoria Page)

Here's is the difference briefly..

    An atom is a value (not a number). A character on the screen is
graphical representation of a value in memory.
    So a character is a value also, a floating is value also. It are all
values. And all values can be read or written as a number, or character, or
whatever you read in.
    When you type 'C' in your program is is equal to the value 67. Because
the ASCII table of dos has the C at the 67th place. (also Windows has this,
but windows doesn't support the last 128 characters in the ASCII table.)
    A file too, is a large amount of values. All those values are from
0-255. Just like characters. WHenever a character is expected, but you give
a value that is higher than 255, Euphoria will cut off the last bits of
memory, so the device will still get the byte it needs. So writing a 'C' to
the screen does the same as writing 67 + 255. Or as writing 'C' + 255

    A sequence is the way you wish to structure/order all your values.
    As you problely know, an object can be either a sequence or an atom.

   A sequence is a list of objects. Each of those objects can thus be an
atom (value) or a sequence (structure)
    So a sequence can contain other sequences and those sequences can
contain other sequence, etc.
    Just like a directory structure, or tree. (with branches)

    Some routines only work with values, some only with values within a
certain range, some only with sequences, and some with both.
    Puts is an example of this. It can write out a value, or a whole list of
values (a sequence thus)
    But puts will generate an error if you give him a sequence that contains
other sequences.
    Puts will cut the atom (to a value withing 0-255, you can calculate this
value ourself by remainder (value, 256))

    Some routines and all comparisations and arithemetical commands are
resursive. They will do the same action to every member of the sequence, and
thus also every memory of every sub-sequence.
    Example with the '+' command:

       sequence s1, s2
        s1 = { "This is a string, eh.. a sequence" , 3, 4, { 3, 4, { 4, 5,
"", {} } } }
        s2 = s2 + 3
        print(1, s1)        -- Will print out s1 and its structure

    Now all values and characters are 3 more than they were it the
beginning.
    Also note:
        Euphoria sees this as a sequence also: "Euphoria"
        Because its just a sequence containing characters (thus value):
 'E', 'U', 'P' ....

    Arithemetic and comparisations also work with 2 sequences (only when
they are of the same length)
    Example:
    s1 = { 1, 2, { 10, 9, 8} }
    s2 = { 0, { 1, 2, 3 } , 3 }
    print( 1, s1 + s2 )

    Now it will make a new sequence where every element (value in the
ordered list/sequence) is added to the element of the other sequence.
    So the sequence printed will be:
    {      1 + 0,    2 + { 1, 2, 3},     {10, 9, 8} + 3     }
    But offcourse the values will already be calculated, thus:
    { 1, {3,4,5}, {13,12,11} }

Same works with comparisation:

    print (1, s1 = s2)

    The sequence will then be evaluated:
    {  1 = 0 , { 2 = 1, 2 = 2, 2 = 3} , {3 = 10, 3 = 9, 3 = 8} }
    And that will make:
    { 0, {0,1,0}, {0,0,0}

    Where 0 off course means false, and 1 means true.
    And the = means compare not assign, unless it is used without any
statement:
    s1 = ( s1 = s2 )

    Will assign the { 0, {0,1,0}, {0,0,0} } to s1
    I'm not sure but I think the paranthesis are needed.

    BUT NOTE: this is the syntax of an if statement:
        if true/false then
            -- code
        end if

    So, this *will* work:
        if 1 = 1 then
            -- code
        end if

    But this *won't* work:
        if {1,0,1} = 1 then
            -- code
        end if

    The if-statement may only get an value and not a structure filled with
values..!

    Now you know how to the file I/O to Euphoria basically works, and you
know what sequences are like, you need to know how to cut, modify and
replace sequences:

    s1 = s2[2]

    Will asign the second element of s2 to s1
    (Only a pointer is copied, not the whole sequence, until it becomes
nessesary.

    s1 = s2[2][1]

    Will generate an error, because the 1st element of the 2nd element of s2
is an value. And s1 is declared as a sequence.

    s1 = s2[2][1..2]

    Will asign the elements 1 until 2 to of the 2nd element of s2 to s1.
    This is not allowed: [4..1]

    The elements are indexed from 1 to their length. (try length(s2) to get
the length during run-time)
    This is allowed: [0..4]
    Only element zero doesn't exist, and is ignored.
    This is not allowed: [-1..4]
    This is allowed: [1..1]    -- Returns a sequence with one element
    This is allowed: [4..3]     -- Returns a zero element sequence like {}
or "" (doesn't matter)
    This is not allowed [6..3]

    It may be confusing why some are and why some are not allowed, but when
you write a lot of algorithms, you'll find out, that the algorithm will work
because of the flexibilities, and that it wouldn't have worked right when
some stuff was allowed that isn't allowed right now.
    I consider this to be very elegant.

    I hope this mini-tutorial helped...

Ralf

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu