1. associative arrays

Hi, All (interested),

Associative arrays is the topic, as I promised earlier today. I call
them simply lists, because Rob kindly left the term vacant. For the
uninitiated, they are (labelled) containers for a bunch of 'key -
value' pairs. The key can be just about anything, but it's usually a
descriptive name, a string. The values can again be anything, even
lists, or lists of lists...

Simple enough concept, but very powerful. Whole languages can be built
around it, Brazilian Lua is a good example.

The lists are also very attractive in the Euphorian context,
especially since the promised namespace solution does not seem to be
forthcoming (is it really that tough, Rob?). They can be used to
simulate all sorts of things, for example structures, really simple
databases, or truly useful OO elements - without the extra namespace
pressure all other schemes tend to exert.

To my mind, there are only about three really simple designs possible.
The one used by Carl, basically just a sequence of alternating keys
and values. Obvious, very simple, but slow. Then the scheme I
originally used in my 'widgets' (see Archives): again a single
sequence, but this time all the keys bunched up in first half of the
sequence and all the values in the second half. Slightly faster,
because the searched length is obviously shorter, but it requires a
couple of calculations to determine the size of the list and to index
to the appropriate value. More recently, after a lot of tests, I
reverted to my original scheme: two separate 'parallel' subsequences
of equal length. It seems to be on average 20 to 50 % faster than the
second scheme. The routines in the attached include are an extract
from the version I am currently using for the widgets (unfinished,
unpublished).

Fancier implementation are of course possible. Rob described one
recently. Or you can have (pre)sorted lists, obviously faster for
larger sets when combined with a binary search. And some sort of
hashing is probably preferable for really huge lists. jiri

--  list.e
--  jiri babor
--  jbabor at paradise.net.nz
--  00-05-18
--  version 2.00  (trimmed)

include machine.e   -- crash_message()

global function index(sequence list, object key)
    -- return *key* index, or zero if key not found
    return find(key, list[1])
end function

global function fetch(sequence list, object key)
    -- return list item with indicated key
    integer i
    i = find(key, list[1])
    if i then
        return list[2][i]
    end if
    crash_message("fetch error: key \"" & key & "\" not found...\n")
end function -- fetch

global function Fetch(
    object list,        -- list from which value is to be fetched
    sequence keys       -- sequence of list keys, one for each level
                        -- last one directly associated with fetched
                        -- value
    )
    -- return item from *nested* list : 'deep' fetch
    for i=1 to length(keys) do
        list = fetch(list, keys[i])
    end for
    return list
end function -- Fetch

global function set(sequence list, object key, object new_value)
    -- replace specified list item with a new value
    -- return modified list

    integer i

    i = find(key, list[1])      -- key index
    if i then
        list[2][i] = new_value
        return list
    end if
    crash_message("store error: key \"" & key & "\" not found...\n")
end function -- set

global function Set (sequence list, sequence keys, object new_value)
    -- 'deep', recursive set for *nested* lists
    integer i,len
    len = length(keys)
    i = find(keys[1],list[1])
    if i then
        if len > 1 then
            list[2][i] = Set(list[2][i], keys[2..len], new_value)
        else
            list[2][i] = new_value
        end if
        return list
    end if
    crash_message("Set error: key \"" & keys[1] & "\" not found...\n")
end function -- Set

global function add(sequence list, object key, object val)
    -- append list
    return {append(list[1],key), append(list[2],val)}
end function -- add

global function join(sequence list1, sequence list2)
    -- concatenate list1 and list2
    return {list1[1] & list2[1],list1[2] & list2[2]}
end function

global function delete(sequence list, object key)
    -- delete item with the specified key from the list
    integer i,l
    i = find(key, list[1])      -- index
    if i then
        l = length(list[1])
        return {list[1][1..i-1] & list[1][i+1..l],
            list[2][1..i-1] & list[2][i+1..l]}
    end if
    crash_message("delete error: key \"" & key & "\" not found...\n")
end function -- delete

new topic     » topic index » view message » categorize

2. Re: associative arrays

On Thu, 18 May 2000 16:25:59 +1200, Jiri Babor <J.Babor at GNS.CRI.NZ> wrote:

>Associative arrays is the topic, as I promised earlier today. I call
>them simply lists, because Rob kindly left the term vacant.

I see I'm lagging behind again smile

Minor quibble: The term 'list' may confuse Lisp/Prolog/Haskell programmers,
and those who have implemented linked lists in other languages to get some
level of sequence style arrays.

>To my mind, there are only about three really simple designs possible.
>The one used by Carl, basically just a sequence of alternating keys
>and values. Obvious, very simple, but slow.

Sounds just like me really doesn't it? smile

The reason I implemented it that way was because I've been heavily
influenced by using Perl recently, and basically that's how Perl stores
associative arrays in memory.

>I reverted to my original scheme: two separate 'parallel' subsequences
>of equal length. It seems to be on average 20 to 50 % faster than the
>second scheme.

I considered that, but stupidly thought: "Oh, I'll need two sequences. We
can't have that." For some unknown reason I forgot { {,,,} , {,,,} } !!

[snip good stuff - see previous post]

> global function join(...

Another reason I implemented the aa type in such a simplistic way is that
things like join() aren't necessary. The good old '&' operator suffices.

Nevertheless, Jiri's code has grown on me (because mine isn't that good) so
I'd like to suggest a couple of extras for lists.e:

-- At the top of lists.e:
without type_check -- We're fairly sure that the stuff in the library
-- fits with the type spec, so there's no point checking while here.

[global?] constant -- Readability
    Keys = 1,
    Data = 2

global type list(object assoc)
    if atom(assoc)        then return 0 end if -- atoms are bad
    if not length(assoc)  then return 1 end if -- {} is ok
    if length(assoc) != 2 then return 0 end if -- length must be 2
    return length(assoc[Keys]) = length(assoc[Data])
end type

global function invert(list assoc)
    -- Generate a reverse lookup list
    return {assoc[Data], assoc[Keys]}
end function

-- At the bottom of lists.e
with type_check -- Turn it back on for safer outside code.

Hope that's useful,
Carl -- Wondering how he manages to irritate Jiri so much...

--
The .sig got away again.

new topic     » goto parent     » topic index » view message » categorize

3. Re: associative arrays

Carl wrote:

>Minor quibble: The term 'list' may confuse Lisp/Prolog/Haskell
>programmers, and those who have implemented linked lists in other
>languages to get some level of sequence style arrays.

Yeah, I know, but 'associative arrays' is such a mouthful! Besides,
all Lisp/Prolog/Haskell programmers seem to be confined to a single
lab at one of the more obscure US universities. They are generally
very clever guys, unlikely to be fooled by us...

Thanks for your suggestions, Carl, but you probably know, what I think
about types. There are, of course, no types in Euphoria once you get
into objects and sequences, not in the conventional sense. So, as far
as I am concerned, type checking is just a drag and a bloat,
unnecessary baggage. A debugging tool of dubious value at best.

>[global?] constant -- Readability
>    Keys = 1,
>    Data = 2

How is Keys and Data more readable than 1, 2 ? Especially since there
is no reason to have them global - no one will ever even see them!

>global function invert(list assoc)
>    -- Generate a reverse lookup list
>    return {assoc[Data], assoc[Keys]}
>end function

This one has already cost me one sleepless night. For heaven's sake,
can you suggest a single possible use of it?

>Carl -- Wondering how he manages to irritate Jiri so much...

On the contrary, I have never been irritated by you, so far. After
all, we usually appear to finish on the same side in most arguments,
on a similar wavelength...

jiri

new topic     » goto parent     » topic index » view message » categorize

4. Re: associative arrays

I was going to post this reply on Friday, but my work 'net connection was
terminated before I got chance to send it. Here we go, 3-4days late...

On Fri, 19 May 2000 11:04:56 +1200, jiri babor <jbabor at PARADISE.NET.NZ>
wrote:

>Carl wrote:
>
>> [Types are] unnecessary baggage. A debugging tool of dubious value at
>> best.

Hmm. I always thought they were a good feature of the language, myself.
Even C programmers use typedefs occasionally. Maybe I'm starting to show OO
tendencies :-o

>>[global?] constant -- Readability
>>    Keys = 1,
>>    Data = 2
>
>How is Keys and Data more readable than 1, 2 ? Especially since there
>is no reason to have them global - no one will ever even see them!

More OO features. We could allow the library user access to features like:
    list myDB
    sequence myKeys
    -- Create database
    -- etc. etc.
    mykeys = myDB[Keys]

I realise this gives a hint to the internal structure, and perhaps a
library function to return the sequence of keys is a better idea, but
there's less work for the library writer this way. blink

>>global function invert(list assoc)
>>    -- Generate a reverse lookup list
>>    return {assoc[Data], assoc[Keys]}
>>end function
>
>This one has already cost me one sleepless night. For heaven's sake,
>can you suggest a single possible use of it?

To go back to my animals example, and using your functions;

Assume we have a list with the following data:
    list noises =
   -Keys-+-Data--
    cat  | miaow
    dog  | woof
    pig  | oink
    cow  | moo

...we can easily query the list to find out what noise a pig makes:
    sequence noise
    noise = fetch(noises, "pig") --> "oink"

But there's no function for finding out what animal makes the noise "moo".
By inverting the database, the noises become the Keys and the animals
become the Data. So we can say this:
    list animals
    sequence animal
    animals = invert(noises)
    animal = fetch(animals, "moo") --> "cow"

It also works for other one-to-one relationships. It's not so perfect if a
Key points to a group of Data though. e.g.:
   mary --has_chilren--> {jack, joe, jane}

...becomes:
   {jack, joe, jane} --has_mother--> mary

...when we probably wanted something like this:
   jack --has_mother--> mary
   joe  --has_mother--> mary
   jane --has_mother--> mary

>>Carl -- Wondering how he manages to irritate Jiri so much...
>
>On the contrary, I have never been irritated by you, so far. After
>all, we usually appear to finish on the same side in most arguments,
>on a similar wavelength...

/me points remote tuning device in Jiri's direction and starts turning the
dial... smile

Carl

--
Where *has* that .sigfile got to?!

new topic     » goto parent     » topic index » view message » categorize

5. Re: associative arrays

Sorry about being behind, but stuff happens here to sidetrack me....

----- Original Message -----
From: "Carl R. White" <cyrek at BIGFOOT.COM>
To: <EUPHORIA at LISTSERV.MUOHIO.EDU>
Sent: Monday, May 22, 2000 4:07 AM
Subject: Re: associative arrays


> I was going to post this reply on Friday, but my work 'net connection was
> terminated before I got chance to send it. Here we go, 3-4days late...
>
> On Fri, 19 May 2000 11:04:56 +1200, jiri babor <jbabor at PARADISE.NET.NZ>
> wrote:
>
> >Carl wrote:
> >
> >> [Types are] unnecessary baggage. A debugging tool of dubious value at
> >> best.
>
> Hmm. I always thought they were a good feature of the language, myself.
> Even C programmers use typedefs occasionally. Maybe I'm starting to show
OO
> tendencies :-o

"OO" .. pronounced like "ewwww" (sorta)

>
> >>[global?] constant -- Readability
> >>    Keys = 1,
> >>    Data = 2
> >
> >How is Keys and Data more readable than 1, 2 ? Especially since there
> >is no reason to have them global - no one will ever even see them!
>
> More OO features. We could allow the library user access to features like:
>     list myDB
>     sequence myKeys
>     -- Create database
>     -- etc. etc.
>     mykeys = myDB[Keys]
>
> I realise this gives a hint to the internal structure, and perhaps a
> library function to return the sequence of keys is a better idea, but
> there's less work for the library writer this way. blink
>
> >>global function invert(list assoc)
> >>    -- Generate a reverse lookup list
> >>    return {assoc[Data], assoc[Keys]}
> >>end function
> >
> >This one has already cost me one sleepless night. For heaven's sake,
> >can you suggest a single possible use of it?
>
> To go back to my animals example, and using your functions;
>
> Assume we have a list with the following data:
>     list noises =
>    -Keys-+-Data--
>     cat  | miaow
>     dog  | woof
>     pig  | oink
>     cow  | moo
>
> ...we can easily query the list to find out what noise a pig makes:
>     sequence noise
>     noise = fetch(noises, "pig") --> "oink"
>
> But there's no function for finding out what animal makes the noise "moo".
> By inverting the database, the noises become the Keys and the animals
> become the Data. So we can say this:
>     list animals
>     sequence animal
>     animals = invert(noises)
>     animal = fetch(animals, "moo") --> "cow"
>
> It also works for other one-to-one relationships. It's not so perfect if a
> Key points to a group of Data though. e.g.:
>    mary --has_chilren--> {jack, joe, jane}
>
> ...becomes:
>    {jack, joe, jane} --has_mother--> mary
>
> ...when we probably wanted something like this:
>    jack --has_mother--> mary
>    joe  --has_mother--> mary
>    jane --has_mother--> mary

What about gettok() ? It wasn't set up to do nested seqs, but it could be,
rather easily, i think, just by being recursive. Then toss in a little more
overhead and you could do this:

* UNTESTED CODE *
integer rec
rec = db("animal","cow",0)
if rec then
  ? db(rec,noise,0) -- moo
  ? db(rec,specie,0) -- <shrug>
-- and with more code:
  ? db(rec,"test,attr,ruminant",0) -- TRUE (it's a cow)
-- and with more code:
  inc rec
  ? db(attr,ruminant,rec) -- llama?
  inc rec
  ? db(attr,ruminant,rec) --  goat?
end if

db(object parm1, object parm2, integer where)
If parm1 is an atom integer, then assume the programmer knows what sie is
doing, and deal with that rec number. If parm1 is a word sequence (aka legal
language string) then go find parm2 as an element of parm1. If where > 0
then look only at recs greater than rec# where.

No OO. No invert(). Multiple returns possible. No fixed anything. One list.
The db can be freely added to, anywhere in the db.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu