1. associative arrays
- Posted by Jiri Babor <J.Babor at GNS.CRI.NZ> May 18, 2000
- 561 views
Hi, All (interested), Associative arrays is the topic, as I promised earlier today. I call them simply lists, because Rob kindly left the term vacant. For the uninitiated, they are (labelled) containers for a bunch of 'key - value' pairs. The key can be just about anything, but it's usually a descriptive name, a string. The values can again be anything, even lists, or lists of lists... Simple enough concept, but very powerful. Whole languages can be built around it, Brazilian Lua is a good example. The lists are also very attractive in the Euphorian context, especially since the promised namespace solution does not seem to be forthcoming (is it really that tough, Rob?). They can be used to simulate all sorts of things, for example structures, really simple databases, or truly useful OO elements - without the extra namespace pressure all other schemes tend to exert. To my mind, there are only about three really simple designs possible. The one used by Carl, basically just a sequence of alternating keys and values. Obvious, very simple, but slow. Then the scheme I originally used in my 'widgets' (see Archives): again a single sequence, but this time all the keys bunched up in first half of the sequence and all the values in the second half. Slightly faster, because the searched length is obviously shorter, but it requires a couple of calculations to determine the size of the list and to index to the appropriate value. More recently, after a lot of tests, I reverted to my original scheme: two separate 'parallel' subsequences of equal length. It seems to be on average 20 to 50 % faster than the second scheme. The routines in the attached include are an extract from the version I am currently using for the widgets (unfinished, unpublished). Fancier implementation are of course possible. Rob described one recently. Or you can have (pre)sorted lists, obviously faster for larger sets when combined with a binary search. And some sort of hashing is probably preferable for really huge lists. jiri -- list.e -- jiri babor -- jbabor at paradise.net.nz -- 00-05-18 -- version 2.00 (trimmed) include machine.e -- crash_message() global function index(sequence list, object key) -- return *key* index, or zero if key not found return find(key, list[1]) end function global function fetch(sequence list, object key) -- return list item with indicated key integer i i = find(key, list[1]) if i then return list[2][i] end if crash_message("fetch error: key \"" & key & "\" not found...\n") end function -- fetch global function Fetch( object list, -- list from which value is to be fetched sequence keys -- sequence of list keys, one for each level -- last one directly associated with fetched -- value ) -- return item from *nested* list : 'deep' fetch for i=1 to length(keys) do list = fetch(list, keys[i]) end for return list end function -- Fetch global function set(sequence list, object key, object new_value) -- replace specified list item with a new value -- return modified list integer i i = find(key, list[1]) -- key index if i then list[2][i] = new_value return list end if crash_message("store error: key \"" & key & "\" not found...\n") end function -- set global function Set (sequence list, sequence keys, object new_value) -- 'deep', recursive set for *nested* lists integer i,len len = length(keys) i = find(keys[1],list[1]) if i then if len > 1 then list[2][i] = Set(list[2][i], keys[2..len], new_value) else list[2][i] = new_value end if return list end if crash_message("Set error: key \"" & keys[1] & "\" not found...\n") end function -- Set global function add(sequence list, object key, object val) -- append list return {append(list[1],key), append(list[2],val)} end function -- add global function join(sequence list1, sequence list2) -- concatenate list1 and list2 return {list1[1] & list2[1],list1[2] & list2[2]} end function global function delete(sequence list, object key) -- delete item with the specified key from the list integer i,l i = find(key, list[1]) -- index if i then l = length(list[1]) return {list[1][1..i-1] & list[1][i+1..l], list[2][1..i-1] & list[2][i+1..l]} end if crash_message("delete error: key \"" & key & "\" not found...\n") end function -- delete
2. Re: associative arrays
- Posted by "Carl R. White" <cyrek at BIGFOOT.COM> May 18, 2000
- 576 views
On Thu, 18 May 2000 16:25:59 +1200, Jiri Babor <J.Babor at GNS.CRI.NZ> wrote: >Associative arrays is the topic, as I promised earlier today. I call >them simply lists, because Rob kindly left the term vacant. I see I'm lagging behind again Minor quibble: The term 'list' may confuse Lisp/Prolog/Haskell programmers, and those who have implemented linked lists in other languages to get some level of sequence style arrays. >To my mind, there are only about three really simple designs possible. >The one used by Carl, basically just a sequence of alternating keys >and values. Obvious, very simple, but slow. Sounds just like me really doesn't it? The reason I implemented it that way was because I've been heavily influenced by using Perl recently, and basically that's how Perl stores associative arrays in memory. >I reverted to my original scheme: two separate 'parallel' subsequences >of equal length. It seems to be on average 20 to 50 % faster than the >second scheme. I considered that, but stupidly thought: "Oh, I'll need two sequences. We can't have that." For some unknown reason I forgot { {,,,} , {,,,} } !! [snip good stuff - see previous post] > global function join(... Another reason I implemented the aa type in such a simplistic way is that things like join() aren't necessary. The good old '&' operator suffices. Nevertheless, Jiri's code has grown on me (because mine isn't that good) so I'd like to suggest a couple of extras for lists.e: -- At the top of lists.e: without type_check -- We're fairly sure that the stuff in the library -- fits with the type spec, so there's no point checking while here. [global?] constant -- Readability Keys = 1, Data = 2 global type list(object assoc) if atom(assoc) then return 0 end if -- atoms are bad if not length(assoc) then return 1 end if -- {} is ok if length(assoc) != 2 then return 0 end if -- length must be 2 return length(assoc[Keys]) = length(assoc[Data]) end type global function invert(list assoc) -- Generate a reverse lookup list return {assoc[Data], assoc[Keys]} end function -- At the bottom of lists.e with type_check -- Turn it back on for safer outside code. Hope that's useful, Carl -- Wondering how he manages to irritate Jiri so much... -- The .sig got away again.
3. Re: associative arrays
- Posted by jiri babor <jbabor at PARADISE.NET.NZ> May 19, 2000
- 551 views
Carl wrote: >Minor quibble: The term 'list' may confuse Lisp/Prolog/Haskell >programmers, and those who have implemented linked lists in other >languages to get some level of sequence style arrays. Yeah, I know, but 'associative arrays' is such a mouthful! Besides, all Lisp/Prolog/Haskell programmers seem to be confined to a single lab at one of the more obscure US universities. They are generally very clever guys, unlikely to be fooled by us... Thanks for your suggestions, Carl, but you probably know, what I think about types. There are, of course, no types in Euphoria once you get into objects and sequences, not in the conventional sense. So, as far as I am concerned, type checking is just a drag and a bloat, unnecessary baggage. A debugging tool of dubious value at best. >[global?] constant -- Readability > Keys = 1, > Data = 2 How is Keys and Data more readable than 1, 2 ? Especially since there is no reason to have them global - no one will ever even see them! >global function invert(list assoc) > -- Generate a reverse lookup list > return {assoc[Data], assoc[Keys]} >end function This one has already cost me one sleepless night. For heaven's sake, can you suggest a single possible use of it? >Carl -- Wondering how he manages to irritate Jiri so much... On the contrary, I have never been irritated by you, so far. After all, we usually appear to finish on the same side in most arguments, on a similar wavelength... jiri
4. Re: associative arrays
- Posted by "Carl R. White" <cyrek at BIGFOOT.COM> May 22, 2000
- 549 views
I was going to post this reply on Friday, but my work 'net connection was terminated before I got chance to send it. Here we go, 3-4days late... On Fri, 19 May 2000 11:04:56 +1200, jiri babor <jbabor at PARADISE.NET.NZ> wrote: >Carl wrote: > >> [Types are] unnecessary baggage. A debugging tool of dubious value at >> best. Hmm. I always thought they were a good feature of the language, myself. Even C programmers use typedefs occasionally. Maybe I'm starting to show OO tendencies :-o >>[global?] constant -- Readability >> Keys = 1, >> Data = 2 > >How is Keys and Data more readable than 1, 2 ? Especially since there >is no reason to have them global - no one will ever even see them! More OO features. We could allow the library user access to features like: list myDB sequence myKeys -- Create database -- etc. etc. mykeys = myDB[Keys] I realise this gives a hint to the internal structure, and perhaps a library function to return the sequence of keys is a better idea, but there's less work for the library writer this way. >>global function invert(list assoc) >> -- Generate a reverse lookup list >> return {assoc[Data], assoc[Keys]} >>end function > >This one has already cost me one sleepless night. For heaven's sake, >can you suggest a single possible use of it? To go back to my animals example, and using your functions; Assume we have a list with the following data: list noises = -Keys-+-Data-- cat | miaow dog | woof pig | oink cow | moo ...we can easily query the list to find out what noise a pig makes: sequence noise noise = fetch(noises, "pig") --> "oink" But there's no function for finding out what animal makes the noise "moo". By inverting the database, the noises become the Keys and the animals become the Data. So we can say this: list animals sequence animal animals = invert(noises) animal = fetch(animals, "moo") --> "cow" It also works for other one-to-one relationships. It's not so perfect if a Key points to a group of Data though. e.g.: mary --has_chilren--> {jack, joe, jane} ...becomes: {jack, joe, jane} --has_mother--> mary ...when we probably wanted something like this: jack --has_mother--> mary joe --has_mother--> mary jane --has_mother--> mary >>Carl -- Wondering how he manages to irritate Jiri so much... > >On the contrary, I have never been irritated by you, so far. After >all, we usually appear to finish on the same side in most arguments, >on a similar wavelength... /me points remote tuning device in Jiri's direction and starts turning the dial... Carl -- Where *has* that .sigfile got to?!
5. Re: associative arrays
- Posted by Kat <gertie at PELL.NET> May 25, 2000
- 574 views
Sorry about being behind, but stuff happens here to sidetrack me.... ----- Original Message ----- From: "Carl R. White" <cyrek at BIGFOOT.COM> To: <EUPHORIA at LISTSERV.MUOHIO.EDU> Sent: Monday, May 22, 2000 4:07 AM Subject: Re: associative arrays > I was going to post this reply on Friday, but my work 'net connection was > terminated before I got chance to send it. Here we go, 3-4days late... > > On Fri, 19 May 2000 11:04:56 +1200, jiri babor <jbabor at PARADISE.NET.NZ> > wrote: > > >Carl wrote: > > > >> [Types are] unnecessary baggage. A debugging tool of dubious value at > >> best. > > Hmm. I always thought they were a good feature of the language, myself. > Even C programmers use typedefs occasionally. Maybe I'm starting to show OO > tendencies :-o "OO" .. pronounced like "ewwww" (sorta) > > >>[global?] constant -- Readability > >> Keys = 1, > >> Data = 2 > > > >How is Keys and Data more readable than 1, 2 ? Especially since there > >is no reason to have them global - no one will ever even see them! > > More OO features. We could allow the library user access to features like: > list myDB > sequence myKeys > -- Create database > -- etc. etc. > mykeys = myDB[Keys] > > I realise this gives a hint to the internal structure, and perhaps a > library function to return the sequence of keys is a better idea, but > there's less work for the library writer this way. > > >>global function invert(list assoc) > >> -- Generate a reverse lookup list > >> return {assoc[Data], assoc[Keys]} > >>end function > > > >This one has already cost me one sleepless night. For heaven's sake, > >can you suggest a single possible use of it? > > To go back to my animals example, and using your functions; > > Assume we have a list with the following data: > list noises = > -Keys-+-Data-- > cat | miaow > dog | woof > pig | oink > cow | moo > > ...we can easily query the list to find out what noise a pig makes: > sequence noise > noise = fetch(noises, "pig") --> "oink" > > But there's no function for finding out what animal makes the noise "moo". > By inverting the database, the noises become the Keys and the animals > become the Data. So we can say this: > list animals > sequence animal > animals = invert(noises) > animal = fetch(animals, "moo") --> "cow" > > It also works for other one-to-one relationships. It's not so perfect if a > Key points to a group of Data though. e.g.: > mary --has_chilren--> {jack, joe, jane} > > ...becomes: > {jack, joe, jane} --has_mother--> mary > > ...when we probably wanted something like this: > jack --has_mother--> mary > joe --has_mother--> mary > jane --has_mother--> mary What about gettok() ? It wasn't set up to do nested seqs, but it could be, rather easily, i think, just by being recursive. Then toss in a little more overhead and you could do this: * UNTESTED CODE * integer rec rec = db("animal","cow",0) if rec then ? db(rec,noise,0) -- moo ? db(rec,specie,0) -- <shrug> -- and with more code: ? db(rec,"test,attr,ruminant",0) -- TRUE (it's a cow) -- and with more code: inc rec ? db(attr,ruminant,rec) -- llama? inc rec ? db(attr,ruminant,rec) -- goat? end if db(object parm1, object parm2, integer where) If parm1 is an atom integer, then assume the programmer knows what sie is doing, and deal with that rec number. If parm1 is a word sequence (aka legal language string) then go find parm2 as an element of parm1. If where > 0 then look only at recs greater than rec# where. No OO. No invert(). Multiple returns possible. No fixed anything. One list. The db can be freely added to, anywhere in the db. Kat