Re: Screensaver Eup'98
- Posted by Robert Craig <rds at EMAIL.MSN.COM> Jan 03, 1998
- 592 views
Ralf writes: > Why isn't there a hash-table in Euphoria, I haven't quite been > thinking about it, (if it would be cleaner to have such a table or not), > but why did you decide not to... (I'm not in favor of this, not am > against it, I'll give it a bit more thought..) Perl and a few other languages have internal hash table mechanisms for supporting "associative arrays". With an associative array you can say: x["Ralf"] = 99 x["Euphoria"] = 25 y = x["Ralf"] + 100 -- y is 199 etc. i.e. there's a list of value-pairs that are associated with eachother. To make lookups efficient for large arrays, you need to implement it with hashing. Yeah, I thought about it a long time ago, but I felt it would be better for the Euphoria programmer to define his own hash table if his application really needed it. That way he could choose a hash function that works well for his application, rather than relying on a general-purpose hash function. In practice I haven't seen many Euphoria programs where hashing would be useful. Usually the linear search of find() does the job nicely. Junko has been working on a hash table demo that she'll put on the Euphoria Web page soon. What is a hash table? Here's a quick introduction. Suppose you have a sequence s of names and you want to know if "Ralf" is one of the names. In Euphoria you can use find("Ralf", s) which will start with s[1] and compare strings until "Ralf" is found, or the end of the sequence is reached. If your list consists of 20 or fewer strings this is probably the fastest way to search. If your list has 1000 strings it may take a long time. If you have hundreds or thousands of names that you want to look up, you should consider hashing. The idea behind "hashing" is to organize the list of 1000 as many short lists, and then search one of the short lists. Suppose that you took the first character of the name (converted to upper case) and used that as an index into a sequence of 26 sequences. All the names beginning with 'A' would be on the first sequence, all the names beginning with 'B' would be on the second sequence, etc. With a nice distribution of names you might then have 26 sequences with about 1000/26 = 38 names per sequence. After indexing to select the correct sequence, you only have 38 names to search, not 1000. In reality you would never get a perfect distribution. You'd find that many more names begin with A than with X, so the A sequence might have 60 names while the X sequence has only 3. On top of that, many more of the names you have to search for would begin with A, so you would tend to search the longer sequences more often than the shorter ones. But at least you'd be searching far fewer than 1000 names, and all you had to do was one quick subscript operation. If you expected to have 10000 names on your list you would want to have lots of "buckets" to map strings to. You might want 2000 (say) possible buckets. For this you would need a "hash function" that would take any string and compute a bucket number from 1 to 2000. Junko has found a pretty good hash function for large numbers of buckets. See her demo (coming soon). Regards, Rob Craig Rapid Deployment Software