Re: Screensaver Eup'98

new topic     » goto parent     » topic index » view thread      » older message » newer message

Ralf writes:
> Why isn't there a hash-table in Euphoria, I haven't quite been
> thinking about it, (if it would be cleaner to have such a table or not),
> but why did you decide not to... (I'm not in favor of this, not am
> against it, I'll give it a bit more thought..)

Perl and a few other languages have internal hash table
mechanisms for supporting "associative arrays".
With an associative array you can say:

               x["Ralf"] = 99
               x["Euphoria"] = 25
               y = x["Ralf"] + 100    -- y is 199
etc.

i.e. there's a list of value-pairs that are associated with eachother.
To make lookups efficient for large arrays, you need to
implement it with hashing.

Yeah, I thought about it a long time ago, but I felt it
would be better for the Euphoria programmer to
define his own hash table if his application really needed it.
That way he could choose a hash function that works well
for his application, rather than relying on a general-purpose
hash function. In practice I haven't seen many Euphoria
programs where hashing would be useful. Usually the linear
search of find() does the job nicely.

Junko has been working on a hash table demo that she'll
put on the Euphoria Web page soon.

What is a hash table? Here's a quick introduction.

Suppose you have a sequence s of names and you want
to know if "Ralf" is one of the names. In Euphoria you can
use find("Ralf", s) which will start with s[1] and compare strings
until "Ralf" is found, or the end of the sequence is reached.

If your list consists of 20 or fewer strings this is probably the fastest
way to search. If your list has 1000 strings it may
take a long time. If you have hundreds or thousands of names
that you want to look up, you should consider hashing.

The idea behind "hashing" is to organize the list of 1000 as
many short lists, and then search one
of the short lists. Suppose that you took the first character
of the name (converted to upper case) and used that as
an index into a sequence of 26 sequences. All the
names beginning with 'A' would be on the first sequence, all the
names beginning with 'B' would be on the second sequence, etc.
With a nice distribution of names you might then have
26 sequences with about 1000/26 = 38 names per sequence.
After indexing to select the correct sequence, you only
have 38 names to search, not 1000.

In reality you would never get a perfect distribution.
You'd find that many more names begin with A than
with X, so the A sequence might have 60 names while the
X sequence has only 3. On top of that, many more of the
names you have to search for would begin with A, so you
would tend to search the longer sequences more often than
the shorter ones.

But at least you'd be searching far fewer than 1000 names,
and all you had to do was one quick subscript operation.

If you expected to have 10000 names on your list you would want
to have lots of "buckets" to map strings to. You
might want 2000 (say)  possible buckets. For this you
would need a "hash function" that would take any string
and compute a bucket number from 1 to 2000.

Junko has found a pretty good hash function for large numbers
of buckets. See her demo (coming soon).

Regards,
     Rob Craig
     Rapid Deployment Software

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu