1. print.e and Other Questions

Where can I get print.e?

How do I use Euman's hash table? Or, rather, what structure is it? I'll 
be looking at it tonight, but maybe somebody can give me a general idea 
about hash tables.

Thanks!
ck

new topic     » topic index » view message » categorize

2. Re: print.e and Other Questions

print.e
http://www.rapideuphoria.com/print.zip

binary print/get
http://www.rapideuphoria.com/bget.zip
http://www.rapideuphoria.com/gbinary.zip

new topic     » goto parent     » topic index » view message » categorize

3. Re: print.e and Other Questions

5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote:

>
>Where can I get print.e?
>
>From the RDS Contributions page, I guess. It is a replacement to Euphoria's
>print() routine.

>How do I use Euman's hash table? 

It gives you a fast way on knowing if a given word is in the words.txt file or
not.

You use it by calculating the 'hash' value for the word you are checking on,
then scan through the
sequence referenced by the first letter of the word and its length, looking to
see if the hashvalue
is there. If so, then the word is in the dictionary, otherwise it is not.

   Eg.

	 theWord = upper(theWord)
      hv = EumsHash(theWord)
      l = theWord[1] - 'A' + 1
      s = length(theWord)
      inDict = 0
      for i = 1 to length(hash_table[l][s]) do
          if hash_table[l][s][i] = hv then
               -- Word is in dict.
              inDict = 1
              exit
          end if
	 end for

>Or, rather, what structure is it? 

It is a three-level sequence. The first level represents the letters of the
alphabet. It is used to
group the dictionary words by their initial letter.  The second level, that's
the level within each
initial letter, represents word length. This sorts all words that start with the
same letter into
word size. The third level is just a list of hash values for the dictionary
words.

>I'll be looking at it tonight, 

Have fun.

>but maybe somebody can give me a general idea about hash tables.

The general idea behind hashing is to calculate a single value for an item,
based on attributes of
the item. Then use this value as a sort of index to speed up searches for the
item. It is often used
in compilers and other word-processing programs that have to keep track of
individual words.

A common method is to add up the ASCII values of each letter, divide this by the
number of "bins"
you have and use the remainder to select a bin to put the word into. If you have
a huge list of
words, this is one method of effectively reducing the number of words you have
to scan through to
find the one you're after.

   Example:  Assume I have 5 bins, numbered 0 to 4.

     word          hashvalue        bin
    "CAT"            24              4
    "KANGAROO"       82              2
    "DOG"            26              1
    "ELEPHANT"       80              0
    "LEOPARD"        71              1
thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would only have
one word in them.

The hard part is getting a good enough hashing algorithm that spreads the
indexes evenly over the
bins.

---------
Cheers,
Derek Parnell 
ICQ# 7647806

new topic     » goto parent     » topic index » view message » categorize

4. Re: print.e and Other Questions

Sorry bout that CK,

I only used print.e to look at the output in an external file.
so its safe to comment this line out.
or you can download the include here:
http://www.rapideuphoria.com/print.zip

Derek more than answered the Q? about hash routines.
BIG D's the man!

Euman
euman at bellsouth.net

Q: Are we monetarily insane?
A: YES
----- Original Message ----- 
From: "Derek Parnell" <ddparnell at bigpond.com>
To: "EUforum" <EUforum at topica.com>
Sent: Tuesday, March 05, 2002 12:08 AM
Subject: Re: print.e and Other Questions


> 
> 5/03/2002 2:51:28 PM, "C. K. Lester" <cklester at yahoo.com> wrote:
> 
> >
> >Where can I get print.e?
> >
> >From the RDS Contributions page, I guess. It is a replacement to Euphoria's
> >print() routine.
> 
> >How do I use Euman's hash table? 
> 
> It gives you a fast way on knowing if a given word is in the words.txt file or
> not.
> 
> You use it by calculating the 'hash' value for the word you are checking on,
> then scan through the
> sequence referenced by the first letter of the word and its length, looking to
> see if the hashvalue
> is there. If so, then the word is in the dictionary, otherwise it is not.
> 
>    Eg.
> 
> theWord = upper(theWord)
>       hv = EumsHash(theWord)
>       l = theWord[1] - 'A' + 1
>       s = length(theWord)
>       inDict = 0
>       for i = 1 to length(hash_table[l][s]) do
>           if hash_table[l][s][i] = hv then
>                -- Word is in dict.
>               inDict = 1
>               exit
>           end if
> end for
> 
> >Or, rather, what structure is it? 
> 
> It is a three-level sequence. The first level represents the letters of the
> alphabet. It is used to
> group the dictionary words by their initial letter.  The second level, that's
> the level within each
> initial letter, represents word length. This sorts all words that start with
> the same letter into
> word size. The third level is just a list of hash values for the dictionary
> words.
> 
> >I'll be looking at it tonight, 
> 
> Have fun.
> 
> >but maybe somebody can give me a general idea about hash tables.
> 
> The general idea behind hashing is to calculate a single value for an item,
> based on attributes of
> the item. Then use this value as a sort of index to speed up searches for the
> item. It is often used
> in compilers and other word-processing programs that have to keep track of
> individual words.
> 
> A common method is to add up the ASCII values of each letter, divide this by
> the number of "bins"
> you have and use the remainder to select a bin to put the word into. If you
> have a huge list of
> words, this is one method of effectively reducing the number of words you have
> to scan through to
> find the one you're after.
> 
>    Example:  Assume I have 5 bins, numbered 0 to 4.
> 
>      word          hashvalue        bin
>     "CAT"            24              4
>     "KANGAROO"       82              2
>     "DOG"            26              1
>     "ELEPHANT"       80              0
>     "LEOPARD"        71              1
> thus "DOG" and "LEOAPARD" would both go in bin#1, but the others would only
> have one word in them.
> 
> The hard part is getting a good enough hashing algorithm that spreads the
> indexes evenly over the
> bins.
> 
> ---------
> Cheers,
> Derek Parnell 
> ICQ# 7647806
> 
> 
> 
>

new topic     » goto parent     » topic index » view message » categorize

5. Re: print.e and Other Questions

----- Original Message ----- 
From: "C. K. Lester" <cklester at yahoo.com>
To: "EUforum" <EUforum at topica.com>
> 
> C. K. Lester wrote:
> > Where can I get print.e?
> 
> Nevermind. I found it.
> 
> > How do I use Euman's hash table? Or, rather, what structure is it?
> > I'll be looking at it tonight, but maybe somebody can give me a
> > general idea about hash tables.
> 
> Okay, it looks like it's an index to the dictionary... is that right?
> 
> How is it sorted?!

CK,

I guess you can get away with saying "index to the dictionary" and that be
right.
The "word.txt" file is already sorted so when I read in the file it steps thru
it from
top to bottom (in alphabetical order) the next part is that it also "sorts" by
the
length of the text that its hashed and just appends to existing words of equal
length.

I just optimized my original routine to be almost as fast as Dereks optimized
version
he's still beating me by 0.11 sec. I just might be able to make Dereks version
faster.
I should know by tommorow.

If we keep this up Junko might have to retire her spellchecker. ;) just pickin
Junko

Euman

new topic     » goto parent     » topic index » view message » categorize

6. Re: print.e and Other Questions

Chris wrote:

>   I don't se how EumsHash() could be useful for the Contest problem.

It depends on how you've coded your solution. My word scoring code looks 
something like this:

   if find( word, dictionary ) then
      return <score based on word size>
   else
      return <score based on partial match>
   end if

50,000 is a lot of words to do that initial find on. A hash-based search 
would speed things up quite a bit.

Searching for a partial match would also benefit from a hash table. For 
example, I encode the word "DOLLAR" under the key "ABCCDE". Replacing the 
search for the key with a hash would speed things up (although storing the 
index is a better long-term solution).

-- David Cuny

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu