1. ternary3
- Posted by Arthur Adamson <euclid at ISOC.NET>
Jul 11, 1998
-
Last edited Jul 12, 1998
> The file ternary3.zip posted today to official Euphoria home page can >
provide
> the base for dictionaries, spell checkers, etc. After a file of ascii
> words is loaded, the info is stored in a one-sided ternary tree. Each
> node (an integer in a sequence) contains a right, center, and left
> pointer plus a numeric value 0..255 which encodes ascii chars or
> whatever (extended char sets).
>
> To run the prog, unzip and use ex captree.ex.
>
> After this tree is prepared and saved, it can be reloaded and searched
> for any test word (return 1 if found, 0 otherwise).
>
> The save and reload feature is unfinished as are other needed
> functions. Only the insert (build) and search functions are complete.
>
> A typical output screen follows. It shows the speed and compression
> capabilities using the RDS spell-checker word list as input.
>
> E-mail euclid at isoc.net with questions or discussion.
>
> Art Adamson
>
>
>
>
>
>Input the number of words...I will read them from words.txt
>I will show more detail if you request fewer word/s than: 300
>Num of words is : 60000 (but only 51802 found in input file)
>Input the maxWordLength: maxWordLength is: 25
>Words loaded! I selected from those eligible: 51802
>I will build the tree. Tree built!! Search-time check done!!
>Search-completness check done!! I found all the words!!!
>
>I found, 0 common words in words.txt and a random words file.
>
>
>Store and search data follows :
>
>Store and search times : 36.4664 13.8539
>Average store and search times per word : 0.000704 0.000267
>
>numWords & maxWordLength are : 51802 22
>Num of input-chars stored : 404586
>Average word length : 7.81024
>
>Nodes used for storing the tree, limit : used : 1048575 : 129881
>Nodes used per input char : 0.321022
>Total bytes for tree storage including pointers is: 519572
>Num of storage bytes per input char is : 1.28421
>
>
>RNodes total limit : used : 3 : 0
>Bytes to store RPointers) : 24
>
>Bytes for inputWords Sequence, just for comparison: 2861616
>Compression ratio bytes(sequence) / bytes(nodes) : 5.5079
Arthur P. Adamson, The Engine Man, euclid at isoc.net