1. ternary3
- Posted by Arthur Adamson <euclid at ISOC.NET> Jul 11, 1998
- 454 views
- Last edited Jul 12, 1998
> The file ternary3.zip posted today to official Euphoria home page can > provide > the base for dictionaries, spell checkers, etc. After a file of ascii > words is loaded, the info is stored in a one-sided ternary tree. Each > node (an integer in a sequence) contains a right, center, and left > pointer plus a numeric value 0..255 which encodes ascii chars or > whatever (extended char sets). > > To run the prog, unzip and use ex captree.ex. > > After this tree is prepared and saved, it can be reloaded and searched > for any test word (return 1 if found, 0 otherwise). > > The save and reload feature is unfinished as are other needed > functions. Only the insert (build) and search functions are complete. > > A typical output screen follows. It shows the speed and compression > capabilities using the RDS spell-checker word list as input. > > E-mail euclid at isoc.net with questions or discussion. > > Art Adamson > > > > > >Input the number of words...I will read them from words.txt >I will show more detail if you request fewer word/s than: 300 >Num of words is : 60000 (but only 51802 found in input file) >Input the maxWordLength: maxWordLength is: 25 >Words loaded! I selected from those eligible: 51802 >I will build the tree. Tree built!! Search-time check done!! >Search-completness check done!! I found all the words!!! > >I found, 0 common words in words.txt and a random words file. > > >Store and search data follows : > >Store and search times : 36.4664 13.8539 >Average store and search times per word : 0.000704 0.000267 > >numWords & maxWordLength are : 51802 22 >Num of input-chars stored : 404586 >Average word length : 7.81024 > >Nodes used for storing the tree, limit : used : 1048575 : 129881 >Nodes used per input char : 0.321022 >Total bytes for tree storage including pointers is: 519572 >Num of storage bytes per input char is : 1.28421 > > >RNodes total limit : used : 3 : 0 >Bytes to store RPointers) : 24 > >Bytes for inputWords Sequence, just for comparison: 2861616 >Compression ratio bytes(sequence) / bytes(nodes) : 5.5079 Arthur P. Adamson, The Engine Man, euclid at isoc.net