OpenEuphoria: Forum: Re: Euphoria Programming Contest...

Re: Euphoria Programming Contest...

new topic » topic index » view thread » older message » newer message
Posted by Arthur Adamson <euclid at ISOC.NET> May 29, 1998
813 views
--=====================_896490233==_

At 10:04 PM 4/16/98 -0500, you wrote:
>Andy,
>A little over a month ago we had a little contest going to write
>a program that could generate primes faster than the benchmark
>RDS uses, and Arthur Adamsomson graciously served as host and
>timekeeper for it. There were no prizes, it was just for fun,
>but many people still participated. If you or anyone else would
>like to start another contest, I'm sure you'd find plenty of
>willing participants. I think we could do without a huge list
>of rules and deadlines though... contests for fun should be
>just that, fun.
>
>Christopher D. Hickman
>
>Btw, How are you Arthur? Haven't seen you on the list for a while.

Chris and all:

        Well, it is time for another challenge. In Dr Dobbs Journal for
April, 1998, is an article re trenary trees, ie 3 branches per node rather
than the usual 2. The article is in the attached ternary.htm though you will
need to go to DDJ site for the figures. This sounded interesting re Euphoria
because of our flexible sequences. The article code is based on C.
        I translated 2 of the routines to Euphoria. The result worked but
required a disappointing ~70 bytes per character stored.
        I fussed with it and got down to ~ 4 bytes per char stored and I
believe fast building and searching. I am pretty proud of the 4 bytes since
each node/character requires 3 pointers and a character. (I have limited the
characters to lower case, alphabetic). I Build a tree of ~ 20500 words,
79000 characters in 13 seconds; then search for and find all the words in 3
seconds. I don't know if this is fast or slow. I use DOS, 6.21, pentium 120
mhz. My capacity is limited by the single integer I have allocated to each
node. More space allocation would allow larger trees...and maybe more speed.
        There is plenty of opportunity to reduce the space and increase the
speed. Also, several other routines are needed. Also, I haven't yet
implemented disk save and reload. Also, I don't have any idea how good is
good re space and time. Also, the code can stand a lot of cleanup and
streamlining. My own efforts are now concentrated on reducing space,
primarily by improving the end-of-word flag a ~, but are going slowly.
        So, as we did in primes, how about putting your shoulder to the tree
trunk and see what we can do.
        With luck, the result will give a very fast and compact dictionary
for whatever use you choose.
        My efforts are contained in ternary.zip at the RDS Euphoria Home
Page. Download it and give it a try. If you are interested in helping, send
me a note and I will email direct to you my latest status.

        Yours for better trees.
Art Adamson, euclid at isoc.net


--=====================_896490233==_

<!-- Site-wide default Template -->
<HTML><HEAD>
<title>Dr. Dobb's Journal April 1998: Ternary SearchTrees</title>
</HEAD>
<BODY BGCOLOR="#ffffff">
<A NAME="master_top"></A>
<CENTER><STRONG><FONT SIZE="+2" COLOR="#c03030">Dr. Dobb's Web
Site</FONT></STRONG></CENTER>
<HR><CENTER>[
<A HREF="http://www.ddj.com/">Home</A> |
<A HREF="/ddj/">Articles</A> |
<A HREF="/ftp/">Source Code</A> |
<A HREF="/ercb/">Book Reviews</A> |
<A HREF="/index/">Index</A> |
<A HREF="http://www.ddj.mfi.com/scripts/shop">CD-ROMs</A> |
<A HREF="/careers/">Careers</A> |
<A HREF="/oped/">Op-Ed</A> |
<A HREF="/orders/ddjsub.htm">Subscribe</A>
]</CENTER><HR>
<CENTER>

<a
href="/advert/bin/nph-redirect.cgi?to=http%3a//www.nesbitt.com/&from=advert/advertisers/nesbitt/include.htm"><img
src="/advert/advertisers/nesbitt/banner.gif"     alt="Nesbitt Software"    
width=460 height=55></a>
</CENTER>


<!--Copyright © Dr. Dobb's Journal-->
<h1>Ternary SearchTrees</h1>

<p><i>Dr. Dobb's Journal</i> April 1998
</p>
<h3>By Jon Bentley and Robert Sedgewick</h3>

<I>Jon is a Member of Technical Staff at Bell Labs. Bob is the William O. Baker
Professor of Computer Science at Princeton University. They can be reached at jlb
at research.bell-labs.com and rs at cs.princeton.edu, respectively.</I>

<hr>

<IMG SRC="inset.gif" ALT="[inset art]" ALIGN=LEFT WIDTH="282" HEIGHT="209">When
you have to store a set of strings, what data structure should you use? You could
use hash tables, which sprinkle the strings throughout an array. Access is fast,
but information about relative order is lost. Another option is the use of binary
search trees, which store strings in order, and are fairly fast. Or you could use
digital search tries, which are lightning fast, but use lots of space.

<p>In this article, we'll examine ternary search trees, which combine the time
efficiency of digital tries with the
space efficiency of binary search trees. The resulting structure is faster than
hashing for many typical search problems, and supports a broader range of useful
problems and operations. Ternary searches are faster than hashing and more
powerful, too.</p>

<p>We described the theory of ternary search trees at a symposium in 1997 (see
"Fast Algorithms for Sorting and Searching Strings," by J.L. Bentley and R.
Sedgewick, <i>Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete
Algorithms</i>, January 1997). In this article, we'll concentrate on what the
data structure offers working programmers. <i>Algorithms in C</i>, Third Edition,
by Robert Sedgewick (Addison-Wesley, 1998) provides yet another view of ternary
search trees. For more information (including all the code in this article and
the driver software), refer to http://www.cs.princeton.edu/~rs/strings/ or see
"Resource Center," page 3.</p>

<h3>A Grove of Trees</h3>

<p><A NAME="rf1"><A HREF="leadf1.htm">Figure 1</A> is a binary search tree that
represents 12 common two-letter words. For every node, all nodes down the left
child have smaller values, while all nodes dow
n the right child have greater values. A search starts at the root. To find the
word "on," for instance, we compare it to "in" and take the right branch. We take
the right branch at "of" and the left branch at "or," and then arrive at "on."
Every comparison could access each character of both words.</p>

<p>Digital search tries store strings character by character. <A NAME="rf2"><A
HREF="leadf2.htm">Figure 2</A> is a tree that represents the same set of 12
words; each input word is shown beneath the node that represents it. (Two-letter
words lead to prettier pictures; all structures that we'll see can store
variable-length words.) In a tree representing words of lowercase letters, each
node has 26-way branching (though most branches are empty, and not shown in <A
HREF="leadf2.htm">Figure 2</A>). Searches are very fast: A search for "is" starts
at the root, takes the "i" branch, then the "s" branch, and ends at the desired
node. At every node, we access an array element (one of 26)
, test for null, and take a branch. Unfortunately, search tries have exorbitant
space requirements: Nodes with 26-way branching typically occupy 104 bytes, and
256-way nodes consume a kilobyte. Eight nodes representing the 34,000-character
Unicode Standard would together require more than a megabyte!</p>

<p>Ternary search trees combine attributes of binary search trees and digital
search tries. Like tries, they proceed character by character. Like binary search
trees, they are space efficient, though each node has three children, rather than
two. A search compares the current character in the search string with the
character at the node. If the search character is less, the search goes to the
left child; if the search character is greater, the search goes to the right
child. When the search character is equal, though, the search goes to the middle
child, and proceeds to the next character in the search string.</p>

<p><A NAME="rf3"><A HREF="leadf3.htm">Figure 3</A> is a balanced ternary search
tree for the same set of 12 words. The low
and high pointers are shown as solid lines, while equal pointers are shown as
dashed lines. Each input word is shown beneath its terminal node. A search for
the word "is" starts at the root, proceeds down the equal child to the node with
value "s," and stops there after two comparisons. A search for "ax" makes three
comparisons to the first letter ("a") and two comparisons to the second letter
("x") before reporting that the word is not in the tree. </p>

<p>The idea behind ternary search trees dates back at least as far as 1964. In
"Randomized Binary Searching with Tree Structures" (<i>Communications of the
ACM</i>, March 1964), H.A. Clampett sketched a primitive version of the
structure. Computer scientists have proven many theorems about the trees; for
instance, searching for a string of length <i>k</i> in a ternary search tree with
<i>n</i> strings will require at most <i>O</i>(<i>log</i> <i>n</i>+<i>k</i>)
comparisons. </p>

<h3>The C Structure</h3>

<p>As far as we can tell,
previous authors have viewed ternary search trees as a theoretical structure
 for proving theorems. We were delighted to find that the trees also lead to
 practical computer programs. Although we've chosen to illustrate the data
 structure in C, we could have just as easily chosen C++, Java, or other
 languages. </p>

<p>Each node in a ternary search tree is represented by the following structure:
</p>

<blockquote>
<PRE>
typedef struct tnode *Tptr;
typedef struct tnode {
   char splitchar;
   Tptr lokid, eqkid, hikid;

} Tnode;
</PRE></blockquote>
<p>The value stored at the node is <i>splitchar</i>, and the three pointers
represent the three children. The root of the tree is declared to be <i>Tptr
root</i>. We will represent every character in each string, including the null
character that terminates it. </p>

<h3>Membership Searching</h3>

<p>We begin with a recursive version of the <i>search</i> function. It returns 1
if string <i>s</i> is in the subtree rooted at <i>p,</i> and 0 otherwise; it is
originally called as <i>rsearch(root
, s)</i>:</p>

<blockquote>

<PRE>int rsearch(Tptr p, char *s)
{   if (!p) return 0;
    if (*s &lt; p-&gt;splitchar)
         return rsearch(p-&gt;lokid, s);
    else if (*s &gt; p-&gt;splitchar)
        return rsearch(p-&gt;hikid, s);
    else {
        if (*s == 0) return 1;
        return rsearch(p-&gt;eqkid, ++s);
   }

}
</PRE></blockquote><p>The first <i>if</i> returns 0 if the search has run off
the end of the tree. The next two <i>if</i> statements take the low and high
branches as appropriate. The final <i>else</i> branch returns 1 if the current
character is the end-of-string character 0, and otherwise moves to the next input
character and to the equal branch. </p>

<p>Some programmers might feel more comfortable with the iterative version of
the <i>search</i> function (which has only one argument): </p>

<blockquote>

<PRE>int search(char *s)
{   Tptr p;
    p = root;
    while (p) {
       if (*s &lt; p-&gt;splitchar)
           p = p-&gt;lokid;

  else if (*s == p-&gt;splitchar) {
          if (*s++ == 0)
              return 1;
          p = p-&gt;eqkid;
      } else
          p = p-&gt;hikid;
    }
    return 0;

}
</PRE></blockquote><p>This is substantially faster on some compilers, and we'll
use it in later experiments. On most optimizing compilers, though, the run time
of the recursive version is within a few percent of the iterative version. </p>

<p>Both versions of the search use a pattern that we will see repeatedly: If the
search character is less, go to <i>lokid</i>; if it is greater, go to
<i>hikid</i>; if it is equal, go to the next character and <i>eqkid</i>. The
following code illustrates that shorter is not always cleaner, simpler, faster,
or better:</p>

<blockquote>
<PRE>
int rsearch2(Tptr p, char *s)
{ return (!p ? 0 : (
        *s == p-&gt;splitchar
       ? (*s ? rsearch2(p-&gt;eqkid, ++s) : 1)
       : (rsearch2(*s &lt; p-&gt;splitchar ?
                p-&gt;lokid : p-&gt;hikid, s))
     ));

}
</PRE></blockquote>

<h3>Inserting a New Stri
ng</h3>

<p>The <i>insert</i> function inserts a new string into the tree, and does
nothing if it is already present. We insert the string <i>s</i> with the code
<i>root = insert(root, s);</i>. The first <i>if</i> statement detects running off
the end of new node, initializes it, and falls through to the standard case.
Subsequent code takes the appropriate branch, but branches to <i>eqkid</i> only
if characters remain in the string. </p>

<blockquote>

<PRE>   Tptr insert(Tptr p, char *s)
   { if (p == 0) {
          p = (Tptr) malloc(sizeof(Tnode));
          p-&gt;splitchar = *s;
          p-&gt;lokid = p-&gt;eqkid = p-&gt;hikid = 0;
     }
     if (*s &lt; p-&gt;splitchar)
          p-&gt;lokid = insert(p-&gt;lokid, s);
     else if (*s == p-&gt;splitchar) {
          if (*s != 0)
              p-&gt;eqkid = insert(p-&gt;eqkid, ++s);
     } else
          p-&gt;hikid = insert(p-&gt;hikid, s);
     return p;

  }
</PRE></blockquote>
<p>This code is short but subtle
, and worth careful study. </p>

<p>The resulting tree stores the strings themselves, but no other information.
Real symbol tables store additional data with each string. To illustrate this
problem, we'll store a pointer to every string in the tree; this data will be
used by later search algorithms. We could add a new <i>info</i> field to each
node, but that would be wasteful. Instead, we'll exploit the fact that a node
with a null <i>splitchar</i> cannot have an <i>eqkid</i>, and store the data in
that field. We could make a <i>union</i> that contains the two possible pointers,
but that would be syntactically clumsy (we would have to refer to the union at
each reference). We will instead use the sleazy hack of coercing a string into a
<i>Tptr</i>. The code inside <i>*s == p-&gt;splitchar</i> test, therefore,
becomes the following:</p>

<blockquote>
<PRE>
if (*s == 0)
    p-&gt;eqkid = (Tptr) insertstr;
else

    p-&gt;eqkid = insert(p-&gt;eqkid, ++s);
</PRE></blockquote>

<h3>Faster Insertion Functions</h3>

<p>We can improve insert
ion in two different ways. We'll start by tuning the insertion code, and
consider different orders in which we can insert the nodes into the tree. </p>

<p>A major cost of the <i>insert</i> function is calling <i>malloc</i> to create
each node. The function <i>insert2</i> in the demo program (available
electronically) uses the ancient C technique of allocating a buffer of nodes and
dealing them out as needed. Profiling shows that this eliminates most of the time
spent in storage allocation. We measured the time to insert the 234,936 words in
a dictionary file (one word per line, for a total of 2,486,813 characters). <A
NAME="rt1"><A HREF="leadt1.htm">Table 1</A> lists the run times in seconds on
three 200-MHz machines, with the compilers optimizing code for speed. The
<i>insert3</i> function (also available electronically) uses other common
techniques to reduce the run time even further: transforming recursion to
iteration, keeping a pointer to a pointer, reordering tests, saving
a difference in a comparison, and splitting the single loop into two loops.
These changes are beneficial on all machines, and substantial on the SPARC. </p>

<h3>Better Insertion Orders</h3>

<p>In what order should you insert the nodes into a tree? No matter in what
order you insert the nodes, you end up with the same digital search trie -- the
data structure is totally insensitive to insertion order. Binary search trees are
at the opposite end of the spectrum: If you insert the nodes in a good order
(middle element first), you end up with a balanced tree. If you insert the nodes
in sorted order, the result is a long skinny tree that is very costly to build
and search. Fortunately, if you insert the nodes in random order, a binary search
tree is usually close to balanced. </p>

<p>Ternary search trees fall between these two extremes. You can build a
completely balanced tree by inserting the median element of the input set, then
recursively inserting all lesser elements and greater elements. A simpler
approach first sorts the input set.
The recursive <i>build</i> function inserts the middle string of its subarray,
then recursively builds the left and right subarrays. We use this method in our
experiments; it is fast and produces fairly well-balanced trees. The cost of
inserting all words in a dictionary with function <i>insert3</i> is never more
than about 10 percent greater than searching for all words. D.D. Sleator and R.E.
Tarjan describe theoretical balancing algorithms for ternary search trees in
"Self-Adjusting Binary Search Trees" (<i>Journal of the ACM</i>, July 1985). </p>

<h3>Comparison to Other Data Structures</h3>

<p>Symbol tables are typically represented by hash tables. We conducted a simple
experiment to compare ternary search trees to that classic data structure; the
complete code is available electronically.</p>

<p>To represent <i>n</i> strings, our hash code uses a chained table of size
<i>tabsize=n</i>. The hash function, from Section 6.6 of B. Kernighan and D.
Ritchie's <i>The C Programming
Language</i>, Second Edition (Prentice Hall, 1988), is reasonably efficient and
 produces good spread:</p>

<blockquote>
<PRE>
int hashfunc(char *s)
{   unsigned n = 0;
        for ( ; *s; s++)
                 n = 31 * n + *s;
        return n % tabsize;

}
</PRE></blockquote>

<p>The body of the <i>search</i> function is as follows:</p>

<blockquote>
<PRE>
for (p = tab[hashfunc(s)]; p; p = p-&gt;next)
    if (strcmp(s, p-&gt;str) == 0)
       return 1;

return 0;
</PRE>
</blockquote>

<p>For fair timing, we replaced the string comparison function <i>str</i>cmp
with inline code (so the hash and tree <i>search</i> functions used the same
coding style). We used the same dictionary to compare the performance of hashing
and ternary trees. We first built the tree by inserting each word in turn (middle
element first, then recurring). Finally, we searched for each element word in the
input file. <A NAME="rt2"><A HREF="leadt2.htm">Table 2</A> presents the times in
seconds for the two data structures across three machines. For both bui
lding and (successful) searching, the two structures have comparable search
times. </p>

<p>Ternary search trees are usually noticeably faster than hashing for
unsuccessful searches. Ternary trees can discover mismatches after examining only
a few characters, while hashing always processes the entire key. On some data
sets with very long keys and mismatches in the first few characters, ternary
trees took less than one-fifth the time of hashing. </p>

<p>Functions <i>insert3</i> and <i>search</i> combine to yield a time-efficient
symbol table. On one dictionary described at our web site, ternary search trees
used about three times the space of hashing. An alternative representation of
ternary search trees is more space efficient: When a subtree contains a single
string, we store a pointer to the string itself (and each node stores three bits
telling whether its children point to nodes or strings). This leads to code that
is less efficient, but it reduces the number of tree nodes cl
ose to the space used by hashing. </p>

<p>We analyzed the worst-case performance of several aspects of ternary search
trees in our theoretical paper. In "The Analysis of Hybrid Trie Structures"
(<i>Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms</i>,
January 1998), J. Clement, P. Flajolet, and B. Valle analyze the expected
performance of a broad class of hybrid trees that includes ternary trees, and
describe extensive experiments to support their theory. </p>

<h3>Other Operations on Ternary Trees</h3>

<p>Most standard techniques on binary search trees can be applied immediately to
their ternary cousins. For instance, we can print the strings in the tree in
sorted order with a recursive traversal:</p>

<blockquote>

<PRE>void traverse(Tptr p)
{    if (!p) return;
     traverse(p-&gt;lokid);
     if (p-&gt;splitchar)
         traverse(p-&gt;eqkid);
     else
         printf("%s/n", (char *) p-&gt;eqkid);
     traverse(p-&gt;hikid);

}
</PRE>
</blockquote>

<p>Simple recursive searches can find the predec
essor or successor of a given element or list all items in a given range. If we
add a <i>count</i> field to every node, we can quickly count the elements in a
given range, count how many words begin with a given substring, or select the
<i>m</i>th largest element. Most of these operations require logarithmic time in
a ternary tree, but linear time in a hash table. </p>

<h3>Partial-Match Searching </h3>

<p>We turn next to the more subtle problem of "partial-match" or "crossword
puzzle" searching: A query string may contain both regular letters and the "don't
care" character ".". Searching the dictionary for the pattern ".u.u.u" matches
the single word <i>auhuhu</i>, while the pattern ".a.a.a" matches 94 words,
including <i>banana</i>, <i>casaba</i>, and <i>pajama</i>. (That pattern does not
match <i>abracadabra</i>, though. The entire word must match the pattern; we do
not look for patterns in the middle of longer words.) </p>

<p>This venerable problem has been studied in many p
apers, such as in "The World's Fastest Scrabble Program," by A.W. Appel and G.J.
Jacobson (<i>Communications of the ACM, May 1988</i>). In "Partial-Match
Retrieval Algorithms," (<i>SIAM Journal on Computing, 5</i>, 1976), R.L. Rivest
presents an algorithm for partial-match searching in digital tries: Take the
single given branch if a letter is specified, for a don't-care character,
recursively search all branches. Function <i>pmsearch</i> implements Rivest's
method in ternary search trees. The function puts pointers to matching words in
<i>srcharr[0..srchtop-1]</i>. It is called, for instance, by: </p>

<blockquote>

<PRE>srchtop = 0;
pmsearch(root, ".a.a.a");
</PRE></blockquote>
<p>The following is the complete search code:</p>
<blockquote>
<PRE>void pmsearch(Tptr p, char *s)
{    if  (!p) return;
     nodecnt++;
     if (*s == '.' || *s &lt; p-&gt;splitchar)
        pmsearch(p-&gt;lokid, s);
     if (*s == '.' || *s == p-&gt;splitchar)
        if (p-&gt;splitchar &amp;&amp; *s)
            pmsearch(p-&gt;eqkid, s+1);
     if
(*s == 0 &amp;&amp; p-&gt;splitchar == 0)
            srcharr[srchtop++] =
                (char *) p-&gt;eqkid;
     if (*s == '.' || *s &gt; p-&gt;splitchar)
            pmsearch(p-&gt;hikid, s);
}
</PRE></blockquote>
<p>Function <i>pmsearch</i> has five <i>if</i> statements. The second and fifth
<i>if</i> statements are symmetric; they recursively search the <i>lokid</i> (or
<i>hikid</i>) when the search character is the don't care "." or when the search
string is less (or greater) than the <i>splitchar</i>. The third <i>if</i>
statement recursively searches the <i>eqkid</i> if both the <i>splitchar</i> and
string are nonnull. The fourth <i>if</i> statement detects a match to the query
and adds the pointer to the complete word (stored in <i>eqkid</i> by our sleazy
hack). </p>
<p>Rivest states that partial-match search in a trie requires "time about
<i>O</i>(<i>n<sup>(k-s)/k)</i></sup> to respond to a query word with <i>s</i>
letters specified, given a file of <i>n</i> <i>k
</i>-letter words." Ternary search trees can be viewed as an implementation of
his tries (with binary trees implementing multiway branching), so we expected his
results to apply immediately to our program. Our experiments, however, led to a
surprise: Unspecified positions at the front of the query word are dramatically
more costly than unspecified characters at the end of the word. (Rivest suggested
shuffling the characters in a word to avoid such problems.) <A NAME="rt3"><A
HREF="leadt3.htm">Table 3</A> gives the flavor of our experiments. The first line
says that the search visited 18 nodes to find the single match to the pattern
"banana." The tree contains a total of 1,026,033 nodes; searches with some of the
first letters specified visit just a tiny fraction of those nodes. </p>
<h3>Near-Neighbor Searching </h3>
<p>We finally examine the problem of "near-neighbor searching" in a set of
strings: We are to find all words in the dictionary that are within a given
Hamming distance of a query word. For instance, a search for all words wit
hin distance two of <i>Dobbs</i> finds <i>Debby</i>, <i>hobby</i>, and 14 other
words. </p>
<p>Function <i>nearsearch</i> performs a near-neighbor search in a ternary
search tree. Its three arguments are a tree node, string, and distance. The first
<i>if</i> statement returns if the node is null or the distance is negative. The
second and fourth <i>if</i> statements are symmetric: They search the appropriate
child if the distance is positive or if the query character is on the appropriate
side of <i>splitchar</i>. The third <i>if</i> statement either checks for a match
or recursively searches the middle child.</p>
<blockquote>
<PRE>
void nearsearch(Tptr p, char *s, int d)
{   if (!p || d &lt; 0) return;
    if (d &gt; 0 || *s &lt; p-&gt;splitchar)
        nearsearch(p-&gt;lokid, s, d);
    if (p-&gt;splitchar == 0) {
       if ((int) strlen(s) &lt;= d)
          srcharr[srchtop++] = (char *) p-&gt;eqkid;
    } else
       nearsearch(p-&gt;eqkid, *s ? s+1:s,
          (*s
== p-&gt;splitchar) ? d:d-1);
    if (d &gt; 0 || *s &gt; p-&gt;splitchar)
        nearsearch(p-&gt;hikid, s, d);
}
</PRE></blockquote>
<p>Our web site contains data on the performance of this search algorithm. It is
quite efficient when searching for near neighbors, but searching for distant
neighbors grows more expensive. A simple probabilistic model accurately predicts
its run time on real data. </p>
<h3>Conclusion</h3>
<p>The primary challenge in implementing digital search tries is to avoid using
excessive memory for trie nodes that are nearly empty. Ternary search trees may
be viewed as a trie implementation that gracefully adapts to handle this case, at
the cost of slightly more work for full nodes. Ternary search trees combine the
best of two worlds: the low space overhead of binary search trees and the
character-based time efficiency of digital search tries. </p>
<p>Ternary search trees have been used for several years to represent English
dictionaries in a commercial optical character recognition (OCR) system built at
Bell
Labs. The trees were faster than hashing for the task, and they gracefully
handle the 34,000-character set of the Unicode Standard. The designers have also
experimented with using partial-match for word lookup: Replace letters with low
probability of recognition with the "don't care" character. </p>
<p>Ternary search trees are efficient and easy to implement. They offer
substantial advantages over both binary search trees and digital search tries. We
feel that they are superior to hashing in many applications for the following
reasons:</p>
<ul>
<li>Ternary trees do not incur extra overhead for insertion or successful
  searches.
<li>Ternary trees are usually substantially faster than hashing for
  unsuccessful searches.
<li>Ternary trees gracefully grow and shrink; hash tables need to be rebuilt
  after large size changes.
<li>Ternary trees support advanced searches, such as partial-match and
  near-neighbor search.
  <li>Ternary trees support many other operations, such as tr
aversal to report items in sorted order.
</ul>
<p><b>DDJ</b></p>


<HR><I>Copyright &copy; 1998, Dr. Dobb's Journal</I><BR>
<A HREF="http://www.ddj.com/">Dr. Dobb's Web Site Home Page</A> --
<A HREF="#master_top">Top of This Page</A><BR>
--=====================_896490233==_

Arthur P. Adamson, The Engine Man, euclid at isoc.net

--=====================_896490233==_--
OpenEuphoria

Re: Euphoria Programming Contest...

Search

Include:

Quick Links

User menu

Misc Menu