OpenEuphoria: Forum: ternary search trees question

1. ternary search trees question

Posted by Tone Škoda <tskoda at email.si> Aug 09, 2005
623 views
Last edited Aug 10, 2005

Maybe some expert on this topic will be able to answer this question:

What if we insert key "aba" and then "abandon". Since data for key is stored in
eqkid, "aba" can't have any data because eqkid will be required to store
"abandon" key.

Am I missing something?

I'm writing ternary search tree database because creation (not access) of
database with EDS is too slow for something I'm doing.

http://algorithm.myrice.com/resources/technical_artile/ternary_search_tree/terstree.htm

new topic » topic index » view message » categorize

2. Re: ternary search trees question

Posted by Matt Lewis <matthewwalkerlewis at gmail.com> Aug 09, 2005
615 views
Last edited Aug 10, 2005

Tone Škoda wrote:
> 
> Maybe some expert on this topic will be able to answer this question:
> 
> What if we insert key "aba" and then "abandon". Since data for key is stored
> in eqkid,
> "aba" can't have any data because eqkid will be required to store "abandon"
> key.
> 
> Am I missing something?
> 
> I'm writing ternary search tree database because creation (not access) of
> database
> with EDS is too slow for something I'm doing.

I haven't thought much about ternary trees, but that's what Pete Eberlein 
used in Derek's contest:

http://www.rapideuphoria.com/finalsubmissions.zip

Matt Lewis

new topic » goto parent » topic index » view message » categorize

3. Re: ternary search trees question

Posted by "Kat" <gertie at visionsix.com> Aug 09, 2005
603 views
Last edited Aug 10, 2005

On 9 Aug 2005, at 13:08, Tone =8Akoda wrote:

>
>
> posted by: Tone =8Akoda <tskoda at email.si>
>
>
>
>
> Maybe some expert on this topic will be able to answer this question:
>
> What if we insert key "aba" and then "abandon". Since data for key is sto=
red in
> eqkid, "aba" can't have any data because eqkid will be required to store
> "abandon" key.
>
> Am I missing something?
>
> I'm writing ternary search tree database because creation (not access) of=

> database with EDS is too slow for something I'm doing.
>
> http://algorithm.myrice.com/resources/technical_artile/ternary_search_tre=
e/terst
> ree.htm

The part i hope someone can explain to me is this: how are the branches of=

such trees ordered, on what criteria, and how is such pre-set-up criteria=

going to be optimal for a search in any but for the pre-set-up question?

For instance, you can set up a tree for words who's every other letter is t=
he
same, or every 2nd letter is the same, or have common endings (minus the=

suffixes), or have the same 3rd letter. But, for instance, how is the ranki=
ng for=20
3rd letters arranged for "cow" vs "cot" vs "pow" determined, by what criter=
ia?=20
Why would "pow" be anywhere near "cow"? I understand pattern matching,
and i understand hashing, but to pre-set the database for fastest searching=
 of=20
a pattern, you have to know in advance what that pattern is! This leads to=

many databases optomised for many patterns.

Kat

new topic » goto parent » topic index » view message » categorize

4. Re: ternary search trees question

Posted by Pete Lomax <petelomax at blueyonder.co.uk> Aug 10, 2005
605 views

On Tue, 09 Aug 2005 13:08:20 -0700, Tone =8Akoda
<guest at RapidEuphoria.com> wrote:

>Maybe some expert on this topic will be able to answer this question:
>
>What if we insert key "aba" and then "abandon". Since data for key is stor=
ed in eqkid, "aba" can't have any data because eqkid will be required to st=
ore "abandon" key.
>
>Am I missing something?
>
Pete Eberlein's winning entry solved this problem by holding
completely separate trees for different word lengths. According to the
notes I made at the time (see
http://palacebuilders.pwp.blueyonder.co.uk/euphoria.html )
The initial block of 1911 is in fact 21 blocks of 91 first chars, the
first of which holds frequency counts for the 1-letter words whereas
the others are pointers for the second letter. In the remainder of the
tree, the equal to node holds the frequency count when on the last
letter, otherwise it holds a node pointer.

Regards,
Pete

new topic » goto parent » topic index » view message » categorize

5. Re: ternary search trees question

Posted by Pete Lomax <petelomax at blueyonder.co.uk> Aug 10, 2005
592 views

On Tue, 09 Aug 2005 13:08:20 -0700, Tone =8Akoda
<guest at RapidEuphoria.com> wrote:

>What if we insert key "aba" and then "abandon". Since data for key is stor=
ed in eqkid, "aba" can't have any data because eqkid will be required to st=
ore "abandon" key.
>
>Am I missing something?
>
>http://algorithm.myrice.com/resources/technical_artile/ternary_search_tree=
/terstree.htm

Now I've studied that paper, it looks to me like the C code given
stores the terminating null found on all C strings, so it does not
match on the second a of "aba" but the 4th byte, zero.

Regards,
Pete

new topic » goto parent » topic index » view message » categorize

6. Re: ternary search trees question

Posted by Matt Lewis <matthewwalkerlewis at gmail.com> Aug 10, 2005
617 views

Kat wrote:
> 
> The part i hope someone can explain to me is this: how are the branches of
> such trees ordered, on what criteria, and how is such pre-set-up criteria
> going to be optimal for a search in any but for the pre-set-up question?
> For instance, you can set up a tree for words who's every other letter is 
> the same, or every 2nd letter is the same, or have common endings
> (minus the suffixes), or have the same 3rd letter. But, for instance, how 
> is the ranking for 3rd letters arranged for "cow" vs "cot" vs "pow" 
> determined, by what criteria
>
> Why would "pow" be anywhere near "cow"? I understand pattern matching,
> and i understand hashing, but to pre-set the database for fastest searching
> of a pattern, you have to know in advance what that pattern is! This 
> leads to many databases optomised for many patterns.
> 

I believe it's all done lexigraphically.  I suppose that you'd have to decide
on a starting point to use (though I haven't looked in detail at the algorithms
on the page Tone linked to).  My understanding is that it's not guaranteed to
be the fastest search algorithm, but it takes log(k+n) (IIRC), so it's going
to be pretty fast.  It compromises between fast search and lower memory
requirements.

To search, you basically follow the tree.  I don't think that there's any
optimization as far as suffixes.  To get that, you probably need to use 
a suffix tree, which has a large overhead, but really fast searching.

I think that "pow" and "cow" wouldn't be close, unless there were no words
that started with any letters between "c" and "p".  "Cow" and "cot" should
be very close, though.

Matt Lewis

new topic » goto parent » topic index » view message » categorize

7. Re: ternary search trees question

Posted by "Kat" <gertie at visionsix.com> Aug 10, 2005
599 views

On 9 Aug 2005, at 19:48, Matt Lewis wrote:

> 
> 
> posted by: Matt Lewis <matthewwalkerlewis at gmail.com>
> 
> Kat wrote:
> > 
> > The part i hope someone can explain to me is this: how are the 
branches of
> > such trees ordered, on what criteria, and how is such pre-set-up criteria
> > going to be optimal for a search in any but for the pre-set-up question? 
For
> > instance, you can set up a tree for words who's every other letter is the
> > same, or every 2nd letter is the same, or have common endings (minus 
the
> > suffixes), or have the same 3rd letter. But, for instance, how is the 
ranking
> > for 3rd letters arranged for "cow" vs "cot" vs "pow" determined, by what
> > criteria
> >
> > Why would "pow" be anywhere near "cow"? I understand pattern 
matching,
> > and i understand hashing, but to pre-set the database for fastest 
searching of
> > a pattern, you have to know in advance what that pattern is! This leads to
> > many databases optomised for many patterns.
> > 
> 
> I believe it's all done lexigraphically.  I suppose that you'd have to decide
> on
> a starting point to use (though I haven't looked in detail at the algorithms
> on
> the page Tone linked to).  My understanding is that it's not guaranteed to 
be
> the fastest search algorithm, but it takes log(k+n) (IIRC), so it's going to
> be
> pretty fast.  It compromises between fast search and lower memory 
requirements.
> 
> To search, you basically follow the tree.  I don't think that there's any
> optimization as far as suffixes.  To get that, you probably need to use 
> a suffix tree, which has a large overhead, but really fast searching.
> 
> I think that "pow" and "cow" wouldn't be close, unless there were no words
> that started with any letters between "c" and "p".  "Cow" and "cot" should
> be very close, though.

I can see "cow" and "cot" being close for several reasons. So searching for 
"co*" would find them quickly. They line up with english spelling nicely. But 
searching for "pow" and "cow" by using "*ow" would be very slow, effectively 
meaning all the leaves of 3 letters or greater would need to be searched! Any 
optomisation for that would make finding "cow" and "cot" as slow as "pow" 
and "cow" were. Multiple matches, or more complex patterns would be even 
more narrowly optomised, as far as i can tell. Like "cow??e" to find "coweye" 
vs "cowpie", you can get to "cow" fast enough, but then it's brute force of all 
down-branch 6-letter words after that. And that is assuming you have ordered 
all "cow*" by word length, which wouldn't really help finding "kewpie" and 
"cowpie" by looking for "*wpie". In fact, looking for "*wpie" might trigger a 
question "did you mean "wipe"?

Any insights you can share?

Kat,
interested.

new topic » goto parent » topic index » view message » categorize

8. Re: ternary search trees question

Posted by Tone Škoda <tskoda at email.si> Aug 10, 2005
609 views

Pete Lomax wrote:
> Pete Eberlein's winning entry solved this problem by holding
> completely separate trees for different word lengths. 

This is good idea! Speedup because trees are smaller plus less memory needed.
Downsides: a little more coding and limit to length of words (not necessary
though).

>
>Now I've studied that paper, it looks to me like the C code given
>stores the terminating null found on all C strings, so it does not
>match on the second a of "aba" but the 4th byte, zero.
>

Yes, it looks like it.
I understand it now; first "n" of "abandon" would be stored at the right side of
terminating 0 of "aba".
(This site has clearer code:
http://www.opensource.apple.com/darwinsource/10.4/gccfast-1621.1/libiberty/ternary.c)

But I'll use your first advice. 

A hybrid between ternary search tree and digital search tree would be really
fast. For first two or three letters would be used digital search tree (for real
speedy search), for rest of the letters would be used ternary search tree (to not
consume too much memory). Although I'm not sure if digital search tree wouldn't
be too slow when inserting new keys because there would be more writing to disk
needed.

new topic » goto parent » topic index » view message » categorize

9. Re: ternary search trees question

Posted by Tone Škoda <tskoda at email.si> Aug 10, 2005
619 views

Kat wrote:

> 
> The part i hope someone can explain to me is this: how are the branches of=
> 
> such trees ordered, on what criteria, and how is such pre-set-up criteria=
> 
> going to be optimal for a search in any but for the pre-set-up question?

Yes, they are not so easy to clearly understand. That's why I made to myself
this drawing which helps me to understand them clearer:
http://www10.brinkster.com/tskoda/Ternary_Search_Tree_Explanation.html

Speed of search for a word depends on how deeply is that word nested in tree.
How deeply is word nested in tree depends on how late was word inserted in tree,
i.e. words inserted later generally take more time to search, plus how many
similar words already existed in tree when the word was inserted. What means
"similar" here is yet to be more clearly explained.

new topic » goto parent » topic index » view message » categorize

10. Re: ternary search trees question

Posted by "Kat" <gertie at visionsix.com> Aug 10, 2005
591 views

On 10 Aug 2005, at 1:36, Tone =8Akoda wrote:

>
>
> posted by: Tone =8Akoda <tskoda at email.si>
>
> Kat wrote:
>
> >=20
> > The part i hope someone can explain to me is this: how are the branches=
 of=
> >=20
> > such trees ordered, on what criteria, and how is such pre-set-up criter=
ia=
> >=20
> > going to be optimal for a search in any but for the pre-set-up question=
?
>
>
> Yes, they are not so easy to clearly understand. That's why I made to mys=
elf
> this drawing which helps me to understand them clearer:
> http://www10.brinkster.com/tskoda/Ternary_Search_Tree_Explanation.html
>
> Speed of search for a word depends on how deeply is that word nested in t=
ree.
> How deeply is word nested in tree depends on how late was word inserted i=
n tree,
> i.e. words inserted later generally take more time to search, plus how ma=
ny
> similar words already existed in tree when the word was inserted. What me=
ans
> "similar" here is yet to be more clearly explained.

That looks like a regular dictionary lookup process to me. There's nothing=

there related to threes (ternary). Why are later entries harder to find? It=
 still=20
looks to me like the tree must be built for each search pattern, because as=

your latest url plainly shows, "pot" would be far away from "cow", and stil=
l
take a while to match "?o?" search terms.

Kat

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. ternary search trees question

2. Re: ternary search trees question

3. Re: ternary search trees question

4. Re: ternary search trees question

5. Re: ternary search trees question

6. Re: ternary search trees question

7. Re: ternary search trees question

8. Re: ternary search trees question

9. Re: ternary search trees question

10. Re: ternary search trees question

Search

Include:

Quick Links

User menu

Misc Menu