Re: ternary search trees question
- Posted by Matt Lewis <matthewwalkerlewis at gmail.com> Aug 10, 2005
- 465 views
Kat wrote: > > The part i hope someone can explain to me is this: how are the branches of > such trees ordered, on what criteria, and how is such pre-set-up criteria > going to be optimal for a search in any but for the pre-set-up question? > For instance, you can set up a tree for words who's every other letter is > the same, or every 2nd letter is the same, or have common endings > (minus the suffixes), or have the same 3rd letter. But, for instance, how > is the ranking for 3rd letters arranged for "cow" vs "cot" vs "pow" > determined, by what criteria > > Why would "pow" be anywhere near "cow"? I understand pattern matching, > and i understand hashing, but to pre-set the database for fastest searching > of a pattern, you have to know in advance what that pattern is! This > leads to many databases optomised for many patterns. > I believe it's all done lexigraphically. I suppose that you'd have to decide on a starting point to use (though I haven't looked in detail at the algorithms on the page Tone linked to). My understanding is that it's not guaranteed to be the fastest search algorithm, but it takes log(k+n) (IIRC), so it's going to be pretty fast. It compromises between fast search and lower memory requirements. To search, you basically follow the tree. I don't think that there's any optimization as far as suffixes. To get that, you probably need to use a suffix tree, which has a large overhead, but really fast searching. I think that "pow" and "cow" wouldn't be close, unless there were no words that started with any letters between "c" and "p". "Cow" and "cot" should be very close, though. Matt Lewis