1. Suffix Tree (was ramdisk ...)

Matt Lewis wrote:
> 
> Allen V Robnett wrote:
> 
> <snip>
>  
> > I am interested in the possibility of letting the user determine 
> > the length of the search string by pressing <ENTER> or <TAB> 
> > after entering the desired number of characters to be matched. EU 
> > seems to want sequences to be assigned a value which then 
> > determines the length. I am considering using a slice to truncate 
> > the original sequence length at whatever the length the user 
> > indicates. Will this work? Is there a better way?
> 
> I think the 'better way' may involve a new design.  What is the 
> logic of how records are allocated?  Is there any structured layout 
> behind them?  If not, might there be a way to do this?
> 
> If the answer is still no, you could design an indexing system to 
> help you look up the strings.   The fastest way would probably be a 
> suffix tree (although it's going to take up a LOT of disk space for 
> something this big).  I have some code to build a tree, although it 
> wouldn't work for your case, because it's all done in memory, but 
> it could be done in files fairly easily.  I'll see if I can get it 
> modified in the next few days or so.

OK, it seems to be working, and it scales linearly.  The tree itself 
(which is housed in 6 separate files) takes up about 26 times the size 
of the data file.  So for your 200MB file, your suffix tree would take 
nearly 5GB.  On my 2.4GHz machine, I estimate it would take about 32-36 
hours to build a tree for your file (althouth the makeup of the data 
could change that).  But, searching would be really fast.

I've tested on files with up to 2^17 12-byte records, where the records 
are just hex numbers right justified amidst '_'s:

___________1
___________2
...
________1000

etc...

Let me know if you're interested.  It would obviously be useless if your 
data changed often, but it takes almost no time to search for a string, 
and you can look for partial results, too, so you could find all records 
that contain FF, for example.

Matt Lewis

new topic     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu