Re: Fast appending and sorting of alot of short strings

new topic     » goto parent     » topic index » view thread      » older message » newer message

Right now after some testing I think that binary search trees would be
the optimal solution in euphoria.
to find just compare if equal return, less goto leg1, greater leg2 repeat
easy
to append, just find and add to empty leg, very very fast and easy.
the only concern is keeping it balanced and thats pretty easy.

But I've decided to break up the one big array of strings into lots of
special cases which is faster in my case.


On Sat, 09 Oct 2004 02:35:45 +0100, Pete Lomax
<petelomax at blueyonder.co.uk> wrote:
> 
> On Wed, 6 Oct 2004 07:30:31 -0700, codepilot Gmail Account
> <codepilot at gmail.com> wrote:
> 
> >Can somone give me some good theory to get this to go fast?
> I was going to experiment with this a little, but got sidetracked & it
> does not seem like I will get round to it. Still, I'll comment:
> >there is a sequence and it gets searched alot, and also appended to.
> >I thought that using a hash_find for search was good, but then it has
> >to be sorted.
> The point of using a hash is that you do not sort the main table.
> Have a play with jiri's hash tables in the archives.
> 
> If I assume you try that and don't like it:
> 
> 1) If you have a big table that gets added to alot, then it is far
> better to use the following scheme:
> 
> sequence table  table=repeat(0,64)
> integer tabused tabused=0
> ...
>        if tabused=length(table) then
>                table&=repeat(0,64)
>        end if
>        tabused+=1
>        table[tabused]=<blah>   -- see next
> 
> 2) In order to keep the table sorted, you are currently using:
> 
>        seq[1..current]&{query}&seq[current+1..len]
> 
> Instead try:
> 
>        table[current+1..tabused]=table[current..tabused-1]
>        table[current]=query
> 
> This is much faster (I think, as I said untested) because instead of
> creating two large slices of table, stitching them all together into a
> big result, then discarding the original table, only the second big
> slice of the table is created (and discarded), the original table
> remains (and obviously the first half remains, completely untouched).
> A similar argument applies to deleting entries.
> 
> 3) While the overhead of passing a large table to a function is
> negligible, modifying/returning, and replacing it adds a huge
> overhead.
> If possible, code a separate routine to modify each (large) table
> directly, rather than passing/returning it. You can use a single
> search function, provided it does not modify/return the whole table.
> 
> HTH,
> Pete
> 
> 
> 
> 
>

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu