Re: Fast appending and sorting of alot of short strings
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Oct 09, 2004
- 533 views
On Wed, 6 Oct 2004 07:30:31 -0700, codepilot Gmail Account <codepilot at gmail.com> wrote: >Can somone give me some good theory to get this to go fast? I was going to experiment with this a little, but got sidetracked & it does not seem like I will get round to it. Still, I'll comment: >there is a sequence and it gets searched alot, and also appended to. >I thought that using a hash_find for search was good, but then it has >to be sorted. The point of using a hash is that you do not sort the main table. Have a play with jiri's hash tables in the archives. If I assume you try that and don't like it: 1) If you have a big table that gets added to alot, then it is far better to use the following scheme: sequence table table=repeat(0,64) integer tabused tabused=0 ... if tabused=length(table) then table&=repeat(0,64) end if tabused+=1 table[tabused]=<blah> -- see next 2) In order to keep the table sorted, you are currently using: seq[1..current]&{query}&seq[current+1..len] Instead try: table[current+1..tabused]=table[current..tabused-1] table[current]=query This is much faster (I think, as I said untested) because instead of creating two large slices of table, stitching them all together into a big result, then discarding the original table, only the second big slice of the table is created (and discarded), the original table remains (and obviously the first half remains, completely untouched). A similar argument applies to deleting entries. 3) While the overhead of passing a large table to a function is negligible, modifying/returning, and replacing it adds a huge overhead. If possible, code a separate routine to modify each (large) table directly, rather than passing/returning it. You can use a single search function, provided it does not modify/return the whole table. HTH, Pete