Re: Optimizing basic operations

new topic     » goto parent     » topic index » view thread      » older message » newer message
James_at_Euphoria said...

Good point. I'll focus on index sorts. Each group of integers currently has about 25 elements, possibly more if I can make an efficient sort. Think of trillions of these data structures coming in sequentially. From that blitz of data, I'm interested in the top 100 scores... So, one option is to make the top 100 list and then sort each new eval that comes in. This is probably not efficient. The other is to wait until all eval scores are in, but this is probably not efficient and there isn't enough memory. Looking at the performance data I posted early, there has to be some kind of "sweet spot" between 1 and 1 trillion. Let's call that number "n". So, I'll wait until my "bucket" has n elements, append the bucket to my current top 100 list, create list with only the eval and the index, sort, and then pull out the new top 100 by using the index. The evals and data that don't make the top 100 are rejected. In fact, in this case, I might test to see if the new evals are at least greater than the lowest eval on my top 100 list before putting it into the bucket. That's probably cheaper than sorting it later. (Now you see why I was asking about optimization at all levels).

James,

If you only need to find the top 100 out of trillions then the problem is not really about fast sorting, its about fast maintenance of an already sorted list, eg:

seq list = first100elements() 
list = sort( list ) 
integer lowest = lowestEval( list ) 
for i = 101 to a trillion less 100 do 
  obj this = next_item() -- get next data structure 
  if this > lowest then 
    shufflelist(list, this) 
    lowest = lowestEval( list ) 
  end if 
end for 

I hope you can follow the logic in this pseudo code. The inner test will most likely be NOT taken. But when it is you can casually update the list, via shufflelist(), which you have to write. Most (like 99.999%) of the time will be spent just looping (ignoring the overhead of next_itme()) so this is about as good as you will get with plain Euphoria.

When the loop exits list will contain the top 100.

shufflelist() simply inserts "this" somewhere in list and drops the last element.

I think this should be good enough for your needs.

regards, Spock

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu