1. Optimizing basic operations
- Posted by James_at_Euphoria Jul 21, 2013
- 6556 views
Please inform if any of you have a trick to optimize the simple comparison and assignment operators. I'm especially interested in optimizing this logic: If X is greater than Y then assign the value of X to Y. Is it possible to use something like pointers at a low machine level to speed up the assignment? All of X and all of Y will be integers. Thanks in advance.
2. Re: Optimizing basic operations
- Posted by jaygade Jul 21, 2013
- 6589 views
Please inform if any of you have a trick to optimize the simple comparison and assignment operators. I'm especially interested in optimizing this logic: If X is greater than Y then assign the value of X to Y.
Optimize how? This is a basic operation and all languages perform it basically the same way. While C and C++ have shortcuts to express this logic in source code, when the code is compiled to machine language it is actually the same.
Is it possible to use something like pointers at a low machine level to speed up the assignment? All of X and all of Y will be integers. Thanks in advance.
Yes and no.
On the one hand, you can allocate blocks of memory and assign values to those blocks and otherwise manipulate them. On the other hand, I don't think there is much speed increase to be seen if you do so.
You can link to C and assembly routines if performance is a real issue, but realize that most speed optimizations outside of critical loops are wasted effort. Clarity of the source code's assumptions and intentions is far more valuable in the long run than a few microseconds or cycles saved.
3. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 21, 2013
- 6571 views
Thanks for such a quick response. I will continue with some testing to compare results. The motivation behind this question is that certain routines involving mathematical functions (factorials, for example) can involve billions or trillions of comparisons as they create a geometric explosion of possibilities. If I use heuristics, IE, trimming the search space / search tree, the probability is that I will find local but not global minimums or maximums. Perhaps after I improve the feature extraction process, assuming that this is possible, the geometric explosion will be smaller (but still very large). In the meantime, every microsecond would help. Again, thanks. P.S. This problem involves pattern recognition.
4. Re: Optimizing basic operations
- Posted by jaygade Jul 21, 2013
- 6528 views
Thanks for such a quick response. I will continue with some testing to compare results. The motivation behind this question is that certain routines involving mathematical functions (factorials, for example) can involve billions or trillions of comparisons as they create a geometric explosion of possibilities. If I use heuristics, IE, trimming the search space / search tree, the probability is that I will find local but not global minimums or maximums. Perhaps after I improve the feature extraction process, assuming that this is possible, the geometric explosion will be smaller (but still very large). In the meantime, every microsecond would help. Again, thanks. P.S. This problem involves pattern recognition.
One of the cool things about Euphoria is that if you have a list of values on which to perform a single operation, then you can do so in a single statement, and it generally works faster than in other languages. It's not always possible, however; I've found that when trying to translate benchmark tests from other languages to Euphoria, you can't always rely on that feature, but it's still there.
Hopefully at some point we will even be able to leverage some processors' SIMD instruction sets in order to make this even more efficient, while at the same time retaining Euphoria's type-neutral character.
As for stuff outside of basic computer math, that's a bit out of my element so my advice is really limited to that.
5. Re: Optimizing basic operations
- Posted by Spock Jul 22, 2013
- 6535 views
Please inform if any of you have a trick to optimize the simple comparison and assignment operators. I'm especially interested in optimizing this logic: If X is greater than Y then assign the value of X to Y. Is it possible to use something like pointers at a low machine level to speed up the assignment? All of X and all of Y will be integers. Thanks in advance.
James, If I were working on a project like yours and needed ultimate speed I would use assembly code, eg, the following (untested and needing modification) snippet will do this operation without branching:
if eax > ebx then ebx += eax end if
"xor edx, edx" "cmp eax, ebx" "setle dl" "dec edx" "and edx, eax" "add ebx, edx"
Simply converting this snippet to machine code, poking it somewhere, and calling it with X and Y parameters will not help speed-wise as the overhead of actually calling the machine code routine will completely offset the very real speed gains that you would get from it.
Now, there are some (I'm talking about people on other programming forums..) who don't worry much about code speed, probably because they don't get to code much beyond some lame business app, but the plain truth is that the faster you can go the more you can do.
If it were possible to access X and Y in a linear fashion (in the algorithm) it would probably be not too difficult to construct a small assembly code framework to do all the processing. However, if you needed to do some fancy memory allocation or something tricky (eg, recursion) as part of the algorithm, well, things might get too complex to contemplate using asm.
Exactly what is your algorithm? It might be possible to realize it using asm, or maybe not.
Spock
6. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Jul 22, 2013
- 6551 views
Please inform if any of you have a trick to optimize the simple comparison and assignment operators. I'm especially interested in optimizing this logic: If X is greater than Y then assign the value of X to Y. Is it possible to use something like pointers at a low machine level to speed up the assignment? All of X and all of Y will be integers. Thanks in advance.
As good a programming language as Euphoria is, it is not suitable for any application that is computationally intensive. Use C or assembler for such applications. It might be that you can create a hybrid that uses Euphoria for disk I/O and user interface, but calls a DLL to do the serious CPU work.
7. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 22, 2013
- 6491 views
Good. I was thinking of the same thing: a hybrid with Euphoria calls to the lower level routines. Thanks for the confirmation. That saves me a lot of testing.
8. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 22, 2013
- 6487 views
Hey Spock, I need to play around with some benchmarks first. Then I'll try to get back to you.
9. Re: Optimizing basic operations
- Posted by jaygade Jul 22, 2013
- 6481 views
Hey Spock, I need to play around with some benchmarks first. Then I'll try to get back to you.
One of the things to look at too is ease and speed of development. Do your prototype first in Euphoria. If you don't get your required performance then you can either translate it to C using the translator, or translate it to C or assembly manually.
Euphoria should be very fast with integer math, though not as fast as finely-tuned C or SIMD assembly instructions.
10. Re: Optimizing basic operations
- Posted by mattlewis (admin) Jul 22, 2013
- 6497 views
Euphoria should be very fast with integer math, though not as fast as finely-tuned C or SIMD assembly instructions.
It's often a matter of scale. There are also things that you can do that have nothing to do with the math that can dramatically slow things down. This usually has to do with sequence indexing and reference counting / copying on write.
Matt
11. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 30, 2013
- 6166 views
My first goal was to look at the sorting process. I noticed that shell_sort works best in general, but there are still many situations where it does not. For example, if the list to be sorted is already sorted, bubble_sort comes out on top. If the list is completely sorted except for the last entry (the insertion), then, not surprisingly, insertion_sort is faster. Additionally, there appears to be worst-case scenarios which can find each of the sorts' weak points. If I remember correctly, in the allsorts.ex program, size variables were set to nitems = 500 and length of list = 17 (using the repeat statement). But the results can be deceiving. because just changing these values (and not the type of data) will alter the relative ranking of the performance of each sort, that is, some will do better than others depending on the situation. I now realize that my original question about optimization had a hidden meaning. There is low level optimization such as n += 1 appears faster than n = n + 1 and high level optimization (situation based). The bucket_sort works best in certain restricted situation because it avoids the comparison operator. Is there a performance penalty for using ">", compare() or equal()? Is there a performance penalty if I use a loop to extract elements of a list instead of splicing into it? (Yes, I think so.) While there may be many ways to skin a cat, some are faster than others. Is there a resource guide that addresses some of these issues?
12. Re: Optimizing basic operations
- Posted by LarryMiller Jul 30, 2013
- 6154 views
The manual includes some information on optimization: http://openeuphoria.org/docs/perform.html#_715_performancetips
13. Re: Optimizing basic operations
- Posted by jaygade Jul 31, 2013
- 6194 views
My first goal was to look at the sorting process. I noticed that shell_sort works best in general, but there are still many situations where it does not. For example, if the list to be sorted is already sorted, bubble_sort comes out on top. If the list is completely sorted except for the last entry (the insertion), then, not surprisingly, insertion_sort is faster. Additionally, there appears to be worst-case scenarios which can find each of the sorts' weak points.
I hope that I'm not pointing out the obvious, but this is true for all computer languages, it's a feature of the given algorithms themselves.
Some sorts work better under given conditions, and some sorts work worse. http://en.wikipedia.org/wiki/Sorting_algorithm
If you have some insight into your data, then you can pick the best representation and sorting method based on those insights. Otherwise you have to be more general.
If I remember correctly, in the allsorts.ex program, size variables were set to nitems = 500 and length of list = 17 (using the repeat statement). But the results can be deceiving. because just changing these values (and not the type of data) will alter the relative ranking of the performance of each sort, that is, some will do better than others depending on the situation.
I haven't actually tried this, and it may be due to the distribution of the random data, but while a couple of algorithms may swap places, some still stay on top for random data and some still stay near the bottom?
I'll have to check.
I now realize that my original question about optimization had a hidden meaning. There is low level optimization such as n += 1 appears faster than n = n + 1
Actually, I think that the Euphoria interpreter is smart enough to handle these two cases the same. The n += 1 is just syntactic sugar.
However, Euphoria does optimize n + 1 vs n + x. http://openeuphoria.org/docs/perform.html#_723_somespecialcaseoptimizations
and high level optimization (situation based).
Yes, the first and best optimization is choosing the correct algorithm. Again, that's true for every language.
Usually it's best to leave the low-level optimizations to the interpreter or compiler, and leave your code clear and easy to read. That way when you go back and look at it six months later, or when someone else reads your code, they can more easily understand it.
That doesn't always hold true, but it's a good general guideline. However, having a low-level understanding of how the computer works can help you with choice of algorithm or choice of algorithm steps.
The bucket_sort works best in certain restricted situation because it avoids the comparison operator. Is there a performance penalty for using ">", compare() or equal()? Is there a performance penalty if I use a loop to extract elements of a list instead of splicing into it? (Yes, I think so.) While there may be many ways to skin a cat, some are faster than others. Is there a resource guide that addresses some of these issues?
Using '>' or '=' on sequence types works slightly differently in Euphoria, so they're not really comparable. However, using '=' vs 'equals' on atoms or integers should have the same performance.
With regards to sequence operations, you can be pretty sure that any built-in sequence operations are more optimal than looping over a sequence manually. However, there are many cases where looping manually is still required.
If you are interested in what Euphoria is doing at a low level with a given piece of code, Euphoria does have an IL disassembler so you can examine the low level operations performed at the interpreter level. You can also translate to C and examine that output. Or you could browse the source and get a look at what's going on under the hood.
All three are good ways to compare optimizations.
14. Re: Optimizing basic operations
- Posted by Spock Jul 31, 2013
- 6134 views
My first goal was to look at the sorting process. I noticed that shell_sort works best in general, but there are still many situations where it does not. For example, if the list to be sorted is already sorted, bubble_sort comes out on top. If the list is completely sorted except for the last entry (the insertion), then, not surprisingly, insertion_sort is faster..
..I now realize that my original question about optimization had a hidden meaning. There is low level optimization such as n += 1 appears faster than n = n + 1 and high level optimization (situation based). The bucket_sort works best in certain restricted situation because it avoids the comparison operator. Is there a performance penalty for using ">", compare() or equal()? Is there a performance penalty if I use a loop to extract elements of a list instead of splicing into it? (Yes, I think so.) While there may be many ways to skin a cat, some are faster than others. Is there a resource guide that addresses some of these issues?
James, I've done my own tests on various sort types and came to the same conclusions as you have. Here are a couple of suggestions:
1) Don't try and optimize the sort now, just get your overall algo functioning as it should and then optimize the sort later.
2) If what you're sorting is not simple, like a flat sequence of ints or strings, you might be better off sorting a sequence of indexes to the data, rather than the data itself. Of course, you'll have to make your own sort routine - just copy an existing one and edit it and change the comparison section inside it.
3) Get some stats of your algorithm (eg, like the histogram of all the data lengths that get sorted). This could be quite insightfull as to deciding which sort would be best.
4) Being able to sort data is quite nice. But would the algorithm (what you're trying to ultimately achieve) still work if NO sorting were done?
5) If what you're sorting IS simple (like a flat sequence of ints) and you really did need Fast_Sort_Now there's always ASM Shell sort, which somebody can help with..
regards, Spock
15. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 31, 2013
- 6063 views
- Last edited Aug 02, 2013
I don't know how to post a graphic file, so here is the output I have. Benchmark results for allsorts.ex. Data to sort has been altered to be lists of integers and not just integers. Bucket_sort was modified to handle this more complex scenario. P.S. If you can see this in graphic form, the performance degradation becomes more clear, as well as the relative ranking of each algorithm. Also, the timer mechanism was not always exact as small differences could occur with no warning or explanation.
Number of completed sorts for 100 items
lists to sort: 2
bubble_sort 847038
simple_sort 908010
insertion_sort 841821
merge_sort 290386
quick_sort 302878
hybrid_sort 698569
great_sort 584590
shell_sort 759986
modified_bucket_sort 93597
lists to sort: 4
bubble_sort 246500
simple_sort 294474
insertion_sort 347594
merge_sort 100568
quick_sort 154605
hybrid_sort 313143
great_sort 265166
shell_sort 378903
modified_bucket_sort 82363
lists to sort: 8
bubble_sort 86490
simple_sort 98159
insertion_sort 152453
merge_sort 40193
quick_sort 70306
hybrid_sort 143642
great_sort 126617
shell_sort 179590
modified_bucket_sort 64797
lists to sort: 16
bubble_sort 19945
simple_sort 24247
insertion_sort 41213
merge_sort 13739
quick_sort 20469
hybrid_sort 43532
great_sort 39388
shell_sort 63232
modified_bucket_sort 40378
lists to sort: 32
bubble_sort 5923
simple_sort 6790
insertion_sort 13430
merge_sort 6682
quick_sort 10453
hybrid_sort 21189
great_sort 18304
shell_sort 23251
modified_bucket_sort 22340
lists to sort: 64
bubble_sort 1412
simple_sort 1721
insertion_sort 3917
merge_sort 3017
quick_sort 4067
hybrid_sort 7455
great_sort 6403
shell_sort 10628
modified_bucket_sort 12287
lists to sort: 128
bubble_sort 362
simple_sort 423
insertion_sort 930
merge_sort 1351
quick_sort 2309
hybrid_sort 3371
great_sort 3127
shell_sort 4058
modified_bucket_sort 5303
lists to sort: 256
bubble_sort 81
simple_sort 101
insertion_sort 218
merge_sort 515
quick_sort 728
hybrid_sort 1177
great_sort 1333
shell_sort 1755
modified_bucket_sort 2475
lists to sort: 512
bubble_sort 22
simple_sort 26
insertion_sort 57
merge_sort 260
quick_sort 403
hybrid_sort 519
great_sort 528
shell_sort 627
modified_bucket_sort 1179
lists to sort: 1024
bubble_sort 6
simple_sort 7
insertion_sort 15
merge_sort 124
quick_sort 198
hybrid_sort 263
great_sort 263
shell_sort 277
modified_bucket_sort 482
lists to sort: 2048
bubble_sort 2
simple_sort 2
insertion_sort 4
merge_sort 55
quick_sort 83
hybrid_sort 105
great_sort 112
shell_sort 121
modified_bucket_sort 214
lists to sort: 4096
bubble_sort 1
simple_sort 1
insertion_sort 1
merge_sort 21
quick_sort 34
hybrid_sort 43
great_sort 37
shell_sort 39
modified_bucket_sort 79
lists to sort: 8192
merge_sort 12
quick_sort 17
hybrid_sort 20
great_sort 23
shell_sort 17
modified_bucket_sort 32
lists to sort: 16384
merge_sort 6
quick_sort 7
hybrid_sort 8
great_sort 10
shell_sort 7
modified_bucket_sort 13
lists to sort: 32768
merge_sort
quick_sort 3
hybrid_sort 3
great_sort 4
shell_sort 4
modified_bucket_sort 5
lists to sort: 65536
merge_sort 2
quick_sort 2
hybrid_sort 2
great_sort 2
shell_sort 2
modified_bucket_sort 2
p.s. I'm still learning the format options. I'm trying to make my benchmarks clear. Any way to post a graph for you guys?
16. Re: Optimizing basic operations
- Posted by James_at_Euphoria Jul 31, 2013
- 6065 views
James, I've done my own tests on various sort types and came to the same conclusions as you have. Here are a couple of suggestions:
1) Don't try and optimize the sort now, just get your overall algo functioning as it should and then optimize the sort later.
2) If what you're sorting is not simple, like a flat sequence of ints or strings, you might be better off sorting a sequence of indexes to the data, rather than the data itself. Of course, you'll have to make your own sort routine - just copy an existing one and edit it and change the comparison section inside it.
3) Get some stats of your algorithm (eg, like the histogram of all the data lengths that get sorted). This could be quite insightfull as to deciding which sort would be best.
4) Being able to sort data is quite nice. But would the algorithm (what you're trying to ultimately achieve) still work if NO sorting were done?
5) If what you're sorting IS simple (like a flat sequence of ints) and you really did need Fast_Sort_Now there's always ASM Shell sort, which somebody can help with..
regards, Spock
1) I have the logic. Now working on the algorithm. The problem is complex... so, I'm trying to divide and conquer.
2) The data structure is simple enough so as not to need an index as far as I know.
3) Data lengths are all the same.
4) Yes, the algorithm would technically work but the results wouldn't be correct.
5) My data structure basically looks like this: { evaluation, {group 1 of integers}, {group 2 of integers} }. The evaluation starts out as a division / floating point, but I can convert it to an integer as approximate may be good enough. Group 1 and Group 2 always have the same length, so list could be made flat for sorting and re-grouped later. Only the evaluation needs to be sorted. The rest just come along for the ride. ASM shell sort sounds nice. How do I find this?
17. Re: Optimizing basic operations
- Posted by jaygade Jul 31, 2013
- 6044 views
5) My data structure basically looks like this: { evaluation, {group 1 of integers}, {group 2 of integers} }. The evaluation starts out as a division / floating point, but I can convert it to an integer as approximate may be good enough. Group 1 and Group 2 always have the same length, so list could be made flat for sorting and re-grouped later. Only the evaluation needs to be sorted. The rest just come along for the ride. ASM shell sort sounds nice. How do I find this?
Which element are you sorting on? If you are sorting on one of the integer groups, how do you determine ordering? That is, whether one group is greater than, less than, or equal to another group?
Have you looked at the functions in std/sort.e? http://openeuphoria.org/docs/std_sort.html#_2964_sorting
Edit: Never mind, I see the answer in my quoted section above.
I'll come up with a code snippet later.
18. Re: Optimizing basic operations
- Posted by Spock Aug 01, 2013
- 6049 views
5) My data structure basically looks like this: { evaluation, {group 1 of integers}, {group 2 of integers} }. The evaluation starts out as a division / floating point, but I can convert it to an integer as approximate may be good enough. Group 1 and Group 2 always have the same length, so list could be made flat for sorting and re-grouped later. Only the evaluation needs to be sorted. The rest just come along for the ride. ASM shell sort sounds nice. How do I find this?
Which element are you sorting on? If you are sorting on one of the integer groups, how do you determine ordering? That is, whether one group is greater than, less than, or equal to another group?
Have you looked at the functions in std/sort.e? http://openeuphoria.org/docs/std_sort.html#_2964_sorting
Edit: Never mind, I see the answer in my quoted section above.
I'll come up with a code snippet later.
James,
If I read this correctly, you have a list of data structures each with this form :
{ eval, {group 1 of integers}, {group 2 of integers} }
And you want to sort by the "eval" of each structure? If so this could be a good candidate for an indexed sort, ie, sort indexes to the data rather than the data itself. However, how long is the super list of these structures? This is to narrow down which sort algo might be the best.
What I would do for the sort is create 2 additional sequences: {eval1, eval2, eval3,..} and some indexes {1,2,3,..}. Thne feed both these into your custom sort routine which will emit a sorted index list with which you can access the original data. You could then rearrange the whole original data, if you wanted. The extra memory for this operation will be like 1% (or less) of the original data. And the sort will be working on a linear list of integers so will have good performance.
In fact, there might already be some sort of index sort in a std library for Euphoria. If there is, try that first, then later optimize using a custom asm shell sort - there'll be plenty on the internet.
I think JayGade might be able to offer additional suggestion too.
regards, Spock
19. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 01, 2013
- 5988 views
Good point. I'll focus on index sorts. Each group of integers currently has about 25 elements, possibly more if I can make an efficient sort. Think of trillions of these data structures coming in sequentially. From that blitz of data, I'm interested in the top 100 scores... So, one option is to make the top 100 list and then sort each new eval that comes in. This is probably not efficient. The other is to wait until all eval scores are in, but this is probably not efficient and there isn't enough memory. Looking at the performance data I posted early, there has to be some kind of "sweet spot" between 1 and 1 trillion. Let's call that number "n". So, I'll wait until my "bucket" has n elements, append the bucket to my current top 100 list, create list with only the eval and the index, sort, and then pull out the new top 100 by using the index. The evals and data that don't make the top 100 are rejected. In fact, in this case, I might test to see if the new evals are at least greater than the lowest eval on my top 100 list before putting it into the bucket. That's probably cheaper than sorting it later. (Now you see why I was asking about optimization at all levels).
20. Re: Optimizing basic operations
- Posted by Spock Aug 01, 2013
- 5993 views
Good point. I'll focus on index sorts. Each group of integers currently has about 25 elements, possibly more if I can make an efficient sort. Think of trillions of these data structures coming in sequentially.. Now you see why I was asking about optimization at all levels.
Trillions? But I thought...
Ok. I'll make a reply as to what I think the best strategy should be, when I get home .. in about 10 hours
Spock
21. Re: Optimizing basic operations
- Posted by jaygade Aug 01, 2013
- 5977 views
Let me see if I understand your problem so far:
- Receive a data structure containing an evaluation and two lists of 25 integers. The evaluation is the result of some calculation.
- Add the data to a list, keeping the list sorted by the value of evaluation as the data are received sequentially.
- The list can grow to a very large size.
This sounds like a good combination for insertion sort from the standard library, especially if scores will be coming in sequentially. I believe that it will be more efficient to build and maintain a sorted list than it will be to read in all the values and sort them afterward. Especially if you are talking "trillions of values". You may not even be able to keep the entire list in memory at once. Edit: The only constraint is if you somehow receive more values than you can sort in a certain amount of time.
Is the evaluation dependent on either list of 25 integers? Do the integer lists need to be sorted independently?
Note: I'm not an expert on sorting itself, just my reading of the manual and layman's knowledge of sorting and complexity.
Can you give a better pseudocode explanation of what you are trying to do? I started working on a test/example from what you had said before, but your new information brings with it new assumptions.
22. Re: Optimizing basic operations
- Posted by gimlet Aug 01, 2013
- 5944 views
As you are interested in the top 100.
1 Sort to first 100.
2 then:
23. Re: Optimizing basic operations
- Posted by gimlet Aug 01, 2013
- 5980 views
As you are interested in the top 100.
1 Sort to first 100.
2 then:
24. Re: Optimizing basic operations
- Posted by gimlet Aug 01, 2013
- 5953 views
As you are interested in the top 100.
1 Sort the first 100 into seq A(1..100) 2 then: for each record if >= A1 add to front and pop last if <= A100 ignore else insert into sequence.
Sorry about the dud attempts
25. Re: Optimizing basic operations
- Posted by Euman Aug 02, 2013
- 5927 views
My first goal was to look at the sorting process. I noticed that shell_sort works best in general, but there are still many situations where it does not. For example, if the list to be sorted is already sorted, bubble_sort comes out on top.
Have you seen this bucket_sort method.. if we are now in the sorting mode? (scratches head)
Sunday, Feb. 23th, 1997 Robert Craig proposed a sort without any comparison:
function bucket_sort(sequence s, integer min_value, integer max_value) -- Sort s into ascending order. No elements are compared. -- The values of s must be integers from min_value to max_value sequence count, sorted integer value count = repeat(0, max_value-min_value+1) s = s - (min_value - 1) for i = 1 to length(s) do value = s[i] count[value] = count[value]+1 end for sorted = {} for i = 1 to length(count) do sorted = sorted & repeat(i, count[i]) end for return sorted + (min_value - 1) end function ? bucket_sort({25,7,2,5,200,14,7,10}, 1, 256)
26. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 02, 2013
- 5906 views
Let me see if I understand your problem so far:
- Receive a data structure containing an evaluation and two lists of 25 integers. The evaluation is the result of some calculation.
- Add the data to a list, keeping the list sorted by the value of evaluation as the data are received sequentially.
- The list can grow to a very large size.
It would be better to say that the list could grow to a very large size. The minimum size is the top 100 + 1. The maximum would be the top 100 + n, where n is limited by memory. After sorting 100 + 1 or 100 + n, then create the new top 100.
27. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 02, 2013
- 5952 views
Is the evaluation dependent on either list of 25 integers? Do the integer lists need to be sorted independently?
The evaluation is the result of a calculation based on the two groups of 25 integers. The two integer groups (lists) associated with the evaluation are already sorted by this point.
28. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 02, 2013
- 5921 views
Have you seen this bucket_sort method.. if we are now in the sorting mode? (scratches head)
Sunday, Feb. 23th, 1997 Robert Craig proposed a sort without any comparison:
Yes, I've analyzed everything inside of allsorts.ex
29. Re: Optimizing basic operations
- Posted by Spock Aug 02, 2013
- 5913 views
Good point. I'll focus on index sorts. Each group of integers currently has about 25 elements, possibly more if I can make an efficient sort. Think of trillions of these data structures coming in sequentially. From that blitz of data, I'm interested in the top 100 scores... So, one option is to make the top 100 list and then sort each new eval that comes in. This is probably not efficient. The other is to wait until all eval scores are in, but this is probably not efficient and there isn't enough memory. Looking at the performance data I posted early, there has to be some kind of "sweet spot" between 1 and 1 trillion. Let's call that number "n". So, I'll wait until my "bucket" has n elements, append the bucket to my current top 100 list, create list with only the eval and the index, sort, and then pull out the new top 100 by using the index. The evals and data that don't make the top 100 are rejected. In fact, in this case, I might test to see if the new evals are at least greater than the lowest eval on my top 100 list before putting it into the bucket. That's probably cheaper than sorting it later. (Now you see why I was asking about optimization at all levels).
James,
If you only need to find the top 100 out of trillions then the problem is not really about fast sorting, its about fast maintenance of an already sorted list, eg:
seq list = first100elements() list = sort( list ) integer lowest = lowestEval( list ) for i = 101 to a trillion less 100 do obj this = next_item() -- get next data structure if this > lowest then shufflelist(list, this) lowest = lowestEval( list ) end if end for
I hope you can follow the logic in this pseudo code. The inner test will most likely be NOT taken. But when it is you can casually update the list, via shufflelist(), which you have to write. Most (like 99.999%) of the time will be spent just looping (ignoring the overhead of next_itme()) so this is about as good as you will get with plain Euphoria.
When the loop exits list will contain the top 100.
shufflelist() simply inserts "this" somewhere in list and drops the last element.
I think this should be good enough for your needs.
regards, Spock
30. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 02, 2013
- 5866 views
When the loop exits list will contain the top 100. shufflelist() simply inserts "this" somewhere in list and drops the last element.
I see. What you're saying is that the time spent doing a shufflelist() plus finding the lowest eval would be faster than a true sort plus lowest_eval = sorted_eval_list[$][1].
31. Re: Optimizing basic operations
- Posted by Spock Aug 02, 2013
- 5851 views
When the loop exits list will contain the top 100. shufflelist() simply inserts "this" somewhere in list and drops the last element.
I see. What you're saying is that the time spent doing a shufflelist() plus finding the lowest eval would be faster than a true sort plus lowest_eval = sorted_eval_list[$][1].
Yes. shufflelist() should be faster than sort() but the efficiency of that part of the program is a minor issue since it will be called so infrequently. The overall processing time will be dominated by the trillions of comparisons needed inside the main loop.
The difference between choosing sort() or shufflelist() might mean the difference between a processing time of 10 hours versus 10 hours and 2 seconds. A much bigger concern is the derivation of the data structures in the first place. Tell us about that, there could be some sizeable gains to be had.
regards, Spock
32. Re: Optimizing basic operations
- Posted by Lnettnay Aug 02, 2013
- 5793 views
Hi James,
I don't want to rain on your parade but I have to ask this question: What are you working on? ( If it isn't classified.)
I've been watching this thread for a while and the main question I have is where is all this data coming from? Depending on the speed of your storage device a trillion records is going to take a very long time just for the input.
Still, let's say you can input & process 1 million records per second. (A very high estimate)
1,000,000,000,000 records / 1,000,000 records / second = 1,000,000 seconds / 86400 seconds / day = 11+ days.
Unless you can break up the data and have multiple computers working at the same time on parts of the data set and then merge them at the end I don't see how it can be done.
Lonny
33. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 03, 2013
- 5827 views
Hi James,
I don't want to rain on your parade but I have to ask this question: What are you working on? ( If it isn't classified.)
I've been watching this thread for a while and the main question I have is where is all this data coming from? Depending on the speed of your storage device a trillion records is going to take a very long time just for the input.
Still, let's say you can input & process 1 million records per second. (A very high estimate)
1,000,000,000,000 records / 1,000,000 records / second = 1,000,000 seconds / 86400 seconds / day = 11+ days.
Unless you can break up the data and have multiple computers working at the same time on parts of the data set and then merge them at the end I don't see how it can be done.
Lonny
Thanks, Lonny, for the reality check. I'm concerned about this as well. I guess if I'm in a hurry I might have to rent a machine that can do up to 240 teraflops. Then I might be able to turn 11+ days into an hour or so. See:
I can also change the threshold (sensitivity) of the data and reduce the number of records. Additionally, I'm also looking at heuristic approaches to prune the search tree. However, these approaches risk skipping over the best solution(s).
34. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 03, 2013
- 5760 views
Yes. shufflelist() should be faster than sort() but the efficiency of that part of the program is a minor issue since it will be called so infrequently. The overall processing time will be dominated by the trillions of comparisons needed inside the main loop.
It that case, I might consider staying with the original floating point expression of the evaluation calculation instead of rounding it to an integer. This would take up more memory and I'm assuming comparing atoms will be just as fast as comparing integers.
The difference between choosing sort() or shufflelist() might mean the difference between a processing time of 10 hours versus 10 hours and 2 seconds. A much bigger concern is the derivation of the data structures in the first place. Tell us about that, there could be some sizeable gains to be had.
Give me a few weeks. I need to code more in order to test out all the stuff we've been discussing. You guys have been really great to take the time to make thoughtful comments. In the meantime, please suggest how a small and highly qualified group of us could work together in strictest confidence so that the proprietary nature of this project isn't broadcast all over the internet.
35. Re: Optimizing basic operations
- Posted by Euman Aug 03, 2013
- 5757 views
well if its top secret.. then shhhhhh if not, just tell us so we can all stop scratching our head.. just saying
36. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 03, 2013
- 5782 views
I'm also not sure if I understand what you're trying but it sounds like you want to know the top 'X' scores from a potentially huge set of data records. Here is one way of doing that does not involve much sorting at all. I've tested with a billion records and it takes about two minutes and does around 1500 'sorts' (binary search insertions actually).
constant MAXSCORE = 100_000_000 -- Largest key possible constant SAMPLESIZE = 1_000_000_000 -- The number of samples to check constant TOPCOUNT = 100 -- Only this amount are retained. sequence Tops sequence NewItem -- The first item in SList is the new item and it is to be placed inside -- SList such that the sequence remains sorted in ascending order. -- The requirement is that SList[2..$] is already sorted. -- It returns the input with the new item in the correct place. function insert_one(sequence SList) sequence NewData integer NewScore integer min integer max integer pos NewData = SList[1] -- Copy the new data NewScore = NewData[1] -- Get the new key now for performance -- Initialize the searching indexes. pos = 2 min = 2 max = length(SList) while min <= max do pos = floor((min + max) / 2) switch eu:compare(NewScore, SList[pos][1]) do case 0 then -- The new score already exists in the list pos -= 1 min = max + 1 -- Force loop to end. case -1 then pos -= 1 max = pos case 1 then min = pos + 1 end switch end while SList[1 .. pos-1] = SList[2 .. pos] -- Shift existing items down SList[pos] = NewData -- Put new data in right spot. return SList end function --- Perform simulation -- integer Lowest Tops = { {rand(MAXSCORE), "some data" , 0} } -- Get first score record. Lowest = Tops[1][1] -- For performance, get the lowest value so far. for i = 1 to SAMPLESIZE do NewItem = {rand(MAXSCORE), "some data" , i} -- Get next score record if NewItem[1] <= Lowest then -- The new score is lower than any I know about so far. if length(Tops) = TOPCOUNT then -- But since it's lower than the top 'X' scores, I don't need it. continue -- Discard new item. This is by far the most common case. end if -- The new score becomes the list's new lowest score. Tops = prepend(Tops, NewItem) else -- The new score is large than the lowest so it needs to go into the -- top 'X' scores. if length(Tops) < TOPCOUNT then -- As the list isn't full yet, I can just add it in. Tops = prepend(Tops, NewItem) else -- The list is full so I need to remove the current lowest and -- insert the new data. Tops[1] = NewItem end if Tops = insert_one(Tops) -- This rearranges the list so that the -- first item is moved to the correct -- sorted position. end if Lowest = Tops[1][1] -- For performance, get the new lowest value so far. end for
37. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 03, 2013
- 5768 views
... please suggest how (to) work together in strictest confidence so that the proprietary nature of this project isn't broadcast all over the internet.
Simple, don't use the internet.
38. Re: Optimizing basic operations
- Posted by useless_ Aug 03, 2013
- 5774 views
Thanks, Lonny, for the reality check. I'm concerned about this as well. I guess if I'm in a hurry I might have to rent a machine that can do up to 240 teraflops. Then I might be able to turn 11+ days into an hour or so. See:
It will prolly take you longer to send the problem to all the nodes than to solve the problem on fewer nodes. You might try a parallel share-all computer instead of a distributed share-none scheme. Even then, you may lose time in setup. I have a small cluster, i seldom need more than 2 or 3 10-yr-old computers on any single problem. Euphoria is bloody fast.
useless
39. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 04, 2013
- 5696 views
As an example of my original question about "optimizing basic operations", you'll find that this code works up to 10% faster than the original bucket_sort, depending on the the number of items (nitems).
-- Sort s into ascending order. No elements are compared. -- The values of s must be integers from min_value to max_value. sequence count, sorted integer k, offset, c count = repeat(0, max_value - min_value + 1) offset = min_value - 1 -- count the number of occurrences of each integer value: for i = 1 to length(s) do -- value = s[i] - offset -- count[value] += 1 count[ s[i] - offset ] += 1 end for -- sorted = repeat(0, length(s)) sorted = s k = 1 -- make the resulting sorted sequence for i = 1 to length(count) do c = count[i] if c then sorted[k .. k + c - 1] = i + offset -- s[k .. k + c - 1] = i + offset k += c end if end for return sorted end function
40. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 04, 2013
- 5654 views
... please suggest how (to) work together in strictest confidence so that the proprietary nature of this project isn't broadcast all over the internet.
Simple, don't use the internet.
Hello Derek. So far, we've been talking about public domain type of stuff. However, some of the viewers of this thread seem curious (or frustrated) as they would like to see more detail regarding my project. I'm somewhat reluctant to do that until we can establish a more private forum. If we can't use the internet, you guys are welcome over to my house. :)
41. Re: Optimizing basic operations
- Posted by gimlet Aug 05, 2013
- 5641 views
Basically you are doing too much work. The only requirements we really need are:
1. That we know the minimum record for the top 100, the others do not concern us.
2. A data structure which will efficiently provide us with that (eg: a heap). See Wikipedia 'Heap (data structure)' only reverse the conditon so that root is the minimum and items are greater than or equal to their parent node.
create heap H for i = 1 to 100 insert R.i into H end for i = 101 to N do if R.i > H.min then drop H.min '(root) insert R.i into H end if end for sort H (it's already half-sorted).
This is a good solution as it uses constant space and an (optimal) single read of the data.
42. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 05, 2013
- 5586 views
Basically you are doing too much work. The only requirements we really need are:
1. That we know the minimum record for the top 100, the others do not concern us.
2. A data structure which will efficiently provide us with that (eg: a heap). See Wikipedia 'Heap (data structure)' only reverse the conditon so that root is the minimum and items are greater than or equal to their parent node.
This is a good solution as it uses constant space and an (optimal) single read of the data.
Yep, that's pretty much exactly what I suggested too. Only instead of a min-heap structure I just used a sorted sequence with a binary search for insertions - which would be much fast than a heap structure.
43. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 10, 2013
- 5156 views
Yep, that's pretty much exactly what I suggested too. Only instead of a min-heap structure I just used a sorted sequence with a binary search for insertions - which would be much fast than a heap structure.
I stand corrected. Out of curiosity, I wrote a basic heap (priority queue) library and tested your idea with it and it was about three times faster than my first efforts. In fact, I can process a billion records in about 40 seconds.
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
44. Re: Optimizing basic operations
- Posted by jimcbrown (admin) Aug 10, 2013
- 5200 views
Yep, that's pretty much exactly what I suggested too. Only instead of a min-heap structure I just used a sorted sequence with a binary search for insertions - which would be much fast than a heap structure.
I stand corrected. Out of curiosity, I wrote a basic heap (priority queue) library and tested your idea with it and it was about three times faster than my first efforts. In fact, I can process a billion records in about 40 seconds.
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
WOW! Amazing. I am very interested. Please submit!
45. Re: Optimizing basic operations
- Posted by useless_ Aug 10, 2013
- 5172 views
Yep, that's pretty much exactly what I suggested too. Only instead of a min-heap structure I just used a sorted sequence with a binary search for insertions - which would be much fast than a heap structure.
I stand corrected. Out of curiosity, I wrote a basic heap (priority queue) library and tested your idea with it and it was about three times faster than my first efforts. In fact, I can process a billion records in about 40 seconds.
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
WOW! Amazing. I am very interested. Please submit!
I was afraid to ask for it.
useless
46. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 11, 2013
- 5142 views
Yep, that's pretty much exactly what I suggested too. Only instead of a min-heap structure I just used a sorted sequence with a binary search for insertions - which would be much fast than a heap structure.
I stand corrected. Out of curiosity, I wrote a basic heap (priority queue) library and tested your idea with it and it was about three times faster than my first efforts. In fact, I can process a billion records in about 40 seconds.
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
WOW! Amazing. I am very interested. Please submit!
I was afraid to ask for it.
useless
I would also be highly interested. Thanks. I was hoping for advancements like this.
47. Re: Optimizing basic operations
- Posted by gimlet Aug 16, 2013
- 4715 views
James,
The talk has mostly been about the speed of processing the data. You need to consider first of all the upload time. Given 10^12 records with a evaluuation integer and 50 numbers you have a very large amount to load. say 200 Tb. According to Tom's hardware the fastest off the shelf and affordable SSDs have data transfer rates approximating 0.5 Gb per second. That is to say you would be looking at an upload time of 400,000 seconds or so. That is 111 hours or 4.5 days. The processing of the records (assuming random data) could be expected to be 40,000 seconds (based on Derek's numbers). That is to say 11 hours. Multiplying the number of machines used is much more effective than using a faster processor. Eg 10 machines would get the load time down to 11 hours and the processing down to 1.1 hours. Cutting down the amount of data would also help (but, of course you still need to have the identifying data together with the evaluation).
48. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 16, 2013
- 4684 views
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
Sorry its taken so long, but documenting it took 3 or 4 times longer than writing the code.
Here is the prototype. Have fun ripping it to shreds.
49. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 17, 2013
- 4581 views
If anyone is interested, I'll submit the library for public/peer review and thus possible inclusion into the standard library.
Sorry its taken so long, but documenting it took 3 or 4 times longer than writing the code.
Here is the prototype. Have fun ripping it to shreds.
Your work is much appreciated.
50. Re: Optimizing basic operations
- Posted by James_at_Euphoria Aug 17, 2013
- 4544 views
The talk has mostly been about the speed of processing the data.
You need to consider first of all the upload time. Given 10^12 records with a evaluation integer and 50 numbers you have a very large amount to load. say 200 Tb.
According to Tom's hardware the fastest off the shelf and affordable SSDs have data transfer rates approximating 0.5 Gb per second. That is to say you would be looking at an upload time of 400,000 seconds or so.
That is 111 hours or 4.5 days.
The records are internally generated and are not loaded from an outside source. Think of a chess game and evaluating n levels deep. The (geometric) combinatorial explosion of possibilities is what creates so many records. My project isn't a chess game but the branching at each level is similar.
51. Re: Optimizing basic operations
- Posted by gimlet Aug 21, 2013
- 4408 views
Derek,
Looking at your code I notice that your implementation of
insert_record is faulty. If you insert data on increasing
key values into a max-heap the code appears to generate a
linked list.
You say that the tree is not balanced: but a heap is
required to be a coplete tree - no gaps except the bottom
row which is filled from left to right.
Code attached: (untested)
(* This is for a max-heap *)
const M = 256; (* say *) type rec = array[1..26] of integer; (* say *) var Heap : array[1..M,0..1] of integer; N : integer := 0; Data : array[1..M] of rec; (* Two procedures siftUp and siftDown which move a node *) (* into its correct position in the heap. *) procedure siftUp; var i,p,t : integer; begin i := N; while i > 1 do p := i div 2; if Heap[i,0] > Heap[p,0] then (* swap *) begin t := Heap[i]; Heap[i] := Heap[p]; Heap[p] := t; i := c end else (* stop *) exit while; end end; (* siftUp *) procedure siftDown; var i,c,t : integer; begin i := 1; c := 2; while c <= N do begin if c < N and Heap[c,0] < Heap[c+1,0] then c := c+1; if Heap[c,0] > Heap[i,0] then t := Heap[c]; Heap[c] := Heap[i]; Heap[i] := t; i := c; c := 2*c else (* stop *) exit while; end; end; (* siftDown *) (* Three procedures to add, remove and replace nodes. Addition is always done at the end - it is the only action that can cause an overflow. Take always takes the root and can cause an underflow. Replace is not required - it was written with respect to James's problem find the top 100 elements in a list. *) procedure add(r : rec); begin N := N+1; (* add a node *) Heap[N,0] := r[1]; Data[Heap[N,1]] := r; siftUp; end; (* add *) function take : rec; var r integer; begin r := Heap[1,1]; Heap[1] := Heap[N]; (* copies both node value and pointer *) Heap[N,1] := r; (* N now points to the record to be dropped *) N := N-1; siftDown; take := Data[r] end; (* take *) procedure replace(r : rec); begin Heap[1,0] := r[1]; Data[Heap[1,1]] := r; (* overwrites with new record *) siftDown; end; (* replace *)
52. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 21, 2013
- 4401 views
Derek,
Looking at your code I notice that your implementation of
insert_record is faulty. If you insert data on increasing
key values into a max-heap the code appears to generate a
linked list.
You say that the tree is not balanced: but a heap is
required to be a coplete tree - no gaps except the bottom
row which is filled from left to right.
Thank you for looking into this code. I appreciate peer reviews.
The insert_record() function is not faulty in the respect you mention. Now as you say, inserting records in priority order will create a linked list. However, note that a linked list is a variant of a tree, one in which each node has at most one child. Also, a priority queue (heap structure) is not bound to be implemented as a binary tree, although it can be, and arguably should be as that could give better insertion speed. The Heap can actually be implemented in any of a number of ways, each has it's own strengths and weakness. My implementation assumed that records would be added in a random manner.
But in any case, I'll have a go tonight at adding a 'balancing' algorithm to help flatten out the tree, regardless of the insertion order. But remember main the purpose of a Heap is to quickly retrieve the highest priority record - though your requirements might differ.
53. Re: Optimizing basic operations
- Posted by gimlet Aug 21, 2013
- 4293 views
Derek,
The point of using a heap is that it give O log N insert and extract min/max. It
is the basis of heapsort.
There are faults in your method:
1 Your method allows a wildly unbalanced tree. This is not the description of a heap.
Whether your heap would balance out in the long run - maybe - probably not.
2 Inserting from the top does not build a correct priority queue as it will
produce last in first out for any given priority. I don't think we want
to model queue-jumping.
3 Testing with random key data will generate a balanced tree - as such it is not a fair
test.
4 When you do extract min/max you promote one of its children - this may further
unbalance your tree.
The (standard) array approach is clean and simple. Is there some pressing reason
that requires you use pointers? Your code is quite complex.
54. Re: Optimizing basic operations
- Posted by DerekParnell (admin) Aug 22, 2013
- 4303 views
Thanks again for doing an analysis of the code. Greatly appreciated. Attempting to justify one's code is a healthy thing to do as it get's the brain thinking and (hopefully) helps improve and document the code itself.
The point of using a heap is that it give O log N insert and extract min/max. It is the basis of heapsort.
Well that may be your requirements but it wasn't mine. My requirement was to give quick [O(1)] access to the top priority record at anytime - nothing more than that. The type of heap you seem to be describing is a binary heap, which is certainly a basis for heapsort. As I'm sure you are aware, there are many types of heaps, each with their own benefits and drawbacks. The thing in common with them all is that the topmost parent node is accessible in O(1) time. All other constraints depend on the application's requirements, and would thus dictate the specific heap implementation.
There are faults in your method:
1 Your method allows a wildly unbalanced tree. This is not the description of a heap. Whether your heap would balance out in the long run - maybe - probably not.
I have a different opinion about whether or not an unbalanced tree can also be a heap. Given my requirements, it certainly can.
2 Inserting from the top does not build a correct priority queue as it will produce last in first out for any given priority. I don't think we want to model queue-jumping.
But in my code it does! You may have missed the fact that the structure I use contains a pointer(index) to the next top priority node, which is updated when ever the priority node is superseded by a newly inserted or removed node.
3 Testing with random key data will generate a balanced tree - as such it is not a fair test.
Why? Fair against which criteria? What if the real-world data being recorded is actually random?
4 When you do extract min/max you promote one of its children - this may further unbalance your tree.
Yes it may, but it doesn't affect my structure's goal of providing O(1) access to the top priority node.
The (standard) array approach is clean and simple. Is there some pressing reason that requires you use pointers? Your code is quite complex.
I'm sorry for the complexity. However, the API is simple to use and that will be the most that nearly anyone will see of the code.
By using pointers (indexes) I achieve at least two things - fast insertion and removal due to not having to move around array items, and stability due to each recorded added to the heap retains its index value (record-id) regardless of how many other records are added or removed.
My first attempt used an array and new records were inserted at the correct position and removed records were physically deleted from the array. I found that using the array implementation was about three times slower - updating a few pointers is faster than shuffling array items. Also, by using an array, the API user could never cache a record's position for later reference and instead would have to do a search prior to fetching a record from the heap.