1. rosettacode 500 milestone passed
- Posted by petelomax Mar 15, 2017
- 1909 views
Just tooting my own horn, Phix on rosettacode has recently passed the 500 milestone.
I have also found an interesting possible optimisation for the Knapsack/Bounded problem, and invite anyone to join in the discussion (on that page or here):
In the Phix solution, I noticed the order of goodies had an impact on search time, ranging from 12s (highest value/fewest items first) to 28s (lowest value first), which got me thinking. I also noticed that sometimes the optimal solution was found very early, and came up with a little theorem.
If we sort by weighted value, and on the last item (given that we have been greedy) we can take the full complement, I believe that means we must be on or have passed through the optimal solution.
Since I am not entirely convinced that hypothesis is right (it does seem rather unlikely that no-one has ever thought of it before), I have left the Phix solution such that it completes the full scan and crashes if a disproving dataset is used. One thing I have tried is increasing the available beer: at 8 and above it never displays "terminate" (it is the last item and 8 will not fit) and must perform a full search, which I view as a good sign, even though it means there is no gain in such circumstances. Actually, there is another possible optimisation right there: 7 beer will terminate in 1.49s (vs 15.3s for 8) so cap the last entry, after sorting, at something that will fit[now added]. It is also interesting to note that we can terminate even if the selection being inspected has not improved matters. Pete Lomax (talk) 13:58, 15 March 2017 (UTC)
2. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 15, 2017
- 1853 views
Just tooting my own horn, Phix on rosettacode has recently passed the 500 milestone.
I have also found an interesting possible optimisation for the Knapsack/Bounded problem, and invite anyone to join in the discussion (on that page or here):
At first glance I think I would make it easy on myself by reforming the data a bit:
- Expanded the data set to unit size
- replace pieces field with a weighted score, v/w or something like that
- Sort data by that score column before starting
(You do 2 of these 3 things)
Next I would split the data into 2 sorted lists - a good one and a bad one. The good one would be my best choices and the bad one is all the rest (with Beer right at the very end). The algorithm would essentially be bubble sort with a twist. I would take the top 2 elements of the bad list and see if they can be swapped them with one element in the good list if that would give me a higher total whilst still keeping the good list under 4kg. Algorithm ends if the scan of the good list reaches the end without being able to swap.
Well, that's my best guess. Can't say how [in]elegant it might be - or even correct.
In a quick test I noticed that just sorting the raw (unified) data by a weighted score column almost produced the optimal result on its own.
Spock
3. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 16, 2017
- 1807 views
Drat and double drat. I found a counterexample. With a limit of 10kg:
-- item weight value pieces {"gold", {4, 42, 2}}, {"silver", {6, 61, 1}}, {"bronze", {1, 10, 1}}
My scheme terminates with 2 gold and 1 bronze, value 94, but 1 gold and 1 silver has value 103...
At least the test I left in triggered correctly.
(Although that example looks very trivial, there is a subtlety to it that ~99% of other examples do not have.)
4. Re: rosettacode 500 milestone passed
- Posted by ChrisB (moderator) Mar 16, 2017
- 1798 views
Dick Dasterdly!
5. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 16, 2017
- 1759 views
.. I would make it easy on myself by reforming the data a bit:
- Expand the data set to unit size
- replace pieces field with a weighted score, v/w or something like that
- Sort data by that score column before starting
Next I would split the data into 2 sorted lists - a good one and a bad one. The good one would be my best choices and the bad one is all the rest (with Beer right at the very end). The algorithm would essentially be bubble sort with a twist. I would take the top 2 elements of the bad list and see if they can be swapped them with one element in the good list if that would give me a higher total whilst still keeping the good list under 4kg. Algorithm ends if the scan of the good list reaches the end without being able to swap.
Well, that's my best guess. Can't say how [in]elegant it might be - or even correct.
This first idea didn't find an optimal solution. But thanks to a hint in Pete's later post a single swap scan was added just before the double swap scan. Also, both lists left unsorted (except for the initialisation sort). Now the algorithm seems to give the correct answer in both examples. Stats:
400 kg example
- Time 0.016s
- Total element compares 4525
- Top level iterations 3 (or 6)
10kg example
- Total element compares 7
Spock
6. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 16, 2017
- 1750 views
I look forward to giving it a spin. Seems like you've got a bit of time to polish it though, rosettacode is down and the last time it was off for over a week.
Pete
7. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 16, 2017
- 1732 views
I look forward to giving it a spin. Seems like you've got a bit of time to polish it though, rosettacode is down and the last time it was off for over a week.
Pete
Ok. It is in Orac form but the conversion is trivial to Euphoria:
~ means length()
. means []
The output is at the unit level since I couldn't be bothered subtotalling..
Spock
include sorting.e include seq.e seq good, bad int WEIGHT = 400 int compares = 0 int iter = 0 procedure init() compares = 0 iter = 0 -- the raw data seq s = -- {name, weight, value, pieces} { { "map", 9, 150, 1}, { "compass", 13, 35, 1}, { "sandwich", 50, 60, 2}, { "glucose", 15, 60, 2}, { "tin", 68, 45, 3}, { "banana", 27, 60, 3}, { "apple", 39, 40, 3}, { "cheese", 23, 30, 1}, { "beer", 52, 10, 3}, { "suntan cream", 11, 70, 1}, { "water", 153, 200, 2}, { "camera", 32, 30, 1}, { "T-shirt", 24, 15, 2}, { "trousers", 48, 10, 2}, { "umbrella", 73, 40, 1}, { "waterproof trousers", 42, 70, 1}, { "waterproof overclothes", 43, 75, 1}, { "note-case", 22, 80, 1}, { "sunglasses", 7, 20, 1}, { "towel", 18, 12, 2}, { "socks", 4, 50, 1}, { "book", 30, 10, 2} } -- reform seq t = {} for i = 1 to ~s do -- unpack seq q = s.i seq name = q.1 int w = q.2 int v = q.3 int n = q.4 -- set t &= repeat({name, w, v, w/v}, n) end for -- sort t = sort(t, 4) -- split good = {} bad = {} w = 0 for i = 1 to ~ t do w += t.i.2 if w > WEIGHT then good = t[1 to i-1] bad = t[i to $] ?{~good, ~bad} return end if end for -- special case all on good list good = t end procedure function spare() -- scan good list and work out how much spare space there is -- init int w = 0 -- loop for i = 1 to ~good do w += good.i.2 end for -- spare int spare = WEIGHT - w assert spare >= 0 -- exit return spare end function function can_swap(int a, int b, int g) -- init compares += 1 -- bad - 1 or 2 parameters int wbad = bad.a.2 int vbad = bad.a.3 if b then wbad += bad.b.2 vbad += bad.b.3 end if -- good int wgood = good.g.2 int vgood = good.g.3 -- compare return wbad <= wgood+spare() and vbad > vgood end function function swap_single() -- loop over BAD for i = 1 to ~bad do -- loop over GOOD for j = 1 to ~good do -- compare if can_swap(i, 0, j) then seq t = bad.i bad.i = good.j good.j = t return 1 end if end for end for return 0 end function function swap_double() -- loop over BAD for a = 1 to ~bad-1 do -- loop 2 over BAD for b = a+1 to ~bad do -- loop over GOOD for g = 1 to ~good do -- compare if can_swap(a, b, g) then seq t1 = bad.a seq t2 = bad.b bad = bad[1 to a-1] & bad[a+1 to b-1] & bad[b+1 to $] & {good.g} good = good[1 to g-1] & good[g+1 to $] & {t1, t2} return 1 end if end for end for end for return 0 end function procedure run() -- init init() -- perform algorithm atom t1 = time() while swap_single() or swap_double() do iter += 1 end while puts(1, "top-level iterations, element compares :") ? {iter, compares} ? time() - t1 -- show results int weight = 0, value = 0 for i = 1 to ~good do weight += good.i.2 value += good.i.3 seq name = good.i.1 printf(1, "%d %d %s\n", {weight, value,name} ) end for ? 9/0 end procedure run()
8. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 17, 2017
- 1717 views
That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?
Just for a laugh, and because I can do so fairly trivially, I had a bash at supporting those orac idioms in Phix.
(All under control of a new "global ORAC = 1" flag so that I can disable/locate them easily.)
Admittedly dot subscripting affected more places and raised more questions than first anticipated, but seems done.
Documenting them (and shipping a new release) is a whole different beanbag, of course.
That raised me a few questions:
I assume that q.2.3 would be treated as q[2.3], not q[2][3] (please say yes!).
The dot notation also precludes ever treating ".4" as a number (must be written as 0.4 - no biggie).
You allow "q.i.j" as an alternative to "q[i][j]":
Would "q.i[j]" be equivalent to "q[i[j]]"?
Would "q[i].j" be equivalent to "q[i][j]"?
You allow "q[i to j]" but not "q.i to j" as an alternative to "q[i..j]"? (please say yes!)
You do not allow "q.i+1" as an alternative to "q[i+1]"[?], but I assume "q.(i+1)" would work.
sort(t, 4) seems like something worthwhile, but I skipped that for now, ditto assert.
Lastly, your scope rules are off, you allow eg:
for i=1 to ~s do int w end for w = 0
which in Phix terminates with "w has not been declared" (obvs. fixed by declaring w before the loop)
Pete
9. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 17, 2017
- 1687 views
That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?
I am quietly confident that swapping just 1 or 2 elements is enough to eventually reach the optimal choice. But having a pathological data set (rather than some Rosetta Code toy..) to experiment with would be prudent.
Just for a laugh, and because I can do so fairly trivially, I had a bash at supporting those orac idioms in Phix.
(All under control of a new "global ORAC = 1" flag so that I can disable/locate them easily.)
Admittedly dot subscripting affected more places and raised more questions than first anticipated, but seems done.
Documenting them is a whole different beanbag, of course.
That raised me a few questions:
I assume that q.2.3 would be treated as q[2.3], not q[2][3] (please say yes!).
Sorry to disappoint. q.2.3 means q[2][3] which is as sophisticated as it gets. Most of the other examples you suggest would just fail under Orac.
The dot notation also precludes ever treating ".4" as a number (must be written as 0.4 - no biggie).
The dot notation only kicks in when lead by a label. The syntax is simple enough so that there is never any ambiguity between fp numbers and sequence subscripts.
sort(t, 4) seems like something worthwhile, but I skipped that for now, ditto assert.
Sorting by column is very useful when handling 2d data sets. Particularly for business apps where just about everything is structured by rows/columns. In my work I pair the (heavy) data set with an index. When sorting only the index is actually altered. I use a virtual listview for viewing the data so even massive sets are trivial to explore from different angles without ever changing the actual data source.
Lastly, your scope rules are off, you allow eg:
for i=1 to ~s do int w end for w = 0
which in Phix terminates with "w has not been declared" (obvs. fixed by declaring w before the loop)
Pete
I once analysed the distributions of variables declared inside routines. So few private variables are used on average that it seemed to me scopes are pointless inside routines - particularly when realising that routines should be kept short and single purpose. I made the conscious decision to not include them. However I think it is useful to be able to declare variables at the point in code when they become needed, rather than interrupting the typing flow by inserting them at an earlier point in the text. YMMV but I just don't see the need to have any structural scopes inside routines. Perhaps someone could point out just where i got it wrong.
Spock
Forked into: rant // OE scope
forked out of 500 Rosetta
10. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 18, 2017
- 1637 views
That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?
I am quietly confident that swapping just 1 or 2 elements is enough to eventually reach the optimal choice. But having a pathological data set (rather than some Rosetta Code toy..) to experiment with would be prudent.
I am now not so sure the algorithm is finding the global maximum because dropping the initial sort produces a sub-optimal (though still good) result.
A more rigorous investigation would be to compare the algorithm against a brute force solver to find any pathological cases..
Spock
11. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 18, 2017
- 1643 views
Sorry to disappoint. q.2.3 means q[2][3]
Not really a problem, it is actually quite reasonable that getToken() should be told when floats can and cannot be valid.
scopes are pointless inside routines - I made the conscious decision to not include them.
Fair enough
I just don't see the need to have any structural scopes inside routines. Perhaps someone could point out just where i got it wrong.
The only two things that spring to mind are (and you did say "kept short and simple") that in an N-00 line routine
if this then -- <you are adding code here> else if that then -- loads more code elsif the_other then integer w else integer W -- <then you add more here>
First, without local mini-scopes, while working up top you forget there is a w 200 lines further down and add another one -> compilation error.
Not exactly what I'd call critical either, and that sort of thing can still happen with scopes.
Second, while working down below, you use w when you meant W, or perhaps that you can't rename/remove w without checking all the rest of the code.
No matter how you look at it though, mini-scopes cannot be justified in routines that fit on a single screen.
Pete
12. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 19, 2017
- 1629 views
I translated the C dynamic programming version:
sequence items = { {"map", 9, 150, 1}, {"compass", 13, 35, 1}, {"water", 153, 200, 2}, {"sandwich", 50, 60, 2}, {"glucose", 15, 60, 2}, {"tin", 68, 45, 3}, {"banana", 27, 60, 3}, {"apple", 39, 40, 3}, {"cheese", 23, 30, 1}, {"beer", 52, 10, 3}, {"suntan cream", 11, 70, 1}, {"camera", 32, 30, 1}, {"T-shirt", 24, 15, 2}, {"trousers", 48, 10, 2}, {"umbrella", 73, 40, 1}, {"waterproof trousers", 42, 70, 1}, {"waterproof overclothes",43, 75, 1}, {"note-case", 22, 80, 1}, {"sunglasses", 7, 20, 1}, {"towel", 18, 12, 2}, {"socks", 4, 50, 1}, {"book", 30, 10, 2}, }; sequence {names,weights,points,counts} = columnize(items) constant n = length(items) function knapsack(int w) int v -- m is the achievable points matrix: -- Note that Phix uses 1-based indexes, so m[1][1] -- actually holds points for 0 items of weight 0, -- and m[n+1][w+1] is for n items at weight w. seq m = repeat(repeat(0,w+1),n+1) for i=1 to n do for j=1 to w+1 do -- (0 to w really) m[i+1][j] = m[i][j] for k=1 to counts[i] do if k*weights[i]>j-1 then exit end if v = m[i][j-k*weights[i]]+k*points[i] if v>m[i+1][j] then m[i+1][j] = v end if end for end for end for seq s = repeat(0,n) int j = w+1 -- (w -> 0 really) for i=n+1 to 2 by -1 do -- (n to 1 really) v = m[i][j] int k = 0 while v!=m[i-1][j]+k*points[i-1] do s[i-1] += 1 j -= weights[i-1] k += 1 end while end for return s end function int tc = 0, tw = 0, tv = 0 seq s = knapsack(400) for i=1 to n do int si = s[i] if si then printf(1,"%-22s %5d %5d %5d\n", {names[i], si, si*weights[i], si*points[i]}) tc += si tw += si*weights[i] tv += si*points[i] end if end for printf(1,"%-22s %5d %5d %5d\n", {"count, weight, points:", tc, tw, tv})
Simple and elegant, yet quite difficult to get your head around - there are some youtube vids (search for knapsack problem) that might help.
Pete
13. Re: rosettacode 500 milestone passed
- Posted by Spock Mar 19, 2017
- 1597 views
I translated the C dynamic programming version:
...
Simple and elegant, yet quite difficult to get your head around - there are some youtube vids (search for knapsack problem) that might help.
Pete
Nice. There are a constant 14k total internal iterations in the main loops, regardless of input order. Unrolling the data to unit size only slightly increased the loop count. I tried various pre-sorting modes but only got modest improvements on the inner if blocks. So, all up very well crafted.
I note the CPP version is described as "Initially taken from C but than fixed and refactored". Does that mean some subtle bug still lurks in the C version? The CPP version is also 3 times longer than the one in C. I didn't realise that "refactor" actually means to make longer.
Spock
14. Re: rosettacode 500 milestone passed
- Posted by petelomax Mar 20, 2017
- 1646 views
I note the CPP version is described as "Initially taken from C but than fixed and refactored". Does that mean some subtle bug still lurks in the C version? The CPP version is also 3 times longer than the one in C. I didn't realise that "refactor" actually means to make longer.
I checked the history and found this: http://rosettacode.org/mw/index.php?title=Knapsack_problem/Bounded&diff=220921&oldid=219546
It seems the C++ version is based on a very different C version to the current one and hence that comment no longer applies.
I have added an appropriate comment to that page.
Still playing with the full-on recursive version, I've got an idea about caching ranges.
Right, I'm done: http://rosettacode.org/wiki/Knapsack_problem/Bounded#Range_cache_version
It should use less memory than the dp version, plus it can handle fractional weights.
Pete