1. rosettacode 500 milestone passed

Just tooting my own horn, Phix on rosettacode has recently passed the 500 milestone.

I have also found an interesting possible optimisation for the Knapsack/Bounded problem, and invite anyone to join in the discussion (on that page or here):

Knapsack/Bounded/Discussion said...

In the Phix solution, I noticed the order of goodies had an impact on search time, ranging from 12s (highest value/fewest items first) to 28s (lowest value first), which got me thinking. I also noticed that sometimes the optimal solution was found very early, and came up with a little theorem.

If we sort by weighted value, and on the last item (given that we have been greedy) we can take the full complement, I believe that means we must be on or have passed through the optimal solution.

Since I am not entirely convinced that hypothesis is right (it does seem rather unlikely that no-one has ever thought of it before), I have left the Phix solution such that it completes the full scan and crashes if a disproving dataset is used. One thing I have tried is increasing the available beer: at 8 and above it never displays "terminate" (it is the last item and 8 will not fit) and must perform a full search, which I view as a good sign, even though it means there is no gain in such circumstances. Actually, there is another possible optimisation right there: 7 beer will terminate in 1.49s (vs 15.3s for 8) so cap the last entry, after sorting, at something that will fit[now added]. It is also interesting to note that we can terminate even if the selection being inspected has not improved matters. Pete Lomax (talk) 13:58, 15 March 2017 (UTC)

new topic     » topic index » view message » categorize

2. Re: rosettacode 500 milestone passed

petelomax said...

Just tooting my own horn, Phix on rosettacode has recently passed the 500 milestone.

I have also found an interesting possible optimisation for the Knapsack/Bounded problem, and invite anyone to join in the discussion (on that page or here):

At first glance I think I would make it easy on myself by reforming the data a bit:

  • Expanded the data set to unit size
  • replace pieces field with a weighted score, v/w or something like that
  • Sort data by that score column before starting

(You do 2 of these 3 things)

Next I would split the data into 2 sorted lists - a good one and a bad one. The good one would be my best choices and the bad one is all the rest (with Beer right at the very end). The algorithm would essentially be bubble sort with a twist. I would take the top 2 elements of the bad list and see if they can be swapped them with one element in the good list if that would give me a higher total whilst still keeping the good list under 4kg. Algorithm ends if the scan of the good list reaches the end without being able to swap.

Well, that's my best guess. Can't say how [in]elegant it might be - or even correct.

In a quick test I noticed that just sorting the raw (unified) data by a weighted score column almost produced the optimal result on its own.

Spock

new topic     » goto parent     » topic index » view message » categorize

3. Re: rosettacode 500 milestone passed

Drat and double drat. I found a counterexample. With a limit of 10kg:

-- item                     weight value pieces 
{"gold",                    {4,     42,     2}}, 
{"silver",                  {6,     61,     1}}, 
{"bronze",                  {1,     10,     1}} 

My scheme terminates with 2 gold and 1 bronze, value 94, but 1 gold and 1 silver has value 103...

At least the test I left in triggered correctly.

(Although that example looks very trivial, there is a subtlety to it that ~99% of other examples do not have.)

new topic     » goto parent     » topic index » view message » categorize

4. Re: rosettacode 500 milestone passed

Dick Dasterdly!

new topic     » goto parent     » topic index » view message » categorize

5. Re: rosettacode 500 milestone passed

Spock said...

.. I would make it easy on myself by reforming the data a bit:

  • Expand the data set to unit size
  • replace pieces field with a weighted score, v/w or something like that
  • Sort data by that score column before starting

Next I would split the data into 2 sorted lists - a good one and a bad one. The good one would be my best choices and the bad one is all the rest (with Beer right at the very end). The algorithm would essentially be bubble sort with a twist. I would take the top 2 elements of the bad list and see if they can be swapped them with one element in the good list if that would give me a higher total whilst still keeping the good list under 4kg. Algorithm ends if the scan of the good list reaches the end without being able to swap.

Well, that's my best guess. Can't say how [in]elegant it might be - or even correct.

This first idea didn't find an optimal solution. But thanks to a hint in Pete's later post a single swap scan was added just before the double swap scan. Also, both lists left unsorted (except for the initialisation sort). Now the algorithm seems to give the correct answer in both examples. Stats:

400 kg example

  • Time 0.016s
  • Total element compares 4525
  • Top level iterations 3 (or 6)

10kg example

  • Total element compares 7

Spock

new topic     » goto parent     » topic index » view message » categorize

6. Re: rosettacode 500 milestone passed

I look forward to giving it a spin. Seems like you've got a bit of time to polish it though, rosettacode is down and the last time it was off for over a week.

Pete

new topic     » goto parent     » topic index » view message » categorize

7. Re: rosettacode 500 milestone passed

petelomax said...

I look forward to giving it a spin. Seems like you've got a bit of time to polish it though, rosettacode is down and the last time it was off for over a week.

Pete

Ok. It is in Orac form but the conversion is trivial to Euphoria:
~ means length()
. means []

The output is at the unit level since I couldn't be bothered subtotalling..

Spock

include sorting.e 
include seq.e 
 
seq good, bad 
 
int WEIGHT = 400 
 
int compares = 0 
int iter = 0 
 
 
procedure init() 
 
	compares = 0 
	iter = 0 
 
	-- the raw data 
	seq s = -- {name, weight, value, pieces} 
 { 
	{ "map",                     9,     150,    1}, 
	{ "compass",                 13,    35,     1}, 
	{ "sandwich",                50,    60,     2}, 
	{ "glucose",                 15,    60,     2}, 
	{ "tin",                     68,    45,     3}, 
	{ "banana",                  27,    60,     3}, 
	{ "apple",                   39,    40,     3}, 
	{ "cheese",                  23,    30,     1}, 
	{ "beer",                    52,    10,     3}, 
	{ "suntan cream",            11,    70,     1}, 
	{ "water",                   153,   200,    2}, 
	{ "camera",                  32,    30,     1}, 
	{ "T-shirt",                 24,    15,     2}, 
	{ "trousers",                48,    10,     2}, 
	{ "umbrella",                73,    40,     1}, 
	{ "waterproof trousers",     42,    70,     1}, 
	{ "waterproof overclothes",  43,    75,     1}, 
	{ "note-case",               22,    80,     1}, 
	{ "sunglasses",              7,     20,     1}, 
	{ "towel",                   18,    12,     2}, 
	{ "socks",                   4,     50,     1}, 
	{ "book",                    30,    10,     2} 
 
 } 
 
 
	-- reform 
	seq t = {} 
 
	for i = 1 to ~s do 
	 
		-- unpack 
		seq q = s.i 
		seq name = q.1 
		int w = q.2 
		int v = q.3 
		int n = q.4 
 
		-- set 
		t &= repeat({name, w, v, w/v}, n) 
 
	end for 
 
	-- sort 
	t = sort(t, 4) 
 
	-- split 
	good = {} 
	bad = {} 
	w = 0 
	for i = 1 to ~ t do 
		w += t.i.2 
		if w > WEIGHT then 
			good = t[1 to i-1] 
			bad = t[i to $] 
			?{~good, ~bad} 
			return 
		end if 
	end for 
 
	-- special case all on good list 
	good = t 
 
	end procedure 
 
 
 
 
function spare() 
 
	-- scan good list and work out how much spare space there is 
 
	-- init 
	int w = 0 
 
	-- loop 
	for i = 1 to ~good do 
		w += good.i.2 
	end for 
 
	-- spare 
	int spare = WEIGHT - w 
 
	assert spare >= 0 
 
	-- exit 
	return spare 
 
	end function 
 
 
 
function can_swap(int a, int b, int g) 
 
	-- init 
	compares += 1 
 
	-- bad - 1 or 2 parameters 
	int wbad = bad.a.2 
	int vbad = bad.a.3 
	if b then 
		wbad += bad.b.2 
		vbad += bad.b.3 
	end if 
 
	-- good 
	int wgood = good.g.2 
	int vgood = good.g.3 
 
	-- compare 
	return wbad <= wgood+spare() and vbad > vgood 
 
	end function 
 
 
 
function swap_single() 
 
	-- loop over BAD 
	for i = 1 to ~bad do 
 
		-- loop over GOOD 
		for j = 1 to ~good do 
 
			-- compare 
			if can_swap(i, 0, j) then 
				seq t = bad.i 
				bad.i = good.j 
				good.j = t 
				return 1 
			end if 
 
		end for 
 
	end for 
 
	return 0 
 
	end function 
 
 
 
function swap_double() 
 
	-- loop over BAD 
	for a = 1 to ~bad-1 do 
 
		-- loop 2 over BAD 
		for b = a+1 to ~bad do 
 
			-- loop over GOOD 
			for g = 1 to ~good do 
 
				-- compare 
				if can_swap(a, b, g) then 
					seq t1 = bad.a 
					seq t2 = bad.b 
					bad = bad[1 to a-1] & bad[a+1 to b-1] & bad[b+1 to $] & {good.g} 
					good = good[1 to g-1] & good[g+1 to $] & {t1, t2} 
					return 1 
				end if 
 
			end for 
 
		end for 
 
	end for 
 
	return 0 
 
	end function 
 
 
 
procedure run() 
 
	-- init 
	init() 
 
	-- perform algorithm 
	atom t1 = time() 
 
	while swap_single() or swap_double() do 
		iter += 1 
	end while 
 
	puts(1, "top-level iterations, element compares :") ? {iter, compares} 
 
	? time() - t1 
 
	-- show results 
	int weight = 0, value = 0 
	for i = 1 to ~good do 
		weight += good.i.2 
		value += good.i.3 
		seq name = good.i.1 
		printf(1, "%d %d %s\n", {weight, value,name} ) 
	end for	 
 
	? 9/0 
 
	end procedure 
 
 
run() 
new topic     » goto parent     » topic index » view message » categorize

8. Re: rosettacode 500 milestone passed

That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?

Just for a laugh, and because I can do so fairly trivially, I had a bash at supporting those orac idioms in Phix.
(All under control of a new "global ORAC = 1" flag so that I can disable/locate them easily.)
Admittedly dot subscripting affected more places and raised more questions than first anticipated, but seems done.
Documenting them (and shipping a new release) is a whole different beanbag, of course.

That raised me a few questions:
I assume that q.2.3 would be treated as q[2.3], not q[2][3] (please say yes!).
The dot notation also precludes ever treating ".4" as a number (must be written as 0.4 - no biggie).
You allow "q.i.j" as an alternative to "q[i][j]":
Would "q.i[j]" be equivalent to "q[i[j]]"?
Would "q[i].j" be equivalent to "q[i][j]"?
You allow "q[i to j]" but not "q.i to j" as an alternative to "q[i..j]"? (please say yes!)
You do not allow "q.i+1" as an alternative to "q[i+1]"[?], but I assume "q.(i+1)" would work.
sort(t, 4) seems like something worthwhile, but I skipped that for now, ditto assert.
Lastly, your scope rules are off, you allow eg:

for i=1 to ~s do  
   int w 
end for 
w = 0  

which in Phix terminates with "w has not been declared" (obvs. fixed by declaring w before the loop)

Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: rosettacode 500 milestone passed

petelomax said...

That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?

I am quietly confident that swapping just 1 or 2 elements is enough to eventually reach the optimal choice. But having a pathological data set (rather than some Rosetta Code toy..) to experiment with would be prudent.

said...

Just for a laugh, and because I can do so fairly trivially, I had a bash at supporting those orac idioms in Phix.
(All under control of a new "global ORAC = 1" flag so that I can disable/locate them easily.)
Admittedly dot subscripting affected more places and raised more questions than first anticipated, but seems done.
Documenting them is a whole different beanbag, of course.

That raised me a few questions:
I assume that q.2.3 would be treated as q[2.3], not q[2][3] (please say yes!).

Sorry to disappoint. q.2.3 means q[2][3] which is as sophisticated as it gets. Most of the other examples you suggest would just fail under Orac.

said...

The dot notation also precludes ever treating ".4" as a number (must be written as 0.4 - no biggie).

The dot notation only kicks in when lead by a label. The syntax is simple enough so that there is never any ambiguity between fp numbers and sequence subscripts.

said...

sort(t, 4) seems like something worthwhile, but I skipped that for now, ditto assert.

Sorting by column is very useful when handling 2d data sets. Particularly for business apps where just about everything is structured by rows/columns. In my work I pair the (heavy) data set with an index. When sorting only the index is actually altered. I use a virtual listview for viewing the data so even massive sets are trivial to explore from different angles without ever changing the actual data source.

said...

Lastly, your scope rules are off, you allow eg:

for i=1 to ~s do  
   int w 
end for 
w = 0  

which in Phix terminates with "w has not been declared" (obvs. fixed by declaring w before the loop)

Pete

I once analysed the distributions of variables declared inside routines. So few private variables are used on average that it seemed to me scopes are pointless inside routines - particularly when realising that routines should be kept short and single purpose. I made the conscious decision to not include them. However I think it is useful to be able to declare variables at the point in code when they become needed, rather than interrupting the typing flow by inserting them at an earlier point in the text. YMMV but I just don't see the need to have any structural scopes inside routines. Perhaps someone could point out just where i got it wrong.

Spock


Forked into: rant // OE scope
forked out of 500 Rosetta

new topic     » goto parent     » topic index » view message » categorize

10. Re: rosettacode 500 milestone passed

Spock said...
petelomax said...

That seems pretty good, not that I've analysed it in any detail.
I have to ask, are you quite certain there are no data sets that might ever need a treble or quadruple (etc) swap?

I am quietly confident that swapping just 1 or 2 elements is enough to eventually reach the optimal choice. But having a pathological data set (rather than some Rosetta Code toy..) to experiment with would be prudent.

I am now not so sure the algorithm is finding the global maximum because dropping the initial sort produces a sub-optimal (though still good) result.

A more rigorous investigation would be to compare the algorithm against a brute force solver to find any pathological cases..

Spock

new topic     » goto parent     » topic index » view message » categorize

11. Re: rosettacode 500 milestone passed

Spock said...

Sorry to disappoint. q.2.3 means q[2][3]

Not really a problem, it is actually quite reasonable that getToken() should be told when floats can and cannot be valid.

Spock said...

scopes are pointless inside routines - I made the conscious decision to not include them.

Fair enough

Spock said...

I just don't see the need to have any structural scopes inside routines. Perhaps someone could point out just where i got it wrong.

The only two things that spring to mind are (and you did say "kept short and simple") that in an N-00 line routine

if this then 
   -- <you are adding code here> 
else 
   if that then 
      -- loads more code 
   elsif the_other then 
      integer w 
   else 
      integer W 
      -- <then you add more here> 

First, without local mini-scopes, while working up top you forget there is a w 200 lines further down and add another one -> compilation error.
Not exactly what I'd call critical either, and that sort of thing can still happen with scopes.

Second, while working down below, you use w when you meant W, or perhaps that you can't rename/remove w without checking all the rest of the code.
No matter how you look at it though, mini-scopes cannot be justified in routines that fit on a single screen.

Pete

new topic     » goto parent     » topic index » view message » categorize

12. Re: rosettacode 500 milestone passed

I translated the C dynamic programming version:

sequence items = { 
                  {"map",                    9,   150,   1}, 
                  {"compass",               13,    35,   1}, 
                  {"water",                153,   200,   2}, 
                  {"sandwich",              50,    60,   2}, 
                  {"glucose",               15,    60,   2}, 
                  {"tin",                   68,    45,   3}, 
                  {"banana",                27,    60,   3}, 
                  {"apple",                 39,    40,   3}, 
                  {"cheese",                23,    30,   1}, 
                  {"beer",                  52,    10,   3}, 
                  {"suntan cream",          11,    70,   1}, 
                  {"camera",                32,    30,   1}, 
                  {"T-shirt",               24,    15,   2}, 
                  {"trousers",              48,    10,   2}, 
                  {"umbrella",              73,    40,   1}, 
                  {"waterproof trousers",   42,    70,   1}, 
                  {"waterproof overclothes",43,    75,   1}, 
                  {"note-case",             22,    80,   1}, 
                  {"sunglasses",             7,    20,   1}, 
                  {"towel",                 18,    12,   2}, 
                  {"socks",                  4,    50,   1}, 
                  {"book",                  30,    10,   2}, 
                 }; 
 
sequence {names,weights,points,counts} = columnize(items) 
 
constant n = length(items) 
 
function knapsack(int w) 
int v 
-- m is the achievable points matrix: 
-- Note that Phix uses 1-based indexes, so m[1][1]  
--  actually holds points for 0 items of weight 0, 
--  and m[n+1][w+1] is for n items at weight w. 
seq m = repeat(repeat(0,w+1),n+1) 
    for i=1 to n do 
        for j=1 to w+1 do       -- (0 to w really) 
            m[i+1][j] = m[i][j] 
            for k=1 to counts[i] do 
                if k*weights[i]>j-1 then 
                    exit 
                end if 
                v = m[i][j-k*weights[i]]+k*points[i] 
                if v>m[i+1][j] then 
                    m[i+1][j] = v 
                end if 
            end for 
        end for 
    end for 
    seq s = repeat(0,n) 
    int j = w+1                 -- (w -> 0 really) 
    for i=n+1 to 2 by -1 do     -- (n to 1 really) 
        v = m[i][j] 
        int k = 0 
        while v!=m[i-1][j]+k*points[i-1] do 
            s[i-1] += 1 
            j -= weights[i-1] 
            k += 1 
        end while 
    end for 
    return s 
end function 
 
int tc = 0, tw = 0, tv = 0 
seq s = knapsack(400) 
for i=1 to n do 
    int si = s[i] 
    if si then 
        printf(1,"%-22s %5d %5d %5d\n", {names[i], si, si*weights[i], si*points[i]}) 
        tc += si 
        tw += si*weights[i] 
        tv += si*points[i] 
    end if 
end for 
printf(1,"%-22s %5d %5d %5d\n", {"count, weight, points:", tc, tw, tv}) 

Simple and elegant, yet quite difficult to get your head around - there are some youtube vids (search for knapsack problem) that might help.

Pete

new topic     » goto parent     » topic index » view message » categorize

13. Re: rosettacode 500 milestone passed

petelomax said...

I translated the C dynamic programming version:

... 

Simple and elegant, yet quite difficult to get your head around - there are some youtube vids (search for knapsack problem) that might help.

Pete

Nice. There are a constant 14k total internal iterations in the main loops, regardless of input order. Unrolling the data to unit size only slightly increased the loop count. I tried various pre-sorting modes but only got modest improvements on the inner if blocks. So, all up very well crafted.

I note the CPP version is described as "Initially taken from C but than fixed and refactored". Does that mean some subtle bug still lurks in the C version? The CPP version is also 3 times longer than the one in C. I didn't realise that "refactor" actually means to make longer.

Spock

new topic     » goto parent     » topic index » view message » categorize

14. Re: rosettacode 500 milestone passed

Spock said...

I note the CPP version is described as "Initially taken from C but than fixed and refactored". Does that mean some subtle bug still lurks in the C version? The CPP version is also 3 times longer than the one in C. I didn't realise that "refactor" actually means to make longer.

I checked the history and found this: http://rosettacode.org/mw/index.php?title=Knapsack_problem/Bounded&diff=220921&oldid=219546

It seems the C++ version is based on a very different C version to the current one and hence that comment no longer applies.
I have added an appropriate comment to that page.

Still playing with the full-on recursive version, I've got an idea about caching ranges.
Right, I'm done: http://rosettacode.org/wiki/Knapsack_problem/Bounded#Range_cache_version
It should use less memory than the dp version, plus it can handle fractional weights.

Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu