1. Subsets
- Posted by Juergen Luethje <j.lue at gmx.de> Jun 16, 2007
- 632 views
- Last edited Jun 17, 2007
Hi all, I've written functions for generating (indexes of) all subsets of a given set. They don't use recursion and are easy to use, and they are not too slow. However, since a set with n elements has power(2,n) subsets, I'd like to ask whether maybe someone can give me faster routines. Here are mine:
sequence S integer N, K global function first_subset (integer n) -- in: n: number of elements in the set S = repeat(0, n+1) N = n K = 1 return {} end function global function next_subset() if S[K] < N then K += 1 S[K] = S[K-1] + 1 return S[2..K] elsif K > 2 then K -= 1 S[K] += 1 return S[2..K] else return -1 end if end function -- Demo object subset integer n n = 4 -- generate all subsets in order subset = first_subset(n) while sequence(subset) do ? subset subset = next_subset() end while
Regards, Juergen
2. Re: Subsets
- Posted by CChris <christian.cuvier at agriculture.gouv.fr> Jun 16, 2007
- 601 views
- Last edited Jun 17, 2007
Juergen Luethje wrote: > > Hi all, > > I've written functions for generating (indexes of) all subsets of a given set. > They don't use recursion and are easy to use, and they are not too slow. > However, since a set with n elements has power(2,n) subsets, I'd like to ask > whether maybe someone can give me faster routines. Here are mine: > }}} <eucode> > sequence S > integer N, K > > global function first_subset (integer n) > -- in: n: number of elements in the set > S = repeat(0, n+1) > N = n > K = 1 > return {} > end function > > global function next_subset() > if S[K] < N then > K += 1 > S[K] = S[K-1] + 1 > return S[2..K] > elsif K > 2 then > K -= 1 > S[K] += 1 > return S[2..K] > else > return -1 > end if > end function > > > -- Demo > object subset > integer n > > n = 4 > > -- generate all subsets in order > subset = first_subset(n) > while sequence(subset) do > ? subset > subset = next_subset() > end while > </eucode> {{{ > Regards, > Juergen Did you try the routines in http://oedoc.free.fr/Fichiers/ESL/sets.zip ? I have no idea whether they are faster, but some of them to that precise work. CChris
3. Re: Subsets
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Jun 17, 2007
- 630 views
Juergen Luethje wrote: > > Hi all, > > I've written functions for generating (indexes of) all subsets of a given set. > They don't use recursion and are easy to use, and they are not too slow. > However, since a set with n elements has power(2,n) subsets, I'd like to ask > whether maybe someone can give me faster routines. My first question is Are all elements unique? If not I'd think of setting up a "mask" array of elements to include/avoid in the subsets, and that way decimate (or better) the number of subsets that there are, if that is valid. My second question is Can you return 1 to signal "iterate through S[2] to S[K]" as that return S[2..K] statement, or anywhere you create such a slice, is probably the biggest cost in your code. I would also try a binary approach, ie return {0,0,1}, then {0,1,0}, then {0,1,1}, then {1,0,0} etc. Actually that would be my first choice. Regards, Pete
4. Re: Subsets
- Posted by Juergen Luethje <j.lue at gmx.de> Jun 17, 2007
- 603 views
CChris wrote: > Juergen Luethje wrote: >> >> Hi all, >> >> I've written functions for generating (indexes of) all subsets of a given >> set. >> They don't use recursion and are easy to use, and they are not too slow. >> However, since a set with n elements has power(2,n) subsets, I'd like to ask >> whether maybe someone can give me faster routines. Here are mine: [code snipped] > Did you try the routines in <a > href="http://oedoc.free.fr/Fichiers/ESL/sets.zip">http://oedoc.free.fr/Fichiers/ESL/sets.zip</a> > ? > I have no idea whether they are faster, but some of them to that precise work. Thanks for the suggestion. I supplemented my code, so that it does the same things than yours, i.e. collect all subsets in one sequence. My code is clearly faster. And I encountered a problem with your code. When I used
set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}
I got the message: "Your program has run out of memory." although my machine has 2 GigaBytes RAM and even more virtual memory. This is the reason why whenever possible I try not to write functions that return a collection of all results in a big sequence, but prefer to use the find_first()/find_next() approach. Regards, Juergen
5. Re: Subsets
- Posted by CChris <christian.cuvier at agriculture.gouv.fr> Jun 17, 2007
- 594 views
Juergen Luethje wrote: > > CChris wrote: > > > Juergen Luethje wrote: > >> > >> Hi all, > >> > >> I've written functions for generating (indexes of) all subsets of a given > >> set. > >> They don't use recursion and are easy to use, and they are not too slow. > >> However, since a set with n elements has power(2,n) subsets, I'd like to > >> ask > >> whether maybe someone can give me faster routines. Here are mine: > > [code snipped] > > > Did you try the routines in <a > > href="http://oedoc.free.fr/Fichiers/ESL/sets.zip">http://oedoc.free.fr/Fichiers/ESL/sets.zip</a> > ?</font></i> > > I have no idea whether they are faster, but some of them to that precise > > work. > > Thanks for the suggestion. > I supplemented my code, so that it does the same things than yours, > i.e. collect all subsets in one sequence. My code is clearly faster. > > And I encountered a problem with your code. When I used > }}} <eucode> > set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25} > </eucode> {{{ > I got the message: > "Your program has run out of memory." > although my machine has 2 GigaBytes RAM and even more virtual memory. > > This is the reason why whenever possible I try not to write functions that > return a collection of all results in a big sequence, but prefer to use the > find_first()/find_next() approach. > > Regards, > Juergen Ok, thanks for the feedback. I had no idea of what the speed cratio could be, and had not specifically tried to optimise that at the time. I believe that, most of the time, you need only a list of some subsets that meet some criteria, and then collecting the result in a sequence is desirable. For large sets, some sort of enumerator can be devised using multitasking. That functionality was only experimental at the time I wrote these routines. Since concepts like ESL don't seem to attract many people, I do some advertisement from time to time for http://oedoc.free.fr/Fichiers/ESL/, and may release some parts on my own terms some day, since the original project appears to be dead. Note that locate.e would need a thorough rewrite now that find_from() and match_from() are available. CChris
6. Re: Subsets
- Posted by Juergen Luethje <j.lue at gmx.de> Jun 17, 2007
- 639 views
Pete Lomax wrote: > Juergen Luethje wrote: > >> Hi all, >> >> I've written functions for generating (indexes of) all subsets of a given >> set. >> They don't use recursion and are easy to use, and they are not too slow. >> However, since a set with n elements has power(2,n) subsets, I'd like to ask >> whether maybe someone can give me faster routines. > > My first question is Are all elements unique? Yes, they are. I mean "sets" in a mathematical sense. > If not I'd think of setting up > a "mask" array of elements to include/avoid in the subsets, and that way > decimate > (or better) the number of subsets that there are, if that is valid. > > My second question is Can you return 1 to signal "iterate through S[2] to > S[K]" > as that return S[2..K] statement, or anywhere you create such a slice, is > probably > the biggest cost in your code. Good idea, thanks! I couldn't see the wood for the trees. Even better, I can return K, so only S must be global (since I want to put these functions in a library). Normally, I try to avoid global variables, but in this case I think it's much better to use a global sequence rather than returning a slice. This is not only faster, but also consumes less memory. Here is my speed test:
--=============================[ sets.e ]=============================-- integer N, K ------------------------------------------------------------------------ sequence S global function first_subset (integer n) -- in: n: number of elements in the set S = repeat(0, n+1) N = n K = 1 return {} end function global function next_subset() if S[K] < N then K += 1 S[K] = S[K-1] + 1 return S[2..K] elsif K > 2 then K -= 1 S[K] += 1 return S[2..K] else return -1 end if end function ------------------------------------------------------------------------ global sequence Subset_Buffer -- The program that uses these functions should read Subset_Buffer[2..k]. global function first_subset_boundary (integer n) -- in: n: number of elements in the set Subset_Buffer = repeat(0, n+1) N = n K = 1 return K end function global function next_subset_boundary() if Subset_Buffer[K] < N then K += 1 Subset_Buffer[K] = Subset_Buffer[K-1] + 1 return K elsif K > 2 then K -= 1 Subset_Buffer[K] += 1 return K else return -1 end if end function --============================[ main.exw ]============================-- include sets.e object current_subset sequence set atom t integer n, k set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25} n = length(set) ------------------------------------------------------------------------ t = time() current_subset = first_subset(n) while sequence(current_subset) do -- ? current_subset current_subset = next_subset() end while t = time()-t ? t -- 7.85 sec. puts(1, "------------------------------------\n") ------------------------------------------------------------------------ t = time() k = first_subset_boundary(n) while k > -1 do -- ? Subset_Buffer[2..k] k = next_subset_boundary() end while t = time()-t ? t -- 2.97 sec. ------------------------------------------------------------------------ if getc(0) then end if
> I would also try a binary approach, ie return {0,0,1}, then {0,1,0}, then > {0,1,1}, > then {1,0,0} etc. Actually that would be my first choice. I had already made a binary approach, that is 10 times slower than my first version. Maybe I can improve the binary approach, but I guess it's hard to beat the new fast version here. I'll write about my binary approach(es) in a separate post. Regards, Juergen
7. Re: Subsets
- Posted by Juergen Luethje <j.lue at gmx.de> Jun 17, 2007
- 583 views
Pete, here is the fastest binary solution which I have found so far. It is about 20 times slower than the other function (after it has been improved according to your suggestion).
include machine.e sequence Subset_Buffer function unrank_subset (integer n, integer rank) -- n: size of the set sequence bits integer k bits = int_to_bits(rank, n) k = 0 for i = 1 to n do if bits[i] = 1 then k += 1 Subset_Buffer[k] = i end if end for return k end function atom t integer n, size n = 24 -- generate all subsets t = time() Subset_Buffer = repeat(0, n) for i = 0 to power(2,n)-1 do size = unrank_subset(n, i) -- ? Subset_Buffer[1..size] end for t = time()-t printf(1, "\n%.2f Sek.\n", {t}) if getc(0) then end if
Do you (or someone else) have any more suggestions? Regards, Juergen
8. Re: Subsets
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Jun 18, 2007
- 621 views
Juergen Luethje wrote: > > Pete, > > here is the fastest binary solution which I have found so far. > It is about 20 times slower than the other function (after it has been > improved according to your suggestion). This is what I meant, perhaps I sould have called it a boolean array:
include machine.e sequence Subset_Buffer function unrank_subset (integer n, integer rank) -- n: size of the set sequence bits integer k bits = int_to_bits(rank, n) k = 0 for i = 1 to n do if bits[i] = 1 then k += 1 Subset_Buffer[k] = i end if end for return k end function atom t, t1, t2 integer n, size function getNextSet() for i=n to 1 by -1 do if Subset_Buffer[i]=0 then Subset_Buffer[i]=1 return 1 end if Subset_Buffer[i]=0 end for return 0 end function n = 24 -- generate all subsets t = time() Subset_Buffer = repeat(0, n) for i = 0 to power(2,n)-1 do size = unrank_subset(n, i) -- ? Subset_Buffer[1..size] end for t1 = time()-t t=time() Subset_Buffer = repeat(0, n) while 1 do if not getNextSet() then exit end if end while t2 = time()-t printf(1, "\nt1=%.2f, t2=%.2f\n", {t1,t2}) if getc(0) then end if
Seems about the same 20-fold faster, but uses flags not indexes. Regards, Pete
9. Re: Subsets
- Posted by Juergen Luethje <j.lue at gmx.de> Jun 18, 2007
- 567 views
- Last edited Jun 19, 2007
Pete Lomax wrote: > Juergen Luethje wrote: >> >> Pete, >> >> here is the fastest binary solution which I have found so far. >> It is about 20 times slower than the other function (after it has been >> improved according to your suggestion). > This is what I meant, perhaps I sould have called it a boolean array: [code snipped] > Seems about the same 20-fold faster, but uses flags not indexes. Cool. Thanks! Regards, Juergen