1. Subsets

Hi all,

I've written functions for generating (indexes of) all subsets of a given set.
They don't use recursion and are easy to use, and they are not too slow.
However, since a set with n elements has power(2,n) subsets, I'd like to ask
whether maybe someone can give me faster routines. Here are mine:
sequence S
integer N, K

global function first_subset (integer n)
   -- in: n: number of elements in the set
   S = repeat(0, n+1)
   N = n
   K = 1
   return {}
end function

global function next_subset()
   if S[K] < N then
      K += 1
      S[K] = S[K-1] + 1
      return S[2..K]
   elsif K > 2 then
      K -= 1
      S[K] += 1
      return S[2..K]
   else
      return -1
   end if
end function


-- Demo
object subset
integer n

n = 4

-- generate all subsets in order
subset = first_subset(n)
while sequence(subset) do
   ? subset
   subset = next_subset()
end while

Regards,
   Juergen

new topic     » topic index » view message » categorize

2. Re: Subsets

Juergen Luethje wrote:
> 
> Hi all,
> 
> I've written functions for generating (indexes of) all subsets of a given set.
> They don't use recursion and are easy to use, and they are not too slow.
> However, since a set with n elements has power(2,n) subsets, I'd like to ask
> whether maybe someone can give me faster routines. Here are mine:
> }}}
<eucode>
> sequence S
> integer N, K
> 
> global function first_subset (integer n)
>    -- in: n: number of elements in the set
>    S = repeat(0, n+1)
>    N = n
>    K = 1
>    return {}
> end function
> 
> global function next_subset()
>    if S[K] < N then
>       K += 1
>       S[K] = S[K-1] + 1
>       return S[2..K]
>    elsif K > 2 then
>       K -= 1
>       S[K] += 1
>       return S[2..K]
>    else
>       return -1
>    end if
> end function
> 
> 
> -- Demo
> object subset
> integer n
> 
> n = 4
> 
> -- generate all subsets in order
> subset = first_subset(n)
> while sequence(subset) do
>    ? subset
>    subset = next_subset()
> end while
> </eucode>
{{{

> Regards,
>    Juergen

Did you try the routines in http://oedoc.free.fr/Fichiers/ESL/sets.zip ?
I have no idea whether they are faster, but some of them to that precise work.

CChris

new topic     » goto parent     » topic index » view message » categorize

3. Re: Subsets

Juergen Luethje wrote:
> 
> Hi all,
> 
> I've written functions for generating (indexes of) all subsets of a given set.
> They don't use recursion and are easy to use, and they are not too slow.
> However, since a set with n elements has power(2,n) subsets, I'd like to ask
> whether maybe someone can give me faster routines.

My first question is Are all elements unique? If not I'd think of setting up a
"mask" array of elements to include/avoid in the subsets, and that way decimate
(or better) the number of subsets that there are, if that is valid.

My second question is Can you return 1 to signal "iterate through S[2] to S[K]"
as that return S[2..K] statement, or anywhere you create such a slice, is
probably the biggest cost in your code.

I would also try a binary approach, ie return {0,0,1}, then {0,1,0}, then
{0,1,1}, then {1,0,0} etc. Actually that would be my first choice.

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

4. Re: Subsets

CChris wrote:

> Juergen Luethje wrote:
>> 
>> Hi all,
>> 
>> I've written functions for generating (indexes of) all subsets of a given
>> set.
>> They don't use recursion and are easy to use, and they are not too slow.
>> However, since a set with n elements has power(2,n) subsets, I'd like to ask
>> whether maybe someone can give me faster routines. Here are mine:

[code snipped]

> Did you try the routines in <a
> href="http://oedoc.free.fr/Fichiers/ESL/sets.zip">http://oedoc.free.fr/Fichiers/ESL/sets.zip</a>
> ?
> I have no idea whether they are faster, but some of them to that precise work.

Thanks for the suggestion.
I supplemented my code, so that it does the same things than yours,
i.e. collect all subsets in one sequence. My code is clearly faster.

And I encountered a problem with your code. When I used
set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}

I got the message:
   "Your program has run out of memory."
although my machine has 2 GigaBytes RAM and even more virtual memory. smile

This is the reason why whenever possible I try not to write functions that
return a collection of all results in a big sequence, but prefer to use the
find_first()/find_next() approach.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

5. Re: Subsets

Juergen Luethje wrote:
> 
> CChris wrote:
> 
> > Juergen Luethje wrote:
> >> 
> >> Hi all,
> >> 
> >> I've written functions for generating (indexes of) all subsets of a given
> >> set.
> >> They don't use recursion and are easy to use, and they are not too slow.
> >> However, since a set with n elements has power(2,n) subsets, I'd like to
> >> ask
> >> whether maybe someone can give me faster routines. Here are mine:
> 
> [code snipped]
> 
> > Did you try the routines in <a
> > href="http://oedoc.free.fr/Fichiers/ESL/sets.zip">http://oedoc.free.fr/Fichiers/ESL/sets.zip</a>
> ?</font></i>
> > I have no idea whether they are faster, but some of them to that precise
> > work.
> 
> Thanks for the suggestion.
> I supplemented my code, so that it does the same things than yours,
> i.e. collect all subsets in one sequence. My code is clearly faster.
> 
> And I encountered a problem with your code. When I used
> }}}
<eucode>
> set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}
> </eucode>
{{{

> I got the message:
>    "Your program has run out of memory."
> although my machine has 2 GigaBytes RAM and even more virtual memory. smile
> 
> This is the reason why whenever possible I try not to write functions that
> return a collection of all results in a big sequence, but prefer to use the
> find_first()/find_next() approach.
> 
> Regards,
>    Juergen

Ok, thanks for the feedback. I had no idea of what the speed cratio could be,
and had not specifically tried to optimise that at the time.

I believe that, most of the time, you need only a list of some subsets that meet
some criteria, and then collecting the result in a sequence is desirable.
For large sets, some sort of enumerator can be devised using multitasking. That
functionality was only experimental at the time I wrote these routines.

Since concepts like ESL don't seem to attract many people, I do some
advertisement from time to time for http://oedoc.free.fr/Fichiers/ESL/, and
may release some parts on my own terms some day, since the original project
appears to be dead.
Note that locate.e would need a thorough rewrite now that find_from() and
match_from() are available.

CChris

new topic     » goto parent     » topic index » view message » categorize

6. Re: Subsets

Pete Lomax wrote:

> Juergen Luethje wrote:
> 
>> Hi all,
>> 
>> I've written functions for generating (indexes of) all subsets of a given
>> set.
>> They don't use recursion and are easy to use, and they are not too slow.
>> However, since a set with n elements has power(2,n) subsets, I'd like to ask
>> whether maybe someone can give me faster routines.
> 
> My first question is Are all elements unique?

Yes, they are. I mean "sets" in a mathematical sense.

> If not I'd think of setting up
> a "mask" array of elements to include/avoid in the subsets, and that way
> decimate
> (or better) the number of subsets that there are, if that is valid.
> 
> My second question is Can you return 1 to signal "iterate through S[2] to
> S[K]"
> as that return S[2..K] statement, or anywhere you create such a slice, is
> probably
> the biggest cost in your code.

Good idea, thanks! I couldn't see the wood for the trees. smile
Even better, I can return K, so only S must be global (since I want to put these
functions in a library). Normally, I try to avoid global variables, but in this
case
I think it's much better to use a global sequence rather than returning a slice.
This is not only faster, but also consumes less memory. Here is my speed test:
--=============================[ sets.e ]=============================--
integer N, K

------------------------------------------------------------------------

sequence S

global function first_subset (integer n)
   -- in: n: number of elements in the set
   S = repeat(0, n+1)
   N = n
   K = 1
   return {}
end function

global function next_subset()
   if S[K] < N then
      K += 1
      S[K] = S[K-1] + 1
      return S[2..K]
   elsif K > 2 then
      K -= 1
      S[K] += 1
      return S[2..K]
   else
      return -1
   end if
end function

------------------------------------------------------------------------

global sequence Subset_Buffer
-- The program that uses these functions should read Subset_Buffer[2..k].
 
global function first_subset_boundary (integer n)
   -- in: n: number of elements in the set
   Subset_Buffer = repeat(0, n+1)
   N = n
   K = 1
   return K
end function

global function next_subset_boundary()
   if Subset_Buffer[K] < N then
      K += 1
      Subset_Buffer[K] = Subset_Buffer[K-1] + 1
      return K
   elsif K > 2 then
      K -= 1
      Subset_Buffer[K] += 1
      return K
   else
      return -1
   end if
end function


--============================[ main.exw ]============================--
include sets.e

object current_subset
sequence set
atom t
integer n, k

set = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}

n = length(set)

------------------------------------------------------------------------

t = time()
current_subset = first_subset(n)
while sequence(current_subset) do
-- ? current_subset
   current_subset = next_subset()
end while
t = time()-t
? t                       -- 7.85 sec.

puts(1, "------------------------------------\n")

------------------------------------------------------------------------

t = time()
k = first_subset_boundary(n)
while k > -1 do
-- ? Subset_Buffer[2..k]
   k = next_subset_boundary()
end while
t = time()-t
? t                       -- 2.97 sec.
------------------------------------------------------------------------

if getc(0) then end if


> I would also try a binary approach, ie return {0,0,1}, then {0,1,0}, then
> {0,1,1},
> then {1,0,0} etc. Actually that would be my first choice.

I had already made a binary approach, that is 10 times slower than my first
version.
Maybe I can improve the binary approach, but I guess it's hard to beat the new
fast
version here. I'll write about my binary approach(es) in a separate post.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

7. Re: Subsets

Pete,

here is the fastest binary solution which I have found so far.
It is about 20 times slower than the other function (after it has been
improved according to your suggestion).
include machine.e

sequence Subset_Buffer

function unrank_subset (integer n, integer rank)
   -- n: size of the set
   sequence bits
   integer k

   bits = int_to_bits(rank, n)
   k = 0
   for i = 1 to n do
      if bits[i] = 1 then
         k += 1
         Subset_Buffer[k] = i
      end if
   end for
   return k
end function
 

atom t
integer n, size

n = 24

-- generate all subsets
t = time()
Subset_Buffer = repeat(0, n)
for i = 0 to power(2,n)-1 do
   size = unrank_subset(n, i)
--    ? Subset_Buffer[1..size]
end for
t = time()-t
printf(1, "\n%.2f Sek.\n", {t})

if getc(0) then end if

Do you (or someone else) have any more suggestions?

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

8. Re: Subsets

Juergen Luethje wrote:
> 
> Pete,
> 
> here is the fastest binary solution which I have found so far.
> It is about 20 times slower than the other function (after it has been
> improved according to your suggestion).
This is what I meant, perhaps I sould have called it a boolean array:
include machine.e

sequence Subset_Buffer

function unrank_subset (integer n, integer rank)
   -- n: size of the set
   sequence bits
   integer k

   bits = int_to_bits(rank, n)
   k = 0
   for i = 1 to n do
      if bits[i] = 1 then
	 k += 1
	 Subset_Buffer[k] = i
      end if
   end for
   return k
end function

atom t, t1, t2
integer n, size

function getNextSet()
    for i=n to 1 by -1 do
	if Subset_Buffer[i]=0 then
	    Subset_Buffer[i]=1
	    return 1
	end if
	Subset_Buffer[i]=0
    end for
    return 0
end function
 

n = 24

-- generate all subsets
t = time()
Subset_Buffer = repeat(0, n)
for i = 0 to power(2,n)-1 do
   size = unrank_subset(n, i)
--    ? Subset_Buffer[1..size]
end for
t1 = time()-t

t=time()
Subset_Buffer = repeat(0, n)
while 1 do
    if not getNextSet() then exit end if
end while
t2 = time()-t
printf(1, "\nt1=%.2f, t2=%.2f\n", {t1,t2})

if getc(0) then end if


Seems about the same 20-fold faster, but uses flags not indexes.

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: Subsets

Pete Lomax wrote:

> Juergen Luethje wrote:
>> 
>> Pete,
>> 
>> here is the fastest binary solution which I have found so far.
>> It is about 20 times slower than the other function (after it has been
>> improved according to your suggestion).
> This is what I meant, perhaps I sould have called it a boolean array:

[code snipped]

> Seems about the same 20-fold faster, but uses flags not indexes.

Cool. Thanks!

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu