1. Some Interesting OpenSource Projects

The reason why I created the Phix dockers images is that I'm involved in two OpenSource projects, and I wanted a consistent, build-reproducible docker image for Phix:

The "Sample Programs" repository is a set of 37 different projects implemented in lots of different languages. These problems vary in difficulty:

  • "Hello, World!" and other simple text display
  • File I/O
  • Quine
  • Mathematical operations
  • Searching
  • Sorting
  • Roman Numerals
  • Graph Theory
  • ...

For a comprehensive list, see the Sample Programs Website. I have implemented all of the Euphoria projects, but all but 2 of the Phix projects are up for grabs. Feel free to do a PR. I'm on the core team, so I can review your work. Also, if you see something in my Euphoria solutions that can be improved upon (other than fixing typos), feel free do to a PR for this, but please explain why the change improves the code. Please make sure to review the Contributing Guide before making a PR.

The "Primes" repository is about trying to do the fastest implementation of the Sieve of Eratosthenes in a given language. I've already contributed a solution in Euphoria and Phix. Feel free to do a PR for your own solution, but please review the Contributing Guide first. I'm not an admin for this project, but the admins are friendly and helpful.

new topic     » topic index » view message » categorize

2. Re: Some Interesting OpenSource Projects

I've reworked your primes/Phix solution, made it a wee bit faster, but only by about 30-fold...
I reached the dizzying heights of 4436 here, up from 121.

enum SIEVE_SIZE, NUM_BITS, SIEVE 
constant STDOUT = 1 
 
without type_check 
constant BIT8_TABLE = x"",  -- (or use power(2,xx) as commented out below) 
     NOT_BIT8_TABLE = x""   (erm, see the PS for the actual values needed) 
 
function set_bits(integer num_bits) 
    -- create a binary string of precisely num-bits 1s 
    integer num_bytes = floor((num_bits + 7) / 8), 
        leftover_bits = 1 + and_bits(num_bits - 1, 0x7), 
        leftover_byte = BIT8_TABLE[leftover_bits] * 2 - 1 
--      leftover_byte = power(2,leftover_bits)-1 
    return repeatch('\xFF',num_bytes-1) & leftover_byte 
end function 
 
function clear_bits(sequence this, integer num_bits, integer start, integer step) 
    integer byte_num = 1 + floor((start - 1) / 8), 
             bit_pos = 1 + and_bits(start - 1, 0x7), 
            byte_inc = floor(step / 8), 
             bit_inc = and_bits(step, 0x7) 
    for k = start to num_bits by step do 
        this[byte_num] = and_bits(this[byte_num], NOT_BIT8_TABLE[bit_pos]) 
--      this[byte_num] = and_bits(this[byte_num], not_bits(power(2,bit_pos-1))) 
        byte_num += byte_inc 
         bit_pos +=  bit_inc 
        if bit_pos > 8 then 
            bit_pos -= 8 
            byte_num += 1 
        end if 
    end for 
    return this 
end function 
 
function get_bit(sequence sieve, integer bit) 
    integer byte_pos = 1 + floor((bit - 1) / 8), 
             bit_pos = 1 + and_bits(bit - 1, 0x7) 
    return and_bits(sieve[byte_pos], BIT8_TABLE[bit_pos]) 
--  return and_bits(sieve[byte_pos], power(2,bit_pos-1)) 
end function 
 
function run_sieve(integer sieve_size) 
    integer num_bits = floor((sieve_size - 1) / 2) 
    sequence sieve = set_bits(num_bits) 
    integer q = floor(sqrt(sieve_size) / 2) 
    -- 
    -- we start off with a sieve of all 1s, and (should) end up with 
    --    < byte2>< byte1> 
    --  0b0110010110110111  -- (little-endian, 33 first, 3 last) 
    --     19  3 97 31 753  -- ie 3,5,7,11,13,17,19,23,29,31 prime 
    --    3  75 1  5  9     --      and 9,15,21,25,27,33 not prime 
    --   30>< 20>< 10> 
    -- (as shown the first byte of sieve contains the flags for 3..17) 
    -- 
    -- Of course this is complete madness, no-one would ever want to 
    -- code like this on a day-to-day basis, but performance-critical 
    -- things can be done the once and then forgotten about. 
    -- 
    -- I already knew the algorithm, though not used an odd-only bitmap. 
    -- I figured out which byte/bit needed testing and let my brain stew 
    -- on it for a couple of days before drumming up this little monster. 
    -- 
    integer step -- (needed due to v.high reg pressure, 64-bit uses r9)  
    #ilASM{  
            [32]  
                xor edi,edi     -- for bit=1 to q do (using 0-based idx)  
                mov esi,[sieve]  
                shl esi,2       -- esi := raw_address(sieve)  
              ::bitloop  
                    -- if sieve[bit] then  
                    -- edi (p) ax/dl  test   
                    --  0   3   0/01  true  
                    --  1   5   0/02  true  
                    --  2   7   0/04  true  
                    --  3   9   0/08  false  
                    --  4  11   0/10  true  
                    --  5  13   0/20  true  
                    --  6  15   0/40  false  
                    --  7  17   0/80  true  
                    --  8  19   1/01  true  
                    --  9  21   1/02  false  
                    mov eax,edi  
                    mov dl,1  
                    mov cl,al  
                    shr eax,3  
                    and cl,7    -- edi -> byte, bit  
                    shl dl,cl   -- bit -> mask  
                    test byte[esi+eax],dl  
                    jz :notset  
                        -- clear_bits:  
                        push edi  
                        -- for edx=p*p to num_bits by p do (0/odd-only) 
                        --  sieve[edx] = 0  
                        -- edx step (ea/dx) al/cl (p) bl 
                        --  0   3     9/3    0/3   9  08 
                        --              6    0/6  15  40 
                        --              9    1/1  21  02 
                        --              12   1/4  27  10 
                        --              15   1/7  33  80 
                        --              18   2/2  39  04 
                        --  1   5    25/11   1/3  25  08 
                        --              16   2/0  35  01 
                        --              21   2/5  45  20 
                        --              26   3/2  55  04 
                        --  2   7    49/23   2/7  49  80 
                        --  3   n/a 
                        --  4  11   121/60  ... 
                        lea ecx,[ebx+edi*2+3]   -- step 
                        mov eax,ecx 
                        mul ecx 
                        mov [step],ecx          -- (regs v scarce here!) 
                        shr eax,1 
                        sub eax,1               -- (0-based) 
                        mov edx,eax             -- start 
 
                      ::innerloop 
                        mov cl,al 
                        shr eax,3 
                        and cl,7                -- -> byte,bit 
                        mov bl,1 
                        shl bl,cl 
                        not bl                  -- bit -> not mask 
                        mov ecx,[step] 
                        add edx,ecx 
                        and byte[esi+eax],bl 
                        mov eax,edx 
                        cmp edx,[num_bits] 
                        jle :innerloop 
 
                        pop edi 
                        xor ebx,ebx -- important!! 
              ::notset           
                add edi,1 
                cmp edi,[q] 
                jle :bitloop 
            [64] 
                xor rdi,rdi     -- for bit=1 to q do (using 0-based idx) 
                mov rsi,[sieve] 
                shl rsi,2       -- rsi := raw_address(sieve) 
              ::bitloop 
                    -- if sieve[bit] then (as above) 
                    mov rax,rdi 
                    mov dl,1 
                    mov cl,al 
                    shr rax,3 
                    and cl,7 
                    shl dl,cl 
                    test byte[rsi+rax],dl 
                    jz :notset 
                        -- clear_bits: (as above) 
                        push rdi 
                        lea rcx,[rbx+rdi*2+3]   -- step 
                        mov rax,rcx 
                        mul rcx 
                        mov r9,rcx              -- (save step) 
                        shr rax,1               -- start 
                        sub rax,1               -- (0-based) 
                        mov rdx,rax             -- start/k 
                        mov r10,[num_bits]      -- (may as well) 
 
                      ::innerloop 
                        mov cl,al 
                        shr rax,3 
                        and cl,7                -- -> byte,bit 
                        mov bl,1 
                        shl bl,cl 
                        not bl 
                        add rdx,r9              -- + step 
                        and byte[rsi+rax],bl 
                        mov rax,rdx 
                        cmp rdx,r10 
                        jle :innerloop 
 
                        pop rdi 
                        xor rbx,rbx -- important!! 
              ::notset           
                add rdi,1 
                cmp rdi,[q] 
                jle :bitloop 
            [] 
          } 
    return {sieve_size, num_bits, sieve} 
end function 
 
function count_primes(sequence this, bool show_results=FALSE) 
    integer count = 1, count2 = 1 
    if show_results then 
        printf(STDOUT, "2, ") 
    end if 
 
    for bit=1 to this[NUM_BITS] do 
        if get_bit(this[SIEVE], bit) then 
            count += 1 
            if show_results then 
                printf(STDOUT, "%d, ", 2 * bit + 1) 
            end if 
        end if 
    end for 
    -- or, maybe: 
    for b in this[SIEVE] do count2 += count_bits(b) end for 
    assert(count=count2) 
    -- or even: (just showing off now, ain't I?) 
    assert(count == 1+sum(apply(this[SIEVE],count_bits))) 
 
    if show_results then 
        printf(STDOUT, "\n") 
    end if 
 
    return count 
end function 
 
-- Erm, this might deserve re-testing?? 
-- Cannot support sieve size greater than 1 billion -- 
constant {sizes,expected} = columnize({{10,4}, 
                                       {100,25}, 
                                       {1_000,168}, 
                                       {10_000,1229}, 
                                       {100_000,9592}, 
                                       {1_000_000,78498}, 
                                       {10_000_000,664579}, 
                                       {100_000_000,5761455}, 
                                       {1_000_000_000,50847534}}) 
 
function validate_results(sequence this) 
    integer k = find(this[SIEVE_SIZE],sizes) 
    assert(k!=0,"Invalid sieve size") 
    integer expected_count = expected[k] 
    return count_primes(this) = expected_count 
end function 
 
procedure print_results(sequence this, integer show_results, atom duration, integer passes) 
    printf( 
        STDOUT, 
        "Passes: %d, Time: %.8f, Avg: %.8f, Limit: %d, Count: %d, Valid: %s\n", 
        { 
            passes, 
            duration, 
            duration / passes, 
            this[SIEVE_SIZE], 
            count_primes(this, show_results), 
            iff(validate_results(this), "true", "false") 
        } 
    ) 
    printf( 
        STDOUT, 
        "\nrzuckerm;%d;%.8f;1;algorithm=base,faithful=yes,bits=1\n", 
        {passes, duration} 
    ) 
end procedure 
 
procedure main() 
    -- should these be obtained from command_line()?? 
    integer n = 1_000_000, 
            show_results = false 
    atom start = time(), passes = 0 
    sequence sieve 
    while 1 do 
        passes += 1 
        sieve = run_sieve(n) 
        atom duration = time() - start 
        if duration >= 5 then 
            print_results(sieve, show_results, duration, passes) 
            exit 
        end if 
    end while 
end procedure 
 
main() 

Note I'm not planning on posting this anywhere else, that'd be your job.
There are a couple of queries: I'll leave it up to you to decide whether
to keep BIT8_TABLE or use power(2,xx), and your limit of 1 billion might
deserve to be quickly re-tested (I've not even tried).

PS: BIT8_TABLE should read x"0102040810204080" and NOT_BIT8_TABLE = x"FEFDFBF7EFDFBF7F",
no idea why that does not show up correctly inside eucode tags.

new topic     » goto parent     » topic index » view message » categorize

3. Re: Some Interesting OpenSource Projects

petelomax said...

I've reworked your primes/Phix solution, made it a wee bit faster, but only by about 30-fold...
I reached the dizzying heights of 4436 here, up from 121.

Might be educational to read how you optimized the code, in a side-by-side, of why the original code ran slowly, and why your's runs faster. I mean, if you have a spare moment. On the overall big view, not the nitty gritty, unless you have written a book on it? Is your magic Phix-specific, general good programming practice, etc?

Kat

new topic     » goto parent     » topic index » view message » categorize

4. Re: Some Interesting OpenSource Projects

I already knew the algorithm, though not used an odd-only bitmap.
I figured out which byte/bit needed testing and let my brain stew on it for a couple of days before drumming up that little monster.

The performance of Phix suffers a little for subscripts (and to a slightly lesser extent floating point numbers).
The simple reason is that Phix does not have a proper register allocator yet, though one is planned for 2.0.0.
In particular, "s[i] op= x" potentially suffers an agi stall on both the fetch and store, which means 20-30 clocks
instead of 1 were ([shifted] ref)s already in a register. Some tight inner loops are ~8* slower than say gcc manages.
There is also no "with inline", and that many extra call/return can also prove a bit costly, relatively speaking.

Inline assembly like that is pretty much an almost/quite probably exclusively Phix-only thing.
(One exception I know for certain is that C++ and probably some plain C compilers have "intrinsics".)
For the builtin primitives, such as length() and sqrt(), there is simply no choice but to do it that way.
It is not normally worth the effort, though I might yet adopt some of that code into builtins\primes.e

new topic     » goto parent     » topic index » view message » categorize

5. Re: Some Interesting OpenSource Projects

@petelomax Thanks for the interesting solution, but you can do a PR for a second solution in a PrimePhix/solution_2 directory. However, I'm not sure this is faithful to the original intent since it has assembly code in it.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Some Interesting OpenSource Projects

rzuckerm said...

@petelomax Thanks for the interesting solution, but you can do a PR for a second solution in a PrimePhix/solution_2 directory. However, I'm not sure this is faithful to the original intent since it has assembly code in it.

No.

As I understand it, all I would have to do is find the original repository, fork it to my github account, open that, download it, modify it, go back to the original and find their "create a PR document", read that and follow every detail and run all the tests they require, probably involving installing a whole bunch of other stuff, fire up github desktop, find/add my local copy, upload that, push it, then open my github account online, find the create PR button, click that, add all the details they need, and then I'm done. In contrast, you've already got at least 9/10ths of that set up. It's one file.

As well as all that, there are a couple of choices I expect you to make, in particular the power(2) or BIT8_TABLE thing. The other one is testing the 1 billion limit.

As to inline assembly, it is part of the language, and besides which, do you seriously think no other entries are doing something like that? Some are in fact pure asm. Just be open and honest about it.

new topic     » goto parent     » topic index » view message » categorize

7. Re: Some Interesting OpenSource Projects

@petelomax

I just checked with the maintainers of the Primes project, and they said that there is a "mixed" category for solutions that contain more than one language. When I have time, I'll do a PR for your solution, and I'll give you credit for it.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu