### 1. Some Interesting OpenSource Projects

- Posted by rzuckerm Apr 23, 2023
- 1187 views

The reason why I created the Phix dockers images is that I'm involved in two OpenSource projects, and I wanted a consistent, build-reproducible docker image for Phix:

The "Sample Programs" repository is a set of 37 different projects implemented in lots of different languages. These problems vary in difficulty:

- "Hello, World!" and other simple text display
- File I/O
- Quine
- Mathematical operations
- Searching
- Sorting
- Roman Numerals
- Graph Theory
- ...

For a comprehensive list, see the Sample Programs Website. I have implemented all of the Euphoria projects, but all but 2 of the Phix projects are up for grabs. Feel free to do a PR. I'm on the core team, so I can review your work. Also, if you see something in my Euphoria solutions that can be improved upon (other than fixing typos), feel free do to a PR for this, but please explain why the change improves the code. Please make sure to review the Contributing Guide before making a PR.

The "Primes" repository is about trying to do the fastest implementation of the Sieve of Eratosthenes in a given language. I've already contributed a solution in Euphoria and Phix. Feel free to do a PR for your own solution, but please review the Contributing Guide first. I'm not an admin for this project, but the admins are friendly and helpful.

### 2. Re: Some Interesting OpenSource Projects

- Posted by petelomax May 01, 2023
- 1080 views
- Last edited May 03, 2023

I've reworked your primes/Phix solution, made it a wee bit faster, but only by about 30-fold...

I reached the dizzying heights of 4436 here, up from 121.

enum SIEVE_SIZE, NUM_BITS, SIEVE constant STDOUT = 1 without type_check constant BIT8_TABLE = x"", -- (or use power(2,xx) as commented out below) NOT_BIT8_TABLE = x"" (erm, see the PS for the actual values needed) function set_bits(integer num_bits) -- create a binary string of precisely num-bits 1s integer num_bytes = floor((num_bits + 7) / 8), leftover_bits = 1 + and_bits(num_bits - 1, 0x7), leftover_byte = BIT8_TABLE[leftover_bits] * 2 - 1 -- leftover_byte = power(2,leftover_bits)-1 return repeatch('\xFF',num_bytes-1) & leftover_byte end function function clear_bits(sequence this, integer num_bits, integer start, integer step) integer byte_num = 1 + floor((start - 1) / 8), bit_pos = 1 + and_bits(start - 1, 0x7), byte_inc = floor(step / 8), bit_inc = and_bits(step, 0x7) for k = start to num_bits by step do this[byte_num] = and_bits(this[byte_num], NOT_BIT8_TABLE[bit_pos]) -- this[byte_num] = and_bits(this[byte_num], not_bits(power(2,bit_pos-1))) byte_num += byte_inc bit_pos += bit_inc if bit_pos > 8 then bit_pos -= 8 byte_num += 1 end if end for return this end function function get_bit(sequence sieve, integer bit) integer byte_pos = 1 + floor((bit - 1) / 8), bit_pos = 1 + and_bits(bit - 1, 0x7) return and_bits(sieve[byte_pos], BIT8_TABLE[bit_pos]) -- return and_bits(sieve[byte_pos], power(2,bit_pos-1)) end function function run_sieve(integer sieve_size) integer num_bits = floor((sieve_size - 1) / 2) sequence sieve = set_bits(num_bits) integer q = floor(sqrt(sieve_size) / 2) -- -- we start off with a sieve of all 1s, and (should) end up with -- < byte2>< byte1> -- 0b0110010110110111 -- (little-endian, 33 first, 3 last) -- 19 3 97 31 753 -- ie 3,5,7,11,13,17,19,23,29,31 prime -- 3 75 1 5 9 -- and 9,15,21,25,27,33 not prime -- 30>< 20>< 10> -- (as shown the first byte of sieve contains the flags for 3..17) -- -- Of course this is complete madness, no-one would ever want to -- code like this on a day-to-day basis, but performance-critical -- things can be done the once and then forgotten about. -- -- I already knew the algorithm, though not used an odd-only bitmap. -- I figured out which byte/bit needed testing and let my brain stew -- on it for a couple of days before drumming up this little monster. -- integer step -- (needed due to v.high reg pressure, 64-bit uses r9) #ilASM{ [32] xor edi,edi -- for bit=1 to q do (using 0-based idx) mov esi,[sieve] shl esi,2 -- esi := raw_address(sieve) ::bitloop -- if sieve[bit] then -- edi (p) ax/dl test -- 0 3 0/01 true -- 1 5 0/02 true -- 2 7 0/04 true -- 3 9 0/08 false -- 4 11 0/10 true -- 5 13 0/20 true -- 6 15 0/40 false -- 7 17 0/80 true -- 8 19 1/01 true -- 9 21 1/02 false mov eax,edi mov dl,1 mov cl,al shr eax,3 and cl,7 -- edi -> byte, bit shl dl,cl -- bit -> mask test byte[esi+eax],dl jz :notset -- clear_bits: push edi -- for edx=p*p to num_bits by p do (0/odd-only) -- sieve[edx] = 0 -- edx step (ea/dx) al/cl (p) bl -- 0 3 9/3 0/3 9 08 -- 6 0/6 15 40 -- 9 1/1 21 02 -- 12 1/4 27 10 -- 15 1/7 33 80 -- 18 2/2 39 04 -- 1 5 25/11 1/3 25 08 -- 16 2/0 35 01 -- 21 2/5 45 20 -- 26 3/2 55 04 -- 2 7 49/23 2/7 49 80 -- 3 n/a -- 4 11 121/60 ... lea ecx,[ebx+edi*2+3] -- step mov eax,ecx mul ecx mov [step],ecx -- (regs v scarce here!) shr eax,1 sub eax,1 -- (0-based) mov edx,eax -- start ::innerloop mov cl,al shr eax,3 and cl,7 -- -> byte,bit mov bl,1 shl bl,cl not bl -- bit -> not mask mov ecx,[step] add edx,ecx and byte[esi+eax],bl mov eax,edx cmp edx,[num_bits] jle :innerloop pop edi xor ebx,ebx -- important!! ::notset add edi,1 cmp edi,[q] jle :bitloop [64] xor rdi,rdi -- for bit=1 to q do (using 0-based idx) mov rsi,[sieve] shl rsi,2 -- rsi := raw_address(sieve) ::bitloop -- if sieve[bit] then (as above) mov rax,rdi mov dl,1 mov cl,al shr rax,3 and cl,7 shl dl,cl test byte[rsi+rax],dl jz :notset -- clear_bits: (as above) push rdi lea rcx,[rbx+rdi*2+3] -- step mov rax,rcx mul rcx mov r9,rcx -- (save step) shr rax,1 -- start sub rax,1 -- (0-based) mov rdx,rax -- start/k mov r10,[num_bits] -- (may as well) ::innerloop mov cl,al shr rax,3 and cl,7 -- -> byte,bit mov bl,1 shl bl,cl not bl add rdx,r9 -- + step and byte[rsi+rax],bl mov rax,rdx cmp rdx,r10 jle :innerloop pop rdi xor rbx,rbx -- important!! ::notset add rdi,1 cmp rdi,[q] jle :bitloop [] } return {sieve_size, num_bits, sieve} end function function count_primes(sequence this, bool show_results=FALSE) integer count = 1, count2 = 1 if show_results then printf(STDOUT, "2, ") end if for bit=1 to this[NUM_BITS] do if get_bit(this[SIEVE], bit) then count += 1 if show_results then printf(STDOUT, "%d, ", 2 * bit + 1) end if end if end for -- or, maybe: for b in this[SIEVE] do count2 += count_bits(b) end for assert(count=count2) -- or even: (just showing off now, ain't I?) assert(count == 1+sum(apply(this[SIEVE],count_bits))) if show_results then printf(STDOUT, "\n") end if return count end function -- Erm, this might deserve re-testing?? -- Cannot support sieve size greater than 1 billion -- constant {sizes,expected} = columnize({{10,4}, {100,25}, {1_000,168}, {10_000,1229}, {100_000,9592}, {1_000_000,78498}, {10_000_000,664579}, {100_000_000,5761455}, {1_000_000_000,50847534}}) function validate_results(sequence this) integer k = find(this[SIEVE_SIZE],sizes) assert(k!=0,"Invalid sieve size") integer expected_count = expected[k] return count_primes(this) = expected_count end function procedure print_results(sequence this, integer show_results, atom duration, integer passes) printf( STDOUT, "Passes: %d, Time: %.8f, Avg: %.8f, Limit: %d, Count: %d, Valid: %s\n", { passes, duration, duration / passes, this[SIEVE_SIZE], count_primes(this, show_results), iff(validate_results(this), "true", "false") } ) printf( STDOUT, "\nrzuckerm;%d;%.8f;1;algorithm=base,faithful=yes,bits=1\n", {passes, duration} ) end procedure procedure main() -- should these be obtained from command_line()?? integer n = 1_000_000, show_results = false atom start = time(), passes = 0 sequence sieve while 1 do passes += 1 sieve = run_sieve(n) atom duration = time() - start if duration >= 5 then print_results(sieve, show_results, duration, passes) exit end if end while end procedure main()

Note I'm not planning on posting this anywhere else, that'd be your job.

There are a couple of queries: I'll leave it up to you to decide whether

to keep BIT8_TABLE or use power(2,xx), and your limit of 1 billion might

deserve to be quickly re-tested (I've not even tried).

PS: BIT8_TABLE should read x"0102040810204080" and NOT_BIT8_TABLE = x"FEFDFBF7EFDFBF7F",

no idea why that does not show up correctly inside eucode tags.

### 3. Re: Some Interesting OpenSource Projects

- Posted by katsmeow May 02, 2023
- 1038 views

I've reworked your primes/Phix solution, made it a wee bit faster, but only by about 30-fold...

I reached the dizzying heights of 4436 here, up from 121.

Might be educational to read how you optimized the code, in a side-by-side, of why the original code ran slowly, and why your's runs faster. I mean, if you have a spare moment. On the overall big view, not the nitty gritty, unless you have written a book on it? Is your magic Phix-specific, general good programming practice, etc?

Kat

### 4. Re: Some Interesting OpenSource Projects

- Posted by petelomax May 03, 2023
- 1005 views
- Last edited May 08, 2023

I already knew the algorithm, though not used an odd-only bitmap.

I figured out which byte/bit needed testing and let my brain stew on it for a couple of days before drumming up that little monster.

The performance of Phix suffers a little for subscripts (and to a slightly lesser extent floating point numbers).

The simple reason is that Phix does not have a proper register allocator yet, though one is planned for 2.0.0.

In particular, "s[i] op= x" potentially suffers an agi stall on *both* the fetch and store, which means 20-30 clocks

instead of 1 were ([shifted] ref)s already in a register. Some tight inner loops are ~8* slower than say gcc manages.

There is also no "with inline", and that many extra call/return can also prove a bit costly, relatively speaking.

Inline assembly like that is pretty much an almost/quite probably exclusively Phix-only thing.

(One exception I know for certain is that C++ and probably some plain C compilers have "intrinsics".)

For the builtin primitives, such as length() and sqrt(), there is simply no choice but to do it that way.

It is not normally worth the effort, though I might yet adopt some of that code into builtins\primes.e

### 5. Re: Some Interesting OpenSource Projects

- Posted by rzuckerm May 05, 2023
- 949 views

@petelomax Thanks for the interesting solution, but you can do a PR for a second solution in a PrimePhix/solution_2 directory. However, I'm not sure this is faithful to the original intent since it has assembly code in it.

### 6. Re: Some Interesting OpenSource Projects

- Posted by petelomax May 07, 2023
- 919 views
- Last edited May 08, 2023

@petelomax Thanks for the interesting solution, but you can do a PR for a second solution in a PrimePhix/solution_2 directory. However, I'm not sure this is faithful to the original intent since it has assembly code in it.

No.

As I understand it, all I would have to do is find the original repository, fork it to my github account, open that, download it, modify it, go back to the original and find their "create a PR document", read that and follow every detail and run all the tests they require, probably involving installing a whole bunch of other stuff, fire up github desktop, find/add my local copy, upload that, push it, then open my github account online, find the create PR button, click that, add all the details they need, and then I'm done. In contrast, you've already got at least 9/10ths of that set up. It's one file.

As well as all that, there are a couple of choices I expect you to make, in particular the power(2) or BIT8_TABLE thing. The other one is testing the 1 billion limit.

As to inline assembly, it is part of the language, and besides which, do you seriously think no other entries are doing something like that? Some are in fact pure asm. Just be open and honest about it.

### 7. Re: Some Interesting OpenSource Projects

- Posted by rzuckerm May 09, 2023
- 888 views

@petelomax

I just checked with the maintainers of the Primes project, and they said that there is a "mixed" category for solutions that contain more than one language. When I have time, I'll do a PR for your solution, and I'll give you credit for it.