RE: Inner loop compiler
- Posted by Pete E <xseal at harborside.com> Sep 05, 2002
- 651 views
Hello Barbarella, Here are some more ideas I had on what to implement first in a simple compiler: Functions and Procedures. Parameters are passed on the stack, using push instructions. Space for local variables is created by moving the stack pointer. Parameters and local variables are accessed by reading memory at the stack pointer + an offset. Recursion should be allowed - that's why we have a stack. Function results should be returned in the eax register. No registers need to be saved before calling subroutines, but a subroutine may modify any register except the stack pointer. (Ideas for later implementation: allow complex calling conventions like passing parameters in registers, store local variables in registers, etc) Global Variables. Global variables are simply a pointer to someplace allocated on the heap. (Variables can be optimized to be stored as registers later.) Data Types. Let's stick with 32-bit integers for now. (We can add floating point and arrays later.) (We could also probably implement arrays just using peek/poke.) Simple Assignment and Arithmetic. Assignments will be initially limited to "x = y" where y is another variable or a literal value. Arithmetic can only be done on existing variables, in the form "x += y". (We can add more arithmetic operations later.) If and While statements. Each statement may have only one comparision. Comparisons can have the form "x < y" where x or y may be a variable or a literal value. (For loops can be implemented using while loops. From what I've heard, the first versions of Euphoria didn't have for loops for that reason.) This seems like a good place to start. Not too complicated, but powerful enough to do some simple tasks. Now, for the intermediate representation, a tree structure would probably be appropriate. The parser would take the text representation of the program (the source code) and convert it to an intermediate representation. This intermediate representation is where the optimization would take place, but we'll leave that out for now. Then the code generation would use the optimized intermediate representation to create the actual machine code. Each statement is a sequence. A block of code will be a sequence of statements. Statement sequences may contain expression sequences as well. (David Cuny loves stuff like this.) Here's an example: --> { x = 1 --> {"assign", "x", {"literal", 1}}, y = x --> {"assign", "y", {"var", "x"}}, if y = 2 then --> {"if", {"equal", {"var", "y"}, {"literal", 2}},{ y += 1 --> {"addto", "y", {"literal", 1}} else --> },{ y = 0 --> {"assign", "y", {"literal", 0}} end if --> }} --> } The code example is stupid, but I hope it illustrates how the translation process works. Each statement sequence has what kind of statement as its first element ("assign", "if", "addto", etc). Expression statements work the same way ("literal", "var", "equal"). We could have taken a shortcut for expressions and left off the literal/var part, but this way we'll be able to produce more complex expressions later, like: {"+", {"*", {"literal", 5}, {"var "y"}}, {"var", "x"}} which is "5 * y + x" Here's the assembly code representations for the statements I think we'll need: (local) (global) x = y mov eax, [esp-@y] mov eax, [@y] mov [esp-@x], eax mov [@x], eax x = c mov [esp-@x], c mov [@x], c poke4(x, y) mov eax, [esp-@y] mov eax, [@y] mov ebx, [esp-@x] mov ebx, [@x] mov [ebx], eax mov [ebx], eax poke4(x, c) mov ebx, [esp-@x] mov ebx, [@x] mov [ebx], c mov [ebx], c poke4(c, y) mov eax, [esp-@y] mov eax, [@y] mov [c], eax mov [c], eax poke4(c1, c2) mov [c1], c2 x = peek4(c) mov eax, [c] mov eax, [c] mov [esp-@x], eax mov [@x], eax x = peek4(y) mov ebx, [esp-@y] mov ebx, [@y] mov eax, [ebx] mov eax, [ebx] mov [esp-@x], eax mov [@x], eax x += c add [esp-@x], c add [@x], c x += y mov eax, [esp-@y] mov eax, [@y] add [esp-@x], eax add [@x], eax proc(...) push ... call proc add esp, #args x = func(...) push ... call func add esp, #args mov [esp-@x], eax mov [@x], eax return x mov eax, [esp-@x] mov eax, [@x] ret return c mov eax, c ret while x OP c do cmp [esp-@x], c cmp [@x], c jnOP end top: ... cmp [esp-@x], c cmp [@x], c jOP top end: while x OP y do mov eax, [esp-@y] mov eax, [@y] cmp [esp-@x], eax cmp [@x], eax [as above] if x OP y then ... else ... end if mov eax, [esp-@y] mov eax, [@y] cmp [esp-@x], eax cmp [@x], eax jnOP else ... jmp end else: ... end: Comments? Suggestions? Clarifications? Or "shut up man and start coding"? Pete