RE: Inner loop compiler

new topic     » goto parent     » topic index » view thread      » older message » newer message

Hello Barbarella,

Here are some more ideas I had on what to implement first in a simple 
compiler:

  Functions and Procedures.  Parameters are passed on the stack, using 
push instructions.  Space for local variables is created by moving the 
stack pointer.  Parameters and local variables are accessed by reading 
memory at the stack pointer + an offset.  Recursion should be allowed - 
that's why we have a stack.  Function results should be returned in the 
eax register.  No registers need to be saved before calling subroutines, 
but a subroutine may modify any register except the stack pointer.  
(Ideas for later implementation: allow complex calling conventions like 
passing parameters in registers, store local variables in registers, 
etc)

  Global Variables.  Global variables are simply a pointer to someplace 
allocated on the heap.  (Variables can be optimized to be stored as 
registers later.)

  Data Types.  Let's stick with 32-bit integers for now.  (We can add 
floating point and arrays later.)  (We could also probably implement 
arrays just using peek/poke.)

  Simple Assignment and Arithmetic.  Assignments will be initially 
limited to "x = y" where y is another variable or a literal value.  
Arithmetic can only be done on existing variables, in the form "x += y". 
 (We can add more arithmetic operations later.)

  If and While statements.  Each statement may have only one 
comparision.  Comparisons can have the form "x < y" where x or y may be 
a variable or a literal value.  (For loops can be implemented using 
while loops.  From what I've heard, the first versions of Euphoria 
didn't have for loops for that reason.)

This seems like a good place to start.  Not too complicated, but 
powerful enough to do some simple tasks.

Now, for the intermediate representation, a tree structure would 
probably be appropriate.  The parser would take the text representation 
of the program (the source code) and convert it to an intermediate 
representation.  This intermediate representation is where the 
optimization would take place, but we'll leave that out for now.  Then 
the code generation would use the optimized intermediate representation 
to create the actual machine code.

Each statement is a sequence.  A block of code will be a sequence of 
statements.  Statement sequences may contain expression sequences as 
well.  (David Cuny loves stuff like this.)

Here's an example:

        -->  {
x = 1   -->   {"assign", "x", {"literal", 1}},
y = x   -->   {"assign", "y", {"var", "x"}},
if y = 2 then --> {"if", {"equal", {"var", "y"}, {"literal", 2}},{
 y += 1 -->           {"addto", "y", {"literal", 1}}
else    -->       },{
 y = 0  -->           {"assign", "y", {"literal", 0}}
end if  -->   }}
        -->  }

The code example is stupid, but I hope it illustrates how the 
translation process works.  Each statement sequence has what kind of 
statement as its first element ("assign", "if", "addto", etc).  
Expression statements work the same way ("literal", "var", "equal").  We 
could have taken a shortcut for expressions and left off the literal/var 
part, but this way we'll be able to produce more complex expressions 
later, like: {"+", {"*", {"literal", 5}, {"var "y"}}, {"var", "x"}} 
which is "5 * y + x"


Here's the assembly code representations for the statements I think 
we'll need:

        (local)                 (global)
x = y
        mov eax, [esp-@y]       mov eax, [@y]
        mov [esp-@x], eax       mov [@x], eax

x = c
        mov [esp-@x], c         mov [@x], c

poke4(x, y)
        mov eax, [esp-@y]       mov eax, [@y]
        mov ebx, [esp-@x]       mov ebx, [@x]
        mov [ebx], eax          mov [ebx], eax

poke4(x, c)
        mov ebx, [esp-@x]       mov ebx, [@x]
        mov [ebx], c            mov [ebx], c

poke4(c, y)
        mov eax, [esp-@y]       mov eax, [@y]
        mov [c], eax            mov [c], eax

poke4(c1, c2)
        mov [c1], c2

x = peek4(c)
        mov eax, [c]            mov eax, [c]
        mov [esp-@x], eax       mov [@x], eax

x = peek4(y)
        mov ebx, [esp-@y]       mov ebx, [@y]
        mov eax, [ebx]          mov eax, [ebx]
        mov [esp-@x], eax       mov [@x], eax

x += c
        add [esp-@x], c         add [@x], c

x += y
        mov eax, [esp-@y]       mov eax, [@y]
        add [esp-@x], eax       add [@x], eax

proc(...)
        push ...
        call proc
        add esp, #args

x = func(...)
        push ...
        call func
        add esp, #args
        mov [esp-@x], eax       mov [@x], eax

return x
        mov eax, [esp-@x]       mov eax, [@x]
        ret

return c
        mov eax, c
        ret

while x OP c do
        cmp [esp-@x], c         cmp [@x], c
        jnOP end
        top:
        ...
        cmp [esp-@x], c         cmp [@x], c
        jOP top
        end:

while x OP y do
        mov eax, [esp-@y]       mov eax, [@y]
        cmp [esp-@x], eax       cmp [@x], eax
        [as above]

if x OP y then ... else ... end if
        mov eax, [esp-@y]       mov eax, [@y]
        cmp [esp-@x], eax       cmp [@x], eax
        jnOP else
        ...
        jmp end
        else:
        ...
        end:


Comments?  Suggestions?  Clarifications?

Or "shut up man and start coding"?  grin

Pete

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu