Re: Writing an assembler

new topic     » goto parent     » topic index » view thread      » older message » newer message

Pete Lomax wrote:
> Well, I got a mad idea about writing an assembler (one to create exe files)
> in Eu. However, tricky stuff like mod r/m aside, there is one point I really
> cannot wrap my head around. Suppose you have:
> 
> jz l9
> ...
> jz l8
> ....
> jz l1
> ...
> l1:
> ....
> l8:
> ...
> l9:
> ...
> jmp l5
> 
> Now l9,8,7... are forward references so you'd be wise to assume they will need
> a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can
> be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can
> too.
> 
> My question is how do you keep track of where all these bytes are going to be?
> When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?
> 
> I think/hope this is a generic problem-solving question:
> How do you 'move' multiple things like this at once?
> Or, just being told exactly what info I really need to keep might make 1p
> drop.
> 
> I guess a similar question is that if I have say:
> "some that some that some that"
> and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want
> those
> indexes changed to {6,13,20}. [I may replace instances in any order and
> consider
> that those numbers may be dispersed a bit more, and there may well be
> literally
> many thousands of such inter/independent effects].
> 
> Hoping for a hint,
> Pete

Wow. What memories!
Back in 1982, I worked at a company (now defunct)
called AES. Along with another guy (Al Matsuoka are you out there?),
I wrote an assembler for the Zilog Z8000.
(Why didn't they just buy an assembler? It's a long story.
They also built their own C compiler.)

It was a 2-pass assembler. It gathered info on
pass 1 (scanning/parsing), and emitted machine code on pass 2.
Between passes, among other things, it optimized branches, 
to try to get as many short branches as possible. 

One of my tasks was to write the branch optimizer. I still have the
C source code for the assembler (dusty paper listing), 
and I just dug it out from a pile of memorabilia
that I haven't looked at in years. I won't give you the source,
but I can see from my old comments roughly how it worked. 
Apparently I was inspired by an article in CACM Vol 21 No 4 April 1978.
I don't think my code does a provably optimal job,
but it's fairly simple. Reading my ancient code ...

  * assume initially that all branches can be short

  * Set up 2 stacks (Euphoria sequences will do). 
    One for assumed legal branch instructions,
    and one for known illegals.

  * Start by going through and finding instructions with
    illegal branches (distance to the label is too far).
    This will populate your two stacks.

  * iterate until done (no more illegals):

    for each illegal in the illegal stack, convert it to a long 
    (now legal) branch. Propagate it's new long length
    through the stack of assumed legals, updating branch distances 
    and possibly creating new illegal short branches that you push
    onto the illegal stack.

    Eventually the illegal stack will be empty, and you'll be left with
    only legals (i.e. ones that can actually be made short branches,
    or ones that had to be converted to long branches). 

Now you can start pass 2, knowing which way (long or short) 
to emit each branch instruction.

I think I just used "stacks", because I was programming in C
and I could neatly allocate memory and run the stacks in
the opposite direction towards each other, since the total size
of the two stacks was constant. Euphoria sequences should be fine. 
I don't think last-in/first-out really mattered.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu