Re: Writing an assembler

new topic     » goto parent     » topic index » view thread      » older message » newer message

Pete Lomax wrote:
> 
> Well, I got a mad idea about writing an assembler (one to create exe files)
> in Eu. However, tricky stuff like mod r/m aside, there is one point I really
> cannot wrap my head around. Suppose you have:
> 
> jz l9
> ...
> jz l8
> ....
> jz l1
> ...
> l1:
> ....
> l8:
> ...
> l9:
> ...
> jmp l5
> 
> Now l9,8,7... are forward references so you'd be wise to assume they will need
> a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can
> be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can
> too.
> 
> My question is how do you keep track of where all these bytes are going to be?
> When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?
> 
> I think/hope this is a generic problem-solving question:
> How do you 'move' multiple things like this at once?
> Or, just being told exactly what info I really need to keep might make 1p
> drop.
> 
> I guess a similar question is that if I have say:
> "some that some that some that"
> and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want
> those
> indexes changed to {6,13,20}. [I may replace instances in any order and
> consider
> that those numbers may be dispersed a bit more, and there may well be
> literally
> many thousands of such inter/independent effects].
> 
> Hoping for a hint,
> Pete

I had that same mad idea a few years ago, and dropped it because of the 300 
statement limit at that time. I still have some code on my older computer.

The problem was more general than jumps, since there could be unresolved 
labels in expressions as well. After all, a forward reference is yet 
another unresolved symbol.

I had devised the following database like system:
* in pass 1, decode all text, and create a record per instruction. Among 
other things, that record has a minimal address, a min code size and max code 
size, a max address. Any jump is treated as unresolved.

* At the same time, build another table of unresolved symbols, with a list
of references and an obstruction code, ie indication of why it's unresolved.
A symbol resolved at pass 1 (forward label) would be removed from the table 
and resolved.

If a jump length could be assessed as either <128 or >=128, the symbol is still 
unresolved, but with a different code (value unknown, size known). Same approach
for other issues, like using #83 or #81 for ADD, encode 1 or 4 byte offset etc.

* In pass 2, loop through the obstruction tableand look for records that
are obstructions but have no obstructions, resolve them. This way, bar any 
cyclical obstruction, the list would get empty. I think I just errorred out
 on any cyclical obstruction.

* In pass 3, all instruction fields have their actual code complete, just 
poke 'em.

The woed "database" doesn't imply I was using database.e. It's the abstract 
database object that I was using.

All this not tested, because that was the time I started coding in C++ in the 
OpenEu framework.

The stack scheme Rob exposed doesn't seem too far from this. 

CChris

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu