Re: Writing an assembler
- Posted by CChris <christian.cuvier at agriculture.gouv.fr> Feb 21, 2007
- 496 views
Pete Lomax wrote: > > Well, I got a mad idea about writing an assembler (one to create exe files) > in Eu. However, tricky stuff like mod r/m aside, there is one point I really > cannot wrap my head around. Suppose you have: > > jz l9 > ... > jz l8 > .... > jz l1 > ... > l1: > .... > l8: > ... > l9: > ... > jmp l5 > > Now l9,8,7... are forward references so you'd be wise to assume they will need > a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can > be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can > too. > > My question is how do you keep track of where all these bytes are going to be? > When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5? > > I think/hope this is a generic problem-solving question: > How do you 'move' multiple things like this at once? > Or, just being told exactly what info I really need to keep might make 1p > drop. > > I guess a similar question is that if I have say: > "some that some that some that" > and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want > those > indexes changed to {6,13,20}. [I may replace instances in any order and > consider > that those numbers may be dispersed a bit more, and there may well be > literally > many thousands of such inter/independent effects]. > > Hoping for a hint, > Pete I had that same mad idea a few years ago, and dropped it because of the 300 statement limit at that time. I still have some code on my older computer. The problem was more general than jumps, since there could be unresolved labels in expressions as well. After all, a forward reference is yet another unresolved symbol. I had devised the following database like system: * in pass 1, decode all text, and create a record per instruction. Among other things, that record has a minimal address, a min code size and max code size, a max address. Any jump is treated as unresolved. * At the same time, build another table of unresolved symbols, with a list of references and an obstruction code, ie indication of why it's unresolved. A symbol resolved at pass 1 (forward label) would be removed from the table and resolved. If a jump length could be assessed as either <128 or >=128, the symbol is still unresolved, but with a different code (value unknown, size known). Same approach for other issues, like using #83 or #81 for ADD, encode 1 or 4 byte offset etc. * In pass 2, loop through the obstruction tableand look for records that are obstructions but have no obstructions, resolve them. This way, bar any cyclical obstruction, the list would get empty. I think I just errorred out on any cyclical obstruction. * In pass 3, all instruction fields have their actual code complete, just poke 'em. The woed "database" doesn't imply I was using database.e. It's the abstract database object that I was using. All this not tested, because that was the time I started coding in C++ in the OpenEu framework. The stack scheme Rob exposed doesn't seem too far from this. CChris