1. Writing an assembler
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Feb 21, 2007
- 538 views
Well, I got a mad idea about writing an assembler (one to create exe files) in Eu. However, tricky stuff like mod r/m aside, there is one point I really cannot wrap my head around. Suppose you have: jz l9 ... jz l8 .... jz l1 ... l1: .... l8: ... l9: ... jmp l5 Now l9,8,7... are forward references so you'd be wise to assume they will need a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can too. My question is how do you keep track of where all these bytes are going to be? When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5? I think/hope this is a generic problem-solving question: How do you 'move' multiple things like this at once? Or, just being told exactly what info I really need to keep might make 1p drop. I guess a similar question is that if I have say: "some that some that some that" and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want those indexes changed to {6,13,20}. [I may replace instances in any order and consider that those numbers may be dispersed a bit more, and there may well be literally many thousands of such inter/independent effects]. Hoping for a hint, Pete
2. Re: Writing an assembler
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Feb 21, 2007
- 525 views
I wrote: Minor correction needed there, replace: > (or 6), how/when do you modify that final jmp l5? with: how/when do you modify eg the first jmp l9? Pete Should have trusted my first instincts there, not modifying stuff on the fly
3. Re: Writing an assembler
- Posted by Robert Craig <rds at rapideuphoria.com> Feb 21, 2007
- 529 views
Pete Lomax wrote: > Well, I got a mad idea about writing an assembler (one to create exe files) > in Eu. However, tricky stuff like mod r/m aside, there is one point I really > cannot wrap my head around. Suppose you have: > > jz l9 > ... > jz l8 > .... > jz l1 > ... > l1: > .... > l8: > ... > l9: > ... > jmp l5 > > Now l9,8,7... are forward references so you'd be wise to assume they will need > a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can > be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can > too. > > My question is how do you keep track of where all these bytes are going to be? > When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5? > > I think/hope this is a generic problem-solving question: > How do you 'move' multiple things like this at once? > Or, just being told exactly what info I really need to keep might make 1p > drop. > > I guess a similar question is that if I have say: > "some that some that some that" > and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want > those > indexes changed to {6,13,20}. [I may replace instances in any order and > consider > that those numbers may be dispersed a bit more, and there may well be > literally > many thousands of such inter/independent effects]. > > Hoping for a hint, > Pete Wow. What memories! Back in 1982, I worked at a company (now defunct) called AES. Along with another guy (Al Matsuoka are you out there?), I wrote an assembler for the Zilog Z8000. (Why didn't they just buy an assembler? It's a long story. They also built their own C compiler.) It was a 2-pass assembler. It gathered info on pass 1 (scanning/parsing), and emitted machine code on pass 2. Between passes, among other things, it optimized branches, to try to get as many short branches as possible. One of my tasks was to write the branch optimizer. I still have the C source code for the assembler (dusty paper listing), and I just dug it out from a pile of memorabilia that I haven't looked at in years. I won't give you the source, but I can see from my old comments roughly how it worked. Apparently I was inspired by an article in CACM Vol 21 No 4 April 1978. I don't think my code does a provably optimal job, but it's fairly simple. Reading my ancient code ... * assume initially that all branches can be short * Set up 2 stacks (Euphoria sequences will do). One for assumed legal branch instructions, and one for known illegals. * Start by going through and finding instructions with illegal branches (distance to the label is too far). This will populate your two stacks. * iterate until done (no more illegals): for each illegal in the illegal stack, convert it to a long (now legal) branch. Propagate it's new long length through the stack of assumed legals, updating branch distances and possibly creating new illegal short branches that you push onto the illegal stack. Eventually the illegal stack will be empty, and you'll be left with only legals (i.e. ones that can actually be made short branches, or ones that had to be converted to long branches). Now you can start pass 2, knowing which way (long or short) to emit each branch instruction. I think I just used "stacks", because I was programming in C and I could neatly allocate memory and run the stacks in the opposite direction towards each other, since the total size of the two stacks was constant. Euphoria sequences should be fine. I don't think last-in/first-out really mattered. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
4. Re: Writing an assembler
- Posted by Al Getz <Xaxo at aol.com> Feb 21, 2007
- 525 views
Robert Craig wrote: > > Pete Lomax wrote: > > Well, I got a mad idea about writing an assembler (one to create exe files) > > in Eu. However, tricky stuff like mod r/m aside, there is one point I really > > cannot wrap my head around. Suppose you have: > > > > jz l9 > > ... > > jz l8 > > .... > > jz l1 > > ... > > l1: > > .... > > l8: > > ... > > l9: > > ... > > jmp l5 > > > > Now l9,8,7... are forward references so you'd be wise to assume they will > > need > > a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can > > be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can > > too. > > > > My question is how do you keep track of where all these bytes are going to > > be? > > When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5? > > > > I think/hope this is a generic problem-solving question: > > How do you 'move' multiple things like this at once? > > Or, just being told exactly what info I really need to keep might make 1p > > drop. > > > > I guess a similar question is that if I have say: > > "some that some that some that" > > and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want > > those > > indexes changed to {6,13,20}. [I may replace instances in any order and > > consider > > that those numbers may be dispersed a bit more, and there may well be > > literally > > many thousands of such inter/independent effects]. > > > > Hoping for a hint, > > Pete > > Wow. What memories! > Back in 1982, I worked at a company (now defunct) > called AES. Along with another guy (Al Matsuoka are you out there?), > I wrote an assembler for the Zilog Z8000. > (Why didn't they just buy an assembler? It's a long story. > They also built their own C compiler.) > > It was a 2-pass assembler. It gathered info on > pass 1 (scanning/parsing), and emitted machine code on pass 2. > Between passes, among other things, it optimized branches, > to try to get as many short branches as possible. > > One of my tasks was to write the branch optimizer. I still have the > C source code for the assembler (dusty paper listing), > and I just dug it out from a pile of memorabilia > that I haven't looked at in years. I won't give you the source, > but I can see from my old comments roughly how it worked. > Apparently I was inspired by an article in CACM Vol 21 No 4 April 1978. > I don't think my code does a provably optimal job, > but it's fairly simple. Reading my ancient code ... > > * assume initially that all branches can be short > > * Set up 2 stacks (Euphoria sequences will do). > One for assumed legal branch instructions, > and one for known illegals. > > * Start by going through and finding instructions with > illegal branches (distance to the label is too far). > This will populate your two stacks. > > * iterate until done (no more illegals): > > for each illegal in the illegal stack, convert it to a long > (now legal) branch. Propagate it's new long length > through the stack of assumed legals, updating branch distances > and possibly creating new illegal short branches that you push > onto the illegal stack. > > Eventually the illegal stack will be empty, and you'll be left with > only legals (i.e. ones that can actually be made short branches, > or ones that had to be converted to long branches). > > Now you can start pass 2, knowing which way (long or short) > to emit each branch instruction. > > I think I just used "stacks", because I was programming in C > and I could neatly allocate memory and run the stacks in > the opposite direction towards each other, since the total size > of the two stacks was constant. Euphoria sequences should be fine. > I don't think last-in/first-out really mattered. > > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> Wow, that brings back memories for me too. I did an assembler for the Z80 chip which at the time was my favorite processor even though there were faster ones, mainly because it had so many different instructions and i had a few of those chips on hand, as well as one in a machine i used regularly ... ha ha. I eventually designed a controller board around the Z80 too which had mucho i/o pins, but yeah took up a whole board about 4 by 3 inches and about 3/8 inch thick, with connectors for boot loading and i/o. Nice thing about the Z80 was all else you really needed was an oscillator chip maybe with a crystal, a cheap static ram chip and a cheap EPROM and you had a small computer system! Used the high address pin for bank switching ram/EPROM. Those were the days Well, on the other hand i wouldnt trade my system now for one of those however. Take care, Al E boa sorte com sua programacao Euphoria! My bumper sticker: "I brake for LED's" From "Black Knight": "I can live with losing the good fight, but i can not live without fighting it". "Well on second thought, maybe not."
5. Re: Writing an assembler
- Posted by CChris <christian.cuvier at agriculture.gouv.fr> Feb 21, 2007
- 519 views
Pete Lomax wrote: > > Well, I got a mad idea about writing an assembler (one to create exe files) > in Eu. However, tricky stuff like mod r/m aside, there is one point I really > cannot wrap my head around. Suppose you have: > > jz l9 > ... > jz l8 > .... > jz l1 > ... > l1: > .... > l8: > ... > l9: > ... > jmp l5 > > Now l9,8,7... are forward references so you'd be wise to assume they will need > a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can > be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can > too. > > My question is how do you keep track of where all these bytes are going to be? > When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5? > > I think/hope this is a generic problem-solving question: > How do you 'move' multiple things like this at once? > Or, just being told exactly what info I really need to keep might make 1p > drop. > > I guess a similar question is that if I have say: > "some that some that some that" > and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want > those > indexes changed to {6,13,20}. [I may replace instances in any order and > consider > that those numbers may be dispersed a bit more, and there may well be > literally > many thousands of such inter/independent effects]. > > Hoping for a hint, > Pete I had that same mad idea a few years ago, and dropped it because of the 300 statement limit at that time. I still have some code on my older computer. The problem was more general than jumps, since there could be unresolved labels in expressions as well. After all, a forward reference is yet another unresolved symbol. I had devised the following database like system: * in pass 1, decode all text, and create a record per instruction. Among other things, that record has a minimal address, a min code size and max code size, a max address. Any jump is treated as unresolved. * At the same time, build another table of unresolved symbols, with a list of references and an obstruction code, ie indication of why it's unresolved. A symbol resolved at pass 1 (forward label) would be removed from the table and resolved. If a jump length could be assessed as either <128 or >=128, the symbol is still unresolved, but with a different code (value unknown, size known). Same approach for other issues, like using #83 or #81 for ADD, encode 1 or 4 byte offset etc. * In pass 2, loop through the obstruction tableand look for records that are obstructions but have no obstructions, resolve them. This way, bar any cyclical obstruction, the list would get empty. I think I just errorred out on any cyclical obstruction. * In pass 3, all instruction fields have their actual code complete, just poke 'em. The woed "database" doesn't imply I was using database.e. It's the abstract database object that I was using. All this not tested, because that was the time I started coding in C++ in the OpenEu framework. The stack scheme Rob exposed doesn't seem too far from this. CChris
6. Re: Writing an assembler
- Posted by duke normandin <dnormandin at bsdrocksperlrolls.com> Feb 21, 2007
- 555 views
Robert Craig wrote: > [snip] > > Wow. What memories! > Back in 1982, I worked at a company (now defunct) > called AES. Along with another guy (Al Matsuoka are you out there?), > I wrote an assembler for the Zilog Z8000. > (Why didn't they just buy an assembler? It's a long story. > They also built their own C compiler.) Do you mean AES -- as in the dedicated word-processing machine c/w 8" floppies my secretary used and loved (more than an IBM Selectric) for years? WoW! You're older than me! ;)) Just kidding! I never wrote an assembler -- just fiddled with Z80 ML on my Apple 2E Enhanced clone from Taiwan which sported a Z80 CPU as well as Apple's 25?? was it, giving this box CP/M capability. It was all fun! [snip] > > Regards, > Rob Craig -- duke SW of Calgary - near the Rockies http://www.rootshell.be/~perlster/euphoria/
7. Re: Writing an assembler
- Posted by Robert Craig <rds at RapidEuphoria.com> Feb 21, 2007
- 512 views
duke normandin wrote: > Robert Craig wrote: > [snip] > > > > Wow. What memories! > > Back in 1982, I worked at a company (now defunct) > > called AES. Along with another guy (Al Matsuoka are you out there?), > > I wrote an assembler for the Zilog Z8000. > > (Why didn't they just buy an assembler? It's a long story. > > They also built their own C compiler.) > > Do you mean AES -- as in the dedicated word-processing machine c/w 8" floppies > my secretary used and loved (more than an IBM Selectric) for years? Yes. That's right. That machine was a big hit for a few years. It was designed and built in Canada. It sold extremely well, and at the time I was hired, they had grandiose plans for a second generation machine based on a version of Unix, that would do more than word processing. Eventually though, after a few years, the industry changed, and offices started buying general purpose IBM PC's instead of buying special purpose machines that could only do word processing. In a short time AES went from being extremely successful to extremely unsuccessful. The project dragged on too long and was dropped. Scores of developers were laid off, but a month later a dozen of us were hired by IBM to develop a set of new compilers for the top-secret (at the time) IBM RISC system. That's how I ended up at the IBM lab in Toronto. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
8. Re: Writing an assembler
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Feb 22, 2007
- 535 views
Robert Craig wrote: > * assume initially that all branches can be short Thanks for replying. It is interesting that you and Tomasz Grysztar (author of fasm) went that way whereas I instinctively thought "assume long". Now I've thought about it a bit more, (thanks to everyone who responded) I think I might try three-address-code-plus-length-sib-modrm-etc, but use simple subscripts rather than offsets or addresses in the TAC pointing to a secondary sequence with actual addresses. Mind you, now that I've thought about it a bit more, I've drastically scaled down what it is I actually want to achieve here ) CChris wrote: >create a record per instruction. Hmm, perhaps my attempts to avoid just that befuddled me ;) mic _ wrote: > have fun writing the linker. ... iirc it got pretty ugly ... I know rva's are tricky, but quite doable. I'll have a look at nqa, ta. Rob wrote: >I later learned that "Abandon Hope..." was the inscription >at the gates of Hell, according to Dante's Divine Comedy. I first came across that in "adventure" on a pdp-11 unix box, I think just after getting through the "maze of twisty little passages, all alike" ... ... and I definately dropped a few grade points that term. Regards, Pete
9. Re: Writing an assembler
- Posted by Alan Oxley <fizzpop at axemail.co.za> Feb 22, 2007
- 529 views
"adventure"... brings back memories for me too. It was on my employers os/390 mainframe years ago! Something to do while awaiting the next tape mount or printer jamb. I even have the "adventure" source somewhere. Ahhh... nostalgia isn't what it used to be... ;))