1. Writing an assembler

Well, I got a mad idea about writing an assembler (one to create exe files) in
Eu. However, tricky stuff like mod r/m aside, there is one point I really cannot
wrap my head around. Suppose you have:

jz l9
...
jz l8
....
jz l1
...
l1:
....
l8:
...
l9:
...
jmp l5

Now l9,8,7... are forward references so you'd be wise to assume they will need a
4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can be a
one-byte offset, which may mean (3rd pass would be OK) that jz l2 can too.

My question is how do you keep track of where all these bytes are going to be?
When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?

I think/hope this is a generic problem-solving question:
How do you 'move' multiple things like this at once?
Or, just being told exactly what info I really need to keep might make 1p drop.

I guess a similar question is that if I have say:
"some that some that some that"
and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want those
indexes changed to {6,13,20}. [I may replace instances in any order and consider
that those numbers may be dispersed a bit more, and there may well be literally
many thousands of such inter/independent effects].

Hoping for a hint,
Pete

new topic     » topic index » view message » categorize

2. Re: Writing an assembler

I wrote:
Minor correction needed there, replace:
> (or 6), how/when do you modify that final jmp l5?
with:
how/when do you modify eg the first jmp l9?

Pete
Should have trusted my first instincts there, not modifying stuff on the fly

new topic     » goto parent     » topic index » view message » categorize

3. Re: Writing an assembler

Pete Lomax wrote:
> Well, I got a mad idea about writing an assembler (one to create exe files)
> in Eu. However, tricky stuff like mod r/m aside, there is one point I really
> cannot wrap my head around. Suppose you have:
> 
> jz l9
> ...
> jz l8
> ....
> jz l1
> ...
> l1:
> ....
> l8:
> ...
> l9:
> ...
> jmp l5
> 
> Now l9,8,7... are forward references so you'd be wise to assume they will need
> a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can
> be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can
> too.
> 
> My question is how do you keep track of where all these bytes are going to be?
> When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?
> 
> I think/hope this is a generic problem-solving question:
> How do you 'move' multiple things like this at once?
> Or, just being told exactly what info I really need to keep might make 1p
> drop.
> 
> I guess a similar question is that if I have say:
> "some that some that some that"
> and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want
> those
> indexes changed to {6,13,20}. [I may replace instances in any order and
> consider
> that those numbers may be dispersed a bit more, and there may well be
> literally
> many thousands of such inter/independent effects].
> 
> Hoping for a hint,
> Pete

Wow. What memories!
Back in 1982, I worked at a company (now defunct)
called AES. Along with another guy (Al Matsuoka are you out there?),
I wrote an assembler for the Zilog Z8000.
(Why didn't they just buy an assembler? It's a long story.
They also built their own C compiler.)

It was a 2-pass assembler. It gathered info on
pass 1 (scanning/parsing), and emitted machine code on pass 2.
Between passes, among other things, it optimized branches, 
to try to get as many short branches as possible. 

One of my tasks was to write the branch optimizer. I still have the
C source code for the assembler (dusty paper listing), 
and I just dug it out from a pile of memorabilia
that I haven't looked at in years. I won't give you the source,
but I can see from my old comments roughly how it worked. 
Apparently I was inspired by an article in CACM Vol 21 No 4 April 1978.
I don't think my code does a provably optimal job,
but it's fairly simple. Reading my ancient code ...

  * assume initially that all branches can be short

  * Set up 2 stacks (Euphoria sequences will do). 
    One for assumed legal branch instructions,
    and one for known illegals.

  * Start by going through and finding instructions with
    illegal branches (distance to the label is too far).
    This will populate your two stacks.

  * iterate until done (no more illegals):

    for each illegal in the illegal stack, convert it to a long 
    (now legal) branch. Propagate it's new long length
    through the stack of assumed legals, updating branch distances 
    and possibly creating new illegal short branches that you push
    onto the illegal stack.

    Eventually the illegal stack will be empty, and you'll be left with
    only legals (i.e. ones that can actually be made short branches,
    or ones that had to be converted to long branches). 

Now you can start pass 2, knowing which way (long or short) 
to emit each branch instruction.

I think I just used "stacks", because I was programming in C
and I could neatly allocate memory and run the stacks in
the opposite direction towards each other, since the total size
of the two stacks was constant. Euphoria sequences should be fine. 
I don't think last-in/first-out really mattered.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

4. Re: Writing an assembler

Robert Craig wrote:
> 
> Pete Lomax wrote:
> > Well, I got a mad idea about writing an assembler (one to create exe files)
> > in Eu. However, tricky stuff like mod r/m aside, there is one point I really
> > cannot wrap my head around. Suppose you have:
> > 
> > jz l9
> > ...
> > jz l8
> > ....
> > jz l1
> > ...
> > l1:
> > ....
> > l8:
> > ...
> > l9:
> > ...
> > jmp l5
> > 
> > Now l9,8,7... are forward references so you'd be wise to assume they will
> > need
> > a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can
> > be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can
> > too.
> > 
> > My question is how do you keep track of where all these bytes are going to
> > be?
> > When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?
> > 
> > I think/hope this is a generic problem-solving question:
> > How do you 'move' multiple things like this at once?
> > Or, just being told exactly what info I really need to keep might make 1p
> > drop.
> > 
> > I guess a similar question is that if I have say:
> > "some that some that some that"
> > and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want
> > those
> > indexes changed to {6,13,20}. [I may replace instances in any order and
> > consider
> > that those numbers may be dispersed a bit more, and there may well be
> > literally
> > many thousands of such inter/independent effects].
> > 
> > Hoping for a hint,
> > Pete
> 
> Wow. What memories!
> Back in 1982, I worked at a company (now defunct)
> called AES. Along with another guy (Al Matsuoka are you out there?),
> I wrote an assembler for the Zilog Z8000.
> (Why didn't they just buy an assembler? It's a long story.
> They also built their own C compiler.)
> 
> It was a 2-pass assembler. It gathered info on
> pass 1 (scanning/parsing), and emitted machine code on pass 2.
> Between passes, among other things, it optimized branches, 
> to try to get as many short branches as possible. 
> 
> One of my tasks was to write the branch optimizer. I still have the
> C source code for the assembler (dusty paper listing), 
> and I just dug it out from a pile of memorabilia
> that I haven't looked at in years. I won't give you the source,
> but I can see from my old comments roughly how it worked. 
> Apparently I was inspired by an article in CACM Vol 21 No 4 April 1978.
> I don't think my code does a provably optimal job,
> but it's fairly simple. Reading my ancient code ...
> 
>   * assume initially that all branches can be short
> 
>   * Set up 2 stacks (Euphoria sequences will do). 
>     One for assumed legal branch instructions,
>     and one for known illegals.
> 
>   * Start by going through and finding instructions with
>     illegal branches (distance to the label is too far).
>     This will populate your two stacks.
> 
>   * iterate until done (no more illegals):
> 
>     for each illegal in the illegal stack, convert it to a long 
>     (now legal) branch. Propagate it's new long length
>     through the stack of assumed legals, updating branch distances 
>     and possibly creating new illegal short branches that you push
>     onto the illegal stack.
> 
>     Eventually the illegal stack will be empty, and you'll be left with
>     only legals (i.e. ones that can actually be made short branches,
>     or ones that had to be converted to long branches). 
> 
> Now you can start pass 2, knowing which way (long or short) 
> to emit each branch instruction.
> 
> I think I just used "stacks", because I was programming in C
> and I could neatly allocate memory and run the stacks in
> the opposite direction towards each other, since the total size
> of the two stacks was constant. Euphoria sequences should be fine. 
> I don't think last-in/first-out really mattered.
> 
> Regards,
>    Rob Craig
>    Rapid Deployment Software
>    <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a>


Wow, that brings back memories for me too.  I did an assembler for the
Z80 chip which at the time was my favorite processor even though there
were faster ones, mainly because it had so many different instructions
and i had a few of those chips on hand, as well as one in a machine
i used regularly ... ha ha.
I eventually designed a controller board around the Z80 too which had mucho
i/o pins, but yeah took up a whole board about 4 by 3 inches and about
3/8 inch thick, with connectors for boot loading and i/o.
Nice thing about the Z80 was all else you really needed was an oscillator chip
maybe with a crystal, a cheap static ram chip and a cheap EPROM and you
had a small computer system!  Used the high address pin for bank switching
ram/EPROM.  Those were the days smile  Well, on the other hand i wouldnt trade
my system now for one of those however.



Take care,
Al

E boa sorte com sua programacao Euphoria!


My bumper sticker: "I brake for LED's"

 From "Black Knight":
"I can live with losing the good fight,
 but i can not live without fighting it".
"Well on second thought, maybe not."

new topic     » goto parent     » topic index » view message » categorize

5. Re: Writing an assembler

Pete Lomax wrote:
> 
> Well, I got a mad idea about writing an assembler (one to create exe files)
> in Eu. However, tricky stuff like mod r/m aside, there is one point I really
> cannot wrap my head around. Suppose you have:
> 
> jz l9
> ...
> jz l8
> ....
> jz l1
> ...
> l1:
> ....
> l8:
> ...
> l9:
> ...
> jmp l5
> 
> Now l9,8,7... are forward references so you'd be wise to assume they will need
> a 4-byte offset. Either at l1 or on a second pass, you figure the jz l1 can
> be a one-byte offset, which may mean (3rd pass would be OK) that jz l2 can
> too.
> 
> My question is how do you keep track of where all these bytes are going to be?
> When you "shrink" jz l1 (or 6), how/when do you modify that final jmp l5?
> 
> I think/hope this is a generic problem-solving question:
> How do you 'move' multiple things like this at once?
> Or, just being told exactly what info I really need to keep might make 1p
> drop.
> 
> I guess a similar question is that if I have say:
> "some that some that some that"
> and I have {6,16,26}, if I replace "that" with "x", somewhere I shall want
> those
> indexes changed to {6,13,20}. [I may replace instances in any order and
> consider
> that those numbers may be dispersed a bit more, and there may well be
> literally
> many thousands of such inter/independent effects].
> 
> Hoping for a hint,
> Pete

I had that same mad idea a few years ago, and dropped it because of the 300 
statement limit at that time. I still have some code on my older computer.

The problem was more general than jumps, since there could be unresolved 
labels in expressions as well. After all, a forward reference is yet 
another unresolved symbol.

I had devised the following database like system:
* in pass 1, decode all text, and create a record per instruction. Among 
other things, that record has a minimal address, a min code size and max code 
size, a max address. Any jump is treated as unresolved.

* At the same time, build another table of unresolved symbols, with a list
of references and an obstruction code, ie indication of why it's unresolved.
A symbol resolved at pass 1 (forward label) would be removed from the table 
and resolved.

If a jump length could be assessed as either <128 or >=128, the symbol is still 
unresolved, but with a different code (value unknown, size known). Same approach
for other issues, like using #83 or #81 for ADD, encode 1 or 4 byte offset etc.

* In pass 2, loop through the obstruction tableand look for records that
are obstructions but have no obstructions, resolve them. This way, bar any 
cyclical obstruction, the list would get empty. I think I just errorred out
 on any cyclical obstruction.

* In pass 3, all instruction fields have their actual code complete, just 
poke 'em.

The woed "database" doesn't imply I was using database.e. It's the abstract 
database object that I was using.

All this not tested, because that was the time I started coding in C++ in the 
OpenEu framework.

The stack scheme Rob exposed doesn't seem too far from this. 

CChris

new topic     » goto parent     » topic index » view message » categorize

6. Re: Writing an assembler

Robert Craig wrote:
> 
[snip]
> 
> Wow. What memories!
> Back in 1982, I worked at a company (now defunct)
> called AES. Along with another guy (Al Matsuoka are you out there?),
> I wrote an assembler for the Zilog Z8000.
> (Why didn't they just buy an assembler? It's a long story.
> They also built their own C compiler.)

Do you mean AES -- as in the dedicated word-processing machine c/w 8" floppies
my secretary used and loved (more than an IBM Selectric) for years?

WoW! You're older than me! ;)) Just kidding! I never wrote an assembler -- just
fiddled with Z80 ML on my Apple 2E Enhanced clone from Taiwan which sported a Z80
CPU as well as Apple's 25?? was it, giving this box CP/M capability. It was all
fun!

[snip]
> 
> Regards,
>    Rob Craig

--
duke
SW of Calgary - near the Rockies
http://www.rootshell.be/~perlster/euphoria/

new topic     » goto parent     » topic index » view message » categorize

7. Re: Writing an assembler

duke normandin wrote:
> Robert Craig wrote:
> [snip]
> > 
> > Wow. What memories!
> > Back in 1982, I worked at a company (now defunct)
> > called AES. Along with another guy (Al Matsuoka are you out there?),
> > I wrote an assembler for the Zilog Z8000.
> > (Why didn't they just buy an assembler? It's a long story.
> > They also built their own C compiler.)
> 
> Do you mean AES -- as in the dedicated word-processing machine c/w 8" floppies
> my secretary used and loved (more than an IBM Selectric) for years?

Yes. That's right.
That machine was a big hit for a few years.
It was designed and built in Canada.
It sold extremely well, and at the time I was hired,
they had grandiose plans for a second generation machine
based on a version of Unix, that would do more than word
processing. Eventually though, after a few years, the 
industry changed, and offices started buying general purpose 
IBM PC's instead of buying special purpose machines that 
could only do word processing. In a short time AES went from
being extremely successful to extremely unsuccessful. The project
dragged on too long and was dropped. Scores of developers were 
laid off, but a month later a dozen of us were hired by IBM to
develop a set of new compilers for the top-secret (at the time)
IBM RISC system. That's how I ended up at the IBM lab in Toronto.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

8. Re: Writing an assembler

Robert Craig wrote:
>   * assume initially that all branches can be short
Thanks for replying. It is interesting that you and Tomasz Grysztar (author of
fasm) went that way whereas I instinctively thought "assume long".

Now I've thought about it a bit more, (thanks to everyone who responded) I think
I might try three-address-code-plus-length-sib-modrm-etc, but use simple
subscripts rather than offsets or addresses in the TAC pointing to a secondary
sequence with actual addresses.

Mind you, now that I've thought about it a bit more, I've drastically scaled
down what it is I actually want to achieve here blink)

CChris wrote:
>create a record per instruction. 
Hmm, perhaps my attempts to avoid just that befuddled me ;)

mic _ wrote:
> have fun writing the linker. ... iirc it got pretty ugly ...
I know rva's are tricky, but quite doable. I'll have a look at nqa, ta.

Rob wrote:
>I later learned that "Abandon Hope..." was the inscription
>at the gates of Hell, according to Dante's Divine Comedy.
I first came across that in "adventure" on a pdp-11 unix box, I think just after
getting through the "maze of twisty little passages, all alike" ...
... and I definately dropped a few grade points that term.

Regards,
Pete

new topic     » goto parent     » topic index » view message » categorize

9. Re: Writing an assembler

"adventure"... brings back memories for me too.
It was on my employers os/390 mainframe years ago!

Something to do while awaiting the next tape mount or printer jamb.
I even have the "adventure" source somewhere.

Ahhh... nostalgia isn't what it used to be...   ;))

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu