1. Brain-scratcher: New example of the delay
- Posted by CladStone Jun 15, 2009
- 1010 views
Okay, I'm hoping I can get someone else to reproduce this and offer some help with it. I've reproduced it on a second computer, but the parameters I had to use were slightly different, so I'm not sure exactly what all affects it.
Again, this seems to be an issue only with exw.exe. EX.EXE runs it a little slower, but it seems consistent, and not as slow as when exw.exe runs it slowly.
For this test, I've used the following. I think if you play around with it, you can reproduce it under whatever environment you normally use, but just so you know mine:
Filename: C:\b.exw
Euphoria version: exw.exe from euphoria_40a3.exe
Method of running:
- Copy exw.exe in root folder (C:\exw.exe)
- Open command prompt and navigate to the root directory (cd\)
- Enter this command: exw.exw b.exw
File format: ANSI text, CR-LF line endings, no extra lines or spaces at the end of lines.
Here's the code:
atom x = machine_func(16, 1000) poke(x, 195) for i = 1 to 8000000 do poke(x + 311, 0) call(x) end for ? {}
Code explanation:
- I start by allocating a 1000 byte buffer.
- I poke in the simplest machine-code program (RET).
- I run a couple lines of code a bunch of times to make the time evident.
- I use the print directive "?" to call up a console window, so you know when the code is finished.
- The program immediately ends, so you'll only see it flash on screen briefly.
Outcome:
- If it behaves like my computer, this program will run for about 2-3 seconds. You'll see the console flash on screen as soon as it's done.
- If the console appears immediately, replace "311" with a smaller number and try again. I'm sure you'll get it to delay before that number reaches zero.
- Now increase "331" to something higher. Before you get to 999, your program should finish immediately rather than delaying.
Check for strange behavior:
- By making half-way changes and running it repeatedly, you'll probably find some point at which it switches from slow to fast. The first fast number is always even. So, for example, you might find that 631 is slow and 632 is fast. Now comes the fun part:
- Convert your first fast number to hexadecimal, with Windows calculator or something. (e.g. 632 = #278)
- In your code, replace 632 with #278, and run it again.
- Do the same with the last slow number. (e.g. 631 = #277)
- You'll now find that instead of one number running fast and the other slow, they now either both run slow or both run fast, even though you're using the same numbers, just written differently. The cutoff point has now changed. If both fast, lower your hex values till you find the slow mode. If both fast, raise your hex value till you find the fast mode.
- Keep playing around with it. Change the numbers in loop, e.g. "1" to "0" or even "#1". Change the value of the number poked during the loop. Any little change like that should change the cutoff point of the offset where slow becomes fast.
Please tell me if you can reproduce this!
2. Re: Brain-scratcher: New example of the delay
- Posted by ne1uno Jun 15, 2009
- 1009 views
is there some way to automate the testing? script or batch file maybe with a pause between tests.
have you tried translation and binding?
3. Re: Brain-scratcher: New example of the delay
- Posted by mattlewis (admin) Jun 15, 2009
- 1021 views
Okay, I'm hoping I can get someone else to reproduce this and offer some help with it. I've reproduced it on a second computer, but the parameters I had to use were slightly different, so I'm not sure exactly what all affects it.
Again, this seems to be an issue only with exw.exe. EX.EXE runs it a little slower, but it seems consistent, and not as slow as when exw.exe runs it slowly.
I'm able to reproduce this on Linux, using r2165. The magic number for me is 32. Anything less than that, and I get the delay. I believe this has something to do with the processor cache and how the OS deals with the memory. I suspect that there is a threshold at which it flushes the changes or something (I'm really pushing the limit of my knowledge here).
The bottom line is that I think there's nothing special about this code from a euphoria standpoint.
Matt
4. Re: Brain-scratcher: New example of the delay
- Posted by CladStone Jun 15, 2009
- 1033 views
Sure, you could come up with a way to automate testing. I thought about it, but didn't get that far.
Regarding translation and/or binding: A capital suggestion; thank you. It took me a little while since I'd never done that before and didn't have a C compiler installed, but I've now tried both.
Binding:
I reproduced the problem using "bindw" to create a few .exe files. One runs fast with the value "x + 230" on line 4. Another runs slow with the value "x + 150". (I didn't take the time to track down the exact "cutoff" point I mentioned earlier - I just wanted to proof that something that ought not make a difference does.) The slow .exe is 2 bytes smaller, and has 29 other different bytes (at least 9 of these are merely shifted), whereas another fast one (using "x + 311") is the same size as the "x + 230" and only different by 4 bytes.
Translation:
The only difference I could produce in the generated C code was between "x + 2" (or higher) and "x + 1", but it only renamed 2 variables and changed a couple lines to make use of the unique number that "1" is, e.g. it used:
vs.
and
vs.
So it seems the problem was reproduced in binding, but not in translating. If you like, I can try to post those .exe files somewhere. I'm not really sure how to reverse engineer them and tell exactly what's going on, though I might try anyway.
5. Re: Brain-scratcher: New example of the delay
- Posted by CladStone Jun 15, 2009
- 1019 views
Thanks for reproducing this for me. Yes, a processor caching issue was my first thought when I came across the delay, but now that I've seen that they bind differently, that seems to defy the idea that that's ALL it is.
6. Re: Brain-scratcher: New example of the delay
- Posted by CladStone Jun 15, 2009
- 995 views
I tried compiling my C code to see what it did, but got a compile error on line 41 of main-.c, so I'm not sure the generated .exe even does what it is supposed to.
7. Re: Brain-scratcher: New example of the delay
- Posted by jimcbrown (admin) Jun 15, 2009
- 1003 views
Thanks for reproducing this for me. Yes, a processor caching issue was my first thought when I came across the delay, but now that I've seen that they bind differently, that seems to defy the idea that that's ALL it is.
I am not so sure about this. The slightly different IL in a bound executable may easily account for this.
8. Re: Brain-scratcher: New example of the delay
- Posted by bug Jun 15, 2009
- 1069 views
See http://en.wikipedia.org/wiki/Self-modifying_code in particular the last sentence of this:
Interaction of cache and self-modifying code
In some cases short sections of self-modifying code executes more slowly on modern processors. This is because a modern processor will usually try to keep blocks of code in its cache memory. Each time the program rewrites a part of itself, the rewritten part must be loaded into the cache again, which results in a slight delay, if the modified codelet shares the same cache line with the modifying code, as is the case when the modified memory address is located within a few bytes to the one of the modifying code.
The cache invalidation issue on modern processors usually means that self-modifying code would still be faster only when the modification will occur rarely, such as in the case of a state switching inside an inner loop.
Most modern processors load the machine code before they execute it, which means that if an instruction that is too near the instruction pointer is modified, the processor will not notice, but instead execute the code as it was before it was modified. See Prefetch Input Queue (PIQ). PC processors have to handle self-modifying code correctly for backwards compatibility reasons but they are far from efficient at doing so[citation needed].
9. Re: Brain-scratcher: New example of the delay
- Posted by mattlewis (admin) Jun 15, 2009
- 985 views
Thanks for reproducing this for me. Yes, a processor caching issue was my first thought when I came across the delay, but now that I've seen that they bind differently, that seems to defy the idea that that's ALL it is.
I suspect it may have to do with where the memory is allocated, and how close to a page boundary it is. The magic values seem to change, but the effect seems consistent, whether translated or bound or interpreted. It really does look like something going on with OS/processor memory management.
Is there some real situation where this issue is causing problems? I don't think there's anything that euphoria can do about this.
Matt
10. Re: Brain-scratcher: New example of the delay
- Posted by CladStone Jun 15, 2009
- 992 views
I am not so sure about this. The slightly different IL in a bound executable may easily account for this.
What's an IL? I don't understand why the code binds differently.
But yes, after a few more tests, I'm willing to concede that this IS a processor caching issue, and that the state is not captured in the bound .exe file. (I found one more like 150 that ran fast, and then I got my 150 to run fast.)
My big question now is about how to avoid the issue. If you'll notice from my original example, there were 2 allocations. The full code was not intended to modify the machine code routine at all. Must I pad any memory block I use for machine code routines with a certain amount of unused memory? If so, do I pad on both ends, or just after? And by how much do I need to pad to make sure I don't run into the issue?
11. Re: Brain-scratcher: New example of the delay
- Posted by LarryMiller Jun 15, 2009
- 994 views
This is an interesting situation and may be worth further investigation. But I don't think the results would be very useful. Windows, along with any other system that currently runs Euphoria, is not a real-time operating system. There are no guarantees regarding the performance of a particular block of code. There are many factors involved, relating to both the CPU and OS. The results may appear to be consistent with a particular system in the short term but this cannot be relied upon. I don't think this is a Euphoria issue, except that an interpreter adds an extra layer of complexity to an already complex system. I think it would be a mistake to attempt to optimize code based on these experimental results.
12. Re: Brain-scratcher: New example of the delay
- Posted by ne1uno Jun 16, 2009
- 1027 views
Sure, you could come up with a way to automate testing.
re the #hex anomaly, I was thinking a slight delay may be caused while interpreted due to the # char being used to signal the start of the new multiline string in eu4. but that wouldn't apply if this was also seen in eu3. I would expect the problem to disappear when translated.
for testing, look at eutest in eu4. you can generate a program or series of programs programatically, write them to disk and run eutest. I believe it can time individual tests and produce a report. it will run any tests in a directory starting with the name t_*. or any named on the commandline.
13. Re: Brain-scratcher: New example of the delay
- Posted by bug Jun 17, 2009
- 984 views
My big question now is about how to avoid the issue. If you'll notice from my original example, there were 2 allocations. The full code was not intended to modify the machine code routine at all. Must I pad any memory block I use for machine code routines with a certain amount of unused memory? If so, do I pad on both ends, or just after? And by how much do I need to pad to make sure I don't run into the issue?
I suspect the best you could do is allocate an nK block up front for all asm code, where by nK I mean 1K, 4K, 32K, or whatever you end up needing. I doubt there is any way for the underlying OS to allocate 'executable' and 'non-executable' requests in separate non-cache-clashing areas...?