Re: Faster lower() command
- Posted by Robert Craig <rds at rapideuphoria.com> Jan 21, 2007
- 538 views
Pete Lomax wrote: > jacques deschĂȘnes wrote: > > Why euphoria interpreter should create 5 or 6 * 1000000 long sequence? > > return o + (o>='A' and o<='Z')*32 -- work as well with atom or sequence > }}} <eucode> > tmp1 = (o>='A') -- a 1000000 sequence of 0 and 1 > tmp2 = (o<='Z') -- a 1000000 sequence of 0 and 1 > tmp1 = tmp1 and tmp2 -- "" "" > tmp1 *= 32 -- "" of 0 and 32 > tmp1 = o+tmp1 -- "" (the result) > return tmp1 > </eucode> {{{ I took a quick look at the IL. (First bind the program. Then run ex \euphoria\source\showil lower.exe and see icode.lst) It does only use two temps, so it isn't wasteful in that regard. > I get far more consistent results when called in a for loop, and I have > noticed > tests like this tend to stongly favour whatever comes second, best to run them > in separate programs in the name of equality. > There might be a margin of say 5% for the 1-liner, but it is nothing like 4 > or 5 times faster as you might at first expect. The more you play with the > above > two code snippets, the more you realise they are essentially the same. It shouldn't make much difference if you have one complicated statement, or 5 simple ones. The IL will be about the same. > > I don't know how the loop is implemented inside euuphoria interpreter but > > there is place for optimization! > > I could enter a long rant about why sequence ops are such a bad idea, but I > think I'll spare you today. I think a major hidden factor here, is the on-chip CPU cache. The sequence-ops version can't make good use of the cache. It reads and writes 1000000-long sequences many times, and the data is far too big to be stored in the on-chip data cache. The sequence data must always be stored/retrieved from RAM, since even a secondary cache would not be big enough in this case. The for-loop version does not make good use of the cache either, but it only goes through the sequence data once. I suspect the sequence-ops version would look much better if the text sequences were of the length of typical text files, such as 80 characters or less. However, if you have very small sequences, the overhead of allocating and deallocating space in the sequence-ops version will become significant. And when you have really huge sequences, the operating system might have to struggle a bit to find that much contiguous RAM, at least on the first iteration. So there is probably a happy medium somewhere. If you studied this carefully, with a range of sizes, you might find that the sequence ops version has sharp degradation at the point where the data can't fit in the on-chip (32K-bytes?) cache anymore, and another degradation at the point where the secondary cache (1Mb?) becomes too small for all the data. I personally use the for-loop version for scanning typical text files, which average maybe 25-40 characters per line. The Translator can speed up the for-loop version more than it can the sequence-ops version. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com