1. Constructing sequences
- Posted by Gordon Webster <gwalias-bb at yahoo.com> Jun 27, 2005
- 628 views
Dear Euphorians Overall, I'm very happy with the performance of Euphoria (both interpreted and compiled from translated C), but I've noticed that constructing sequences seems to be pretty slow by comparison with accessing them once they are constructed. When I add 100,000 smaller sequences (my data "records") to a storage sequence, it takes me nearly 40 seconds either interpreted or compiled, to complete. Once this 100,000 record sequence is built however, I can access the various "records" and/or their "fields" at blazing speeds. I get no speed up on the compiled version for the construction of the sequence, but I do get some speedup for the already very rapid access to the built sequence. I have even tried making the sequence to be constructed global, and using a procedure to append to it, rather than a function which has to pass a copy of the sequence. It made no discernible difference. I have read the performance tips about using the optimized form 'append' etc. etc., but if any of you out there have any experience of improving the speed of creation of large sequences, I'd be very grateful for any tips or pointers (no pun intended). Best Gordon
2. Re: Constructing sequences
- Posted by Bernie Ryan <xotron at bluefrog.com> Jun 27, 2005
- 591 views
Gordon Webster wrote: > > Dear Euphorians > > Overall, I'm very happy with the performance of Euphoria (both > interpreted and compiled from translated C), but I've noticed > that constructing sequences seems to be pretty slow by comparison > with accessing them once they are constructed. > > When I add 100,000 smaller sequences (my data "records") to a storage > sequence, it takes me nearly 40 seconds either interpreted or > compiled, to complete. Once this 100,000 record sequence is built > however, I can access the various "records" and/or their "fields" > at blazing speeds. I get no speed up on the compiled version for > the construction of the sequence, but I do get some speedup for the > already very rapid access to the built sequence. > > I have even tried making the sequence to be constructed global, and > using a procedure to append to it, rather than a function which has > to pass a copy of the sequence. It made no discernible difference. > > I have read the performance tips about using the optimized form > 'append' etc. etc., but if any of you out there have any experience > of improving the speed of creation of large sequences, I'd be very > grateful for any tips or pointers (no pun intended). > > Best Gordon: I have found that thing slow way down when using many appends is version 2.5 Eu. If you can define the complete large sequence all at once then that will be faster without appends. Have you thought of using a database to store the information your keeping in the small sequences instead of adding them to a large sequence. Bernie My files in archive: w32engin.ew mixedlib.e eu_engin.e win32eru.exw Can be downloaded here: http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan
3. Re: Constructing sequences
- Posted by Gordon Webster <gwalias-bb at yahoo.com> Jun 27, 2005
- 581 views
Bernie Ryan wrote: > > Gordon Webster wrote: > > > > Dear Euphorians > > > > Overall, I'm very happy with the performance of Euphoria (both > > interpreted and compiled from translated C), but I've noticed > > that constructing sequences seems to be pretty slow by comparison > > with accessing them once they are constructed. > > > > When I add 100,000 smaller sequences (my data "records") to a storage > > sequence, it takes me nearly 40 seconds either interpreted or > > compiled, to complete. Once this 100,000 record sequence is built > > however, I can access the various "records" and/or their "fields" > > at blazing speeds. I get no speed up on the compiled version for > > the construction of the sequence, but I do get some speedup for the > > already very rapid access to the built sequence. > > > > I have even tried making the sequence to be constructed global, and > > using a procedure to append to it, rather than a function which has > > to pass a copy of the sequence. It made no discernible difference. > > > > I have read the performance tips about using the optimized form > > 'append' etc. etc., but if any of you out there have any experience > > of improving the speed of creation of large sequences, I'd be very > > grateful for any tips or pointers (no pun intended). > > > > Best > > Gordon: > > I have found that thing slow way down when using many appends > is version 2.5 Eu. If you can define the complete large sequence > all at once then that will be faster without appends. > > Have you thought of using a database to store the information > your keeping in the small sequences instead of adding them to > a large sequence. > > Bernie > > My files in archive: > w32engin.ew mixedlib.e eu_engin.e win32eru.exw > > Can be downloaded here: > <a > href="http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan">http://www.rapideuphoria.com/cgi-bin/asearch.exu?dos=on&win=on&lnx=on&gen=on&keywords=bernie+ryan</a> > Thanks for the reply Bernie I am reading the data records in from a file so I don't see an obvious way I could build the whole sequence at once. Wouldn't I have to temporariliy store the records in another sequence, incurring the same 'append' overhead anyway? The other aspect of this is that parts of the data records are going to be used for heavy numerical computation and I need to be able to construct numerical vectors and matrices really fast, using certain fields of my data records. I didn't consider using a database for this, since my data records are very simple and well defined and I was worried that accessing fields from a db would be a lot slower than accessing atoms from a sequence. I'm really shooting for "lean and fast" here since I may have as many as a million records to process and I want to keep the memory overhead down as well. Perhaps I should be thinking of using a db. Would it bloat my memory overhead much, or slow things down compared to using raw sequences? It seems that once I've built my raw sequence, I can access it very very rapidly indeed, but if I don't use a db, is it possible to speed up the construction of sequences in memory? Best Gordon
4. Re: Constructing sequences
- Posted by Matt Lewis <matthewwalkerlewis at gmail.com> Jun 27, 2005
- 628 views
Gordon Webster wrote: > > Dear Euphorians > > Overall, I'm very happy with the performance of Euphoria (both > interpreted and compiled from translated C), but I've noticed > that constructing sequences seems to be pretty slow by comparison > with accessing them once they are constructed. > > When I add 100,000 smaller sequences (my data "records") to a storage > sequence, it takes me nearly 40 seconds either interpreted or > compiled, to complete. Once this 100,000 record sequence is built > however, I can access the various "records" and/or their "fields" > at blazing speeds. I get no speed up on the compiled version for > the construction of the sequence, but I do get some speedup for the > already very rapid access to the built sequence. How are you adding the smaller sequences to the larger sequence? If you continue to simply append sequences to another, you will be forcing Euphoria to reallocate memory many times (it does allocate more than it needs, but you'll still be wasting a lot of time doing this). If you know how many (or even about how many) you can get a big speedup. Try, for instance (pseudocode):
storage_sequence = repeat( 0, 100000 ) while reading_data do -- read data... storage_sequence[ix] = new_data ix += 1 end while
You should see a big increase in speed. You can also increase the size by some fixed (or changing) size. Either way, I think you'll find a nice speed increase. Matt Lewis
5. Re: Constructing sequences
- Posted by Gordon Webster <gwalias-bb at yahoo.com> Jun 27, 2005
- 589 views
Matt Lewis wrote: > > Gordon Webster wrote: > > > > Dear Euphorians > > > > Overall, I'm very happy with the performance of Euphoria (both > > interpreted and compiled from translated C), but I've noticed > > that constructing sequences seems to be pretty slow by comparison > > with accessing them once they are constructed. > > > > When I add 100,000 smaller sequences (my data "records") to a storage > > sequence, it takes me nearly 40 seconds either interpreted or > > compiled, to complete. Once this 100,000 record sequence is built > > however, I can access the various "records" and/or their "fields" > > at blazing speeds. I get no speed up on the compiled version for > > the construction of the sequence, but I do get some speedup for the > > already very rapid access to the built sequence. > > How are you adding the smaller sequences to the larger sequence? If you > continue to simply append sequences to another, you will be forcing Euphoria > to reallocate memory many times (it does allocate more than it needs, but > you'll still be wasting a lot of time doing this). If you know how many > (or even about how many) you can get a big speedup. Try, for instance > (pseudocode): > }}} <eucode> > storage_sequence = repeat( 0, 100000 ) > while reading_data do > -- read data... > storage_sequence[ix] = new_data > ix += 1 > end while > <font color="#330033"></eucode> {{{ </font> > You should see a big increase in speed. You can also increase the size by > some fixed (or changing) size. Either way, I think you'll find a nice > speed increase. > > Matt Lewis > Aaaaaaaaaaah! Thanks Matt It's all the individual memory allocations that take the time on the 'append' operation. Using your method of pre-allocating memory in blocks, my 40 second sequence build comes down to 0.05s - an 800x speedup! BTW: just for fun, when I compiled this, the time was too small to be displayed by the time() function for 100,000 records added. I will allocate and expand my sequences in blocks from now on. Thanks for the help - I really appreciate it! Best Gordon
6. Re: Constructing sequences
- Posted by "Igor Kachan" <kinz at peterlink.ru> Jun 27, 2005
- 597 views
Gordon Webster wrote: [snip] > I am reading the data records in from a file so I don't see an > obvious way I could build the whole sequence at once. Wouldn't > I have to temporariliy store the records in another sequence, > incurring the same 'append' overhead anyway? [snip] Gordon, try please:
-- file aa.ex include file.e function Get_Big_File(object name) object file file = dir(name) file = file[1][3] file = repeat(0, file) name = open(name, "rb") for i=1 to length(file) do file[i] = getc(name) end for close(name) return file end function puts(1, Get_Big_File("aa.ex")) -- end of file aa.ex
Just copy/paste this code into NotePad, save it as aa.ex, and then run it from ed.ex. Good Luck! Regards, Igor Kachan kinz at peterlink.ru
7. Re: Constructing sequences
- Posted by Robert Craig <rds at RapidEuphoria.com> Jun 27, 2005
- 599 views
Gordon Webster wrote: > Aaaaaaaaaaah! Thanks Matt > > It's all the individual memory allocations that take the time on the > 'append' operation. Using your method of pre-allocating memory in blocks, > my 40 second sequence build comes down to 0.05s - an 800x speedup! > > BTW: just for fun, when I compiled this, the time was too small to be > displayed by the time() function for 100,000 records added. > > I will allocate and expand my sequences in blocks from now on. That sounds like a good solution, however I did a test on my Pentium-4 1.8GHz machine...
integer fn object line sequence buffer -- create file fn = open("bigfile.txt", "wb") for i = 1 to 100000 do puts(fn, 'A' + rand(repeat(100, 19 + rand(11)))) puts(fn, '\n') end for close(fn) -- file is about 2.5Mb -- read file fn = open("bigfile.txt", "rb") -- 100,000 lines of 20 to 30 random chars atom t t = time() buffer = {} while 1 do line = gets(fn) if atom(line) then exit end if buffer = append(buffer, line) end while close(fn) ? time() - t -- 4.06 seconds, 1.8GHz Pentium-4 ? length(buffer) -- 100,000
With 2.5, ex.exe took 4.06 seconds. With 2.4, ex.exe took 4.34 seconds. Maybe your machine is much slower, or maybe you were not appending to a simple variable, like "buffer", but rather to a subscripted variable. The latter case is not optimized as well. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
8. Re: Constructing sequences
- Posted by Gordon Webster <gwalias-bb at yahoo.com> Jun 27, 2005
- 621 views
Robert Craig wrote: > > Gordon Webster wrote: > > Aaaaaaaaaaah! Thanks Matt > > > > It's all the individual memory allocations that take the time on the > > 'append' operation. Using your method of pre-allocating memory in blocks, > > my 40 second sequence build comes down to 0.05s - an 800x speedup! > > > > BTW: just for fun, when I compiled this, the time was too small to be > > displayed by the time() function for 100,000 records added. > > > > I will allocate and expand my sequences in blocks from now on. > > That sounds like a good solution, however > I did a test on my Pentium-4 1.8GHz machine... > > }}} <eucode> > integer fn > object line > sequence buffer > > -- create file > fn = open("bigfile.txt", "wb") > for i = 1 to 100000 do > puts(fn, 'A' + rand(repeat(100, 19 + rand(11)))) > puts(fn, '\n') > end for > close(fn) -- file is about 2.5Mb > > -- read file > fn = open("bigfile.txt", "rb") -- 100,000 lines of 20 to 30 random chars > > atom t > t = time() > > buffer = {} > while 1 do > line = gets(fn) > if atom(line) then > exit > end if > buffer = append(buffer, line) > end while > > close(fn) > > ? time() - t -- 4.06 seconds, 1.8GHz Pentium-4 > > ? length(buffer) -- 100,000 > <font color="#330033"></eucode> {{{ </font> > > With 2.5, ex.exe took 4.06 seconds. > With 2.4, ex.exe took 4.34 seconds. > > Maybe your machine is much slower, or maybe you > were not appending to a simple variable, like "buffer", > but rather to a subscripted variable. The latter case > is not optimized as well. > > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> > Dear Rob, You are absolutely right, I was writing to a subscripted variable. I had created a sequence whose first few variables are the sequence 'config', and whose last variable is another sequence destined to hold the actual data records. In effect was 'appending' to mysequence[$]. I didn't realise that there was such a difference between subscripted and unsubscripted variables for this purpose (and my machine is pretty fast - a 3.2GHz Pentium). Thanks for the tip. This is something I might never have discovered for myself. This list is a terrific resource and I have had some great help here, for which I'm very grateful. Coding in Euphoria is certainly a pleasure. I love how lean, clean and readable my code is and (an important feature for me) how fast it runs ... ... when you do things the "right" way, as I'm learning today I know that you have a "Performance Tips" section in the online docs, but it might be worth compiling some more of this kind of advice into it. Trawling through 9 years worth of the EuForum is not so easy - the search is lighning quick, but wading through the output can be a slog. It would really be a shame if a newcomer to Euphoria like myself, were to try out the language without being aware of these things, erroneously conclude that Perl/Python etc. (pick your favorite interpreter) was faster, and go looking elsewhere for speedy code. Best Gordon
9. Re: Constructing sequences
- Posted by Derek Parnell <ddparnell at bigpond.com> Jun 27, 2005
- 606 views
- Last edited Jun 28, 2005
Robert Craig wrote: > With 2.5, ex.exe took 4.06 seconds. Another 'instant' speed up is use exwc.exe when running on a Windows platform. I used the same code as Robert here and on my Pentium 3, 550MHz machine I got 0.45 seconds using exwc.exe -- Derek Parnell Melbourne, Australia irc://irc.sorcery.net:9000/euphoria
10. Re: Constructing sequences
- Posted by Rubens <rml at rubis.trix.net> Jun 27, 2005
- 619 views
- Last edited Jun 28, 2005
Hi Rob... There is something wrong... 0.13s exw test.exw 0.11s exwc test.exw 2.54s ex test.ex (!!!!) Pentium 2.4Ghz win XP. Rubens At 17:20 27/6/2005, you wrote: > > >posted by: Derek Parnell <ddparnell at bigpond.com> > >Robert Craig wrote: > > With 2.5, ex.exe took 4.06 seconds. > >Another 'instant' speed up is use exwc.exe when running on a Windows >platform. I used the same code as Robert here and on my Pentium 3, >550MHz machine I got 0.45 seconds using exwc.exe > >-- >Derek Parnell >Melbourne, Australia >irc://irc.sorcery.net:9000/euphoria > > >
11. Re: Constructing sequences
- Posted by Robert Craig <rds at RapidEuphoria.com> Jun 27, 2005
- 600 views
- Last edited Jun 28, 2005
Rubens wrote: > Hi Rob... > There is something wrong... > > 0.13s exw test.exw > 0.11s exwc test.exw > 2.54s ex test.ex (!!!!) > > Pentium 2.4Ghz win XP. > > Rubens > > > At 17:20 27/6/2005, you wrote: > > > > > >posted by: Derek Parnell <ddparnell at bigpond.com> > > > >Robert Craig wrote: > > > With 2.5, ex.exe took 4.06 seconds. > > > >Another 'instant' speed up is use exwc.exe when running on a Windows > >platform. I used the same code as Robert here and on my Pentium 3, > >550MHz machine I got 0.45 seconds using exwc.exe I get: ex.exe 4.06 seconds exwc.exe 0.16 seconds yet ex and exwc are roughly equal on the sieve8k.exw benchmark. I think the reason for this is that I'm using WATCOM's storage allocator for ex.exe, but I'm using Microsoft's allocator for exw.exe and exwc.exe. I think WATCOM's allocator scans through all the (up to 100000) allocated blocks, looking for one that has been marked free, before it gives up and allocates a new block. Most other allocators maintain a free list, so they would know instantly that there are no free blocks, and they would allocate a new block. WATCOM is faster at freeing blocks, and generally works ok when there's a mix of allocates and frees happening. It's just the case of thousands of allocates in a row with no frees where it falls down. All of this shows that storage allocation algorithms can have a big, sometimes huge, impact on performance. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
12. Re: Constructing sequences
- Posted by Greg Haberek <ghaberek at gmail.com> Jun 27, 2005
- 598 views
- Last edited Jun 28, 2005
> 0.13s exw test.exw > 0.11s exwc test.exw > 2.54s ex test.ex (!!!!) LOL :) There's nothing wrong - that's just Windows XP DOS emulation rearing its ugly face again. Since it's fully emulating DOS, there's *a lot* of overhead with memory and such. Try to use exwc for any command-line programs, unless you need to specifically use DOS for some reason. ~Greg
13. Re: Constructing sequences
- Posted by Robert Craig <rds at RapidEuphoria.com> Jun 27, 2005
- 609 views
- Last edited Jun 28, 2005
Greg Haberek wrote: > > 0.13s exw test.exw > > 0.11s exwc test.exw > > 2.54s ex test.ex (!!!!) > > LOL :) There's nothing wrong - that's just Windows XP DOS emulation > rearing its ugly face again. Since it's fully emulating DOS, there's > *a lot* of overhead with memory and such. Try to use exwc for any > command-line programs, unless you need to specifically use DOS for > some reason. If I comment out the "append" line in my example, then I get 0.11 (instead of 4.06) for ex, and 0.05 for exwc, so the I/O is a bit faster when using exwc, but it's the append that is dominating the time for ex. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
14. Re: Constructing sequences
- Posted by "Kat" <gertie at visionsix.com> Jun 28, 2005
- 618 views
On 27 Jun 2005, at 7:43, Gordon Webster wrote: > > > posted by: Gordon Webster <gwalias-bb at yahoo.com> > > Dear Euphorians > > Overall, I'm very happy with the performance of Euphoria (both > interpreted and compiled from translated C), but I've noticed > that constructing sequences seems to be pretty slow by comparison > with accessing them once they are constructed. > > When I add 100,000 smaller sequences (my data "records") to a storage > sequence, it takes me nearly 40 seconds either interpreted or > compiled, to complete. Once this 100,000 record sequence is built > however, I can access the various "records" and/or their "fields" > at blazing speeds. I get no speed up on the compiled version for > the construction of the sequence, but I do get some speedup for the > already very rapid access to the built sequence. > > I have even tried making the sequence to be constructed global, and > using a procedure to append to it, rather than a function which has > to pass a copy of the sequence. It made no discernible difference. > > I have read the performance tips about using the optimized form > 'append' etc. etc., but if any of you out there have any experience > of improving the speed of creation of large sequences, I'd be very > grateful for any tips or pointers (no pun intended). See the last email thread about getf(). It's now extremely fast at building a sequence in memory from a harddrive file. HTH. Kat
15. Re: Constructing sequences
- Posted by Mario Steele <eumario at trilake.net> Jun 28, 2005
- 626 views
Allright, Considering someone posted something about file i/o (Including Kat), about this, and the overwhelming evidence that pre-allocation is best way to go, here should be an extreamly fast version of loading a file into Memory (AKA Sequence):
function fast_read(sequence fname) sequence file_block integer fh, size fh = open(fname,"rb") if seek(fh,-1) then end if size = where(fh) if seek(fh,0) then end if file_block = repeat(0,size) for i = 1 to size do file_block[i] = getc(fh) end for close(fh) return file_block end function
This code is untested, as I formulated it out in my head, but it should prove the point that it will be faster. I use seek() instead of dir(), cause sometimes, dir() Rely's on the Operating System to get the size, and that can prove to be slow. seek() is faster, cause you open the file allready, or have it open, seek() is easier to reset the file pointers, rewind, and such, and I have yet to see an error occur from seek() where it goes to the end of the file, then back to the first byte in the file. And that's from countless usages that I have used this method for. Mario Steele http://enchantedblade.trilake.net Attaining World Dominiation, one byte at a time...
16. Re: Constructing sequences
- Posted by "Igor Kachan" <kinz at peterlink.ru> Jun 28, 2005
- 589 views
Mario Steele wrote: > posted by: Mario Steele <eumario at trilake.net> > > Allright, > > Considering someone posted something about file i/o (Including Kat), > about this, and the overwhelming evidence that pre-allocation is best > way to go, here should be an extreamly fast version of loading a file > into Memory (AKA Sequence): > > }}} <eucode> > function fast_read(sequence fname) > sequence file_block > integer fh, size > fh = open(fname,"rb") > if seek(fh,-1) then end if > size = where(fh) > if seek(fh,0) then end if > file_block = repeat(0,size) > for i = 1 to size do > file_block[i] = getc(fh) > end for > close(fh) > return file_block > end function > </eucode> {{{ > > This code is untested, as I formulated it out in my head, but it should > prove the point that it will be faster. I use seek() instead of dir(), > cause sometimes, dir() Rely's on the Operating System to get the size, > and that can prove to be slow. seek() is faster, cause you open the file > allready, or have it open, seek() is easier to reset the file pointers, > rewind, and such, and I have yet to see an error occur from seek() where > it goes to the end of the file, then back to the first byte in the file. > And that's from countless usages that I have used this method for. You are right, I think, about pre-allocation. So the question is - what method is fastest for determining of the file size to read file into already prepared sequence. I have tested the following code:
--- file cc.e sequence F integer fn procedure a(sequence name) F=machine_func(22, name) -- dir F=repeat(0, F[1][3]) fn = open(name, "rb") close(fn) end procedure procedure b(sequence name) fn = open(name,"rb") if machine_func(19,{fn,-1}) then end if --seek F=repeat(0, machine_func(20,fn)) --where if machine_func(19,{fn,0}) then end if --seek close(fn) end procedure atom T T=time() for i=1 to 5000 do a("cc.e") end for ? time() - T T=time() for i=1 to 5000 do b("cc.e") end for ? time() - T -- end of file cc.e
My results on P166MMX/Win95 64M RAM. ------------ DOS Window, ex.exe: a() 7.8 7.74 7.75 7.69 7.8 b() 4.61 4.62 4.61 4.62 4.67 Plane DOS32 session: a() 2.75 2.8 2.86 2.75 2.75 b() 2.14 2.14 2.14 2.14 2.2 Windows console, exw.exe: a() 4.4 4.51 4.68 4.39 4.49 b() 3.67 3.68 3.67 3.68 3.73 ----------- So, the fastest way is seek/where/seek with direct calls of needed machine_func(). But dir() is not that bad too, I think. Adds just a small fraction of one millisecond to the process, but gives very clear code. Regards, Igor Kachan kinz at peterlink.ru
17. Re: Constructing sequences
- Posted by codepilot Gmail Account <codepilot at gmail.com> Jun 28, 2005
- 591 views
In WIN32 you could just use GetFileSizeEx or GetFileSize, exported from kernel32.dll. But I don't know how fast it is. Dan On 6/28/05, Igor Kachan <kinz at peterlink.ru> wrote: > > > Mario Steele wrote: > > > posted by: Mario Steele <eumario at trilake.net> > > > > Allright, > > > > Considering someone posted something about file i/o (Including Kat), > > about this, and the overwhelming evidence that pre-allocation is best > > way to go, here should be an extreamly fast version of loading a file > > into Memory (AKA Sequence): > > > > }}} <eucode> > > function fast_read(sequence fname) > > sequence file_block > > integer fh, size > > fh = open(fname,"rb") > > if seek(fh,-1) then end if > > size = where(fh) > > if seek(fh,0) then end if > > file_block = repeat(0,size) > > for i = 1 to size do > > file_block[i] = getc(fh) > > end for > > close(fh) > > return file_block > > end function > > </eucode> {{{ > > > > This code is untested, as I formulated it out in my head, but it should > > prove the point that it will be faster. I use seek() instead of dir(), > > cause sometimes, dir() Rely's on the Operating System to get the size, > > and that can prove to be slow. seek() is faster, cause you open the fi= le > > allready, or have it open, seek() is easier to reset the file pointers, > > rewind, and such, and I have yet to see an error occur from seek() wher= e > > it goes to the end of the file, then back to the first byte in the file= . > > And that's from countless usages that I have used this method for. > > You are right, I think, about pre-allocation. > So the question is - what method is fastest for > determining of the file size to read file into > already prepared sequence. > > I have tested the following code: > > }}} <eucode> > --- file cc.e > sequence F > integer fn >=20 > procedure a(sequence name) > F=machine_func(22, name) -- dir > F=repeat(0, F[1][3]) > fn = open(name, "rb") > close(fn) > end procedure >=20 > procedure b(sequence name) > fn = open(name,"rb") > if machine_func(19,{fn,-1}) then end if --seek > F=repeat(0, machine_func(20,fn)) --where > if machine_func(19,{fn,0}) then end if --seek > close(fn) > end procedure >=20 > atom T > T=time() > for i=1 to 5000 do > a("cc.e") > end for > ? time() - T >=20 > T=time() > for i=1 to 5000 do > b("cc.e") > end for > ? time() - T > -- end of file cc.e >=20 > </eucode> {{{ > > My results on P166MMX/Win95 64M RAM. > ------------ > DOS Window, ex.exe: > > a() 7.8 7.74 7.75 7.69 7.8 > b() 4.61 4.62 4.61 4.62 4.67 > > Plane DOS32 session: > > a() 2.75 2.8 2.86 2.75 2.75 > b() 2.14 2.14 2.14 2.14 2.2 > > Windows console, exw.exe: > > a() 4.4 4.51 4.68 4.39 4.49 > b() 3.67 3.68 3.67 3.68 3.73 > ----------- > > So, the fastest way is seek/where/seek > with direct calls of needed machine_func(). > > But dir() is not that bad too, I think. > Adds just a small fraction of one millisecond > to the process, but gives very clear code. > > Regards, > Igor Kachan > kinz at peterlink.ru > > > > > > >
18. Re: Constructing sequences
- Posted by Derek Parnell <ddparnell at bigpond.com> Jun 28, 2005
- 602 views
- Last edited Jun 29, 2005
Mario Steele wrote: > > Allright, > > Considering someone posted something about file i/o (Including Kat), about > this, and > the overwhelming evidence that pre-allocation is best way to go, here should > be an > extreamly fast version of loading a file into Memory (AKA Sequence): It seems that a lot of people have missed the original poster's point. Or maybe its just me But the way I read it is that they were trying to load each LINE of a text file into its own sub-sequence, rather than load the entire file into one single sequence. Thus a file with twenty lines would create twenty sub-sequences. Rob Craig's code is about the best so far for doing that, though I'm sure it could still be tweaked a bit more. -- Derek Parnell Melbourne, Australia irc://irc.sorcery.net:9000/euphoria
19. Re: Constructing sequences
- Posted by Pete Lomax <petelomax at blueyonder.co.uk> Jun 29, 2005
- 588 views
On Tue, 28 Jun 2005 20:05:31 +0400, Igor Kachan <kinz at peterlink.ru> wrote: Please don't do this:
F=machine_func(22, name) -- dir ... if machine_func(19,{fn,-1}) then end if --seek F=repeat(0, machine_func(20,fn)) --where if machine_func(19,{fn,0}) then end if --seek
It costs absolutely nothing to define
constant M_SEEK = 19, M_WHERE = 20, M_DIR = 22,
And hence (at the very least) code:
F=machine_func(M_DIR, name) ... if machine_func(M_SEEK,{fn,-1}) then end if F=repeat(0, machine_func(M_WHERE,fn)) if machine_func(M_SEEK,{fn,0}) then end if
If you can show a (run-time) difference, I'll eat my hat, and even the saving of a routine call is highly questionable, as of course it goes without saying that
F=dir(name) ... void = seek(fn,-1) F=repeat(0,where(fn)) void = seek(fn,0)
is far, far, far more readable (given that I have only copied relevant lines so the above does not entirely cohere), and, again, in a practical real-world example if you can show any measurable gain, I guess I'll eat my other hat. >But dir() is not that bad too, I think. >Adds just a small fraction of one millisecond >to the process, but gives very clear code. I wholeheartedly agree. Your example does show an overhead of calling dir() 5000 times, but very few real-world programs will do that. Use code which is readable, even when you need it to be fast. Regards, Pete
20. Re: Constructing sequences
- Posted by "Igor Kachan" <kinz at peterlink.ru> Jun 29, 2005
- 587 views
Pete Lomax wrote: > On Tue, 28 Jun 2005 20:05:31 +0400, Igor Kachan <kinz at peterlink.ru> > wrote: Please don't do this: > }}} <eucode> > F=machine_func(22, name) -- dir > ... > if machine_func(19,{fn,-1}) then end if --seek > F=repeat(0, machine_func(20,fn)) --where > if machine_func(19,{fn,0}) then end if --seek > </eucode> {{{ Pete, my goal was to post a short, but yet understandable, solution. But I can not suggest this form for the real programming in a big complicated program too. So, I'm fully agreed with your point, but in more hard form: almost never do that above, boys & girls! > It costs absolutely nothing to define > }}} <eucode> > constant M_SEEK = 19, > M_WHERE = 20, > M_DIR = 22, > </eucode> {{{ > And hence (at the very least) code: > }}} <eucode> > F=machine_func(M_DIR, name) > ... > if machine_func(M_SEEK,{fn,-1}) then end if > F=repeat(0, machine_func(M_WHERE,fn)) > if machine_func(M_SEEK,{fn,0}) then end if > </eucode> {{{ Yes, Pete, you are right, of course. But it costs some extra space and some extra efforts. These constants are not global in file.e library to just include them. And Russian laziness, indolens and sloth are world-wide known, as Oblomov, I'm just lazy too, sorry, sorry, sorry. > If you can show a (run-time) difference, I'll eat my hat, and even the > saving of a routine call is highly questionable, as of course it goes > without saying that > }}} <eucode> > F=dir(name) > ... > void = seek(fn,-1) > F=repeat(0,where(fn)) > void = seek(fn,0) > </eucode> {{{ > is far, far, far more readable (given that I have only copied relevant > lines so the above does not entirely cohere), and, again, in a > practical real-world example if you can show any measurable gain, > I guess I'll eat my other hat. I'm ready to eat yours hats, do not worry, you are absolutelly right here, I think. > >But dir() is not that bad too, I think. > >Adds just a small fraction of one millisecond > >to the process, but gives very clear code. > I wholeheartedly agree. Your example does show an overhead of calling > dir() 5000 times, but very few real-world programs will do that. Yes, of course. > Use code which is readable, even when you need it to be fast. Yes, but sometimes you may need some extremely fast and extremely compact Euphoria code, Pete. > Regards, > Pete Regards, Igor Kachan kinz at peterlink.ru
21. Re: Constructing sequences
- Posted by "Kat" <gertie at visionsix.com> Jun 29, 2005
- 606 views
On 29 Jun 2005, at 0:55, Pete Lomax wrote: > > > On Tue, 28 Jun 2005 20:05:31 +0400, Igor Kachan <kinz at peterlink.ru> > wrote: <snip> > >But dir() is not that bad too, I think. > >Adds just a small fraction of one millisecond > >to the process, but gives very clear code. > I wholeheartedly agree. Your example does show an overhead of calling > dir() 5000 times, but very few real-world programs will do that. Wow, i seem to be still writing lots of rare code then! Can i get a goto? Kat
22. Re: Constructing sequences
- Posted by "Igor Kachan" <kinz at peterlink.ru> Jun 29, 2005
- 604 views
Derek Parnell wrote: > > Mario Steele wrote: > > > > Allright, > > > > Considering someone posted something about file i/o (Including Kat), about this, and > > the overwhelming evidence that pre-allocation is best way to go, here should be an > > extreamly fast version of loading a file into Memory (AKA Sequence): > > It seems that a lot of people have missed the original poster's point. > Or maybe its just me But the way I read it is that they were trying > to load each LINE of a text file into its own sub-sequence, rather than > load the entire file into one single sequence. Thus a file with twenty > lines would create twenty sub-sequences. > > Rob Craig's code is about the best so far for doing that, though I'm sure > it could still be tweaked a bit more. What to me, I try to answer the following Gordon's question: GW>> I am reading the data records in from a file so I don't see an GW>> obvious way I could build the whole sequence at once. Wouldn't GW>> I have to temporariliy store the records in another sequence, GW>> incurring the same 'append' overhead anyway? Really, the way with the *file size* is absent in the original RDS documents, as far as I can see, so it is not *obvious way*, I think. But these ways exist. So, any one can place a big file into single sequence at once and do all he/she wants with that sequence - scan, find CR/LF, make lines, records, parse them etc etc. Mario takes this point in his attention too, I think. Matt's approach is the simplest one - just create a sequence with some extra elements to place a big file without risk to go away from the sequence space while reading a file. Regards, Igor Kachan kinz at peterlink.ru
23. Re: Constructing sequences
- Posted by "Juergen Luethje" <j.lue at gmx.de> Jun 29, 2005
- 597 views
Kat wrote: > On 29 Jun 2005, at 0:55, Pete Lomax wrote: > >> On Tue, 28 Jun 2005 20:05:31 +0400, Igor Kachan wrote: > > <snip> > >>> But dir() is not that bad too, I think. >>> Adds just a small fraction of one millisecond >>> to the process, but gives very clear code. >> I wholeheartedly agree. Your example does show an overhead of calling >> dir() 5000 times, but very few real-world programs will do that. > > Wow, i seem to be still writing lots of rare code then! Can i get a goto? I couldn't imagine it previously, but obviously any and every post (regardless of its topic) *can* be used to write a reply that deals with "goto". Interesting. Regards, Juergen
24. Re: Constructing sequences
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com> Jun 29, 2005
- 623 views
This is similar to I problem I had. It seems best to avoid using some kinds of very large data structures in memory and use clever file manipulation or the fast Euphoria Database System. Try running the code below, incrementing MEMORY constant: 1000, 10000, 100000. You can also have Task Manager open and watch Mem Usage. Here is what I got, notice that the parsing functions do not return the same (By the way has anyone a simple trick to get rid of the return at the end og fthe string and the last comma): mySplit (my own) 0.06 1000 0.87 10000 87.93 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc", Kparse (kparse.e) 0.06 1000 3.46 10000 384.81 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","","abc ", parse (Strtok-v2-1.e) 0.05 1000 3.51 10000 385.47 100000 "abc","abc","abc","abc","abc","abc","abc","abc","abc","abc","abc ", And here is the code. Notice my way of naming variables, o for objects, s for strings, n for integers, l for lists, t for tables or trees (lists containing lists). Any comment? Also it is the append function that evaluates the parsing function so: l_line = mySplit(o_line, TAB) t_buffer = append(t_buffer, l_line) is same as: t_buffer = append(t_buffer, mySplit(o_line, TAB)) You can uncomment all functions and it is the last one that runs.
include kparse.e include Strtok-v2-1.e include file.e include types.e without warning sequence s_line, l_line, t_buffer integer n_file object o_line atom t constant TRUE = 1 constant FALSE = 0 constant TAB = "\t" constant MEMORY = 1000 function mySplit(string s_input, object o_limiter) sequence l_return, s_return, s_limiter, s_tail integer n_start, n_stop if atom(o_limiter) then s_limiter = {o_limiter} else s_limiter = o_limiter end if l_return = {} n_start = 1 s_tail = s_input while TRUE do n_stop = match(s_limiter, s_tail) if n_stop then s_return = s_input[n_start..n_stop-1] l_return = append(l_return, s_return) s_tail = s_tail[n_stop + length(s_limiter)..$] else if length(s_tail) > 1 then l_return = append(l_return, s_tail[n_start..$-1]) end if exit end if end while return l_return end function -- create file n_file = open("big.txt", "wb") s_line = "abc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\tabc\t\tabc" for i = 1 to MEMORY do puts(n_file, s_line) puts(n_file, "\n") end for close(n_file) t = time() n_file = open("big.txt", "rb") t_buffer = {} while 1 do o_line = gets(n_file) if atom(o_line) then exit end if --l_line = Kparse(o_line, TAB) --l_line = parse(o_line, TAB) l_line = mySplit(o_line, TAB) t_buffer = append(t_buffer, l_line) end while close(n_file) ? time() - t ? length(t_buffer) ? length(t_buffer[$-1]) for i = 1 to length(t_buffer[$-1]) do puts(1,"\"" & t_buffer[$-1][i] & "\",") end for
25. Re: Constructing sequences
- Posted by Haflidi Asgrimsson <haflidi at prokaria.com> Jun 29, 2005
- 600 views
Sorry, an old version of my function got there. Here it is hopefully OK: function mySplit(string s_input, object o_limiter) sequence l_return, s_return, s_limiter, s_tail integer n_start, n_stop if atom(o_limiter) then s_limiter = {o_limiter} else s_limiter = o_limiter end if l_return = {} n_start = 1 s_tail = s_input while TRUE do n_stop = match(s_limiter, s_tail) if n_stop then
s_return = s_tail[n_start..n_stop-1]
l_return = append(l_return, s_return) s_tail = s_tail[n_stop + length(s_limiter)..$] else if length(s_tail) > 1 then l_return = append(l_return, s_tail[n_start..$-1]) end if exit end if end while return l_return end function
26. Re: Constructing sequences
- Posted by Michael Raley <thinkways at yahoo.com> Jun 30, 2005
- 595 views
- Last edited Jul 01, 2005
Haflidi Asgrimsson wrote: > > This is similar to I problem I had. It seems best to avoid using some kinds > of very large data structures in memory and use clever file manipulation or > the fast Euphoria Database System. > Try running the code below, incrementing MEMORY constant: 1000, 10000, 100000. > You can also have Task Manager open and watch Mem Usage. > Here is what I got, notice that the parsing functions do not return the same > (By the way has anyone a simple trick to get rid of the return at the > end og fthe string and the last comma): > I remember that Kat added a keep parameter into string-tok , which needs to be specified in the call. that's why the output is different. since Kparse builds a flat bookmark list of delimiter positions before it generates the actual list, it would be easy to read the length of the bookmark list and preallocate the parsed output list. there's still the overhead of appending to the bookmark list itself. --"ask about our layaway plan". --
27. Re: Constructing sequences
- Posted by Bob Thompson <rthompson at rthompson.karoo.co.uk> Jun 30, 2005
- 599 views
- Last edited Jul 01, 2005
Maybe I’m digressing from the point of the original post but, for me, it raises the question, what are the general rules for fast loading and saving of very complex or huge structures and what level of sub-sequences can be addressed efficiently? Here’s an example with pre allocation (structure only) and appending to a subscripted variable which is only twice as slow at loading from file as Robert’s optimised routine loading to a simple buffer sequence (it’s part of a simple spell checker I use). For speed, it appears that pre-defining the structure before reading and breaking the structure down into simple units before saving is important. include file.e include misc.e atom handle -- create pseudo dictionary file - legible version of Robert's code handle = open("bigfile.txt", "w") for i = 1 to 100000 do puts(handle, 'a' + rand(repeat(25, 19 + rand(11)))) puts(handle, '\n') end for close(handle) --key for pseudo dictionary hash - code intended for proper lower case words function dic_hash_key(sequence key_word) integer i, j, k i = key_word[1] - 96 if length(key_word) > 2 then--most tested at this stage j = key_word[2] - 96 k = key_word[3] - 96 return {i, j, k} elsif length(key_word) > 1 then--very few tested at or after this stage j = key_word[2] - 96 return {i, j, 26} else return {i, 26, 26} end if end function --read function read_list_to_hash(sequence file) object filed_line sequence hash, key handle = open(file, "r") hash = repeat(repeat(repeat({}, 26), 26), 26) while 1 do filed_line = gets(handle) if sequence(filed_line) then filed_line = filed_line[1..length(filed_line) -1] key = dic_hash_key(filed_line) hash[key[1]][key[2]][key[3]]=append(hash[key[1]][key[2]][key[3]],filed_line) else exit end if end while close(handle) return hash end function --save procedure save_hash_to_list(sequence file, sequence hash) handle = open(current_dir() & file, "w") for h = 1 to length(hash) do for i = 1 to length(hash[h]) do for j = 1 to length(hash[h][i]) do for k = 1 to length(hash[h][i][j]) do puts(handle, hash[h][i][j][k] & 10) end for end for end for end for close(handle) end procedure sequence dictionary_hash atom t --load dictionary t = time() dictionary_hash = read_list_to_hash("bigfile.txt") ?time()-t --save dictionary t = time() save_hash_to_list("bigfile.txt", dictionary_hash) ?time()-t sleep(3)