1. Speed question
- Posted by Allen V Robnett <alrobnett at alumni.princeton.edu> Jan 20, 2004
- 476 views
After opening a Euphoria text file "r", I am reading in one million 8-character words, (the entire file). clear_screen() fp=seek(fn,0) s = get(fn) close(fn) word_array = s[2] word_array[4][6] is then the 6th letter of the 4th word in the array. It works fine, but it takes fifteen minutes to read in the array. Is there a better way? Allen
2. Re: Speed question
- Posted by "Lucius L. Hilley III" <L3Euphoria at bellsouth.net> Jan 20, 2004
- 464 views
Yeah, spend your 15 minutes reading the array. after that.. right it out in a better format and always read it using the better format. you could use EDB. seems overkill for what you state. but maybe you should look at it. You state that every word is exactly 8 characters in length. This tells me to use a fixed width database. All of the code below assumes that every word in the array is exactly 8 characters long. integer out out = open("new.txt", "wb") -- for saving in a simple format for A = 1 to length(word_array) do puts(out, word_array[A]) end for close(out) include get.e integer in sequence s in = open("new.txt", "rb") -- for reading the new format s = get_bytes(in, 8) while length(s) do word_array &= {s} s = get_bytes(in, 8) end while Lucius L. Hilley III - Unkmar PS: Cheers, and I hope this helps. I whipped this up in a hurry before going to bed. ----- Original Message ----- From: "Allen Robnett" <alrobnett at alumni.princeton.edu> To: <EUforum at topica.com> Sent: Monday, January 19, 2004 09:08 PM Subject: Speed question > > > After opening a Euphoria text file "r", I am reading in one million > 8-character words, (the entire file). > > clear_screen() > fp=seek(fn,0) > s = get(fn) > close(fn) > word_array = s[2] > > word_array[4][6] is then the 6th letter of the 4th word in the array. > > It works fine, but it takes fifteen minutes to read in the array. Is > there a better way? > > Allen > > > > TOPICA - Start your own email discussion group. FREE! > >
3. Re: Speed question
- Posted by "Kat" <gertie at visionsix.com> Jan 20, 2004
- 505 views
On 19 Jan 2004, at 20:08, Allen Robnett wrote: > > > After opening a Euphoria text file "r", I am reading in one million > 8-character words, (the entire file). > > clear_screen() > fp=seek(fn,0) -- why do you seek()? > s = get(fn) Since the file is not \n delimited, i'd use gets() > close(fn) > word_array = s[2] -- what? > word_array[4][6] is then the 6th letter of the 4th word in the array. using gets, word_array[wordlen x wordnum][6] is the same. > It works fine, but it takes fifteen minutes to read in the array. Is > there a better way? There must be, i can get a megabyte off the internet in 15 minutes! Take a peek at function getf() in file.e. Kat Kat
4. Re: Speed question
- Posted by "Hayden McKay" <hmck1 at dodo.com.au> Jan 21, 2004
- 472 views
--=======AVGMAIL-400DD1953AF5======= boundary="----=_NextPart_000_000A_01C3E017.9904BCC0" ------=_NextPart_000_000A_01C3E017.9904BCC0 here is a fast way of reading a file. I exracted it from the forum not long ago. you may be able to use it to your advantage. One way would be to read the file, then sort your data. Another way would be to read the file and sort the data at the same time. Example: object x x = read(fileName,32) read takes 2 arguments; the filename and a buffersize. * read only reads a whole file. I do not understant what it is your trying to seek in the line fp=seek(fn,0) --====================================================================-- -- This section is used by the global function below. include win32lib.ew constant kernel32 = open_dll("kernel32.dll"), xCreateFile = define_c_func(kernel32,"CreateFileA",{C_POINTER,C_LONG, C_LONG,C_POINTER,C_LONG,C_LONG,C_INT}, C_LONG), xReadFile = define_c_func(kernel32,"ReadFile",{C_INT,C_POINTER,C_UINT, C_POINTER,C_POINTER},C_LONG), xCloseHandle = define_c_func(kernel32,"CloseHandle",{C_LONG},C_LONG) constant GENERIC_READ = #80000000, FILE_ATTRIBUTE_NORMAL = #80, FILE_FLAG_SEQUENTIAL_SCAN = #8000000, OPEN_EXISTING = 3 function OpenFile_rb(sequence fname) atom handle, FileName FileName = allocate_string(fname) handle = c_func(xCreateFile,{FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN, NULL}) return handle end function atom lpNumberOfBytesRead -- actual No. of bytes read by routine function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead) return c_func(xReadFile,{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,0}) end function --====================================================================-- --This is a very fast way of reading a file global function read(sequence fileName, integer KbChunks) sequence buffer, data atom lpBuffer, remaining, fileSize integer fn, buffSize, void object temp temp = dir(fileName) if atom(temp) then return -1 -- error end if fileSize = temp[1][D_SIZE] fn = OpenFile_rb(fileName) if fn = -1 then return -1 -- error end if data = {} buffSize = KbChunks * 1024 lpBuffer = allocate(buffSize) lpNumberOfBytesRead = allocate(4) remaining = fileSize while remaining > 0 do if remaining < buffSize then buffSize = remaining end if void = ReadFile(fn, lpBuffer, buffSize) buffer = peek({lpBuffer, buffSize}) -- you can process the read data here before appending it to 'data' data &= buffer remaining -= buffSize end while free(lpBuffer) free(lpNumberOfBytesRead) void = c_func(xCloseHandle, {fn}) if data[length(data)] = '\n' then --Remove the last character data = data[1..length(data) - 1] end if -- or you can process 'data' here before returning it. return data -- success end function ----- Original Message ----- From: "Kat" <gertie at visionsix.com> To: <EUforum at topica.com> Sent: Wednesday, January 21, 2004 3:33 AM Subject: Re: Speed question ============ The Euphoria Mailing List ============ On 19 Jan 2004, at 20:08, Allen Robnett wrote: > > > After opening a Euphoria text file "r", I am reading in one million > 8-character words, (the entire file). > > clear_screen() > fp=seek(fn,0) -- why do you seek()? > s = get(fn) Since the file is not \n delimited, i'd use gets() > close(fn) > word_array = s[2] -- what? > word_array[4][6] is then the 6th letter of the 4th word in the array. using gets, word_array[wordlen x wordnum][6] is the same. > It works fine, but it takes fifteen minutes to read in the array. Is > there a better way? There must be, i can get a megabyte off the internet in 15 minutes! Take a peek at function getf() in file.e. Kat Kat --^---------------------------------------------------------------- This email was sent to: hmck1 at dodo.com.au EASY UNSUBSCRIBE click here: http://topica.com/u/?b1dd66.b60Ray.aG1jazFA Or send an email to: EUforum-unsubscribe at topica.com TOPICA - Start your own email discussion group. FREE! http://www.topica.com/partner/tag02/create/index2.html --^---------------------------------------------------------------- -- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 13/01/04 ------=_NextPart_000_000A_01C3E017.9904BCC0 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <META content="MSHTML 6.00.2800.1276" name=GENERATOR> <STYLE></STYLE> </HEAD> <BODY> <DIV><FONT face=Arial size=2>here is a fast way of reading a file. I exracted it from the forum not long ago.</FONT></DIV> <DIV><FONT face=Arial size=2>you may be able to use it to your advantage.</FONT></DIV> <DIV><FONT face=Arial size=2>One way would be to read the file, then sort your data.</FONT></DIV> <DIV><FONT face=Arial size=2>Another way would be to read the file and sort the data at the same time.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Example: object x</FONT></DIV> <DIV><FONT face=Arial size=2> x = read(fileName,32)</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>read takes 2 arguments; the filename and a buffersize.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>* read only reads a whole file. I do not understant what it is your trying to seek in the line</FONT></DIV> <DIV><FONT face=Arial size=2>fp=seek(fn,0)</FONT></DIV> <DIV><FONT face=Arial size=2>--====================================================================--</FONT></DIV> <DIV><FONT face=Arial color=#ff0000 size=2><U>-- This section is used by the global function below.</U></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>include win32lib.ew</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>constant<BR> kernel32 = open_dll("kernel32.dll"),<BR> xCreateFile = define_c_func(kernel32,"CreateFileA",{C_POINTER,C_LONG,<BR> C_LONG,C_POINTER,C_LONG,C_LONG,C_INT},<BR> C_LONG),<BR> xReadFile = define_c_func(kernel32,"ReadFile",{C_INT,C_POINTER,C_UINT,<BR> C_POINTER,C_POINTER},C_LONG),<BR> xCloseHandle = define_c_func(kernel32,"CloseHandle",{C_LONG},C_LONG)</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>constant<BR> GENERIC_READ = #80000000,<BR> FILE_ATTRIBUTE_NORMAL = #80,<BR> FILE_FLAG_SEQUENTIAL_SCAN = #8000000,<BR> OPEN_EXISTING = 3 </FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>function OpenFile_rb(sequence fname)<BR> atom handle, FileName</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> FileName = allocate_string(fname)<BR> handle = c_func(xCreateFile,{FileName,<BR> GENERIC_READ,<BR> 0,<BR> NULL,<BR> OPEN_EXISTING,<BR> FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,<BR> NULL})<BR> return handle<BR>end function</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>atom lpNumberOfBytesRead -- actual No. of bytes read by routine</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)<BR> return c_func(xReadFile,{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,0})<BR>end function</FONT><FONT size=2><FONT face=Arial><BR>--====================================================================--<BR><U><FONT color=#ff0000>--This is a very fast way of reading a file</FONT></U></FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT><BR><FONT face=Arial size=2>global function read(sequence fileName, integer KbChunks)<BR> sequence buffer, data<BR> atom lpBuffer, remaining, fileSize<BR> integer fn, buffSize, void<BR> object temp<BR> <BR> temp = dir(fileName)<BR> if atom(temp) then<BR> return -1 -- error<BR> end if<BR> fileSize = temp[1][D_SIZE]</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> fn = OpenFile_rb(fileName)<BR> if fn = -1 then<BR> return -1 -- error<BR> end if<BR> <BR> data = {}</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> buffSize = KbChunks * 1024<BR> lpBuffer = allocate(buffSize)<BR> lpNumberOfBytesRead = allocate(4)<BR> remaining = fileSize<BR> while remaining > 0 do<BR> if remaining < buffSize then<BR> buffSize = remaining<BR> end if<BR> void = ReadFile(fn, lpBuffer, buffSize)<BR> buffer = peek({lpBuffer, buffSize})</FONT></DIV> <DIV><FONT><FONT face=Arial size=2> </FONT><FONT size=2><FONT face=Arial><U><FONT color=#ff0000>-- you can process the read data here before appending it to 'data'<BR></FONT></U> data &= buffer<BR> remaining -= buffSize<BR> end while</FONT></FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> free(lpBuffer)<BR> free(lpNumberOfBytesRead)<BR> void = c_func(xCloseHandle, {fn})<BR> if data[length(data)] = '\n' then --Remove the last character<BR> data = data[1..length(data) - 1]<BR> end if</FONT></DIV> <DIV><FONT face=Arial size=2> <U><FONT color=#ff0000>-- or you can process 'data' here before returning it.</FONT></U></FONT><FONT><BR><FONT face=Arial size=2> return data -- success<BR>end function</FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>----- Original Message ----- </FONT> <DIV><FONT face=Arial size=2>From: "Kat" <</FONT><A href="mailto:gertie at visionsix.com"><FONT face=Arial size=2>gertie at visionsix.com</FONT></A><FONT face=Arial size=2>></FONT></DIV> <DIV><FONT face=Arial size=2>To: <</FONT><A href="mailto:EUforum at topica.com"><FONT face=Arial size=2>EUforum at topica.com</FONT></A><FONT face=Arial size=2>></FONT></DIV> <DIV><FONT face=Arial size=2>Sent: Wednesday, January 21, 2004 3:33 AM</FONT></DIV> <DIV><FONT face=Arial size=2>Subject: Re: Speed question</FONT></DIV></DIV> <DIV><FONT face=Arial><BR><FONT size=2></FONT></FONT></DIV> <DIV><FONT face=Arial size=2>> ============ The Euphoria Mailing List ============ <BR>> <BR>> <BR>> On 19 Jan 2004, at 20:08, Allen Robnett wrote:<BR>> <BR>> > <BR>> > <BR>> > After opening a Euphoria text file "r", I am reading in one million <BR>> > 8-character words, (the entire file).<BR>> > <BR>> > clear_screen() <BR>> > fp=seek(fn,0) -- why do you seek()?<BR>> > s = get(fn)<BR>> <BR>> Since the file is not \n delimited, i'd use gets()<BR>> <BR>> > close(fn)<BR>> > word_array = s[2] -- what?<BR>> > word_array[4][6] is then the 6th letter of the 4th word in the array.<BR>> <BR>> using gets, word_array[wordlen x wordnum][6] is the same.<BR>> <BR>> > It works fine, but it takes fifteen minutes to read in the array. Is <BR>> > there a better way?<BR>> <BR>> There must be, i can get a megabyte off the internet in 15 minutes! Take a <BR>> peek at function getf() in file.e.<BR>> <BR>> Kat<BR>> <BR>> <BR>> <BR>> Kat<BR>> <BR>> --^----------------------------------------------------------------<BR>> This email was sent to: </FONT><A href="mailto:hmck1 at dodo.com.au"><FONT face=Arial size=2>hmck1 at dodo.com.au</FONT></A><BR><FONT face=Arial size=2>> <BR>> EASY UNSUBSCRIBE click here: </FONT><A href="http://topica.com/u/?b1dd66.b60Ray.aG1jazFA"><FONT face=Arial size=2>http://topica.com/u/?b1dd66.b60Ray.aG1jazFA</FONT></A><BR><FONT face=Arial size=2>> Or send an email to: </FONT><A href="mailto:EUforum-unsubscribe at topica.com"><FONT face=Arial size=2>EUforum-unsubscribe at topica.com</FONT></A><BR><FONT face=Arial size=2>> <BR>> TOPICA - Start your own email discussion group. FREE!<BR>> </FONT><A href="http://www.topica.com/partner/tag02/create/index2.html"><FONT face=Arial size=2>http://www.topica.com/partner/tag02/create/index2.html</FONT></A><BR><FONT face=Arial size=2>> --^----------------------------------------------------------------<BR>> <BR>> <BR>> <BR>> <BR>> <BR>> -- <BR>> Incoming mail is certified Virus Free.<BR>> Checked by AVG anti-virus system (</FONT><A href="http://www.grisoft.com"><FONT face=Arial size=2>http://www.grisoft.com</FONT></A><FONT face=Arial size=2>).<BR>> Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04<BR>> </FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2><BR>---<BR>Outgoing mail is certified Virus Free.<BR>Checked by AVG anti-virus system (<A href="http://www.grisoft.com">http://www.grisoft.com</A>).<BR>Version: 6.0.561 / ------=_NextPart_000_000A_01C3E017.9904BCC0-- --=======AVGMAIL-400DD1953AF5======= Content-Type: text/plain; x-avg=cert; charset=iso-8859-2 Content-Transfer-Encoding: 8bit Content-Disposition: inline Content-Description: "AVG certification" Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04 --=======AVGMAIL-400DD1953AF5=======--