Re: Speed question
- Posted by "Hayden McKay" <hmck1 at dodo.com.au> Jan 21, 2004
- 470 views
--=======AVGMAIL-400DD1953AF5======= boundary="----=_NextPart_000_000A_01C3E017.9904BCC0" ------=_NextPart_000_000A_01C3E017.9904BCC0 here is a fast way of reading a file. I exracted it from the forum not long ago. you may be able to use it to your advantage. One way would be to read the file, then sort your data. Another way would be to read the file and sort the data at the same time. Example: object x x = read(fileName,32) read takes 2 arguments; the filename and a buffersize. * read only reads a whole file. I do not understant what it is your trying to seek in the line fp=seek(fn,0) --====================================================================-- -- This section is used by the global function below. include win32lib.ew constant kernel32 = open_dll("kernel32.dll"), xCreateFile = define_c_func(kernel32,"CreateFileA",{C_POINTER,C_LONG, C_LONG,C_POINTER,C_LONG,C_LONG,C_INT}, C_LONG), xReadFile = define_c_func(kernel32,"ReadFile",{C_INT,C_POINTER,C_UINT, C_POINTER,C_POINTER},C_LONG), xCloseHandle = define_c_func(kernel32,"CloseHandle",{C_LONG},C_LONG) constant GENERIC_READ = #80000000, FILE_ATTRIBUTE_NORMAL = #80, FILE_FLAG_SEQUENTIAL_SCAN = #8000000, OPEN_EXISTING = 3 function OpenFile_rb(sequence fname) atom handle, FileName FileName = allocate_string(fname) handle = c_func(xCreateFile,{FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN, NULL}) return handle end function atom lpNumberOfBytesRead -- actual No. of bytes read by routine function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead) return c_func(xReadFile,{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,0}) end function --====================================================================-- --This is a very fast way of reading a file global function read(sequence fileName, integer KbChunks) sequence buffer, data atom lpBuffer, remaining, fileSize integer fn, buffSize, void object temp temp = dir(fileName) if atom(temp) then return -1 -- error end if fileSize = temp[1][D_SIZE] fn = OpenFile_rb(fileName) if fn = -1 then return -1 -- error end if data = {} buffSize = KbChunks * 1024 lpBuffer = allocate(buffSize) lpNumberOfBytesRead = allocate(4) remaining = fileSize while remaining > 0 do if remaining < buffSize then buffSize = remaining end if void = ReadFile(fn, lpBuffer, buffSize) buffer = peek({lpBuffer, buffSize}) -- you can process the read data here before appending it to 'data' data &= buffer remaining -= buffSize end while free(lpBuffer) free(lpNumberOfBytesRead) void = c_func(xCloseHandle, {fn}) if data[length(data)] = '\n' then --Remove the last character data = data[1..length(data) - 1] end if -- or you can process 'data' here before returning it. return data -- success end function ----- Original Message ----- From: "Kat" <gertie at visionsix.com> To: <EUforum at topica.com> Sent: Wednesday, January 21, 2004 3:33 AM Subject: Re: Speed question ============ The Euphoria Mailing List ============ On 19 Jan 2004, at 20:08, Allen Robnett wrote: > > > After opening a Euphoria text file "r", I am reading in one million > 8-character words, (the entire file). > > clear_screen() > fp=seek(fn,0) -- why do you seek()? > s = get(fn) Since the file is not \n delimited, i'd use gets() > close(fn) > word_array = s[2] -- what? > word_array[4][6] is then the 6th letter of the 4th word in the array. using gets, word_array[wordlen x wordnum][6] is the same. > It works fine, but it takes fifteen minutes to read in the array. Is > there a better way? There must be, i can get a megabyte off the internet in 15 minutes! Take a peek at function getf() in file.e. Kat Kat --^---------------------------------------------------------------- This email was sent to: hmck1 at dodo.com.au EASY UNSUBSCRIBE click here: http://topica.com/u/?b1dd66.b60Ray.aG1jazFA Or send an email to: EUforum-unsubscribe at topica.com TOPICA - Start your own email discussion group. FREE! http://www.topica.com/partner/tag02/create/index2.html --^---------------------------------------------------------------- -- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 13/01/04 ------=_NextPart_000_000A_01C3E017.9904BCC0 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: 8bit <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <META content="MSHTML 6.00.2800.1276" name=GENERATOR> <STYLE></STYLE> </HEAD> <BODY> <DIV><FONT face=Arial size=2>here is a fast way of reading a file. I exracted it from the forum not long ago.</FONT></DIV> <DIV><FONT face=Arial size=2>you may be able to use it to your advantage.</FONT></DIV> <DIV><FONT face=Arial size=2>One way would be to read the file, then sort your data.</FONT></DIV> <DIV><FONT face=Arial size=2>Another way would be to read the file and sort the data at the same time.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>Example: object x</FONT></DIV> <DIV><FONT face=Arial size=2> x = read(fileName,32)</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>read takes 2 arguments; the filename and a buffersize.</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>* read only reads a whole file. I do not understant what it is your trying to seek in the line</FONT></DIV> <DIV><FONT face=Arial size=2>fp=seek(fn,0)</FONT></DIV> <DIV><FONT face=Arial size=2>--====================================================================--</FONT></DIV> <DIV><FONT face=Arial color=#ff0000 size=2><U>-- This section is used by the global function below.</U></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>include win32lib.ew</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>constant<BR> kernel32 = open_dll("kernel32.dll"),<BR> xCreateFile = define_c_func(kernel32,"CreateFileA",{C_POINTER,C_LONG,<BR> C_LONG,C_POINTER,C_LONG,C_LONG,C_INT},<BR> C_LONG),<BR> xReadFile = define_c_func(kernel32,"ReadFile",{C_INT,C_POINTER,C_UINT,<BR> C_POINTER,C_POINTER},C_LONG),<BR> xCloseHandle = define_c_func(kernel32,"CloseHandle",{C_LONG},C_LONG)</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>constant<BR> GENERIC_READ = #80000000,<BR> FILE_ATTRIBUTE_NORMAL = #80,<BR> FILE_FLAG_SEQUENTIAL_SCAN = #8000000,<BR> OPEN_EXISTING = 3 </FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>function OpenFile_rb(sequence fname)<BR> atom handle, FileName</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> FileName = allocate_string(fname)<BR> handle = c_func(xCreateFile,{FileName,<BR> GENERIC_READ,<BR> 0,<BR> NULL,<BR> OPEN_EXISTING,<BR> FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,<BR> NULL})<BR> return handle<BR>end function</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>atom lpNumberOfBytesRead -- actual No. of bytes read by routine</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)<BR> return c_func(xReadFile,{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,0})<BR>end function</FONT><FONT size=2><FONT face=Arial><BR>--====================================================================--<BR><U><FONT color=#ff0000>--This is a very fast way of reading a file</FONT></U></FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT><BR><FONT face=Arial size=2>global function read(sequence fileName, integer KbChunks)<BR> sequence buffer, data<BR> atom lpBuffer, remaining, fileSize<BR> integer fn, buffSize, void<BR> object temp<BR> <BR> temp = dir(fileName)<BR> if atom(temp) then<BR> return -1 -- error<BR> end if<BR> fileSize = temp[1][D_SIZE]</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> fn = OpenFile_rb(fileName)<BR> if fn = -1 then<BR> return -1 -- error<BR> end if<BR> <BR> data = {}</FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> buffSize = KbChunks * 1024<BR> lpBuffer = allocate(buffSize)<BR> lpNumberOfBytesRead = allocate(4)<BR> remaining = fileSize<BR> while remaining > 0 do<BR> if remaining < buffSize then<BR> buffSize = remaining<BR> end if<BR> void = ReadFile(fn, lpBuffer, buffSize)<BR> buffer = peek({lpBuffer, buffSize})</FONT></DIV> <DIV><FONT><FONT face=Arial size=2> </FONT><FONT size=2><FONT face=Arial><U><FONT color=#ff0000>-- you can process the read data here before appending it to 'data'<BR></FONT></U> data &= buffer<BR> remaining -= buffSize<BR> end while</FONT></FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2> free(lpBuffer)<BR> free(lpNumberOfBytesRead)<BR> void = c_func(xCloseHandle, {fn})<BR> if data[length(data)] = '\n' then --Remove the last character<BR> data = data[1..length(data) - 1]<BR> end if</FONT></DIV> <DIV><FONT face=Arial size=2> <U><FONT color=#ff0000>-- or you can process 'data' here before returning it.</FONT></U></FONT><FONT><BR><FONT face=Arial size=2> return data -- success<BR>end function</FONT></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <DIV><FONT face=Arial size=2>----- Original Message ----- </FONT> <DIV><FONT face=Arial size=2>From: "Kat" <</FONT><A href="mailto:gertie at visionsix.com"><FONT face=Arial size=2>gertie at visionsix.com</FONT></A><FONT face=Arial size=2>></FONT></DIV> <DIV><FONT face=Arial size=2>To: <</FONT><A href="mailto:EUforum at topica.com"><FONT face=Arial size=2>EUforum at topica.com</FONT></A><FONT face=Arial size=2>></FONT></DIV> <DIV><FONT face=Arial size=2>Sent: Wednesday, January 21, 2004 3:33 AM</FONT></DIV> <DIV><FONT face=Arial size=2>Subject: Re: Speed question</FONT></DIV></DIV> <DIV><FONT face=Arial><BR><FONT size=2></FONT></FONT></DIV> <DIV><FONT face=Arial size=2>> ============ The Euphoria Mailing List ============ <BR>> <BR>> <BR>> On 19 Jan 2004, at 20:08, Allen Robnett wrote:<BR>> <BR>> > <BR>> > <BR>> > After opening a Euphoria text file "r", I am reading in one million <BR>> > 8-character words, (the entire file).<BR>> > <BR>> > clear_screen() <BR>> > fp=seek(fn,0) -- why do you seek()?<BR>> > s = get(fn)<BR>> <BR>> Since the file is not \n delimited, i'd use gets()<BR>> <BR>> > close(fn)<BR>> > word_array = s[2] -- what?<BR>> > word_array[4][6] is then the 6th letter of the 4th word in the array.<BR>> <BR>> using gets, word_array[wordlen x wordnum][6] is the same.<BR>> <BR>> > It works fine, but it takes fifteen minutes to read in the array. Is <BR>> > there a better way?<BR>> <BR>> There must be, i can get a megabyte off the internet in 15 minutes! Take a <BR>> peek at function getf() in file.e.<BR>> <BR>> Kat<BR>> <BR>> <BR>> <BR>> Kat<BR>> <BR>> --^----------------------------------------------------------------<BR>> This email was sent to: </FONT><A href="mailto:hmck1 at dodo.com.au"><FONT face=Arial size=2>hmck1 at dodo.com.au</FONT></A><BR><FONT face=Arial size=2>> <BR>> EASY UNSUBSCRIBE click here: </FONT><A href="http://topica.com/u/?b1dd66.b60Ray.aG1jazFA"><FONT face=Arial size=2>http://topica.com/u/?b1dd66.b60Ray.aG1jazFA</FONT></A><BR><FONT face=Arial size=2>> Or send an email to: </FONT><A href="mailto:EUforum-unsubscribe at topica.com"><FONT face=Arial size=2>EUforum-unsubscribe at topica.com</FONT></A><BR><FONT face=Arial size=2>> <BR>> TOPICA - Start your own email discussion group. FREE!<BR>> </FONT><A href="http://www.topica.com/partner/tag02/create/index2.html"><FONT face=Arial size=2>http://www.topica.com/partner/tag02/create/index2.html</FONT></A><BR><FONT face=Arial size=2>> --^----------------------------------------------------------------<BR>> <BR>> <BR>> <BR>> <BR>> <BR>> -- <BR>> Incoming mail is certified Virus Free.<BR>> Checked by AVG anti-virus system (</FONT><A href="http://www.grisoft.com"><FONT face=Arial size=2>http://www.grisoft.com</FONT></A><FONT face=Arial size=2>).<BR>> Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04<BR>> </FONT></DIV> <DIV> </DIV> <DIV><FONT face=Arial size=2><BR>---<BR>Outgoing mail is certified Virus Free.<BR>Checked by AVG anti-virus system (<A href="http://www.grisoft.com">http://www.grisoft.com</A>).<BR>Version: 6.0.561 / ------=_NextPart_000_000A_01C3E017.9904BCC0-- --=======AVGMAIL-400DD1953AF5======= Content-Type: text/plain; x-avg=cert; charset=iso-8859-2 Content-Transfer-Encoding: 8bit Content-Disposition: inline Content-Description: "AVG certification" Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.561 / Virus Database: 353 - Release Date: 15/01/04 --=======AVGMAIL-400DD1953AF5=======--