1. Changing data types, Continued
- Posted by Allen V Robnett <alrobnett at alumni.princeton.edu> Mar 21, 2004
- 410 views
I have found that peek4u() works to get the RAM data into 4-byte atom form, and that bytes_2_int can be used to change the user input into an atom. Since I appear to be restricted to using a multiple of 4 bytes, I have elected to use a for loop and wait_key to get the 4-character input. Is there a better way to accomplish all of this? Allen
2. Changing data types, Continued
- Posted by Allen V Robnett <alrobnett at alumni.princeton.edu> Mar 22, 2004
- 406 views
Brian Broker <bkb at cnw.com> wrote: >Allen V Robnett wrote: > ><snipped response> > > >>Thanks for the response. I believe match() is restricted to sequences. >>Given the 200MB size of my data file, even without Euphoria's sequence >>delimiters, and given the relative speed of reading the undelimited file >>into RAM at the beginning of a session, and then back out at the >>conclusion, it seemed preferable to me to use peek and poke on an >>undelimited, allocated file buffer. The editor program will search the >>entire file in 7 seconds. >> >> >After reading this, I'm not sure if you still have a question. It might >help to know at least what platform (DOS, *nix, Win) you are working >with to determine the best (or fastest?) solution. > >-- Brian > > Originally I had a problem. Having figured out AN answer before getting any response, but it may not be THE answer. I now have several questions rather than a problem. I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with 80 GB HD. The windows capabilities of my OS makes it possible for me to use the EU program in a window-like way, even though is is written in vanilla DOS mode. As a first crack, I was looking for simplicity and speed, and know little about programming and nothing about programming for Windows. The data file that I am building is a binary tree with fixed length nodes. Each node (at location n) has 2 children nodes (at locations 2n and 2n+1). Reading through and displaying all of the parent nodes in the branch for any given node is not difficult. There is no need for any system of record keys. When I tried to implement the system using EU's sequences, the size of the data file more than doubled, and the time taken to read the file into memory was on the order of ten minutes. When I stripped out the sequence delimiters and treated the whole file as one big number, I was able to allocated a buffer and read the file into it in 6 seconds. Rob Craig says all EU objects are either atoms or sequences. Of course this file is not an EU object, but that was the point of my question. I have discovered that there is no problem in pointing to a 4-letter part of a word and telling EU to treat it as an integer. It is then compared with a similarly constructed search integer. The method can search the entire file in 7 seconds. I know that there are many participants in this forum whose forte is speed in searches. My current question is, is there a better way to accomplish my goal within EU, either speedwise or stylewise? I am actually content with the 7 seven second result, if it cannot be improved. Thanks to the many who have taken time to be of help. Allen
3. Re: Changing data types, Continued
- Posted by "Kat" <gertie at visionsix.com> Mar 22, 2004
- 399 views
On 22 Mar 2004, at 7:47, Allen Robnett wrote: > > > Brian Broker <bkb at cnw.com> wrote: > > >Allen V Robnett wrote: > > > ><snipped response> > > > > > >>Thanks for the response. I believe match() is restricted to sequences. > >>Given the 200MB size of my data file, even without Euphoria's sequence > >>delimiters, and given the relative speed of reading the undelimited file > >>into > >>RAM at the beginning of a session, and then back out at the conclusion, it > >>seemed preferable to me to use peek and poke on an undelimited, allocated > >>file > >>buffer. The editor program will search the entire file in 7 seconds. > >> > >> > >After reading this, I'm not sure if you still have a question. It might > >help to know at least what platform (DOS, *nix, Win) you are working > >with to determine the best (or fastest?) solution. > > > >-- Brian > > > > > Originally I had a problem. Having figured out AN answer before getting > any response, but it may not be THE answer. I now have several questions > rather than a problem. > > I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with > 80 GB HD. The windows capabilities of my OS makes it possible for me to > use the EU program in a window-like way, even though is is written in > vanilla DOS mode. As a first crack, I was looking for simplicity and > speed, and know little about programming and nothing about programming > for Windows. > > The data file that I am building is a binary tree with fixed length > nodes. Each node (at location n) has 2 children nodes (at locations 2n > and 2n+1). Reading through and displaying all of the parent nodes in the > branch for any given node is not difficult. There is no need for any > system of record keys. > > When I tried to implement the system using EU's sequences, the size of > the data file more than doubled, and the time taken to read the file > into memory was on the order of ten minutes. When I stripped out the > sequence delimiters and treated the whole file as one big number, I was > able to allocated a buffer and read the file into it in 6 seconds. Rob > Craig says all EU objects are either atoms or sequences. Of course this > file is not an EU object, but that was the point of my question. I have > discovered that there is no problem in pointing to a 4-letter part of a > word and telling EU to treat it as an integer. It is then compared with > a similarly constructed search integer. The method can search the entire > file in 7 seconds. > > I know that there are many participants in this forum whose forte is > speed in searches. My current question is, is there a better way to > accomplish my goal within EU, either speedwise or stylewise? I am > actually content with the 7 seven second result, if it cannot be improved. No doubt this will be the slowest solution, but what if you stored the 200meg file in ram as a series of smaller data chunks, small enough to fit into a reasonable sequence size, and then looped thru the smaller chunks, reading each chunk, one at a time, into a sequence, so you could use Euphoria code to play with them? Then you could use such things as wildmatch(), wildtok(), and DavidC's regexp lib on them. I am very interested in seeing code for how you are loading and accessing 200meg file in ram now. Kat, applying flame retardant clothing again.
4. Re: Changing data types, Continued
- Posted by "H.W Overman" <euman at bellsouth.net> Mar 22, 2004
- 383 views
> I am very interested in seeing code for how you are loading and accessing > 200meg file in ram now. > > Kat, Sure Kat, I wrote it! Allen shouldve mentioned that... Euman -- message authentication: /589&*809/-452/+1205
5. Re: Changing data types, Continued
- Posted by "H.W Overman" <euman at bellsouth.net> Mar 22, 2004
- 407 views
Kat, This was a preliminary version I wrote for Allen back at the beginning of his project, I have several versions I sent him but this one is what you would probably be interested in... Two parts (.exw programs) are here, one to create and write the file, the other to read the file... I hope I hadnt changed anything to screw this up over time, Im in Linux right now so I cant test it but it should still be ok... Note: possible to tweak further! -- Code part 1 -- PART 1 speedread_write.exw creates the 8mg file -- SpeedRead_write Binary file in Windows -- Euman 2004 -- Based on Allen V. Robnett's idea for nodes rambase without type_check without trace without warning include get.e integer fn, fp, x constant max_node_level = 20 fn = -1 while fn=-1 do fn = open("nodes.dat", "rb") if fn=-1 then puts(1,"No Nodes file. Create it (Y/N)? \n") x = wait_key() if x='y' or x='Y' then fn = open("nodes.dat", "wb") puts(1,"Creating node table...Please wait\n") for i=1 to power(2, max_node_level) do for j = 1 to 8 do puts(fn, '_') end for end for puts(1,"Done...\n") close(fn) end if else close(fn) end if exit end while -- PART 2 speedread.exw reads written file from output of above code -- SpeedRead Binary file in Windows -- LOOK FOR: pass name of file (below) and .dat file containing 8mg -- Euman 2004 -- W/ minor tweak by Tommy Carlier noted below without type_check with trace include machine.e include file.e include dll.e atom t1, t2, fn, lFileLen, result sequence lFile, line integer char atom kernel32 kernel32 = open_dll("kernel32.dll") constant xCreateFile = define_c_func(kernel32,"CreateFileA", {C_POINTER,C_LONG,C_LONG,C_POINTER ,C_LONG,C_LONG,C_INT},C_LONG), xReadFile = define_c_func(kernel32,"ReadFile", {C_INT,C_POINTER,C_UINT,C_POINTER, C_POINTER},C_LONG) global constant GENERIC_READ = #80000000, FILE_ATTRIBUTE_NORMAL = #80, FILE_FLAG_SEQUENTIAL_SCAN= #8000000, OPEN_EXISTING = 3 atom hFile global function CreateFile(sequence fname) atom FileName FileName = allocate_string(fname) hFile = c_func(xCreateFile,{FileName, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN, NULL}) return hFile end function atom lpNumberOfBytesRead lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead) return c_func(xReadFile, {hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead, 0}) end function sequence yourfile yourfile = "nodes.dat" -- pass name of file fn = open(yourfile, "r") lFileLen = seek(fn, -1) lFileLen = where(fn) close(fn) atom lpFileBuff, cIN, cIndex, cMaxIndex lpFileBuff = allocate(lFileLen) cIN = 0 t1 = time() hFile = CreateFile(yourfile) result = ReadFile(hFile, lpFileBuff, lFileLen) -- we havent checked lpNumberOfBytesRead perhaps you may want to -- compare this to lFileLen before you start the loop, in this case -- the file is pretty small. '8,000,000 chars' -- Plus, this assumes binary read and no crlf chars exist trace(1) line = repeat(0, lFileLen / 8) atom lIndex if result then cMaxIndex = lpFileBuff + lFileLen cIndex = lpFileBuff lIndex = 1 -- tweak by Tommy Carlier while cIndex < cMaxIndex do line[lIndex] = peek({cIndex, 8}) cIndex += 8 lIndex += 1 end while end if free(lpFileBuff) -- take trace out to get correct speed! t2 = time()- t1 printf(1,"Average Time : %1.4f sec\n", t2 ) if getc(0) then end if -- END CODES Euman On Monday 22 March 2004 11:17 am, Kat wrote: > On 22 Mar 2004, at 10:35, H.W Overman wrote: > > > > > I am very interested in seeing code for how you are loading and > > > accessing 200meg file in ram now. > > > > > > Kat, > > > > Sure Kat, I wrote it! Allen shouldve mentioned that... > > I did a search in the RDS archives for Euman and Overman, and nothing > showed on the memory stash code. I remember you were doing something > about this, or someone was, last year, i think. > > Kat, > looking interested, > powered by caffine and cheetoes. -- message authentication: /409&*777/-682/+905
6. Re: Changing data types, Continued
- Posted by "George Walters" <gwalters at sc.rr.com> Mar 22, 2004
- 449 views
euman, you sent this to the wrong address i beleive ----- Original Message ----- From: "H.W Overman" <euman at bellsouth.net> To: <gertie at visionsix.com> Cc: "Euphoria Mailing List" <EUforum at topica.com> Subject: Re: Changing data types, Continued > > > Kat, > > This was a preliminary version I wrote for Allen back at the beginning > of his project, I have several versions I sent him but this one is what > you would probably be interested in... > > Two parts (.exw programs) are here, one to create and write the file, the > other to read the file... > > I hope I hadnt changed anything to screw this up over time, Im in Linux right > now so I cant test it but it should still be ok... > Note: possible to tweak further! > > -- Code part 1 > > -- PART 1 speedread_write.exw creates the 8mg file > > -- SpeedRead_write Binary file in Windows > -- Euman 2004 > -- Based on Allen V. Robnett's idea for nodes rambase > > without type_check > without trace > without warning > > include get.e > > integer fn, fp, x > > constant max_node_level = 20 > > fn = -1 > while fn=-1 do > fn = open("nodes.dat", "rb") > if fn=-1 then > puts(1,"No Nodes file. Create it (Y/N)? \n") > x = wait_key() > if x='y' or x='Y' then > fn = open("nodes.dat", "wb") > puts(1,"Creating node table...Please wait\n") > for i=1 to power(2, max_node_level) do > for j = 1 to 8 do > puts(fn, '_') > end for > end for > puts(1,"Done...\n") > close(fn) > end if > else > close(fn) > end if > exit > end while > > -- PART 2 speedread.exw reads written file from output of above code > > -- SpeedRead Binary file in Windows > -- LOOK FOR: pass name of file (below) and .dat file containing 8mg > -- Euman 2004 > -- W/ minor tweak by Tommy Carlier noted below > > without type_check > with trace > > include machine.e > include file.e > include dll.e > > atom t1, t2, fn, lFileLen, result > sequence lFile, line > integer char > > atom kernel32 > kernel32 = open_dll("kernel32.dll") > > constant > xCreateFile = define_c_func(kernel32,"CreateFileA", > {C_POINTER,C_LONG,C_LONG,C_POINTER > ,C_LONG,C_LONG,C_INT},C_LONG), > xReadFile = define_c_func(kernel32,"ReadFile", > {C_INT,C_POINTER,C_UINT,C_POINTER, > C_POINTER},C_LONG) > > global constant > GENERIC_READ = #80000000, > FILE_ATTRIBUTE_NORMAL = #80, > FILE_FLAG_SEQUENTIAL_SCAN= #8000000, > OPEN_EXISTING = 3 > > atom hFile > > global function CreateFile(sequence fname) > atom FileName > FileName = allocate_string(fname) > hFile = c_func(xCreateFile,{FileName, > GENERIC_READ, > 0, > NULL, > OPEN_EXISTING, > FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN, > NULL}) > return hFile > end function > > atom lpNumberOfBytesRead > lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead > > function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead) > return c_func(xReadFile, > {hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead, > 0}) > end function > > sequence yourfile > yourfile = "nodes.dat" -- pass name of file > > fn = open(yourfile, "r") > lFileLen = seek(fn, -1) > lFileLen = where(fn) > close(fn) > > atom lpFileBuff, cIN, cIndex, cMaxIndex > lpFileBuff = allocate(lFileLen) > cIN = 0 > > t1 = time() > hFile = CreateFile(yourfile) > result = ReadFile(hFile, lpFileBuff, lFileLen) > > -- we havent checked lpNumberOfBytesRead perhaps you may want to > -- compare this to lFileLen before you start the loop, in this case > -- the file is pretty small. '8,000,000 chars' > > -- Plus, this assumes binary read and no crlf chars exist > trace(1) > line = repeat(0, lFileLen / 8) > atom lIndex > > if result then > > cMaxIndex = lpFileBuff + lFileLen > cIndex = lpFileBuff > lIndex = 1 > > -- tweak by Tommy Carlier > while cIndex < cMaxIndex do > line[lIndex] = peek({cIndex, 8}) > cIndex += 8 > lIndex += 1 > end while > > end if > free(lpFileBuff) > > -- take trace out to get correct speed! > t2 = time()- t1 > printf(1,"Average Time : %1.4f sec\n", t2 ) > > if getc(0) then end if > > -- END CODES > > Euman > > On Monday 22 March 2004 11:17 am, Kat wrote: > > On 22 Mar 2004, at 10:35, H.W Overman wrote: > > > > > > > I am very interested in seeing code for how you are loading and > > > > accessing 200meg file in ram now. > > > > > > > > Kat, > > > > > > Sure Kat, I wrote it! Allen shouldve mentioned that... > > > > I did a search in the RDS archives for Euman and Overman, and nothing > > showed on the memory stash code. I remember you were doing something > > about this, or someone was, last year, i think. > > > > Kat, > > looking interested, > > powered by caffine and cheetoes. > > -- > message authentication: /409&*777/-682/+905 > > > > For Topica's complete suite of email marketing solutions visit: > http://www.topica.com/?p=TEXFOOTER > >
7. Re: Changing data types, Continued
- Posted by Allen V Robnett <alrobnett at alumni.princeton.edu> Mar 23, 2004
- 390 views
>>I am very interested in seeing code for how you are loading and accessing >>200meg file in ram now. >> >>Kat, >> >> >Sure Kat, I wrote it! Allen shouldve mentioned that... > >Euman > > Yes, Euman was of inestimable help in telling me how to read the file into an allocated buffer using the API. As far as accessing the data is concerned, the only trick is having the pointer to the start of the buffer. Then just use peek and poke with appropriate offsets. Allen
8. Re: Changing data types, Continued
- Posted by "H.W Overman" <euman at bellsouth.net> Mar 23, 2004
- 385 views
On Monday 22 March 2004 07:44 pm, Allen Robnett wrote: > Yes, Euman was of inestimable help in telling me how to read the file > into an allocated buffer using the API. inestimable: 1. Impossible to estimate or compute: inestimable damage. 2. Of immeasurable value or worth; invaluable: "shared all the inestimable advantages of being wealthy, good-looking, confident and intelligent" Well, I learned a new inestimable word from Allen, you guys? > As far as accessing the data is concerned, the only trick is having the > pointer to the start of the buffer. Then just use peek and poke with > appropriate offsets. That was your question to the list last week in other words.....Answered! Darn Allen you are good! > > Allen
9. Re: Changing data types, Continued
- Posted by "Kat" <kathy at mists.net> Mar 24, 2004
- 387 views
On 22 Mar 2004, at 12:25, H.W Overman wrote: > > > Kat, > > This was a preliminary version I wrote for Allen back at the beginning > of his project, I have several versions I sent him but this one is what > you would probably be interested in... > > Two parts (.exw programs) are here, one to create and write the file, the > other to read the file... > > I hope I hadnt changed anything to screw this up over time, Im in Linux right > now so I cant test it but it should still be ok... Note: possible to tweak > further! I tried running the code you sent, but when it's memory use gets up to 56 megabytes, i shut it down. It seems too high for an 8meg file, even for Euphoria. What is normal for it? Kat