1. Changing data types, Continued
I have found that peek4u() works to get the RAM data into 4-byte atom
form, and that bytes_2_int can be used to change the user input into an
atom. Since I appear to be restricted to using a multiple of 4 bytes, I
have elected to use a for loop and wait_key to get the 4-character
input. Is there a better way to accomplish all of this?
Allen
2. Changing data types, Continued
Brian Broker <bkb at cnw.com> wrote:
>Allen V Robnett wrote:
>
><snipped response>
>
>
>>Thanks for the response. I believe match() is restricted to sequences.
>>Given the 200MB size of my data file, even without Euphoria's sequence
>>delimiters, and given the relative speed of reading the undelimited file
>>into RAM at the beginning of a session, and then back out at the
>>conclusion, it seemed preferable to me to use peek and poke on an
>>undelimited, allocated file buffer. The editor program will search the
>>entire file in 7 seconds.
>>
>>
>After reading this, I'm not sure if you still have a question. It might
>help to know at least what platform (DOS, *nix, Win) you are working
>with to determine the best (or fastest?) solution.
>
>-- Brian
>
>
Originally I had a problem. Having figured out AN answer before getting
any response, but it may not be THE answer. I now have several questions
rather than a problem.
I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with
80 GB HD. The windows capabilities of my OS makes it possible for me to
use the EU program in a window-like way, even though is is written in
vanilla DOS mode. As a first crack, I was looking for simplicity and
speed, and know little about programming and nothing about programming
for Windows.
The data file that I am building is a binary tree with fixed length
nodes. Each node (at location n) has 2 children nodes (at locations 2n
and 2n+1). Reading through and displaying all of the parent nodes in the
branch for any given node is not difficult. There is no need for any
system of record keys.
When I tried to implement the system using EU's sequences, the size of
the data file more than doubled, and the time taken to read the file
into memory was on the order of ten minutes. When I stripped out the
sequence delimiters and treated the whole file as one big number, I was
able to allocated a buffer and read the file into it in 6 seconds. Rob
Craig says all EU objects are either atoms or sequences. Of course this
file is not an EU object, but that was the point of my question. I have
discovered that there is no problem in pointing to a 4-letter part of a
word and telling EU to treat it as an integer. It is then compared with
a similarly constructed search integer. The method can search the entire
file in 7 seconds.
I know that there are many participants in this forum whose forte is
speed in searches. My current question is, is there a better way to
accomplish my goal within EU, either speedwise or stylewise? I am
actually content with the 7 seven second result, if it cannot be improved.
Thanks to the many who have taken time to be of help.
Allen
3. Re: Changing data types, Continued
On 22 Mar 2004, at 7:47, Allen Robnett wrote:
>
>
> Brian Broker <bkb at cnw.com> wrote:
>
> >Allen V Robnett wrote:
> >
> ><snipped response>
> >
> >
> >>Thanks for the response. I believe match() is restricted to sequences.
> >>Given the 200MB size of my data file, even without Euphoria's sequence
> >>delimiters, and given the relative speed of reading the undelimited file
> >>into
> >>RAM at the beginning of a session, and then back out at the conclusion, it
> >>seemed preferable to me to use peek and poke on an undelimited, allocated
> >>file
> >>buffer. The editor program will search the entire file in 7 seconds.
> >>
> >>
> >After reading this, I'm not sure if you still have a question. It might
> >help to know at least what platform (DOS, *nix, Win) you are working
> >with to determine the best (or fastest?) solution.
> >
> >-- Brian
> >
> >
> Originally I had a problem. Having figured out AN answer before getting
> any response, but it may not be THE answer. I now have several questions
> rather than a problem.
>
> I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with
> 80 GB HD. The windows capabilities of my OS makes it possible for me to
> use the EU program in a window-like way, even though is is written in
> vanilla DOS mode. As a first crack, I was looking for simplicity and
> speed, and know little about programming and nothing about programming
> for Windows.
>
> The data file that I am building is a binary tree with fixed length
> nodes. Each node (at location n) has 2 children nodes (at locations 2n
> and 2n+1). Reading through and displaying all of the parent nodes in the
> branch for any given node is not difficult. There is no need for any
> system of record keys.
>
> When I tried to implement the system using EU's sequences, the size of
> the data file more than doubled, and the time taken to read the file
> into memory was on the order of ten minutes. When I stripped out the
> sequence delimiters and treated the whole file as one big number, I was
> able to allocated a buffer and read the file into it in 6 seconds. Rob
> Craig says all EU objects are either atoms or sequences. Of course this
> file is not an EU object, but that was the point of my question. I have
> discovered that there is no problem in pointing to a 4-letter part of a
> word and telling EU to treat it as an integer. It is then compared with
> a similarly constructed search integer. The method can search the entire
> file in 7 seconds.
>
> I know that there are many participants in this forum whose forte is
> speed in searches. My current question is, is there a better way to
> accomplish my goal within EU, either speedwise or stylewise? I am
> actually content with the 7 seven second result, if it cannot be improved.
No doubt this will be the slowest solution, but what if you stored the 200meg
file in ram as a series of smaller data chunks, small enough to fit into a
reasonable sequence size, and then looped thru the smaller chunks, reading
each chunk, one at a time, into a sequence, so you could use Euphoria
code to play with them? Then you could use such things as wildmatch(),
wildtok(), and DavidC's regexp lib on them.
I am very interested in seeing code for how you are loading and accessing
200meg file in ram now.
Kat,
applying flame retardant clothing again.
4. Re: Changing data types, Continued
> I am very interested in seeing code for how you are loading and accessing
> 200meg file in ram now.
>
> Kat,
Sure Kat, I wrote it! Allen shouldve mentioned that...
Euman
--
message authentication: /589&*809/-452/+1205
5. Re: Changing data types, Continued
Kat,
This was a preliminary version I wrote for Allen back at the beginning
of his project, I have several versions I sent him but this one is what
you would probably be interested in...
Two parts (.exw programs) are here, one to create and write the file, the
other to read the file...
I hope I hadnt changed anything to screw this up over time, Im in Linux right
now so I cant test it but it should still be ok...
Note: possible to tweak further!
-- Code part 1
-- PART 1 speedread_write.exw creates the 8mg file
-- SpeedRead_write Binary file in Windows
-- Euman 2004
-- Based on Allen V. Robnett's idea for nodes rambase
without type_check
without trace
without warning
include get.e
integer fn, fp, x
constant max_node_level = 20
fn = -1
while fn=-1 do
fn = open("nodes.dat", "rb")
if fn=-1 then
puts(1,"No Nodes file. Create it (Y/N)? \n")
x = wait_key()
if x='y' or x='Y' then
fn = open("nodes.dat", "wb")
puts(1,"Creating node table...Please wait\n")
for i=1 to power(2, max_node_level) do
for j = 1 to 8 do
puts(fn, '_')
end for
end for
puts(1,"Done...\n")
close(fn)
end if
else
close(fn)
end if
exit
end while
-- PART 2 speedread.exw reads written file from output of above code
-- SpeedRead Binary file in Windows
-- LOOK FOR: pass name of file (below) and .dat file containing 8mg
-- Euman 2004
-- W/ minor tweak by Tommy Carlier noted below
without type_check
with trace
include machine.e
include file.e
include dll.e
atom t1, t2, fn, lFileLen, result
sequence lFile, line
integer char
atom kernel32
kernel32 = open_dll("kernel32.dll")
constant
xCreateFile = define_c_func(kernel32,"CreateFileA",
{C_POINTER,C_LONG,C_LONG,C_POINTER
,C_LONG,C_LONG,C_INT},C_LONG),
xReadFile = define_c_func(kernel32,"ReadFile",
{C_INT,C_POINTER,C_UINT,C_POINTER,
C_POINTER},C_LONG)
global constant
GENERIC_READ = #80000000,
FILE_ATTRIBUTE_NORMAL = #80,
FILE_FLAG_SEQUENTIAL_SCAN= #8000000,
OPEN_EXISTING = 3
atom hFile
global function CreateFile(sequence fname)
atom FileName
FileName = allocate_string(fname)
hFile = c_func(xCreateFile,{FileName,
GENERIC_READ,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,
NULL})
return hFile
end function
atom lpNumberOfBytesRead
lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead
function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)
return c_func(xReadFile,
{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,
0})
end function
sequence yourfile
yourfile = "nodes.dat" -- pass name of file
fn = open(yourfile, "r")
lFileLen = seek(fn, -1)
lFileLen = where(fn)
close(fn)
atom lpFileBuff, cIN, cIndex, cMaxIndex
lpFileBuff = allocate(lFileLen)
cIN = 0
t1 = time()
hFile = CreateFile(yourfile)
result = ReadFile(hFile, lpFileBuff, lFileLen)
-- we havent checked lpNumberOfBytesRead perhaps you may want to
-- compare this to lFileLen before you start the loop, in this case
-- the file is pretty small. '8,000,000 chars'
-- Plus, this assumes binary read and no crlf chars exist
trace(1)
line = repeat(0, lFileLen / 8)
atom lIndex
if result then
cMaxIndex = lpFileBuff + lFileLen
cIndex = lpFileBuff
lIndex = 1
-- tweak by Tommy Carlier
while cIndex < cMaxIndex do
line[lIndex] = peek({cIndex, 8})
cIndex += 8
lIndex += 1
end while
end if
free(lpFileBuff)
-- take trace out to get correct speed!
t2 = time()- t1
printf(1,"Average Time : %1.4f sec\n", t2 )
if getc(0) then end if
-- END CODES
Euman
On Monday 22 March 2004 11:17 am, Kat wrote:
> On 22 Mar 2004, at 10:35, H.W Overman wrote:
> >
> > > I am very interested in seeing code for how you are loading and
> > > accessing 200meg file in ram now.
> > >
> > > Kat,
> >
> > Sure Kat, I wrote it! Allen shouldve mentioned that...
>
> I did a search in the RDS archives for Euman and Overman, and nothing
> showed on the memory stash code. I remember you were doing something
> about this, or someone was, last year, i think.
>
> Kat,
> looking interested,
> powered by caffine and cheetoes.
--
message authentication: /409&*777/-682/+905
6. Re: Changing data types, Continued
euman, you sent this to the wrong address i beleive
----- Original Message -----
From: "H.W Overman" <euman at bellsouth.net>
To: <gertie at visionsix.com>
Cc: "Euphoria Mailing List" <EUforum at topica.com>
Subject: Re: Changing data types, Continued
>
>
> Kat,
>
> This was a preliminary version I wrote for Allen back at the beginning
> of his project, I have several versions I sent him but this one is what
> you would probably be interested in...
>
> Two parts (.exw programs) are here, one to create and write the file, the
> other to read the file...
>
> I hope I hadnt changed anything to screw this up over time, Im in Linux
right
> now so I cant test it but it should still be ok...
> Note: possible to tweak further!
>
> -- Code part 1
>
> -- PART 1 speedread_write.exw creates the 8mg file
>
> -- SpeedRead_write Binary file in Windows
> -- Euman 2004
> -- Based on Allen V. Robnett's idea for nodes rambase
>
> without type_check
> without trace
> without warning
>
> include get.e
>
> integer fn, fp, x
>
> constant max_node_level = 20
>
> fn = -1
> while fn=-1 do
> fn = open("nodes.dat", "rb")
> if fn=-1 then
> puts(1,"No Nodes file. Create it (Y/N)? \n")
> x = wait_key()
> if x='y' or x='Y' then
> fn = open("nodes.dat", "wb")
> puts(1,"Creating node table...Please wait\n")
> for i=1 to power(2, max_node_level) do
> for j = 1 to 8 do
> puts(fn, '_')
> end for
> end for
> puts(1,"Done...\n")
> close(fn)
> end if
> else
> close(fn)
> end if
> exit
> end while
>
> -- PART 2 speedread.exw reads written file from output of above code
>
> -- SpeedRead Binary file in Windows
> -- LOOK FOR: pass name of file (below) and .dat file containing 8mg
> -- Euman 2004
> -- W/ minor tweak by Tommy Carlier noted below
>
> without type_check
> with trace
>
> include machine.e
> include file.e
> include dll.e
>
> atom t1, t2, fn, lFileLen, result
> sequence lFile, line
> integer char
>
> atom kernel32
> kernel32 = open_dll("kernel32.dll")
>
> constant
> xCreateFile = define_c_func(kernel32,"CreateFileA",
> {C_POINTER,C_LONG,C_LONG,C_POINTER
> ,C_LONG,C_LONG,C_INT},C_LONG),
> xReadFile = define_c_func(kernel32,"ReadFile",
> {C_INT,C_POINTER,C_UINT,C_POINTER,
> C_POINTER},C_LONG)
>
> global constant
> GENERIC_READ = #80000000,
> FILE_ATTRIBUTE_NORMAL = #80,
> FILE_FLAG_SEQUENTIAL_SCAN= #8000000,
> OPEN_EXISTING = 3
>
> atom hFile
>
> global function CreateFile(sequence fname)
> atom FileName
> FileName = allocate_string(fname)
> hFile = c_func(xCreateFile,{FileName,
> GENERIC_READ,
> 0,
> NULL,
> OPEN_EXISTING,
>
FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,
> NULL})
> return hFile
> end function
>
> atom lpNumberOfBytesRead
> lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead
>
> function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)
> return c_func(xReadFile,
> {hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,
> 0})
> end function
>
> sequence yourfile
> yourfile = "nodes.dat" -- pass name of file
>
> fn = open(yourfile, "r")
> lFileLen = seek(fn, -1)
> lFileLen = where(fn)
> close(fn)
>
> atom lpFileBuff, cIN, cIndex, cMaxIndex
> lpFileBuff = allocate(lFileLen)
> cIN = 0
>
> t1 = time()
> hFile = CreateFile(yourfile)
> result = ReadFile(hFile, lpFileBuff, lFileLen)
>
> -- we havent checked lpNumberOfBytesRead perhaps you may want to
> -- compare this to lFileLen before you start the loop, in this case
> -- the file is pretty small. '8,000,000 chars'
>
> -- Plus, this assumes binary read and no crlf chars exist
> trace(1)
> line = repeat(0, lFileLen / 8)
> atom lIndex
>
> if result then
>
> cMaxIndex = lpFileBuff + lFileLen
> cIndex = lpFileBuff
> lIndex = 1
>
> -- tweak by Tommy Carlier
> while cIndex < cMaxIndex do
> line[lIndex] = peek({cIndex, 8})
> cIndex += 8
> lIndex += 1
> end while
>
> end if
> free(lpFileBuff)
>
> -- take trace out to get correct speed!
> t2 = time()- t1
> printf(1,"Average Time : %1.4f sec\n", t2 )
>
> if getc(0) then end if
>
> -- END CODES
>
> Euman
>
> On Monday 22 March 2004 11:17 am, Kat wrote:
> > On 22 Mar 2004, at 10:35, H.W Overman wrote:
> > >
> > > > I am very interested in seeing code for how you are loading and
> > > > accessing 200meg file in ram now.
> > > >
> > > > Kat,
> > >
> > > Sure Kat, I wrote it! Allen shouldve mentioned that...
> >
> > I did a search in the RDS archives for Euman and Overman, and nothing
> > showed on the memory stash code. I remember you were doing something
> > about this, or someone was, last year, i think.
> >
> > Kat,
> > looking interested,
> > powered by caffine and cheetoes.
>
> --
> message authentication: /409&*777/-682/+905
>
>
>
> For Topica's complete suite of email marketing solutions visit:
> http://www.topica.com/?p=TEXFOOTER
>
>
7. Re: Changing data types, Continued
>>I am very interested in seeing code for how you are loading and accessing
>>200meg file in ram now.
>>
>>Kat,
>>
>>
>Sure Kat, I wrote it! Allen shouldve mentioned that...
>
>Euman
>
>
Yes, Euman was of inestimable help in telling me how to read the file
into an allocated buffer using the API.
As far as accessing the data is concerned, the only trick is having the
pointer to the start of the buffer. Then just use peek and poke with
appropriate offsets.
Allen
8. Re: Changing data types, Continued
On Monday 22 March 2004 07:44 pm, Allen Robnett wrote:
> Yes, Euman was of inestimable help in telling me how to read the file
> into an allocated buffer using the API.
inestimable:
1. Impossible to estimate or compute: inestimable damage.
2. Of immeasurable value or worth; invaluable: "shared all the inestimable
advantages of being wealthy, good-looking, confident and intelligent"
Well, I learned a new inestimable word from Allen, you guys?
> As far as accessing the data is concerned, the only trick is having the
> pointer to the start of the buffer. Then just use peek and poke with
> appropriate offsets.
That was your question to the list last week in other words.....Answered!
Darn Allen you are good!
>
> Allen
9. Re: Changing data types, Continued
On 22 Mar 2004, at 12:25, H.W Overman wrote:
>
>
> Kat,
>
> This was a preliminary version I wrote for Allen back at the beginning
> of his project, I have several versions I sent him but this one is what
> you would probably be interested in...
>
> Two parts (.exw programs) are here, one to create and write the file, the
> other to read the file...
>
> I hope I hadnt changed anything to screw this up over time, Im in Linux right
> now so I cant test it but it should still be ok... Note: possible to tweak
> further!
I tried running the code you sent, but when it's memory use gets up to 56
megabytes, i shut it down. It seems too high for an 8meg file, even for
Euphoria. What is normal for it?
Kat