1. Changing data types, Continued

I have found that peek4u() works to get the RAM data into 4-byte atom 
form, and that  bytes_2_int can be used to change the user input into an 
atom. Since I appear to be restricted to using a multiple of 4 bytes, I 
have elected to use a for loop and wait_key to get the 4-character 
input. Is there a better way to accomplish all of this?

Allen

new topic     » topic index » view message » categorize

2. Changing data types, Continued

Brian Broker <bkb at cnw.com> wrote:

>Allen V Robnett wrote:
>
><snipped response>
>  
>
>>Thanks for the response. I believe match() is restricted to sequences. 
>>Given the 200MB size of my data file, even without Euphoria's sequence 
>>delimiters, and given the relative speed of reading the undelimited file 
>>into RAM at the beginning of a session, and then back out at the 
>>conclusion, it seemed preferable to me to use peek and poke on an 
>>undelimited, allocated file buffer. The editor program will search the 
>>entire file in 7 seconds.
>>    
>>
>After reading this, I'm not sure if you still have a question.  It might 
>help to know at least what platform (DOS, *nix, Win) you are working 
>with to determine the best (or fastest?) solution.
>
>-- Brian
>  
>
Originally I had a problem. Having figured out AN answer before getting 
any response, but it may not be THE answer. I now have several questions 
rather than a problem.

I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with 
80 GB HD. The windows capabilities of my OS makes it possible for me to 
use the EU program in a window-like way, even though is is written in 
vanilla DOS mode. As a first crack, I was looking for simplicity and 
speed, and know little about programming and nothing about programming 
for Windows.

The data file that I am building is a binary tree with fixed length 
nodes. Each node (at location n) has 2 children nodes (at locations 2n 
and 2n+1). Reading through and displaying all of the parent nodes in the 
branch for any given node is not difficult. There is no need for any 
system of record keys.

When I tried to implement the system using EU's sequences, the size of 
the data file more than doubled, and the time taken to read the file 
into memory was on the order of ten minutes. When I stripped out the 
sequence delimiters and treated the whole file as one big number, I was 
able to allocated a buffer and read the file into it  in  6 seconds. Rob 
Craig says all EU objects are either atoms or sequences. Of course this 
file is not an EU object, but that was the point of my question. I have 
discovered that there is no problem in pointing to a 4-letter part of a 
word and telling EU to treat it as an integer. It is then compared with 
a similarly constructed search integer. The method can search the entire 
file in 7 seconds.

I know that there are many participants in this forum whose forte is 
speed in searches. My current question is, is there a better way to 
accomplish my goal within EU, either speedwise or stylewise? I am 
actually content with the 7 seven second result, if it cannot be improved.

Thanks to the many who have taken time to be of help.
Allen

new topic     » goto parent     » topic index » view message » categorize

3. Re: Changing data types, Continued

On 22 Mar 2004, at 7:47, Allen Robnett wrote:

> 
> 
> Brian Broker <bkb at cnw.com> wrote:
> 
> >Allen V Robnett wrote:
> >
> ><snipped response>
> >  
> >
> >>Thanks for the response. I believe match() is restricted to sequences. 
> >>Given the 200MB size of my data file, even without Euphoria's sequence 
> >>delimiters, and given the relative speed of reading the undelimited file
> >>into
> >>RAM at the beginning of a session, and then back out at the conclusion, it
> >>seemed preferable to me to use peek and poke on an undelimited, allocated
> >>file
> >>buffer. The editor program will search the entire file in 7 seconds.
> >>    
> >>
> >After reading this, I'm not sure if you still have a question.  It might 
> >help to know at least what platform (DOS, *nix, Win) you are working 
> >with to determine the best (or fastest?) solution.
> >
> >-- Brian
> >  
> >
> Originally I had a problem. Having figured out AN answer before getting 
> any response, but it may not be THE answer. I now have several questions 
> rather than a problem.
> 
> I am using Windows XP HE on a Dell Pentium 4, 2.53 GHz; 533 MHz FSB with 
> 80 GB HD. The windows capabilities of my OS makes it possible for me to 
> use the EU program in a window-like way, even though is is written in 
> vanilla DOS mode. As a first crack, I was looking for simplicity and 
> speed, and know little about programming and nothing about programming 
> for Windows.
> 
> The data file that I am building is a binary tree with fixed length 
> nodes. Each node (at location n) has 2 children nodes (at locations 2n 
> and 2n+1). Reading through and displaying all of the parent nodes in the 
> branch for any given node is not difficult. There is no need for any 
> system of record keys.
> 
> When I tried to implement the system using EU's sequences, the size of 
> the data file more than doubled, and the time taken to read the file 
> into memory was on the order of ten minutes. When I stripped out the 
> sequence delimiters and treated the whole file as one big number, I was 
> able to allocated a buffer and read the file into it  in  6 seconds. Rob 
> Craig says all EU objects are either atoms or sequences. Of course this 
> file is not an EU object, but that was the point of my question. I have 
> discovered that there is no problem in pointing to a 4-letter part of a 
> word and telling EU to treat it as an integer. It is then compared with 
> a similarly constructed search integer. The method can search the entire 
> file in 7 seconds.
> 
> I know that there are many participants in this forum whose forte is 
> speed in searches. My current question is, is there a better way to 
> accomplish my goal within EU, either speedwise or stylewise? I am 
> actually content with the 7 seven second result, if it cannot be improved.

No doubt this will be the slowest solution, but what if you stored the 200meg 
file in ram as a series of smaller data chunks, small enough to fit into a 
reasonable sequence size, and then looped thru the smaller chunks, reading 
each chunk, one at a time, into a sequence, so you could use Euphoria 
code to play with them? Then you could use such things as wildmatch(), 
wildtok(), and DavidC's regexp lib on them.

I am very interested in seeing code for how you are loading and accessing 
200meg file in ram now.

Kat,
applying flame retardant clothing again.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Changing data types, Continued

> I am very interested in seeing code for how you are loading and accessing
> 200meg file in ram now.
>
> Kat,

Sure Kat, I wrote it! Allen shouldve mentioned that...

Euman

-- 
message authentication: /589&*809/-452/+1205

new topic     » goto parent     » topic index » view message » categorize

5. Re: Changing data types, Continued

Kat,

This was a preliminary version I wrote for Allen back at the beginning
of his project, I have several versions I sent him but this one is what
you would probably be interested in...

Two parts (.exw programs) are here, one to create and write the file, the
other to read the file...

I hope I hadnt changed anything to screw this up over time, Im in Linux right
now so I cant test it but it should still be ok... 
Note: possible to tweak further!

-- Code part 1

-- PART 1 speedread_write.exw creates the 8mg file

-- SpeedRead_write Binary file in Windows
-- Euman 2004
-- Based on Allen V. Robnett's idea for nodes rambase

without type_check
without trace
without warning

include get.e

integer fn, fp, x

constant max_node_level = 20

fn = -1
while fn=-1 do
  fn = open("nodes.dat", "rb")
  if fn=-1 then
     puts(1,"No Nodes file. Create it (Y/N)? \n")    
     x = wait_key()
     if x='y' or x='Y' then
        fn = open("nodes.dat", "wb")
        puts(1,"Creating node table...Please wait\n")
        for i=1 to power(2, max_node_level) do
            for j = 1 to 8 do
                puts(fn, '_')
            end for  
        end for
        puts(1,"Done...\n")
        close(fn)
     end if
  else
  close(fn)
  end if
  exit  
end while

-- PART 2 speedread.exw reads written file from output of above code

-- SpeedRead Binary file in Windows
-- LOOK FOR: pass name of file (below) and .dat file containing 8mg
-- Euman 2004
-- W/ minor tweak by Tommy Carlier noted below

without type_check
with trace

include machine.e
include file.e
include dll.e

atom t1, t2, fn, lFileLen, result
sequence lFile, line
integer char

atom kernel32
kernel32 = open_dll("kernel32.dll")

constant
xCreateFile = define_c_func(kernel32,"CreateFileA",
{C_POINTER,C_LONG,C_LONG,C_POINTER
,C_LONG,C_LONG,C_INT},C_LONG),
xReadFile = define_c_func(kernel32,"ReadFile",
{C_INT,C_POINTER,C_UINT,C_POINTER,
C_POINTER},C_LONG)

global constant 
  GENERIC_READ             = #80000000,
  FILE_ATTRIBUTE_NORMAL    = #80,
  FILE_FLAG_SEQUENTIAL_SCAN= #8000000,
  OPEN_EXISTING            = 3

atom hFile
  
global function CreateFile(sequence fname)
atom FileName 
     FileName = allocate_string(fname)
     hFile = c_func(xCreateFile,{FileName,
                                GENERIC_READ,
                                0,
                                NULL,
                                OPEN_EXISTING,
                                FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,
                                NULL})
    return hFile
end function

atom lpNumberOfBytesRead
lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead

function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)
  return c_func(xReadFile,
{hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,
0})
end function

sequence yourfile
yourfile = "nodes.dat" -- pass name of file

fn = open(yourfile, "r") 
lFileLen = seek(fn, -1)                                 
lFileLen = where(fn)
close(fn)

atom lpFileBuff, cIN, cIndex, cMaxIndex
lpFileBuff = allocate(lFileLen)
cIN = 0

t1 = time()
hFile = CreateFile(yourfile) 
result = ReadFile(hFile, lpFileBuff, lFileLen) 

-- we havent checked lpNumberOfBytesRead perhaps you may want to
-- compare this to lFileLen before you start the loop, in this case
-- the file is pretty small. '8,000,000 chars'

-- Plus, this assumes binary read and no crlf chars exist
trace(1)
line = repeat(0, lFileLen / 8)
atom lIndex

if result then
      
   cMaxIndex = lpFileBuff + lFileLen
   cIndex = lpFileBuff
   lIndex = 1
   
  -- tweak by Tommy Carlier
   while cIndex < cMaxIndex do
      line[lIndex] = peek({cIndex, 8})
      cIndex += 8
      lIndex += 1
   end while
     
end if
free(lpFileBuff)

-- take trace out to get correct speed!
t2 = time()- t1                                 
printf(1,"Average Time : %1.4f sec\n", t2 )

if getc(0) then end if

-- END CODES

Euman

On Monday 22 March 2004 11:17 am, Kat wrote:
> On 22 Mar 2004, at 10:35, H.W Overman wrote:
> >
> > > I am very interested in seeing code for how you are loading and
> > > accessing 200meg file in ram now.
> > >
> > > Kat,
> >
> > Sure Kat, I wrote it! Allen shouldve mentioned that...
>
> I did a search in the RDS archives for Euman and Overman, and nothing
> showed on the memory stash code. I remember you were doing something
> about this, or someone was, last year, i think.
>
> Kat,
> looking interested,
> powered by caffine and cheetoes.

-- 
message authentication: /409&*777/-682/+905

new topic     » goto parent     » topic index » view message » categorize

6. Re: Changing data types, Continued

euman, you sent this to the wrong address i beleive

----- Original Message -----
From: "H.W Overman" <euman at bellsouth.net>
To: <gertie at visionsix.com>
Cc: "Euphoria Mailing List" <EUforum at topica.com>
Subject: Re: Changing data types, Continued


>
>
> Kat,
>
> This was a preliminary version I wrote for Allen back at the beginning
> of his project, I have several versions I sent him but this one is what
> you would probably be interested in...
>
> Two parts (.exw programs) are here, one to create and write the file, the
> other to read the file...
>
> I hope I hadnt changed anything to screw this up over time, Im in Linux
right
> now so I cant test it but it should still be ok...
> Note: possible to tweak further!
>
> -- Code part 1
>
> -- PART 1 speedread_write.exw creates the 8mg file
>
> -- SpeedRead_write Binary file in Windows
> -- Euman 2004
> -- Based on Allen V. Robnett's idea for nodes rambase
>
> without type_check
> without trace
> without warning
>
> include get.e
>
> integer fn, fp, x
>
> constant max_node_level = 20
>
> fn = -1
> while fn=-1 do
>   fn = open("nodes.dat", "rb")
>   if fn=-1 then
>      puts(1,"No Nodes file. Create it (Y/N)? \n")
>      x = wait_key()
>      if x='y' or x='Y' then
>         fn = open("nodes.dat", "wb")
>         puts(1,"Creating node table...Please wait\n")
>         for i=1 to power(2, max_node_level) do
>             for j = 1 to 8 do
>                 puts(fn, '_')
>             end for
>         end for
>         puts(1,"Done...\n")
>         close(fn)
>      end if
>   else
>   close(fn)
>   end if
>   exit
> end while
>
> -- PART 2 speedread.exw reads written file from output of above code
>
> -- SpeedRead Binary file in Windows
> -- LOOK FOR: pass name of file (below) and .dat file containing 8mg
> -- Euman 2004
> -- W/ minor tweak by Tommy Carlier noted below
>
> without type_check
> with trace
>
> include machine.e
> include file.e
> include dll.e
>
> atom t1, t2, fn, lFileLen, result
> sequence lFile, line
> integer char
>
> atom kernel32
> kernel32 = open_dll("kernel32.dll")
>
> constant
> xCreateFile = define_c_func(kernel32,"CreateFileA",
> {C_POINTER,C_LONG,C_LONG,C_POINTER
> ,C_LONG,C_LONG,C_INT},C_LONG),
> xReadFile = define_c_func(kernel32,"ReadFile",
> {C_INT,C_POINTER,C_UINT,C_POINTER,
> C_POINTER},C_LONG)
>
> global constant
>   GENERIC_READ             = #80000000,
>   FILE_ATTRIBUTE_NORMAL    = #80,
>   FILE_FLAG_SEQUENTIAL_SCAN= #8000000,
>   OPEN_EXISTING            = 3
>
> atom hFile
>
> global function CreateFile(sequence fname)
> atom FileName
>      FileName = allocate_string(fname)
>      hFile = c_func(xCreateFile,{FileName,
>                                 GENERIC_READ,
>                                 0,
>                                 NULL,
>                                 OPEN_EXISTING,
>
FILE_ATTRIBUTE_NORMAL+FILE_FLAG_SEQUENTIAL_SCAN,
>                                 NULL})
>     return hFile
> end function
>
> atom lpNumberOfBytesRead
> lpNumberOfBytesRead = allocate(4) --lpNumberOfBytesRead
>
> function ReadFile(atom hFile, atom lpBuffer, atom nNumberOfBytesToRead)
>   return c_func(xReadFile,
> {hFile,lpBuffer,nNumberOfBytesToRead,lpNumberOfBytesRead,
> 0})
> end function
>
> sequence yourfile
> yourfile = "nodes.dat" -- pass name of file
>
> fn = open(yourfile, "r")
> lFileLen = seek(fn, -1)
> lFileLen = where(fn)
> close(fn)
>
> atom lpFileBuff, cIN, cIndex, cMaxIndex
> lpFileBuff = allocate(lFileLen)
> cIN = 0
>
> t1 = time()
> hFile = CreateFile(yourfile)
> result = ReadFile(hFile, lpFileBuff, lFileLen)
>
> -- we havent checked lpNumberOfBytesRead perhaps you may want to
> -- compare this to lFileLen before you start the loop, in this case
> -- the file is pretty small. '8,000,000 chars'
>
> -- Plus, this assumes binary read and no crlf chars exist
> trace(1)
> line = repeat(0, lFileLen / 8)
> atom lIndex
>
> if result then
>
>    cMaxIndex = lpFileBuff + lFileLen
>    cIndex = lpFileBuff
>    lIndex = 1
>
>   -- tweak by Tommy Carlier
>    while cIndex < cMaxIndex do
>       line[lIndex] = peek({cIndex, 8})
>       cIndex += 8
>       lIndex += 1
>    end while
>
> end if
> free(lpFileBuff)
>
> -- take trace out to get correct speed!
> t2 = time()- t1
> printf(1,"Average Time : %1.4f sec\n", t2 )
>
> if getc(0) then end if
>
> -- END CODES
>
> Euman
>
> On Monday 22 March 2004 11:17 am, Kat wrote:
> > On 22 Mar 2004, at 10:35, H.W Overman wrote:
> > >
> > > > I am very interested in seeing code for how you are loading and
> > > > accessing 200meg file in ram now.
> > > >
> > > > Kat,
> > >
> > > Sure Kat, I wrote it! Allen shouldve mentioned that...
> >
> > I did a search in the RDS archives for Euman and Overman, and nothing
> > showed on the memory stash code. I remember you were doing something
> > about this, or someone was, last year, i think.
> >
> > Kat,
> > looking interested,
> > powered by caffine and cheetoes.
>
> --
> message authentication: /409&*777/-682/+905
>
>
>
> For Topica's complete suite of email marketing solutions visit:
> http://www.topica.com/?p=TEXFOOTER
>
>

new topic     » goto parent     » topic index » view message » categorize

7. Re: Changing data types, Continued

>>I am very interested in seeing code for how you are loading and accessing
>>200meg file in ram now.
>>
>>Kat,
>>    
>>
>Sure Kat, I wrote it! Allen shouldve mentioned that...
>
>Euman
>  
>

Yes, Euman was of inestimable help in telling me how to read the file 
into an allocated buffer using the API.
As far as accessing the data is concerned, the only trick is having the 
pointer to the start of the buffer. Then just use peek and poke with 
appropriate offsets.

Allen

new topic     » goto parent     » topic index » view message » categorize

8. Re: Changing data types, Continued

On Monday 22 March 2004 07:44 pm, Allen Robnett wrote:

> Yes, Euman was of inestimable help in telling me how to read the file
> into an allocated buffer using the API.

inestimable:
1. Impossible to estimate or compute: inestimable damage. 
2. Of immeasurable value or worth; invaluable: "shared all the inestimable 
advantages of being wealthy, good-looking, confident and intelligent" 

Well, I learned a new inestimable word from Allen, you guys?

> As far as accessing the data is concerned, the only trick is having the
> pointer to the start of the buffer. Then just use peek and poke with
> appropriate offsets.

That was your question to the list last week in other words.....Answered!

Darn Allen you are good!

blink

>
> Allen

new topic     » goto parent     » topic index » view message » categorize

9. Re: Changing data types, Continued

On 22 Mar 2004, at 12:25, H.W Overman wrote:

> 
> 
> Kat,
> 
> This was a preliminary version I wrote for Allen back at the beginning
> of his project, I have several versions I sent him but this one is what
> you would probably be interested in...
> 
> Two parts (.exw programs) are here, one to create and write the file, the
> other to read the file...
> 
> I hope I hadnt changed anything to screw this up over time, Im in Linux right
> now so I cant test it but it should still be ok... Note: possible to tweak
> further!

I tried running the code you sent, but when it's memory use gets up to 56 
megabytes, i shut it down. It seems too high for an 8meg file, even for 
Euphoria. What is normal for it?

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu