RE: ramdisk

new topic     » topic index » view thread      » older message » newer message

On 25 Mar 2004, at 16:27, Derek Parnell wrote:

> 
> 
> > -----Original Message-----
> > From: Allen Robnett [mailto:alrobnett at alumni.princeton.edu]
> > Sent: Thursday, 25 March 2004 3:35 PM
> > To: EUforum at topica.com
> > Subject: Re: ramdisk (was: Re: Changing data types Concluded)
> >
> >
> > Derek wrote:
> >
> > <<As you have all the bytes in RAM, you do not have to convert them to
> > Euphoria integers etc... You can use the RAM-based string
> > searching routines
> > built in to Windows.
> >
> > <snip>
> >   while offset < FileSize do
> >     result = c_func(CompareString,{0, 0, RAMADDR+offset, len,
> > FindStr, len})
> >     if result = CSTR_ERROR or result = CSTR_EQUAL then
> >          exit
> >     end if
> >     offset += recsize
> >   end while
> > >>
> >
> > Euman wrote:
> >
> > << Make sure you free that allocated string. >>
> >
> > Kat wrote:
> >
> > <<No, it means it took 14 sec to find it each time. Which
> > sounds bad to me,
> > because it means a 200meg file will take about 6 minutes to
> > find a record at
> > the end. There must be a gotcha somewhere.>>
> >
> > Many thanks to all for the generous assistance.
> > Using Derek's suggested CompareString on my 200MB+ file
> > (Pentium 4, 2.53 GHz, 512MB RAM) I got 7.69 seconds for a
> > 4-character search string and 8.36 seconds for a 12-character
> > search string. (The majority of the fields contained simply
> > underscores. The target string was in the last 12-character
> > record, number 16,777,216, ending with byte number 201,326,592.)
> >
> > Modifying the while statement to:
> >
> >   while offset < FileSize and c_func(CompareString,{0, 0,
> > RAMADDR+offset, len, FindStr, len}) != CSTR_EQUAL do
> >      offset += recsize
> >   end while
> >
> > resulted in a only a very slight improvement: 7.30 seconds.
> 
> CompareString() is a *very* expense method as it takes regional locale
> aspects into consideration. It might be better to get a small machine-code
> routine written to scan from a RAM address for the first occurance of the
> target string.

Still not in machine code,, but for a 10meg file with the find-pattern in the 
middle of it, i got the time down to 4.25 sec. If i take the allocation of vars
out
of the timing loop, it doesn't change. If i remove the puts() from the loop so 
there is no dos box during timing, the time drops to 3.7 seconds. I am still 
not terribly pleased yet.

But, i am using peek() now, and making over twice as many as needed, 
imho. Lets see if i can fix that somehow....

Yeas, got it down to finding the string midway in a 10 megabyte file in 1 
second (average time on 10 searches). So i put the string at the end of the 
file, and it was found in 2 sec (average time on 10 searches). This makes for 
finding the end of a 200 megabyte file in 40 seconds. That's more 
reasonable, at least for this slow computer.

Naturally, if there was a clue to the approximate location of the string in the 
file, it could be found even faster. I haven't checked, but i figure from what
the
code is doing that a binary search on a 200 meg file, the way i am doing it, 
will take 2.25 sec, much faster than the previous scarey 6 minutes.

Note i am not using fixed size records, which would slow the speed as the 
record sizes get smaller using the method i am using now. But a fixed record 
size *may* speed up binary searches. Of course, a separate index file would 
really speed it up.

Something else interesting, which points up Eu not closing out files properly, 
is that two instances of the loadsearch code (i tried to post here in a .zip
file)
cannot be run at the same time, the 2nd instance errors out with "cannot 
open readfile", because the first instance (finished running, but sitting there 
waiting for me to press enter) still owns the test file. So lemme ask again: 
Why is this?

Kat

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu