1. RE: ramdisk

On 25 Mar 2004, at 16:27, Derek Parnell wrote:

> 
> 
> > -----Original Message-----
> > From: Allen Robnett [mailto:alrobnett at alumni.princeton.edu]
> > Sent: Thursday, 25 March 2004 3:35 PM
> > To: EUforum at topica.com
> > Subject: Re: ramdisk (was: Re: Changing data types Concluded)
> >
> >
> > Derek wrote:
> >
> > <<As you have all the bytes in RAM, you do not have to convert them to
> > Euphoria integers etc... You can use the RAM-based string
> > searching routines
> > built in to Windows.
> >
> > <snip>
> >   while offset < FileSize do
> >     result = c_func(CompareString,{0, 0, RAMADDR+offset, len,
> > FindStr, len})
> >     if result = CSTR_ERROR or result = CSTR_EQUAL then
> >          exit
> >     end if
> >     offset += recsize
> >   end while
> > >>
> >
> > Euman wrote:
> >
> > << Make sure you free that allocated string. >>
> >
> > Kat wrote:
> >
> > <<No, it means it took 14 sec to find it each time. Which
> > sounds bad to me,
> > because it means a 200meg file will take about 6 minutes to
> > find a record at
> > the end. There must be a gotcha somewhere.>>
> >
> > Many thanks to all for the generous assistance.
> > Using Derek's suggested CompareString on my 200MB+ file
> > (Pentium 4, 2.53 GHz, 512MB RAM) I got 7.69 seconds for a
> > 4-character search string and 8.36 seconds for a 12-character
> > search string. (The majority of the fields contained simply
> > underscores. The target string was in the last 12-character
> > record, number 16,777,216, ending with byte number 201,326,592.)
> >
> > Modifying the while statement to:
> >
> >   while offset < FileSize and c_func(CompareString,{0, 0,
> > RAMADDR+offset, len, FindStr, len}) != CSTR_EQUAL do
> >      offset += recsize
> >   end while
> >
> > resulted in a only a very slight improvement: 7.30 seconds.
> 
> CompareString() is a *very* expense method as it takes regional locale
> aspects into consideration. It might be better to get a small machine-code
> routine written to scan from a RAM address for the first occurance of the
> target string.

Still not in machine code,, but for a 10meg file with the find-pattern in the 
middle of it, i got the time down to 4.25 sec. If i take the allocation of vars
out
of the timing loop, it doesn't change. If i remove the puts() from the loop so 
there is no dos box during timing, the time drops to 3.7 seconds. I am still 
not terribly pleased yet.

But, i am using peek() now, and making over twice as many as needed, 
imho. Lets see if i can fix that somehow....

Yeas, got it down to finding the string midway in a 10 megabyte file in 1 
second (average time on 10 searches). So i put the string at the end of the 
file, and it was found in 2 sec (average time on 10 searches). This makes for 
finding the end of a 200 megabyte file in 40 seconds. That's more 
reasonable, at least for this slow computer.

Naturally, if there was a clue to the approximate location of the string in the 
file, it could be found even faster. I haven't checked, but i figure from what
the
code is doing that a binary search on a 200 meg file, the way i am doing it, 
will take 2.25 sec, much faster than the previous scarey 6 minutes.

Note i am not using fixed size records, which would slow the speed as the 
record sizes get smaller using the method i am using now. But a fixed record 
size *may* speed up binary searches. Of course, a separate index file would 
really speed it up.

Something else interesting, which points up Eu not closing out files properly, 
is that two instances of the loadsearch code (i tried to post here in a .zip
file)
cannot be run at the same time, the 2nd instance errors out with "cannot 
open readfile", because the first instance (finished running, but sitting there 
waiting for me to press enter) still owns the test file. So lemme ask again: 
Why is this?

Kat

new topic     » topic index » view message » categorize

2. RE: ramdisk

On 26 Mar 2004 at 3:50, Kat wrote:

> 
> On 25 Mar 2004, at 16:27, Derek Parnell wrote:
> 
> > 
> > > -----Original Message-----
> > > From: Allen Robnett [mailto:alrobnett at alumni.princeton.edu]
> > > Sent: Thursday, 25 March 2004 3:35 PM
> > > To: EUforum at topica.com
> > > Subject: Re: ramdisk (was: Re: Changing data types Concluded)
> > >
> > >
> > > Derek wrote:
> > >
> > > <<As you have all the bytes in RAM, you do not have to convert them to
> > > Euphoria integers etc... You can use the RAM-based string
> > > searching routines
> > > built in to Windows.
> > >
> > > <snip>
> > >   while offset < FileSize do
> > >     result = c_func(CompareString,{0, 0, RAMADDR+offset, len,
> > > FindStr, len})
> > >     if result = CSTR_ERROR or result = CSTR_EQUAL then
> > >          exit
> > >     end if
> > >     offset += recsize
> > >   end while
> > > >>
> > >
> > > Euman wrote:
> > >
> > > << Make sure you free that allocated string. >>
> > >
> > > Kat wrote:
> > >
> > > <<No, it means it took 14 sec to find it each time. Which
> > > sounds bad to me,
> > > because it means a 200meg file will take about 6 minutes to
> > > find a record at
> > > the end. There must be a gotcha somewhere.>>
> > >
> > > Many thanks to all for the generous assistance.
> > > Using Derek's suggested CompareString on my 200MB+ file
> > > (Pentium 4, 2.53 GHz, 512MB RAM) I got 7.69 seconds for a
> > > 4-character search string and 8.36 seconds for a 12-character
> > > search string. (The majority of the fields contained simply
> > > underscores. The target string was in the last 12-character
> > > record, number 16,777,216, ending with byte number 201,326,592.)
> > >
> > > Modifying the while statement to:
> > >
> > >   while offset < FileSize and c_func(CompareString,{0, 0,
> > > RAMADDR+offset, len, FindStr, len}) != CSTR_EQUAL do
> > >      offset += recsize
> > >   end while
> > >
> > > resulted in a only a very slight improvement: 7.30 seconds.
> > 
> > CompareString() is a *very* expense method as it takes regional locale
> > aspects into consideration. It might be better to get a small machine-code
> > routine written to scan from a RAM address for the first occurance of the
> > target string.
> 
> Still not in machine code,, but for a 10meg file with the find-pattern in the 
> middle of it, i got the time down to 4.25 sec. If i take the allocation of
> vars out
> of the timing loop, it doesn't change. If i remove the puts() from the loop so
>
> there is no dos box during timing, the time drops to 3.7 seconds. I am still 
> not terribly pleased yet.
> 
> But, i am using peek() now, and making over twice as many as needed, 
> imho. Lets see if i can fix that somehow....
> 
> Yeas, got it down to finding the string midway in a 10 megabyte file in 1 
> second (average time on 10 searches). So i put the string at the end of the 
> file, and it was found in 2 sec (average time on 10 searches). This makes for 
> finding the end of a 200 megabyte file in 40 seconds. That's more 
> reasonable, at least for this slow computer.
> 
> Naturally, if there was a clue to the approximate location of the string in
> the
> file, it could be found even faster. I haven't checked, but i figure from what
> the
> code is doing that a binary search on a 200 meg file, the way i am doing it, 
> will take 2.25 sec, much faster than the previous scarey 6 minutes.
> 
> Note i am not using fixed size records, which would slow the speed as the 
> record sizes get smaller using the method i am using now. But a fixed record 
> size *may* speed up binary searches. Of course, a separate index file would 
> really speed it up.
> 
> snip <
> 
> Kat
> 
Ha ve you tried a Boyer-Moore search?
Is it applicable in this case?

Karl Bochert

new topic     » goto parent     » topic index » view message » categorize

3. RE: ramdisk

On 26 Mar 2004, at 2:38, kbochert at copper.net wrote:

<snip>

> > Yeas, got it down to finding the string midway in a 10 megabyte file in 1
> > second (average time on 10 searches). So i put the string at the end of the
> > file, and it was found in 2 sec (average time on 10 searches). This makes
> > for
> > finding the end of a 200 megabyte file in 40 seconds. That's more
> > reasonable,
> > at least for this slow computer.
> > 
> > Naturally, if there was a clue to the approximate location of the string in
> > the file, it could be found even faster. I haven't checked, but i figure
> > from
> > what the code is doing that a binary search on a 200 meg file, the way i am
> > doing it, will take 2.25 sec, much faster than the previous scarey 6
> > minutes.
> > 
> > Note i am not using fixed size records, which would slow the speed as the
> > record sizes get smaller using the method i am using now. But a fixed record
> > size *may* speed up binary searches. Of course, a separate index file would
> > really speed it up.
> > 
> > snip <
> > 
> > Kat
> > 
> Ha ve you tried a Boyer-Moore search?
> Is it applicable in this case?

I am using Eu's built-in match() to locate the string in the megabyte file, 
without putting the megabyte file into match(). (Sure you figured it out, but 
it's not as easy as that, consider a peek(fn,x) where x lands in the middle of 
what you are looking for.) Using the BM search on the file would mean 
accessing the file one byte at a time, and would work well (i suspect) only in 
machine code. I use Euphoria to avoid machine code.

The next problem is being able to do string insert() on a file too big to 
duplicate in memory, such as inserting a record 40% of the way from an end 
of the file, in a way that binary search still works quickly. I suspect the only
way to accomplish this is a separate index list. However, if the 200 
megabyte file IS the index list, which it could be, i am back to square one.

Kat

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu