1. RE: Changing data types Concluded

> -----Original Message-----
> From: Allen Robnett [mailto:alrobnett at alumni.princeton.edu]
> Subject: Changing data types Concluded
>
>
>
> Derek wrote:
>
> <<Yes, it was a rather salient point as we all assumed you
> were using standard
> Euphoria code to read and search the file. Using the API and
> raw ram buffers
> instead changes things a lot, as we are no longer talking
> about atoms and
> sequences.
> Derek >>
>
>
> Sorry about the confusion. I am a math teacher, not a
> programmer, and I learn the lingo by having programmers kick
> me when I make a mistake.

No problem.

> Back to my original question.
> I am reading a 200MB text file into a preallocated buffer via
> an API method suggested by Euman. The file consists of 2^24
> fixed length fields, each 12 bytes long. I would like to take
> advantage of EU's features to the greatest extent possible in
> searching the file. I am currently doing this by successively
> peeking at the first 4 characters of each field and telling
> EU that they represent an unsigned integer (actually an atom).
>
> The search of the 16,777,216 fields is done in 6.6 seconds if
> "not equal()" is used, and in 5.8 seconds if != is used. The
> latter is possible because the 4-character sub-field is
> treated as an integer.
>
> Questions:
> 1. Does anyone see a way to improve on this time?

Yes. See below.

> 2. Could I also arrange to use an 8-byte sub-field?

As you have all the bytes in RAM, you do not have to convert them to
Euphoria integers etc... You can use the RAM-based string searching routines
built in to Windows.

  integer kernel32
  integer CompareString

  kernel32 = open_dll("kernel32.dll")
  CompareString = define_c_func(kernel32, "CompareStringA",
{C_UINT,C_UINT,C_UINT,C_INT,C_UINT,C_INT},C_INT)

And assuming you have the file loaded into RAM at the address stored in
'RAMADDR' then you can do

 constant CSTR_ERROR = 0
 constant CSTR_LESS_THAN = 1
 constant CSTR_EQUAL = 2
 constant CSTR_GREATER_THAN = 3

  atom FindStr
  integer offset
  atom result
  integer len
  constant recsize = 12
  sequence TheRecord

  -- Find first text record that begins with 'abcd'.
  FindStr = allocate_string("abcd")
  len = 4
  offset = 0
  while offset < FileSize do
    result = c_func(CompareString,{0, 0, RAMADDR+offset, len, FindStr, len})
    if result = CSTR_ERROR or result = CSTR_EQUAL then
         exit
    end if
    offset += recsize
  end while
  -- Copy the whole record into a sequence
  TheRecord = peek({RAMADDR+offset, recsize})

> 3. Is there some way to persuade EU "position()" to allow the
> display line to be greater than 24 (using vanilla DOS)?

  integer  maxlines
  maxlines = text_rows(50) --Values of 25, 28, 43 and 50 lines are supported
by most video cards.

--
Derek

new topic     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu