RE: Changing data types Concluded
- Posted by "Derek Parnell" <ddparnell at bigpond.com> Mar 24, 2004
- 380 views
> -----Original Message----- > From: Allen Robnett [mailto:alrobnett at alumni.princeton.edu] > Subject: Changing data types Concluded > > > > Derek wrote: > > <<Yes, it was a rather salient point as we all assumed you > were using standard > Euphoria code to read and search the file. Using the API and > raw ram buffers > instead changes things a lot, as we are no longer talking > about atoms and > sequences. > Derek >> > > > Sorry about the confusion. I am a math teacher, not a > programmer, and I learn the lingo by having programmers kick > me when I make a mistake. No problem. > Back to my original question. > I am reading a 200MB text file into a preallocated buffer via > an API method suggested by Euman. The file consists of 2^24 > fixed length fields, each 12 bytes long. I would like to take > advantage of EU's features to the greatest extent possible in > searching the file. I am currently doing this by successively > peeking at the first 4 characters of each field and telling > EU that they represent an unsigned integer (actually an atom). > > The search of the 16,777,216 fields is done in 6.6 seconds if > "not equal()" is used, and in 5.8 seconds if != is used. The > latter is possible because the 4-character sub-field is > treated as an integer. > > Questions: > 1. Does anyone see a way to improve on this time? Yes. See below. > 2. Could I also arrange to use an 8-byte sub-field? As you have all the bytes in RAM, you do not have to convert them to Euphoria integers etc... You can use the RAM-based string searching routines built in to Windows. integer kernel32 integer CompareString kernel32 = open_dll("kernel32.dll") CompareString = define_c_func(kernel32, "CompareStringA", {C_UINT,C_UINT,C_UINT,C_INT,C_UINT,C_INT},C_INT) And assuming you have the file loaded into RAM at the address stored in 'RAMADDR' then you can do constant CSTR_ERROR = 0 constant CSTR_LESS_THAN = 1 constant CSTR_EQUAL = 2 constant CSTR_GREATER_THAN = 3 atom FindStr integer offset atom result integer len constant recsize = 12 sequence TheRecord -- Find first text record that begins with 'abcd'. FindStr = allocate_string("abcd") len = 4 offset = 0 while offset < FileSize do result = c_func(CompareString,{0, 0, RAMADDR+offset, len, FindStr, len}) if result = CSTR_ERROR or result = CSTR_EQUAL then exit end if offset += recsize end while -- Copy the whole record into a sequence TheRecord = peek({RAMADDR+offset, recsize}) > 3. Is there some way to persuade EU "position()" to allow the > display line to be greater than 24 (using vanilla DOS)? integer maxlines maxlines = text_rows(50) --Values of 25, 28, 43 and 50 lines are supported by most video cards. -- Derek