1. HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 18, 2000
- 542 views
I'm having a problem in using the 'seek' function. Being new to Euphoria, I'm sure that the problem is my own, but, after struggling with it for days, I haven't been able to figure out why the 'gets' function seems not to be reading from the position (in the file) that the 'seek' pointed to . I hope someone on the list can explain what I'm doing wrong. The program and a test data file are below. (The entire program is included, since I don't know where in the program the problem originates). The program is intended to do the following: 1. Read all records in the test data file to verify (a) number of records, (b) number of bytes per record, which should be 99 (including the line-feed). 2. From the end-of-file position, it is to back up 3 records, read and process the last 3 records. The problem begins at the comment -- Back up 3 Records -- After read the 10 records, the 'seek' seems to correctly position correctly at the first byte of record no. 7. However the 'gets' function which follows the 'seek ' does not read the file beginning at the byte positon pointed to by 'seek', but begins reading at a position 7 bytes before that point. The number of bytes (too early) varies with the number of records in the file (229 record file causes the 'gets' to begin reading 219 bytes too far back in the file). Any suggestions or solutions will be gratefully appreciated. (Using Windows 98, the Complete Edition of Euphoria, and the ED editor) Thanks. Jim -- Program begins here --- -- read last 3 records of a file include get.e include file.e with trace allow_break(1) sequence Characteristics,fullpath,drive,file integer Loops,FieldNo object LowerShadow1,Day1Range,Body1,Record,AvgClose atom z,sum,EOF,byteno function getpath() -------------------------------------------------------------------- -- read a text file into a sequence -- TEMPORARILY COMMENTED OUT FOR TESTING -- drive = "c" -- puts(1, "\n Enter Drive Letter: ") -- drive = gets(0) -- get the drive letter in a sequence. --It has a linefeed attached that will need to be stripped. -- puts(1, "\n Enter Directory Name: ") -- start new line for better readability -- directry = gets(0) -- get the directry name in a sequence -- puts(1,"\n") -- puts(1, "\n Enter File Name: ") -- start new line for better readability -- file = gets(0) -- get the file name in a sequence -- puts(1,"\n") -- fullpath = drive[1..length(drive)-1] & ":\\" -- strip the \n and concatenate the ":\\" to the end of the drive -- directry = directry[1..length(directry)-1] & "\\" -- fullpath = fullpath &directry &file[1..length(file)-1] -- concatenate the file & dir name to the drive letter and ":\\" --------------------------------------- -- FOR TESTING ONLY --- fullpath = "C:\\learn\\testdata.txt" -- --------------------------------------- return fullpath end function -------------------------------------------------------------------------- x = repeat (0,99) RecLen = 0 -- should be 99; 98 bytes of data, one byte line feed marker probably '\n' RecNo = 0 true = 1 false = 0 len = 0 AvgClose = 0 Array = {} Loops = 0 Characteristics = {} fullpath = getpath() DataFile = open(fullpath,"r") -- open for reading if DataFile = -1 then puts(1, "Couldn't Open specified textfile \n") abort(1) end if -- how many records are in the file? while true do Record = gets(DataFile) if atom(Record) then exit end if -- eof Date1 = Record[2..11] puts(1,"\n Date is ") puts(1,Date1) -- verify that all record lengths are what's expected Loops += 1 if Loops < 2000 then -- tot. recs is lots less--- eof will occur first if sequence(Record) then RecLen = length(Record) if RecLen != 99 then printf(1,"Unexpected Record Length is %5d\n ...Should Be 99...Aborting",RecLen) close(DataFile) abort(1) end if end if end if RecNo += 1 end while -- at eof trace(1) TotRecs = RecNo TotBytes = TotRecs*RecLen ------------------------ -- back up 3 records -- ------------------------ Back3 = TotBytes - (3*RecLen) StartAt = seek(DataFile,Back3) if StartAt != 0 then puts(1, "seek didn't work\n") abort(1) end if byteno = where(DataFile) ? byteno -- s/b 693 RecNo = 0 while true do Record = gets(DataFile) -- error -- does not read from correct positon. if atom(Record) then exit -- if Record = -1, at end of file end if RecNo += 1 Date1 = Record[2..11] Open1 = Record[15..25] High1 = Record[29..39] Low1 = Record[43..53] Close1 = Record[57..67] Volume1 = Record[71..81] OpenInt1 = Record[85..97] -- final quote mark in 98, line feed in 99 Open1Val = value(Open1) High1Val = value(High1) Low1Val = value(Low1) Close1Val = value(Close1) Volume1Val = value(Volume1) OpenInt1Val = value(OpenInt1) Day1Range = High1Val[2] - Low1Val[2] if Open1Val[2] > Close1Val[2] then Body1 = "black" UpperShadow1 = High1Val[2] - Open1Val[2] LowerShadow1 = Close1Val[2] - Low1Val[2] elsif Open1Val[2] < Close1Val[2] then Body1 = "white" UpperShadow1 = High1Val[2] - Close1Val[2] LowerShadow1 = Open1Val[2] - Low1Val[2] elsif Open1Val[2] = Close1Val[2] then Body1 = "Doji" UpperShadow1 = High1Val[2] - Open1Val[2] LowerShadow1 = Close1Val[2] - Low1Val[2] end if printf(1,"Daily Range = %9.5f\n", Day1Range) printf(1,"Upper Shadow = %9.5f\n", UpperShadow1) printf(1,"Lower Shadow = %9.5f\n", LowerShadow1) puts(1,Body1) puts(1,Record) printf(1,"Record No. %5f\n",RecNo) -- put the record into an array (7x3) (7 fields, 3 days) Array = append(Array,Date1) Array = append(Array,Open1Val[2]) Array = append(Array,High1Val[2]) Array = append(Array,Low1Val[2]) Array = append(Array,Close1Val[2]) Array = append(Array,Volume1Val[2]) Array = append(Array,OpenInt1Val[2]) end while --trace(1) for i = 1 to FieldNo*7 by 1 do -- 7 fields per record if i = 1 or i = 8 or i = 15 then -- fields containing Date mm\dd\yyyy printf(1,"\n%s", {Array[i]}) else printf(1,"%11.5f", Array[i]) -- field width 11, 5 dec places end if end for -- now let's average the closes for i = 5 to FieldNo*7 by 7 do -- 'cuz 7 fields per rec, close is field 5 TotClose = TotClose + Array[i] end for AvgClose = AvgClose / RecNo printf(1,"\n%11.5f", AvgClose) -- field width 11, 5 dec places close (DataFile) --------------------------- -- ascii data file follows: --------------------------- "07/31/2000"," 1437.00000"," 1449.00000"," 1429.80000"," 1438.90000"," 70442"," 375470" "08/01/2000"," 1442.00000"," 1454.50000"," 1439.00000"," 1447.50000"," 62190"," 376523" "08/02/2000"," 1447.00000"," 1461.00000"," 1443.10000"," 1452.50000"," 58432"," 374653" "08/03/2000"," 1437.00000"," 1465.00000"," 1433.00000"," 1461.50000"," 58007"," 376075" "08/04/2000"," 1467.50000"," 1473.50000"," 1461.00000"," 1471.70000"," 65827"," 379646" "08/07/2000"," 1474.00000"," 1490.50000"," 1470.50000"," 1486.20000"," 53438"," 378166" "08/08/2000"," 1482.00000"," 1494.00000"," 1481.50000"," 1491.70000"," 41723"," 377278" "08/09/2000"," 1497.00000"," 1498.00000"," 1480.00000"," 1481.00000"," 47953"," 374466" "08/10/2000"," 1482.00000"," 1483.00000"," 1468.00000"," 1474.30000"," 51867"," 375371" "08/11/2000"," 1470.00000"," 1484.50000"," 1461.00000"," 1478.50000"," 42623"," 376597"
2. Re: HELP with 'seek'
- Posted by Robert Craig <rds at ATTCANADA.NET> Aug 18, 2000
- 513 views
Jim writes: > I'm having a problem in using the 'seek' function... > ...DataFile = open(fullpath,"r") -- open for reading When you use seek(), you have to keep track of byte offsets within a file very carefully. Opening a file as "r" (read in text mode) is going to cause you no end of trouble. It would be better to open the file as "rb" (read in binary mode) and realize that each line of text (each record) normally ends with two characters: \r\n. I believe seek() looks at all the characters in the file, whereas gets() will only "see" \n, not \r\n as is actually stored in the file. The division of I/O on DOS/Windows into "text" vs. "binary" modes was completely unnecessary, and has caused confusion for many years. I guess it was a quick kludge to make some IBM printer work better, back in the early 80's. Regards, Rob Craig Rapid Deployment Software http://www.RapidEuphoria.com
3. Re: HELP with 'seek'
- Posted by Irv Mullins <irv at ELLIJAY.COM> Aug 18, 2000
- 521 views
On Fri, 18 Aug 2000, you wrote: > Jim writes: > > I'm having a problem in using the 'seek' function... > > ...DataFile = open(fullpath,"r") -- open for reading > > When you use seek(), you have to keep track of > byte offsets within a file very carefully. > Opening a file as "r" (read in text mode) is going to > cause you no end of trouble. It would be better to > open the file as "rb" (read in binary mode) and realize > that each line of text (each record) > normally ends with two characters: \r\n. > > I believe seek() looks at all the characters in the file, > whereas gets() will only "see" \n, not \r\n as is actually > stored in the file. > > The division of I/O on DOS/Windows into "text" vs. > "binary" modes was completely unnecessary, and > has caused confusion for many years. I guess it > was a quick kludge to make some IBM printer > work better, back in the early 80's. In addition, it may be easier to use get() to read these lines in. This will allow for format variations and eliminate the need to cast strings into values. Depending upon how many lines are in the actual data you're working with, it would probably be much easier to just load them all into a sequence, and work with them there. See code below. Beyond that, I note that (1) the sample data only contains 86 characters plus a line feed (except for the last line) no 99 that your code is looking for. Maybe your e-mail client (or mine) clipped the lines? Regards, Irv include get.e constant DATE = 1, OPEN1 = 2, HIGH1 = 3, LOW1 = 4, CLOSE1 = 5, VOLUME1 = 6, OPENINT1 = 7 object data, line atom fn function readline() object line, item line = {} for i = DATE to OPENINT1 do item = get(fn) if item[1] = GET_SUCCESS then line = append(line,item[2]) else return -1 end if end for return line end function fn = open("testdata.txt","r") data = {} while 1 do line = readline() if atom(line) then exit else data = append(data,line) end if end while close(fn) printf(1,"%d records read\n",length(data)) for i = 1 to length(data) do printf(1,"Date: %s\n",{data[i][DATE]}) end for
4. Re: HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 18, 2000
- 503 views
Rob, Thanks for the help. I wasn't aware of the 'seek' function looking at any character beyon the '\n'. I'll do as you suggest re:open(fn,"rb"). Sorry my e-mail composer messed up the indentation and wrapping of the program lines so badly. It's almost unreadable. Regards, Jim Robert Craig wrote: > Jim writes: > > I'm having a problem in using the 'seek' function... > > ...DataFile = open(fullpath,"r") -- open for reading > > When you use seek(), you have to keep track of > byte offsets within a file very carefully. > Opening a file as "r" (read in text mode) is going to > cause you no end of trouble. It would be better to > open the file as "rb" (read in binary mode) and realize > that each line of text (each record) > normally ends with two characters: \r\n. > > I believe seek() looks at all the characters in the file, > whereas gets() will only "see" \n, not \r\n as is actually > stored in the file. > > The division of I/O on DOS/Windows into "text" vs. > "binary" modes was completely unnecessary, and > has caused confusion for many years. I guess it > was a quick kludge to make some IBM printer > work better, back in the early 80's. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com
5. Re: HELP with 'seek'
- Posted by Irv Mullins <irv at ELLIJAY.COM> Aug 18, 2000
- 523 views
On Fri, 18 Aug 2000, I sent the backup version of the file. Let's try again with the _current_ one :) -- Am I reading your code correctly, you're only using the last three records? -- fI so, the following should work without warning include get.e constant DATE = 1, OPEN1 = 2, HIGH1 = 3, LOW1 = 4, CLOSE1 = 5, VOLUME1 = 6, OPENINT1 = 7 atom DayRange1, UpperShadow1, LowerShadow1 object data, line atom fn function readline() object line, item line = {} for i = DATE to OPENINT1 do item = get(fn) if item[1] = GET_SUCCESS then line = append(line,item[2]) else return -1 end if end for return line end function fn = open("testdata.txt","r") data = {} while 1 do line = readline() if atom(line) then exit else data = append(data,line) end if end while close(fn) printf(1,"%d records read\n",length(data)) for i = 1 to length(data) do printf(1,"Date: %s\n",{data[i][DATE]}) for j = OPEN1 to OPENINT1 do data[i][j] = value(data[i][j]) data[i][j] = data[i][j][2] end for end for -- discard all but last 3 records; data = data[length(data)-3..length(data)] clear_screen() puts(1,"Last 3 days\n") puts(1,"Date Open High Low Close Volume OpenInt\n") for i = 1 to length(data) do printf(1,"%s %4.5f %4.5f %4.5f %8d %8d \n",data[i]) end for
6. Re: HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 18, 2000
- 515 views
Rob, Thanks for the information. Your solution solved the problem, and taught me some things I didn't know. Thanks for being a willing teacher. Regards, Jim Robert Craig wrote: > Jim writes: > > I'm having a problem in using the 'seek' function... > > ...DataFile = open(fullpath,"r") -- open for reading > > When you use seek(), you have to keep track of > byte offsets within a file very carefully. > Opening a file as "r" (read in text mode) is going to > cause you no end of trouble. It would be better to > open the file as "rb" (read in binary mode) and realize > that each line of text (each record) > normally ends with two characters: \r\n. > > I believe seek() looks at all the characters in the file, > whereas gets() will only "see" \n, not \r\n as is actually > stored in the file. > > The division of I/O on DOS/Windows into "text" vs. > "binary" modes was completely unnecessary, and > has caused confusion for many years. I guess it > was a quick kludge to make some IBM printer > work better, back in the early 80's. > > Regards, > Rob Craig > Rapid Deployment Software > http://www.RapidEuphoria.com
7. Re: HELP with 'seek'
- Posted by Al Getz <xaxo at AOL.COM> Aug 18, 2000
- 515 views
- Last edited Aug 19, 2000
On reading files using r, rb seek(), get(), gets() , where() etc.: I dont know if this helps but, i was working on a project some time ago that required repeated comparisons of data stored in a file in the form of text (the file written using printf() statements). The problem was, because you never knew where you were going to have to go next in the file and the file was very large, you couldnt take the time to read through the whole file time after time just to get one or two items randomly placed in the file. Since it was all text lines gets() seemed like a good candidate method for reading back the data without having to program a get() interface to allow random accessing of variable length 'records'. The solution came out somewhat simple: [A] 1. open the file in "r" mode 2. while reading all the records through once do: do a where() followed by a gets() and log all the addresses returned by where() in a sequence. If you want you can save the first two letters in the same sequence to function as a hash alpha lookup. This would yield sub seq's such as: {100,"ab"},{123,"ac"},... 3. now that you have the address for every text group its simply a matter of using seek() followed by gets() to get to the data. [B] if your data is mixed (not all text) then you simply use an alternating series of gets() for text fields and get() for other fields. You only need to log 'where()' addresses once for the first field of each 'record'. Ultimately, also record the next record address within the file and you've got a linked list on the next run. Record the previous address also and you can query up and down the list as well. [C] if your data is doubly mixed (not all records are the same type) then simply make the first field the format type identifier. [D] if your file is so large its not practical to read through the whole file at startup, you can start a separate file to record where()'s whenever a record IS found during the normal application run. Each time the app runs more and more records are located making the time to locate data less and less each time. Of course when something is added to the file the location is stored at the same time. [E] ive also had great success with using home made delimiters chosen such that an occurance of the delimiter char(s) never or seldom occur naturally in the target data, or only in known locations. Filenames are a good example as quite a few characters are not allowed. Hope this helps some, if not perhaps in another project. Good luck, --Al Getz
8. Re: HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 20, 2000
- 519 views
Irv, Again, Thanks. Lots of interesting stuff for me to study here. I like your assignment of field numbers as constants. Make following the code a lot clearer. Regards, Jim Duffy Irv Mullins wrote: > On Fri, 18 Aug 2000, I sent the backup version of the file. > Let's try again with the _current_ one :) > > -- Am I reading your code correctly, you're only using the last three records? > -- fI so, the following should work > > without warning > include get.e > > constant DATE = 1, OPEN1 = 2, HIGH1 = 3, LOW1 = 4, > CLOSE1 = 5, VOLUME1 = 6, OPENINT1 = 7 > > atom DayRange1, UpperShadow1, LowerShadow1 > > object data, line > atom fn > > function readline() > object line, item > line = {} > for i = DATE to OPENINT1 do > item = get(fn) > if item[1] = GET_SUCCESS then > line = append(line,item[2]) > else return -1 > end if > end for > return line > end function > > fn = open("testdata.txt","r") > data = {} > while 1 do > line = readline() > if atom(line) then exit > else data = append(data,line) > end if > end while > close(fn) > > printf(1,"%d records read\n",length(data)) > > for i = 1 to length(data) do > printf(1,"Date: %s\n",{data[i][DATE]}) > for j = OPEN1 to OPENINT1 do > data[i][j] = value(data[i][j]) > data[i][j] = data[i][j][2] > end for > end for > > -- discard all but last 3 records; > data = data[length(data)-3..length(data)] > > clear_screen() > puts(1,"Last 3 days\n") > puts(1,"Date Open High Low Close Volume OpenInt\n") > for i = 1 to length(data) do > printf(1,"%s %4.5f %4.5f %4.5f %8d %8d \n",data[i]) > end for
9. Re: HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 20, 2000
- 505 views
Al, > Hope this helps some, if not perhaps in another project.Good luck, > --Al Getz Yes, it helped a lot, although I not finished studying your concepts. Your response has been something of a mini-course on data access, for me. Thanks for taking some much time to explain your thinking. I will be a while studying it and implementing your ideas. > if your file is so large its not practical to read through > the whole file at startup, Yeah, some of the files I'll be processing are pretty large, 17,000 records of 100 bytes each (1.7million bytes). Your solution 'D' sounds unique... and one of the first I'll implment. Thanks for all the help. Regards, Jim Duffy I Al Getz wrote: > On reading files using r, rb seek(), get(), gets() , where() etc.: > > I dont know if this helps but, i was working on a project some time > ago that required repeated comparisons of data stored in a file in the form > of text (the file written using printf() statements). The problem was, > because you never knew where you were going to have to go next in the > file and the file was very large, you couldnt take the time to read > through the whole file time after time just to get one or two items > randomly placed in the file. Since it was all text lines gets() seemed > like a good candidate method for reading back the data without having to > program a get() interface to allow random accessing of variable > length 'records'. > > The solution came out somewhat simple: > > [A] > > 1. open the file in "r" mode > 2. while reading all the records through once do: > do a where() followed by a gets() and > log all the addresses returned by where() in a sequence. > If you want you can save the first two letters in the same > sequence to function as a hash alpha lookup. This would yield > sub seq's such as: {100,"ab"},{123,"ac"},... > 3. now that you have the address for every text group its simply a > matter of using seek() followed by gets() to get to the data. > > [B] > > if your data is mixed (not all text) then you simply use an > alternating series of gets() for text fields and get() for other > fields. You only need to log 'where()' addresses once for the > first field of each 'record'. > Ultimately, also record the next record address within the file and > you've got a linked list on the next run. Record the previous address > also and you can query up and down the list as well. > > [C] > > if your data is doubly mixed (not all records are the same type) > then simply make the first field the format type identifier. > > [D] > > if your file is so large its not practical to read through > the whole file at startup, you can start a separate file to record > where()'s whenever a record IS found during the normal application > run. Each time the app runs more and more records are located > making the time to locate data less and less each time. Of course > when something is added to the file the location is stored at the > same time. > > [E] > ive also had great success with using home made delimiters chosen > such that an occurance of the delimiter char(s) never or seldom > occur naturally in the target data, or only in known locations. > Filenames are a good example as quite a few characters are not > allowed. > > > > Hope this helps some, if not perhaps in another project.Good luck, > --Al Getz
10. Re: HELP with 'seek'
- Posted by Irv Mullins <irv at ELLIJAY.COM> Aug 20, 2000
- 502 views
On Sun, 20 Aug 2000, you wrote: > Irv, > > Again, Thanks. Lots of interesting stuff for me to study here. I like > your assignment of field numbers as constants. Make following the code > a lot clearer. I'm wondering how large is your data set? The technique of reading back to pick up the last three records would be necessary for HUGE data files, but more work than necessary for smaller sets (of, let's say, 10 - 20,000 lines). Better and faster to just read them into a sequence, and use the portion you want, especially since you eventually want the data in an array anyway. If you really are reading huge data sets, maybe a better way would be to find the end of the file, then read backwards until you count 3 linefeeds, then read forward from there. Regards, Irv
11. Re: HELP with 'seek'
- Posted by ck lester <cklester at YAHOO.COM> Aug 20, 2000
- 521 views
Or use EDS... > If you really are reading huge data sets, maybe a better way would be to > find the end of the file, then read backwards until you count 3 > linefeeds, then read forward from there. _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
12. Re: HELP with 'seek'
- Posted by Irv Mullins <irv at ELLIJAY.COM> Aug 20, 2000
- 513 views
On Sun, 20 Aug 2000, Jim wrote: > > Al wrote: > > if your file is so large its not practical to read through > > the whole file at startup, > > Yeah, some of the files I'll be processing are pretty large, 17,000 records > of 100 bytes each (1.7million bytes). Your solution 'D' sounds unique... > and one of the first I'll implment. > A couple of questions come to mind: Are these files always comma delimited, quoted strings? Do you have a choice? Secondly, are the files one-time use-and-discard, or are they more of a cumulative thing, which would make the use of EDS (suggested by ck leaster) a "good thing"? Regards, Irv
13. Re: HELP with 'seek'
- Posted by Jim <futures8 at PCOLA.GULF.NET> Aug 20, 2000
- 516 views
Irv, Thanks for the latest info. I'm new to Euphoria, and in study mode. What I'm currently doing is experimental, just to see how things work, like how many records can I read into a buffer, then how large an array can I make from the fields within those records, etc. In the process, I'm learning a lot about Euphoria, thanks to you, Rob Craig and other who've offer insights and suggestions. There are simpler ways of doing things, and you've pointed out many, which I appreciate and am implementing, for study purposes. One day, I'll be ready to get back to commercial development, using Euphoria. Thanks again for your kind help. Regards, Jim Duffy Irv Mullins wrote: > I'm wondering how large is your data set? The technique of reading back to > pick > up the last three records would be necessary for HUGE data files, but more > work than necessary for smaller sets (of, let's say, 10 - 20,000 lines). > Better and faster to just read them into a sequence, and use the portion you > want, especially since you eventually want the data in an array anyway. > > If you really are reading huge data sets, maybe a better way would be to > find the end of the file, then read backwards until you count 3 linefeeds, > then > read forward from there. > > Regards, > Irv