Re: Mac text files and gets()
- Posted by Fernando Bauer <fmbauer at ho?ma?l.com> Sep 14, 2007
- 537 views
Andy Drummond wrote: > > Robert Craig wrote: > > > > CChris wrote: > > > Actually, the mod as I implemented it works fine under DOS/Windows, > > > because > > > the OS takes care of removing the \r. > > > Under Linux/BSD: > > > * currently, doing gets() on a DOS text file results in returned lines > > > being > > > terminated with \r (I doubt the OS filters it out). > > > * with my curent implementation, the trailing \r would disappear, but a > > > spurious > > > extra empty line would be generated by the \n part. > > > > > > If we are to avoid this, then it looks like, under Linux/BSD, we'd need a > > > one > > > char lookahead buffer for gets(). Its contents would be needed so as to > > > recoginse > > > \r\n as \n, eliminating both the trailing \r and the spurious empty ine. > > > However, > > > this also means that an extra char has been read at almost all times: > > > * open(), close() and seek() must invalidate the buffer; > > > * where() must adjust back one char if the buffer is valid. > > > This doesn't incur any noticeable performance penalty under Windows: it's > > > the > > > same trick I used to remove the get() quirk of needing an extra space > > > always. > > > Didn't take the time to test under colinux - my fault, should have read > > > their > > > wiki first so as to get it running. > > > > On Linux and FreeBSD, there is no distinction > > (in Euphoria or in general) between opening > > a file in "r" mode (text) versus "rb" mode (binary). > > The Unix world got it right. There is no silly distinction between > > "text" and "binary" files. A file is a file. Period. > > And '\n' is the *only* character that indicates the end of a line. > > '\r' is irrelevant. The O/S never tries to secretly fiddle around > > with any bytes or combinations of bytes. > > > > So you wouldn't be able to limit your "enhancement" to just "r" mode, > > since Linux/FreeBSD users expect "r" and "rb" to both mean > > exactly the same thing: > > "give me the straight goods - no smoke and mirrors, no lying > > about what bytes are really there". > > > > I am trying to think back to my DOS days. Isn't it true that when you > write a text file with CRLF, the OS strips the CR and stores just a LF? > Then when you read a text file, the OS adds the CR to create the CRLF > pair which, as Matt says, back in the days of teletypes, you needed. > CR first and LF second was vital! No. In text file, ENTER is stored as CRLF (#0D #0A). Using edit.com: CTR-J (LF) is stored as #0A, CTR-M (CR) and ENTER as #0D #0A. Using Notepad.exe: CTR-J, CTR-M and ENTER are stored as #0D #0A. > So if we open ALL files as binary, and apply gets() to the binary file, > it merely has to look for LF to terminate a line. The OS has less work > to do, less memory allocation to fool with. And gets() doesn't have to > look ahead to see if CR is actually the first byte of CRLF. Then gets() > could just accept CR or LF as end-of-line and return the line. DOS files, > Unix files, Mac files, all work the same. > Oh, and thanks for the comments on string types; I think if it involves > fundamental design changes then we forget it; Euphoria is too good to > risk messing up at this stage of the game. Yes, if I want real strings > I can use C or write a DLL.... > Andy > > > Regards, > > Rob Craig > > Rapid Deployment Software > > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a> Regards, Fernando