1. Re: Mac text files and gets()
- Posted by Andy Drummond <andy at ?estrelt?le.com> Sep 14, 2007
- 517 views
- Last edited Sep 15, 2007
Robert Craig wrote: > > CChris wrote: > > Actually, the mod as I implemented it works fine under DOS/Windows, because > > the OS takes care of removing the \r. > > Under Linux/BSD: > > * currently, doing gets() on a DOS text file results in returned lines being > > terminated with \r (I doubt the OS filters it out). > > * with my curent implementation, the trailing \r would disappear, but a > > spurious > > extra empty line would be generated by the \n part. > > > > If we are to avoid this, then it looks like, under Linux/BSD, we'd need a > > one > > char lookahead buffer for gets(). Its contents would be needed so as to > > recoginse > > \r\n as \n, eliminating both the trailing \r and the spurious empty ine. > > However, > > this also means that an extra char has been read at almost all times: > > * open(), close() and seek() must invalidate the buffer; > > * where() must adjust back one char if the buffer is valid. > > This doesn't incur any noticeable performance penalty under Windows: it's > > the > > same trick I used to remove the get() quirk of needing an extra space > > always. > > Didn't take the time to test under colinux - my fault, should have read > > their > > wiki first so as to get it running. > > On Linux and FreeBSD, there is no distinction > (in Euphoria or in general) between opening > a file in "r" mode (text) versus "rb" mode (binary). > The Unix world got it right. There is no silly distinction between > "text" and "binary" files. A file is a file. Period. > And '\n' is the *only* character that indicates the end of a line. > '\r' is irrelevant. The O/S never tries to secretly fiddle around > with any bytes or combinations of bytes. > > So you wouldn't be able to limit your "enhancement" to just "r" mode, > since Linux/FreeBSD users expect "r" and "rb" to both mean > exactly the same thing: > "give me the straight goods - no smoke and mirrors, no lying > about what bytes are really there". > I am trying to think back to my DOS days. Isn't it true that when you write a text file with CRLF, the OS strips the CR and stores just a LF? Then when you read a text file, the OS adds the CR to create the CRLF pair which, as Matt says, back in the days of teletypes, you needed. CR first and LF second was vital! So if we open ALL files as binary, and apply gets() to the binary file, it merely has to look for LF to terminate a line. The OS has less work to do, less memory allocation to fool with. And gets() doesn't have to look ahead to see if CR is actually the first byte of CRLF. Then gets() could just accept CR or LF as end-of-line and return the line. DOS files, Unix files, Mac files, all work the same. Oh, and thanks for the comments on string types; I think if it involves fundamental design changes then we forget it; Euphoria is too good to risk messing up at this stage of the game. Yes, if I want real strings I can use C or write a DLL.... Andy > Regards, > Rob Craig > Rapid Deployment Software > <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a>