Re: Mac text files and gets()

new topic     » topic index » view thread      » older message » newer message

Andy Drummond wrote:
> 
> Robert Craig wrote:
> > 
> > CChris wrote:
> > > Actually, the mod as I implemented it works fine under DOS/Windows,
> > > because
> > > the OS takes care of removing the \r.
> > > Under Linux/BSD:
> > > * currently, doing gets() on a DOS text file results in returned lines
> > > being
> > > terminated with \r (I doubt the OS filters it out).
> > > * with my curent implementation, the trailing \r would disappear, but a
> > > spurious
> > > extra empty line would be generated by the \n part.
> > > 
> > > If we are to avoid this, then it looks like, under Linux/BSD, we'd need a
> > > one
> > > char lookahead buffer for gets(). Its contents would be needed so as to
> > > recoginse
> > > \r\n as \n, eliminating both the trailing \r and the spurious empty ine.
> > > However,
> > > this also means that an extra char has been read at almost all times:
> > > * open(), close() and seek() must invalidate the buffer;
> > > * where() must adjust back one char if the buffer is valid.
> > > This doesn't incur any noticeable performance penalty under Windows: it's
> > > the
> > > same trick I used to remove the get() quirk of needing an extra space
> > > always.
> > > Didn't take the time to test under colinux - my fault, should have read
> > > their
> > > wiki first so as to get it running.
> > 
> > On Linux and FreeBSD, there is no distinction 
> > (in Euphoria or in general) between opening
> > a file in "r" mode (text) versus "rb" mode (binary). 
> > The Unix world got it right. There is no silly distinction between
> > "text" and "binary" files. A file is a file. Period.
> > And '\n' is the *only* character that indicates the end of a line.
> > '\r' is irrelevant. The O/S never tries to secretly fiddle around
> > with any bytes or combinations of bytes.
> > 
> > So you wouldn't be able to limit your "enhancement" to just "r" mode,
> > since Linux/FreeBSD users expect "r" and "rb" to both mean 
> > exactly the same thing:
> >   "give me the straight goods - no smoke and mirrors, no lying
> >   about what bytes are really there".
> > 
> 
> I am trying to think back to my DOS days. Isn't it true that when you
> write a text file with CRLF, the OS strips the CR and stores just a LF?
> Then when you read a text file, the OS adds the CR to create the CRLF
> pair which, as Matt says, back in the days of teletypes, you needed.
> CR first and LF second was vital!

No. In text file, ENTER is stored as CRLF (#0D #0A).
Using edit.com: CTR-J (LF) is stored as #0A, CTR-M (CR) and ENTER as #0D #0A.
Using Notepad.exe: CTR-J, CTR-M and ENTER are stored as #0D #0A.

> So if we open ALL files as binary, and apply gets() to the binary file,
> it merely has to look for LF to terminate a line. The OS has less work
> to do, less memory allocation to fool with. And gets() doesn't have to
> look ahead to see if CR is actually the first byte of CRLF. Then gets()
> could just accept CR or LF as end-of-line and return the line. DOS files,
> Unix files, Mac files, all work the same.
> Oh, and thanks for the comments on string types; I think if it involves
> fundamental design changes then we forget it; Euphoria is too good to
> risk messing up at this stage of the game. Yes, if I want real strings
> I can use C or write a DLL....
> Andy
> 
> > Regards,
> >    Rob Craig
> >    Rapid Deployment Software
> >    <a href="http://www.RapidEuphoria.com">http://www.RapidEuphoria.com</a>

Regards,
Fernando

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu