Re: Mac text files and gets()

new topic     » topic index » view thread      » older message » newer message

Pete Lomax wrote:
> 
> Rob Craig wrote:
> > Code breakage would be rare, provided you limit this feature to
> > "r" mode, (I sometimes use gets() in binary "rb" mode). 
> 
> That gets my vote, however AFAICT, the code CChris is proposing to modify is:
> 
> 	    // not stdin - faster loop
> 	    do { 
> 		TempBuff[i++] = c;
> 		if (c <= '\n') {
> 		    if (c == '\n') {
> 			break;
> 		    }
> 		    else if (c == EOF) {
> 			i--;
> 			break;
> 		    }
> 		}
> 		if (i == TEMP_SIZE)
> 		    break;
> 		c = getc(f);
> 	    } while (TRUE);
> 
> which, cmiiw, is not the cause of discarding the '\r', so this whole thread
> is probably moot anyway...
> 

Actually, the mod as I implemented it works fine under DOS/Windows, because the
OS takes care of removing the \r.
Under Linux/BSD:
* currently, doing gets() on a DOS text file results in returned lines being
terminated with \r (I doubt the OS filters it out).
* with my curent implementation, the trailing \r would disappear, but a spurious
extra empty line would be generated by the \n part.

If we are to avoid this, then it looks like, under Linux/BSD, we'd need a one
char lookahead buffer for gets(). Its contents would be needed so as to recoginse
\r\n as \n, eliminating both the trailing \r and the spurious empty ine. However,
this also means that an extra char has been read at almost all times:
* open(), close() and seek() must invalidate the buffer;
* where() must adjust back one char if the buffer is valid.
This doesn't incur any noticeable performance penalty under Windows: it's the
same trick I used to remove the get() quirk of needing an extra space always.
Didn't take the time to test under colinux - my fault, should have read their
wiki first so as to get it running.

CChris

> CChris wrote:
> >Only trick is that \r\r means two lines, \n\n too,
> No probs.
> > \r\n is one line
> Looks to me like your suggested mod would just treat that as \n\n anyway
> > and \n\r... well, left alone as unsupported.
> FWIW, the way I would have attempted this is somthing like:
> 
> change:
> 
>     // get first char
>     c = getc(f);
> 
>     if (c == EOF)
> ...
> 	    do { 
> 		TempBuff[i++] = c;
> 		if (c <= '\n') {
> 		    if (c == '\n') {
> 			break;
> 		    }
> 
> to:
> 
>     // get first char
>     c = getc(f);
> 
>     if (c == EOF)
> 
>     else if (c==skippable) {
>        c=getc(f);
>        skippable=EOF   \\ put back to initial state
>        }
> ...
> 	    do { 
> 		TempBuff[i++] = c;
> 		if (c <= '\r') {
> 		    if (c == '\n') {
>                         skippable = '\r'
> 			break;
> 		    }
> 		    if (c == '\r') {
>                         c = '\n';
>                         skippable = '\n'
> 			break;
> 		    }
> 
> I would expect any cost this might have to be well under 4% for the average
> text file, but of course that would neeed to be verified. Plus the new var
> skippable
> would best be file-specific.
> 
> Regards,
> Pete

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu