Re: Mac text files and gets()

new topic     » topic index » view thread      » older message » newer message

CChris wrote:
> > > > Since it is about as likely, under Windows, to have Unix files or Mac
> > > > files
> > > > on one's HD, 

I don't agree.
The Internet is largely based on Unix / Linux servers,
not Mac. Many Internet files use simply '\n' as the line terminator
which can generally be handled OK by most DOS/Windows programs.
On DOS/Windows "\r\n" is the standard for files opened in "text" mode.
On DOS/Windows, '\r' by itself does not mean end-of-line.

> > > > isn't fixing/extending gets() to handle both formats a good idea?
> > > > No need to modify the behaviour of EGets() for stdin, so changing lines
> > > > 2668-72
> > > > in be_runtime.c from
> > > > 		if (c <= '\n') {
> > > > 		    if (c == '\n') {
> > > > 			break;
> > > > 		    }
> > > > 
> > > > to
> > > > 		if (c <= '\r') {
> > > > 		    if (c == '\n') {
> > > > 			break;
> > > > 		    }
> > > > 		    if (c == '\r') {
> > > > 			c = '\n';

So you are going to lie, and say that there's a '\n'
in the input, when it's really a '\r'?

> > > >                         break;
> > > > 		    }
> > > > should be enough. The extra test is taken only if a character in the
> > > > 0.13 range
> > > > (a control charatcter) is read, so the impact should be nil on text
> > > > files.

Admittedly very small, but not "nil".

> > > > Did I miss some ripple effect?

We won't know how many ripples until code starts breaking,
or other places in Euphoria that subtly assume DOS/Windows, not Mac,
line terminators start popping up.
 
> > > Since there were exactly zero comments regarding this earler post, I'll
> > > update
> > > the backend and the doc for gets() this weekend, unless there is a late
> > > outcry.
> > 
> > I don't think this is a good idea.
> > 
> > > Addiionally, I'll consider changing the second test to "<=", so that VT
> > > and
> > > FF are also treated as EOL by gets(). This can be useful when importing
> > > files
> > > output by mainframe computers, and is already implemented in the D
> > > language.
> > 
> > I don't think this is a good idea either.
> > 
> > You will break some existing Euphoria for Windows/DOS programs
> > in the name of added convenience in reading Mac files, an O/S
> > that currently is not supported by Euphoria. Wait until there
> > is a Mac version of Euphoria.
> > I'm sure there are other Euphoria programs, such as ed,
> > and possibly other places in the interpreter,
> > that also assume DOS/Windows (or sometimes) Linux, 
> > line terminators, so your "enhancement" will not be complete, 
> > and will just cause confusion.
> > There's already enough confusion in DOS/Windows
> > due to the silly distinction Microsoft makes between 
> > "text" and "binary" files. Don't make it even more confusing.
> Huh?
> The point is to make isolated \r (and possibly ASCII 11 and 12) be _also_
> recognised
> as line terminators, not instead, not anything more.

You make it sound like this is purely an enhancement,
that can't harm anyone. Not true.

> If gets() encounters ASCII 10, it currently treats it as EOL and returns a
> line
> ending in \n (which happens to be the same);
> If gets() encounters ASCII 13, then ASCII 10, it does the same.

The DOS/Windows O/S itself does that "\r\n" -> "\n" thing automatically, 
when the file is opened in "r" (text) mode, but not when it's opened 
in "rb" (binary) mode. If you look at the source code for gets(),
you won't see any test for '\r', just '\n'.

> The proposed enhancement is to treat ASCII 13 not followed by ASCII 10 in the
> same way. This can be done bt treating \r as line terminator on its own, and
> ignoring a \n that would immediately follow.
> 
> I have the feeling, as I read your comment, that you understood something else
> and farther reaching. Not sure what, though. The behaviour of gets() on
> Unix/DOS/Windows
> text files would _not_ be changed at all. 
> The only change is that, if a file
> is opened in text mode and has isolated \r (ie not followed by \n) inside,
> these
> will be read as EOL while they currently aren't. Which sort of code will this
> break, and which sort of text files Eu code currently processes would exhibit
> this pattern?

Code breakage would be rare, provided you limit this feature to
"r" mode, (I sometimes use gets() in binary "rb" mode). 
However Euphoria programs that would want 
to directly read Mac text files on DOS/Windows would also be rare. 

Why not simply write a trivial program to convert the 
Mac \r's to \n's or \r\n's?
Or write a tiny subroutine, mac_gets() that uses getc() in "rb" mode
to look for the \r's and return the line of text?
You could do the same for IBM characters.

Don't make the rest of us DOS/Windows users have to worry 
about the line terminator standards of foreign systems.
The DOS/Windows "\r\n" nonsense is already complicated enough.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu