1. Accented characters in identifiers

Currently, if you use characters in the 128-255 range in identifiers, you will
get incongruous error messages, like "Result of a function must be assigned"
because you used a ó.

This comes from the shrouding method Euphoria had been using long ago. Rob
himself admitted supporting was becoming obsolte.

Implementationwise, the move is simple: change the character class of all those
chars from KEYWORD or BUILTIN to LETTER in the scanner. Nothing else (a couple if
branches and constants will become dead code).

Since characters that display as a letter in some code page may display
differently on another, I think including the whole 128..255 range as valid
characters is better than restricting it. If a char is valid somewhere, it must
be valid anywhere, even if it displays funny.

What do you think?

CChris

new topic     » topic index » view message » categorize

2. Re: Accented characters in identifiers

CChris wrote:
> 
> 
> Currently, if you use characters in the 128-255 range in identifiers, you will
> get incongruous error messages, like "Result of a function must be assigned"
> because you used a ó.
> 
> 
> What do you think?
> 

Being a unicode (I should say any non-ascii) dummy, how would this affect people
reading and using code with such characters? I do not want to discriminate at
all, but will we create two divisions in the libraries and also code
contributions to Euphoria core? For instance, I know English and a tiny, tiny bit
of Spanish. I'm sure others know multiple languages but spoken languages just was
something I've never been interested in. If a library has come out that does some
very cool things but even it's function names use characters > 128 that I don't
even know how to type of my keyboard, let alone what they mean, I cannot use it.

Now, this is *obviously* bigger than I. I can understand if you are not a native
English speaking person you're probably clenching your fists and steam is rising
from your head but I am just trying to understand the impact. I mean no offense
or to say that all programmers should speak English and forget their native
tongue. I'm just trying to understand.

--
Jeremy Cowgar
http://jeremy.cowgar.com

new topic     » goto parent     » topic index » view message » categorize

3. Re: Accented characters in identifiers

Jeremy Cowgar wrote:
> I do not want to discriminate at all...

"Look! They are one people and there is one language for them all...  Why
now, there is NOTHING that they may have in mind that will be unattainable
for them"

You may recognize this quote from the story of Babel.  The process of
thinking IS discriminating (whatever your native language, call me a bigot.)

Unicode is useful for data, not for code.  Otherwise, why not have the
languages keywords come in mutliple translations?  It's because of the
principle in that quote above.  Do you want things to be attainable?

Sometimes you have to limit yourself to make great results.  It would be
perfectly logical to say, "Let's do all our code in Unicode for one day
we might want to bring out different language versions of our compiler."

Except my keyboard expects me to write in English even iF other language
keyboards are available.  You can do a lot with 127 characters.  It's not
a valid assumption that you can do a lot more with 65,365 or so.

I shudder to think of the finely crafted Euphoria language trying to
become all things to all people.  It will never achieve greatness if
that's the road taken.  Keep in mind I say that while also on a quest
for the 'one true language.'  Perhaps we should code in hebrew? blink

new topic     » goto parent     » topic index » view message » categorize

4. Re: Accented characters in identifiers

Jeremy Cowgar wrote:
> 
> CChris wrote:
> > 
> > What do you think?
> 
> Being a unicode (I should say any non-ascii) dummy, how would this affect
> people
> reading and using code with such characters?

I think it would be great! I would definitely use the special characters.  
Whenever I write an open library I try anyway to choose function names 
that are obvious for the majority of the programmers. (See English speakers.)
But whenever I write a code for my own (CGI, database, etc.) I prefer my 
own language/words/characters. Why not?

Regards,

Salix
(hu-en-de)

new topic     » goto parent     » topic index » view message » categorize

5. Re: Accented characters in identifiers

I would have to agree with Jeremy and Ken in their posts. CChris had suggested
that accented characters be permitted in identifiers. While this might be
appealing to some, it may cause more trouble than it is worth.

The trouble with code pages is that they are all different in how they interpret
characters 128-255. Most of the Latin based code pages are close but the same can
not be said of others, such as those for Cyrillic, Greek, etc. If a program used
a character in this range it may display differently on a system with a different
code page. Depending on the font and code page used they may not even be
readable. I am sure the developers already know this but others may not.

These problems could be minimized (but not eliminated) if Euphoria were to use
some form of unicode, such as UTF8. Bu I don't think that the developers wish to
travel that road.

I think it best that identifiers be restricted to characters common to all code
pages - ASCII. This may annoy some who's native language is not english but I
think they will understand.

Larry Miller

new topic     » goto parent     » topic index » view message » categorize

6. Re: Accented characters in identifiers

I agree with Salix on this.  We could give the freedom of accented character to
programmers but  standard libraries and other codes distributed with euphoria
should be restricted to english. I rebember reading a contribution from a
euphoria user with identifiers although not accented were in a language I don't
understand. I gave up because it is hard to read code where all identifiers are
from an unknown language. But it not a problem as long as it not part of the
distribution. English is a de facto common language on this planet.

jacques
 

Salix wrote:
> 
> Jeremy Cowgar wrote:
> > 
> > CChris wrote:
> > > 
> > > What do you think?
> > 
> > Being a unicode (I should say any non-ascii) dummy, how would this affect
> > people
> > reading and using code with such characters?
> 
> I think it would be great! I would definitely use the special characters.  
> Whenever I write an open library I try anyway to choose function names 
> that are obvious for the majority of the programmers. (See English speakers.)
> But whenever I write a code for my own (CGI, database, etc.) I prefer my 
> own language/words/characters. Why not?
> 
> Regards,
> 
> Salix
> (hu-en-de)

new topic     » goto parent     » topic index » view message » categorize

7. Re: Accented characters in identifiers

I see two sides to this debate:
One, says we should use identifiers everyone can type:  That is a subset of
ASCII.  I don't understand how ADA programmers cope with their character set.

The other says okay, lets use English identifiers in libraries and core
keywords but in the programmers' code let us allow them to use his native
tongue.

I would like to add that sometimes routines that were intended to be internal
to a program sometimes get put into a .e file.  Sometimes these .e files 
get uploaded to the archive by some altruism of the programmmer.  This would
become less likely if the programmer would also have to translate their routine
names.

I think if we decide to include accented characters we should use Unicode 16bit
format.  The interpreter could branch and do the IO in a Unicode manner if it 
finds the byte-order word at the beginning of the file.  Alternatively,
we could have a browser like usage of character sets where it sets the encoding
at the beginnning of the file in a comment.

#!/usr/bin/exu
-- encoding: utf-8


I am not trying to be ironic or sarcastic here but I am just brainstorming what
could be implemented.  I say so because sometimes, I sense that I come across 
as sarcastic when I do not mean to.

Shawn Pringle B.Sc.

new topic     » goto parent     » topic index » view message » categorize

8. Re: Accented characters in identifiers

CChris wrote:
> 
> 
> Currently, if you use characters in the 128-255 range in identifiers, you will
> get incongruous error messages, like "Result of a function must be assigned"
> because you used a ó.
> 
> This comes from the shrouding method Euphoria had been using long ago. Rob
> himself
> admitted supporting was becoming obsolte.
> 
> Implementationwise, the move is simple: change the character class of all
> those
> chars from KEYWORD or BUILTIN to LETTER in the scanner. Nothing else (a couple
> if branches and constants will become dead code).
> 
> Since characters that display as a letter in some code page may display
> differently
> on another, I think including the whole 128..255 range as valid characters is
> better than restricting it. If a char is valid somewhere, it must be valid
> anywhere,
> even if it displays funny.
> 
> What do you think?

Hi Chris,

It is very good idea, I think, but its implementation is not
too simple. There is Bilingual Euphoria 2.5 in the Archive.
It understands any characters in identifiers and 
English and Russian keywords and has the English or Russian
error messages and can translate the program text from English
to Russian and back, but there is still unknown bug on
Linux platform (DOS32, WIN32 are very stable, I work on it all
the time).

http://www.rapideuphoria.com/ru_eu_11.zip

Sorry, I do not have some spare time to implement these
features in 3.2 now - my vegetable-garden takes all my
summer time   smile

So ask please Rob for that code just to see various details
of that interpreter, if you want. That was strongly licensed
2.5 stuff, that was our co-work with Rob and he didn't want
to open that code that time.


Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

9. Re: Accented characters in identifiers

Igor Kachan wrote:
> 
> CChris wrote:
> > 
> > 
> > Currently, if you use characters in the 128-255 range in identifiers, you
> > will
> > get incongruous error messages, like "Result of a function must be assigned"
> > because you used a ó.
> > 
> > This comes from the shrouding method Euphoria had been using long ago. Rob
> > himself
> > admitted supporting was becoming obsolte.
> > 
> > Implementationwise, the move is simple: change the character class of all
> > those
> > chars from KEYWORD or BUILTIN to LETTER in the scanner. Nothing else (a
> > couple
> > if branches and constants will become dead code).
> > 
> > Since characters that display as a letter in some code page may display
> > differently
> > on another, I think including the whole 128..255 range as valid characters
> > is
> > better than restricting it. If a char is valid somewhere, it must be valid
> > anywhere,
> > even if it displays funny.
> > 
> > What do you think?
> 
> Hi Chris,
> 
> It is very good idea, I think, but its implementation is not
> too simple. There is Bilingual Euphoria 2.5 in the Archive.
> It understands any characters in identifiers and 
> English and Russian keywords and has the English or Russian
> error messages and can translate the program text from English
> to Russian and back, but there is still unknown bug on
> Linux platform (DOS32, WIN32 are very stable, I work on it all
> the time).
> 
> <a
> href="http://www.rapideuphoria.com/ru_eu_11.zip">http://www.rapideuphoria.com/ru_eu_11.zip</a>
> 
> Sorry, I do not have some spare time to implement these
> features in 3.2 now - my vegetable-garden takes all my
> summer time   smile
> 
> So ask please Rob for that code just to see various details
> of that interpreter, if you want. That was strongly licensed
> 2.5 stuff, that was our co-work with Rob and he didn't want
> to open that code that time.
> 
> 
> Regards,
> Igor Kachan
> kinz at peterlink.ru

You may be aware that Euphoria has been open source for one year now. I'm not
sure using any licensed material with restrictions worse than GPL would be
possible or desirable. I may be wrong there though.

Could you elaborate on how the bug shows up on Linux? Inasmuch as this doesn't
infringe on any NDA of course.

CChris

new topic     » goto parent     » topic index » view message » categorize

10. Re: Accented characters in identifiers

CChris wrote:
> 
> Igor Kachan wrote:
> > 
> > CChris wrote:
> > > 
> > > [snip]
> > > 
> > > What do you think?
> > 
> > Hi Chris,
> > 
> > It is very good idea, I think, but its implementation is not
> > too simple. There is Bilingual Euphoria 2.5 in the Archive.
> > It understands any characters in identifiers and 
> > English and Russian keywords and has the English or Russian
> > error messages and can translate the program text from English
> > to Russian and back, but there is still unknown bug on
> > Linux platform (DOS32, WIN32 are very stable, I work on it all
> > the time).
> > 
> > http://www.rapideuphoria.com/ru_eu_11.zip
> > 
> > Sorry, I do not have some spare time to implement these
> > features in 3.2 now - my vegetable-garden takes all my
> > summer time   smile
> > 
> > So ask please Rob for that code just to see various details
> > of that interpreter, if you want. That was strongly licensed
> > 2.5 stuff, that was our co-work with Rob and he didn't want
> > to open that code that time.
> > 
> > 
> You may be aware that Euphoria has been open source for one year now.

Yes, I do know that EU 3.0 is open source, but the 2.5 source
code was a commercial product with strong license restrictions.

After 3.0, I asked Rob to open the bilingual EU 2.5 too - I did not
have the spare time to develop the bilingual EU 3.0 by myself, so
why not to allow this work to someone who wants to work without
reinventing of all that stuff?

That time Rob prefered to wait me. But this waiting gets too long.

> I'm not sure using any licensed material with restrictions worse
> than GPL would be possible or desirable. I may be wrong there though.

There are the official developers of the Open source EU now,
why not to open just for them just that 2.5 bilingual interpreter?

Rob?

> Could you elaborate on how the bug shows up on Linux?

Ok, I'll try to find that interpreter on my old reserved
HDD and run it to make the screen-shots
on Linux Mandrake 10.0.

> Inasmuch as this doesn't infringe on any NDA of course.

What is NDA? Sorry, I do not know this abbreviation. 

Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

11. Re: Accented characters in identifiers

Igor Kachan wrote:
> 
> CChris wrote:
> > 
> > Igor Kachan wrote:
> > > 
> > > CChris wrote:
> > > > 
> > > > [snip]
> > > > 
> > > > What do you think?
> > > 
> > > Hi Chris,
> > > 
> > > It is very good idea, I think, but its implementation is not
> > > too simple. There is Bilingual Euphoria 2.5 in the Archive.
> > > It understands any characters in identifiers and 
> > > English and Russian keywords and has the English or Russian
> > > error messages and can translate the program text from English
> > > to Russian and back, but there is still unknown bug on
> > > Linux platform (DOS32, WIN32 are very stable, I work on it all
> > > the time).
> > > 
> > > <a
> > > href="http://www.rapideuphoria.com/ru_eu_11.zip">http://www.rapideuphoria.com/ru_eu_11.zip</a>
> > > 
> > > Sorry, I do not have some spare time to implement these
> > > features in 3.2 now - my vegetable-garden takes all my
> > > summer time   smile
> > > 
> > > So ask please Rob for that code just to see various details
> > > of that interpreter, if you want. That was strongly licensed
> > > 2.5 stuff, that was our co-work with Rob and he didn't want
> > > to open that code that time.
> > > 
> > > 
> > You may be aware that Euphoria has been open source for one year now.
> 
> Yes, I do know that EU 3.0 is open source, but the 2.5 source
> code was a commercial product with strong license restrictions.
> 
> After 3.0, I asked Rob to open the bilingual EU 2.5 too - I did not
> have the spare time to develop the bilingual EU 3.0 by myself, so
> why not to allow this work to someone who wants to work without
> reinventing of all that stuff?
> 
> That time Rob prefered to wait me. But this waiting gets too long.
> 
> > I'm not sure using any licensed material with restrictions worse
> > than GPL would be possible or desirable. I may be wrong there though.
> 
> There are the official developers of the Open source EU now,
> why not to open just for them just that 2.5 bilingual interpreter?
> 
> Rob?
> 
> > Could you elaborate on how the bug shows up on Linux?
> 
> Ok, I'll try to find that interpreter on my old reserved
> HDD and run it to make the screen-shots
> on Linux Mandrake 10.0.
> 
> > Inasmuch as this doesn't infringe on any NDA of course.
> 
> What is NDA? Sorry, I do not know this abbreviation. 
> 
> Regards,
> Igor Kachan
> kinz at peterlink.ru

Sorry: Non Disclosure Agreement.

CChris

new topic     » goto parent     » topic index » view message » categorize

12. Re: Accented characters in identifiers

Igor Kachan wrote:
> 
> CChris wrote:
> > 
> > Igor Kachan wrote:
> > > 
> > > CChris wrote:
> > > > 
> > > > [snip]
> > > > 
> > > > What do you think?
>
>[snip]
>
> > Could you elaborate on how the bug shows up on Linux?
> 
> Ok, I'll try to find that interpreter on my old reserved
> HDD and run it to make the screen-shots
> on Linux Mandrake 10.0.

There are buggy bilingual interpreter for
Linux exu_r and ex.err files for two euphoria/demos/linux
programs in this package:

http://www.private.peterlink.ru/kinz/exu_r_25.zip

Try please, if you want.

sanity.ex works ok with exu_r - 100% passed.

Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

13. Re: Accented characters in identifiers

Igor Kachan wrote:
> 
> Igor Kachan wrote:
> > 
> > CChris wrote:
> > > 
> > > Igor Kachan wrote:
> > > > 
> > > > CChris wrote:
> > > > > 
> > > > > [snip]
> > > > > 
> > > > > What do you think?
> >
> >[snip]
> >
> > > Could you elaborate on how the bug shows up on Linux?
> > 
> > Ok, I'll try to find that interpreter on my old reserved
> > HDD and run it to make the screen-shots
> > on Linux Mandrake 10.0.
> 
> There are buggy bilingual interpreter for
> Linux exu_r and ex.err files for two euphoria/demos/linux
> programs in this package:
> 
> <a
> href="http://www.private.peterlink.ru/kinz/exu_r_25.zip">http://www.private.peterlink.ru/kinz/exu_r_25.zip</a>
> 
> Try please, if you want.
> 
> sanity.ex works ok with exu_r - 100% passed.
> 
> Regards,
> Igor Kachan
> kinz at peterlink.ru

Got those files, which are hardly informative indeed.
I think any implementation of acccented chars (allowing any UTF-8 char in
identifiers is trivial, they just may cause display concerns when the code page
is not the original one) would be done with the new tools in 4.0, and there wil
be many.
Perhaps you, Rob and Jeremy might want to discuss this?

CChris

new topic     » goto parent     » topic index » view message » categorize

14. Accented characters in identifiers

Currently, Eu interprets characters with the most significant bit set as
 opcodes. Only old shrouded files store Eu opcodes this way.

Isn't it time to remove that restriction, so as to be able to use non 
english identifiers in programs? Other languages frequently use accented
 characters.

Is anyone running these legacy shrouded files?


CChris

new topic     » goto parent     » topic index » view message » categorize

15. Re: Accented characters in identifiers

CChris wrote:
> Currently, Eu interprets characters with the most significant bit set as
> opcodes. Only old shrouded files store Eu opcodes this way.
> 
> Isn't it time to remove that restriction, so as to be able to use non 
> english identifiers in programs? Other languages frequently use accented
> characters.

Yes, I agree. I'll do that fairly soon, if nobody objects.
Others, such as Igor Kachan, have also mentioned the lack of support
for the higher ASCII codes for non-English languages.

The 3.0 open source version of Euphoria does not have the
ability to decrypt files that are both "shrouded" and "encrypted".
"shrouding" used to mean just conversion of keywords and 
built-in names to single-byte codes and converting variable
and routine names to short meaningless identifier names. 
With version 2.0 came the option to also "encrypt".
In 2.5, a whole new binder/shrouder was developed where 
conversion to byte-codes no longer occurred, but a form of 
IL encryption continued, for bound executables only.

> Is anyone running these legacy shrouded files?

I believe there are very few files out there
that are "shrouded", but not also "encrypted" or bound 
into an executable, so there is little point now in 
maintaining support for the single-byte codes (in scanner.e).

Note that executable programs that were "bound" with the interpreter, 
pre-3.0, are not affected, since they contain the required 
interpreter version. The only breakage here would be very old, 
probably pre-2.0 (1997 or earlier) files, "shrouded", 
but not "encrypted".

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

16. Re: Accented characters in identifiers

Robert Craig wrote:
> In 2.5, a whole new binder/shrouder was developed where 
> conversion to byte-codes no longer occurred, but a form of 
> IL encryption continued, for bound executables only.

Actually, you could also make an encrypted separate .il file, to be
run by the backend, but the point remains that if 3.0 can't "decrypt",
there is little point in handling the special keyword/built-in 
byte codes in the 128-255 ASCII range.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

17. Re: Accented characters in identifiers

Robert Craig wrote:

> CChris wrote:
>> Currently, Eu interprets characters with the most significant bit set as
>> opcodes. Only old shrouded files store Eu opcodes this way.
>> 
>> Isn't it time to remove that restriction, so as to be able to use non 
>> english identifiers in programs? Other languages frequently use accented
>> characters.
> 
> Yes, I agree. I'll do that fairly soon, if nobody objects.
> Others, such as Igor Kachan, have also mentioned the lack of support
> for the higher ASCII codes for non-English languages.

<snip>

Sorry, I don't think that this is a good idea, because:

a) The usage of this feature will bring a considerable disadvantage.
   When someone creates identifiers that contain special characters of
   her/his language, it is likely that other people somewhwre else in
   the world will have problems to read that code.
   You recently reminded us of a post from you on 12 Feb 2002:
<http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize>

   In this message it reads:
   | I like it better the way it is. You could argue that I don't have to
   | use variable inits if I don't want to. You could argue that I don't
   | have to use goto if I don't want to. A language does not exist just
   | to serve the isolated programmer. It exists to serve a community
   | of programmers. In situations where it really doesn't matter 
   | how something is written, I think there are advantages to 
   | reducing the number of choices.

   IMHO the same is true concerning special characters in identifiers,
   especially since many of them are not equal in different languages.
   The Euphoria community is small enough, Euphoria shouldn't encourage
   people to write code that can only be read by a fraction of this
   small community.

b) It is not necessary at all. We currently have a sufficient number of
   characters for creating identifiers. The German language also has
   some special characters, but I _never_ had the need to use one of
   them in an identifier.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

18. Re: Accented characters in identifiers

Robert Craig wrote:
> CChris wrote:
> > Is anyone running these legacy shrouded files?
Seems unlikely. I was not even aware that pre-2.0 files could possibly be run by
3.0 anyway. Don't suppose there is any chance of partially resurrecting that
feature [ie unencrypted] in any form is there Rob?
(OK, don't sweat it, I know the answer was NO when I asked before)

Juergen Luethje wrote:
>   When someone creates identifiers that contain special characters of
>   her/his language, it is likely that other people somewhwre else in
>   the world will have problems to read that code.
However if they write code in ascii-7 but comments in Japanese...

>   The Euphoria community is small enough
I strongly disagree, we should rise to the (formidable) challenge of a wider and
multi-lingual community. If I cannot read some future code written in, say, Urdu,
that may well be annoying[1], but I cannot believe there is or ever will be any
benefit to us deciding now that such should never exist.

Regards,
Pete
[1] aku has previously submitted quality code which was both named and commented
in a foreign language; this problem already exists. To his (or her?!) credit, an
english wrapper was provided.

new topic     » goto parent     » topic index » view message » categorize

19. Re: Accented characters in identifiers

Pete Lomax wrote:

> Juergen Luethje wrote:
>>   When someone creates identifiers that contain special characters of
>>   her/his language, it is likely that other people somewhwre else in
>>   the world will have problems to read that code.
> However if they write code in ascii-7 but comments in Japanese...

Then I can still read the code.

>>   The Euphoria community is small enough
>I strongly disagree, we should rise to the (formidable) challenge of a wider
> and multi-lingual community. If I cannot read some future code written in,
> say,
> Urdu, that may well be annoying[1], but I cannot believe there is or ever will
> be any benefit to us deciding now that such should never exist.

The benefit is in considerably reducing the chance that someone will
encounter such an annoying situation.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

20. Re: Accented characters in identifiers

Robert Craig wrote:

> 
> CChris wrote:
> > Currently, Eu interprets characters with the most significant bit set as
> > opcodes. Only old shrouded files store Eu opcodes this way.
> > 
> > Isn't it time to remove that restriction, so as to be able to use non 
> > english identifiers in programs? Other languages frequently use accented
> > characters.
> 
> Yes, I agree. I'll do that fairly soon, if nobody objects.

If nobody objects, it is just consensus, too rare thing here  smile

> Others, such as Igor Kachan, have also mentioned the lack of support
> for the higher ASCII codes for non-English languages.

Yes, I like this feature very much, the bilingual EU 2.5 works
OK for me and I have thanks from people for that package.

That 2.5 can execute an EU code with *any* non-English names
for identifiers (128..255 codes), not only Russian and English.

Anyway, I plan to expand the 3.0.2(3..) source for this
feature plus multilingual EU messages, some time later on,
just my spare time is very limited for now.

The automatic code translation from any foreign language to
standard 100% pure Euphoria and back is simple (execpt comments)
and works OK in 2.5 from English to Russian and
from Russian to English.

[snip]

Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

21. Re: Accented characters in identifiers

Juergen Luethje wrote:
> Robert Craig wrote:
> > CChris wrote:
> >> Currently, Eu interprets characters with the most significant bit set as
> >> opcodes. Only old shrouded files store Eu opcodes this way.
> >> 
> >> Isn't it time to remove that restriction, so as to be able to use non 
> >> english identifiers in programs? Other languages frequently use accented
> >> characters.
> > 
> > Yes, I agree. I'll do that fairly soon, if nobody objects.
> > Others, such as Igor Kachan, have also mentioned the lack of support
> > for the higher ASCII codes for non-English languages.
> 
> <snip>
> 
> Sorry, I don't think that this is a good idea, because:
> 
> a) The usage of this feature will bring a considerable disadvantage.
>    When someone creates identifiers that contain special characters of
>    her/his language, it is likely that other people somewhwre else in
>    the world will have problems to read that code.
>    You recently reminded us of a post from you on 12 Feb 2002:
>    <<a
>    href="http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize">http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize</a>>
> 
>    In this message it reads:
>    | I like it better the way it is. You could argue that I don't have to
>    | use variable inits if I don't want to. You could argue that I don't
>    | have to use goto if I don't want to. A language does not exist just
>    | to serve the isolated programmer. It exists to serve a community
>    | of programmers. In situations where it really doesn't matter 
>    | how something is written, I think there are advantages to 
>    | reducing the number of choices.
> 
>    IMHO the same is true concerning special characters in identifiers,
>    especially since many of them are not equal in different languages.
>    The Euphoria community is small enough, Euphoria shouldn't encourage
>    people to write code that can only be read by a fraction of this
>    small community.
> 
> b) It is not necessary at all. We currently have a sufficient number of
>    characters for creating identifiers. The German language also has
>    some special characters, but I _never_ had the need to use one of
>    them in an identifier.

OK, thanks for that insight.
I guess I'll hold off, for at least several days, 
until we hear from some other non-English programmers.

It just seemed to me that if I had to do without
some of the English alphabet in my identifiers, 
it would be annoying to me, so I figured it must be 
annoying to non-English programmers. Also, if someone
creates identifiers that are not English-related,
I wouldn't understand them anyway, regardless of
whether they contain accents or funny-looking characters.

I guess it could be a problem though if some characters
resemble punctuation and other confusing shapes, 
like some of the English ASCII 128-255 characters do on my
English region computer.

Regards,
   Rob Craig
   Rapid Deployment Software
   http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

22. Re: Accented characters in identifiers

Robert Craig wrote:
> Juergen Luethje wrote:
>> Robert Craig wrote:
>>> CChris wrote:
>>>> Currently, Eu interprets characters with the most significant bit set as
>>>> opcodes. Only old shrouded files store Eu opcodes this way.
>>>> 
>>>> Isn't it time to remove that restriction, so as to be able to use non 
>>>> english identifiers in programs? Other languages frequently use accented
>>>> characters.
>>> 
>>> Yes, I agree. I'll do that fairly soon, if nobody objects.
>>> Others, such as Igor Kachan, have also mentioned the lack of support
>>> for the higher ASCII codes for non-English languages.
>> 
>> <snip>
>> 
>> Sorry, I don't think that this is a good idea, because:
>> 
>> a) The usage of this feature will bring a considerable disadvantage.
>>    When someone creates identifiers that contain special characters of
>>    her/his language, it is likely that other people somewhwre else in
>>    the world will have problems to read that code.
>>    You recently reminded us of a post from you on 12 Feb 2002:
>>    <<a
>>    href="http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&amp;fromYear=7&amp;toMonth=2&amp;toYear=7&amp;postedBy=rds&amp;keywords=declaration+initialize">http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize</a>>
>> 
>>    In this message it reads:
>>    | I like it better the way it is. You could argue that I don't have to
>>    | use variable inits if I don't want to. You could argue that I don't
>>    | have to use goto if I don't want to. A language does not exist just
>>    | to serve the isolated programmer. It exists to serve a community
>>    | of programmers. In situations where it really doesn't matter 
>>    | how something is written, I think there are advantages to 
>>    | reducing the number of choices.
>> 
>>    IMHO the same is true concerning special characters in identifiers,
>>    especially since many of them are not equal in different languages.
>>    The Euphoria community is small enough, Euphoria shouldn't encourage
>>    people to write code that can only be read by a fraction of this
>>    small community.
>> 
>> b) It is not necessary at all. We currently have a sufficient number of
>>    characters for creating identifiers. The German language also has
>>    some special characters, but I _never_ had the need to use one of
>>    them in an identifier.
> 
> OK, thanks for that insight.
> I guess I'll hold off, for at least several days, 
> until we hear from some other non-English programmers.

Many of xUSSR programmers just scruple of their not
very good English to discuss these things here ...

So let me please take the second voice - I'm not
too modest anyway, you do know  smile

And this my second voice says that the modern educational
BlackBox system well known in Russia (by Oberon microsystems,
Inc. Switzerland) does support the identifiers above 127
with some tecnical exception.

> It just seemed to me that if I had to do without
> some of the English alphabet in my identifiers, 
> it would be annoying to me, so I figured it must be 
> annoying to non-English programmers.
 
... to learn English and to look for Latinic
letters on Russian (totally different) keyboard
just to begin programming from the simplest:
puts(1,"Hello World!")


I have to say that anyway any professional programmer
must to learn and know English well enough, but I do
know some very talented persons who are almost
absolutelly unapt to learn a foreign language.

And I have to say that just only switching of
the different registers of keyboard (Lat - Rus)
is very annoying, for me too, but programming
in *pure* Russian or in *pure* English both
are handy - if without that perpetual switching.

> Also, if someone creates identifiers that are not
> English-related, I wouldn't understand them anyway,
> regardless of whether they contain accents or
> funny-looking characters.

Rob, I think now you are one of those very talented
persons who are almost absolutelly unapt to learn
Russian, no?    smile
 
> I guess it could be a problem though if some characters
> resemble punctuation and other confusing shapes, 
> like some of the English ASCII 128-255 characters do on my
> English region computer.

If you have a proper code page set up on your machine,
all things will be clear without these confusing shapes.

I must to say that I do understand all Juergen's objections
very well, but, sorry, can not agree. 

Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

23. Re: Accented characters in identifiers

Robert Craig wrote:

> Juergen Luethje wrote:
> > Robert Craig wrote:
> > > CChris wrote:
> > >> Currently, Eu interprets characters with the most significant bit set as
> > >> opcodes. Only old shrouded files store Eu opcodes this way.
> > >> 
> > >> Isn't it time to remove that restriction, so as to be able to use non 
> > >> english identifiers in programs? Other languages frequently use accented
> > >> characters.
> > > 
> > > Yes, I agree. I'll do that fairly soon, if nobody objects.
> > > Others, such as Igor Kachan, have also mentioned the lack of support
> > > for the higher ASCII codes for non-English languages.
> > 
> > <snip>
> > 
> > Sorry, I don't think that this is a good idea, because:
> > 
> > a) The usage of this feature will bring a considerable disadvantage.
> >    When someone creates identifiers that contain special characters of
> >    her/his language, it is likely that other people somewhwre else in
> >    the world will have problems to read that code.
> >    You recently reminded us of a post from you on 12 Feb 2002:
> >    <<a
> >    href="http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize">http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize</a>>
> > 
> >    In this message it reads:
> >    | I like it better the way it is. You could argue that I don't have to
> >    | use variable inits if I don't want to. You could argue that I don't
> >    | have to use goto if I don't want to. A language does not exist just
> >    | to serve the isolated programmer. It exists to serve a community
> >    | of programmers. In situations where it really doesn't matter 
> >    | how something is written, I think there are advantages to 
> >    | reducing the number of choices.
> > 
> >    IMHO the same is true concerning special characters in identifiers,
> >    especially since many of them are not equal in different languages.
> >    The Euphoria community is small enough, Euphoria shouldn't encourage
> >    people to write code that can only be read by a fraction of this
> >    small community.
> > 
> > b) It is not necessary at all. We currently have a sufficient number of
> >    characters for creating identifiers. The German language also has
> >    some special characters, but I _never_ had the need to use one of
> >    them in an identifier.
> 
> OK, thanks for that insight.
> I guess I'll hold off, for at least several days, 
> until we hear from some other non-English programmers.
> 
> It just seemed to me that if I had to do without
> some of the English alphabet in my identifiers, 
> it would be annoying to me, so I figured it must be 
> annoying to non-English programmers.

Well, I must admit that German with its 7 special characters (and I
think e.g. French, Spanish or Swedish don't contain much more non-ASCII
characters) is much closer to English than e.g. Russian or Japanese. So
I understand especially Igor's intention here.

> Also, if someone
> creates identifiers that are not English-related,
> I wouldn't understand them anyway, regardless of
> whether they contain accents or funny-looking characters.

smile I agree.

I wanted to say that allowing special characters in identifiers
_encourages_ programmers to write code that is hard to read for a lot of
other people. So I think it increases the chance that an Eu programmer
will see identifiers that (s)he wouldn't understand.

> I guess it could be a problem though if some characters
> resemble punctuation and other confusing shapes, 
> like some of the English ASCII 128-255 characters do on my
> English region computer.

I also think so. When you see non-English identifiers e.g. 'Pferd' and
'Ente', even when you do not know their meaning (which is btw. 'horse'
and 'duck') you probably can easily recognize and distinguish them from
each other in the whole code anyway. This might not be so easy with
identifiers that consist of "very special" (from the point of view of
the reader) characters.

When I would try to read important code that contained identifiers which
are meaningless to me, and which I could hardly recognize and distinguish
from each other, then I think I would try to guess appropriate German or
English names for them, and then "search and replace" these identifiers.

This leads to another point, which I almost had forgotten:
Special characters can confuse editors. In the past I repeatedly made the
experience that editors handle some special characters as word delmiters.

I just tested the following with the current Metapad version 3.51:
When I double-cklick anywhere at the expression 'FooBar', Metapad always
selects the whole expression, i.e. the entire "word". This does _not_
happen with the expression 'FoüBar'. (I hope it will read here on the
message board as expected -- I replaced the third character with the
lowercase German u-Umlaut.) Metapad handles this special German character
as a word delimiter, so it "sees" the two words 'Fo' and 'Bar'!

When I "search and replace" identifiers in program source code, I use
the option:
   [v] whole words only

With an editor that behaves as described above, I think this can lead to
unexpected and unwanted results.

Regards,
   Juergen

new topic     » goto parent     » topic index » view message » categorize

24. Re: Accented characters in identifiers

In the past I try to read some code from Aku but because the identifiers were in
a language I don't understand, It was hard to understand and finaly I didn't
persue.
As a french speaking programmer, I always used english identifiers for code I
distribute on web, because I consider english as a commun language for
programmers all around the world.
But when I write code for myself I use french identifier et comments, but I
don't really miss accent in identifiers.

regards,
Jacques Deschênes


Juergen Luethje wrote:
> 
> Robert Craig wrote:
> 
> > Juergen Luethje wrote:
> > > Robert Craig wrote:
> > > > CChris wrote:
> > > >> Currently, Eu interprets characters with the most significant bit set
> > > >> as
> > > >> opcodes. Only old shrouded files store Eu opcodes this way.
> > > >> 
> > > >> Isn't it time to remove that restriction, so as to be able to use non 
> > > >> english identifiers in programs? Other languages frequently use
> > > >> accented
> > > >> characters.
> > > > 
> > > > Yes, I agree. I'll do that fairly soon, if nobody objects.
> > > > Others, such as Igor Kachan, have also mentioned the lack of support
> > > > for the higher ASCII codes for non-English languages.
> > > 
> > > <snip>
> > > 
> > > Sorry, I don't think that this is a good idea, because:
> > > 
> > > a) The usage of this feature will bring a considerable disadvantage.
> > >    When someone creates identifiers that contain special characters of
> > >    her/his language, it is likely that other people somewhwre else in
> > >    the world will have problems to read that code.
> > >    You recently reminded us of a post from you on 12 Feb 2002:
> > >    <<a
> > >    href="http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize">http://www.listfilter.com/cgi-bin/esearch.exu?fromMonth=2&fromYear=7&toMonth=2&toYear=7&postedBy=rds&keywords=declaration+initialize</a>>
> > > 
> > >    In this message it reads:
> > >    | I like it better the way it is. You could argue that I don't have to
> > >    | use variable inits if I don't want to. You could argue that I don't
> > >    | have to use goto if I don't want to. A language does not exist just
> > >    | to serve the isolated programmer. It exists to serve a community
> > >    | of programmers. In situations where it really doesn't matter 
> > >    | how something is written, I think there are advantages to 
> > >    | reducing the number of choices.
> > > 
> > >    IMHO the same is true concerning special characters in identifiers,
> > >    especially since many of them are not equal in different languages.
> > >    The Euphoria community is small enough, Euphoria shouldn't encourage
> > >    people to write code that can only be read by a fraction of this
> > >    small community.
> > > 
> > > b) It is not necessary at all. We currently have a sufficient number of
> > >    characters for creating identifiers. The German language also has
> > >    some special characters, but I _never_ had the need to use one of
> > >    them in an identifier.
> > 
> > OK, thanks for that insight.
> > I guess I'll hold off, for at least several days, 
> > until we hear from some other non-English programmers.
> > 
> > It just seemed to me that if I had to do without
> > some of the English alphabet in my identifiers, 
> > it would be annoying to me, so I figured it must be 
> > annoying to non-English programmers.
> 
> Well, I must admit that German with its 7 special characters (and I
> think e.g. French, Spanish or Swedish don't contain much more non-ASCII
> characters) is much closer to English than e.g. Russian or Japanese. So
> I understand especially Igor's intention here.
> 
> > Also, if someone
> > creates identifiers that are not English-related,
> > I wouldn't understand them anyway, regardless of
> > whether they contain accents or funny-looking characters.
> 
> smile I agree.
> 
> I wanted to say that allowing special characters in identifiers
> _encourages_ programmers to write code that is hard to read for a lot of
> other people. So I think it increases the chance that an Eu programmer
> will see identifiers that (s)he wouldn't understand.
> 
> > I guess it could be a problem though if some characters
> > resemble punctuation and other confusing shapes, 
> > like some of the English ASCII 128-255 characters do on my
> > English region computer.
> 
> I also think so. When you see non-English identifiers e.g. 'Pferd' and
> 'Ente', even when you do not know their meaning (which is btw. 'horse'
> and 'duck') you probably can easily recognize and distinguish them from
> each other in the whole code anyway. This might not be so easy with
> identifiers that consist of "very special" (from the point of view of
> the reader) characters.
> 
> When I would try to read important code that contained identifiers which
> are meaningless to me, and which I could hardly recognize and distinguish
> from each other, then I think I would try to guess appropriate German or
> English names for them, and then "search and replace" these identifiers.
> 
> This leads to another point, which I almost had forgotten:
> Special characters can confuse editors. In the past I repeatedly made the
> experience that editors handle some special characters as word delmiters.
> 
> I just tested the following with the current Metapad version 3.51:
> When I double-cklick anywhere at the expression 'FooBar', Metapad always
> selects the whole expression, i.e. the entire "word". This does _not_
> happen with the expression 'FoüBar'. (I hope it will read here on the
> message board as expected -- I replaced the third character with the
> lowercase German u-Umlaut.) Metapad handles this special German character
> as a word delimiter, so it "sees" the two words 'Fo' and 'Bar'!
> 
> When I "search and replace" identifiers in program source code, I use
> the option:
>    [v] whole words only
> 
> With an editor that behaves as described above, I think this can lead to
> unexpected and unwanted results.
> 
> Regards,
>    Juergen

new topic     » goto parent     » topic index » view message » categorize

25. Re: Accented characters in identifiers

CChris wrote:
> 
> Igor Kachan wrote:
> > 
> > 
> > There are buggy bilingual interpreter for
> > Linux exu_r and ex.err files for two euphoria/demos/linux
> > programs in this package:
> > 
> > http://www.private.peterlink.ru/kinz/exu_r_25.zip
> > 
> > Try please, if you want.
> > 
> > sanity.ex works ok with exu_r - 100% passed.
> > 
> 
> Got those files, which are hardly informative indeed.

Yes, it is a good puzzle  smile
Ok, you can see the changes to 2.5 source
code here (gotten by the diff program) :

http://www.private.peterlink.ru/kinz/changes.zip

> I think any implementation of acccented chars (allowing any UTF-8 char in
> identifiers
> is trivial, they just may cause display concerns when the code page is not the
> original one) would be done with the new tools in 4.0, and there wil be many.

That code above uses just the DOS code pages, not UTF-8.
Watcom has the single code page for the DOS pixel
modes, so you have to replace fonts in your localised
interpreter.

> Perhaps you, Rob and Jeremy might want to discuss this?

I do not think Rob wants those changes to official EU
just now, maybe 8.0    smile

Regards,
Igor Kachan
kinz at peterlink.ru

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu