Euphoria Ticket #715: Use enums instead of literals for messages

In msgtext.e, we have a list of messages, used in help, errors and warnings. The format is a sequence, with the message number followed by the message itself. We use the message number to emit a particular message.

We should convert the literals into enums. This will have at least a couple of benefits:

  • Code using these messages will be a little bit clearer (assuming we choose reasonable names for the messages)
  • Merging branches that add messages will be much simpler. It's common for code in a branch to require new messages. However, when merging, there may have been other messages created. This requires merging not only the list, but all of the usages of the new messages, which can be difficult. Using enums would make this nearly trivial.

Details

Type: Task Severity: Normal Category: Front End
Assigned To: SDPringle Status: Accepted Reported Release:
Fixed in SVN #: View VCS: none Milestone: 4.1.0

1. Comment by DerekParnell Sep 11, 2011

However, realize that once a specific message has been assigned a number, that number should never change over the life of Euphoria. This is so each message has a unique number that can be referred to in historical documents. With enums, re-ordering the messages (or merging from different sources) can alter a message's unique number.

2. Comment by mattlewis Sep 11, 2011

Good point. In light of that, I'd amend this to "constants" instead of "enums".

3. Comment by jeremy Sep 11, 2011

In what way are we refering to error numbers in historical documents? Couldn't we just refer to the enum name? I'm a bit lost on where we do this, so if my example is totally off, forgive me.

You may get error 302 during blah blah blah

to

You may get an ERR_INDEX_OUT_OF_RANGE error during blah blah blah

4. Comment by jimcbrown Sep 11, 2011

Well, we already have places in the standard library where the meaning of numbers has changed.

E.g., machine_func() #4 was originally M_PIXEL but now has been reused as M_SOCK_INFO.

5. Comment by DerekParnell Sep 11, 2011

Jeremy: The number is displayed on screen and ex.err rather than the 'error name'. Number's do not need language translation.

Jim: An excellent example of a mistake that should never have happened.

6. Comment by jeremy Sep 11, 2011

Do either the ex.err or screen need to preserve historical data? Seems both of those are pretty temporal. If they do need historical reference, maybe we should include the Euphoria version number in at least the ex.err?

7. Comment by jimcbrown Sep 11, 2011

My point was that if this is such an important and pressing need, then we should make a separate task to fix this by renumbering all the reused ones. This sort of thing needs to be done consistently, imvho.

I'm not sure I buy the argument that numbers don't need to be translated. That'strue, but only for the same reason enum names don't need to be translated. Both are recognizable, recordable, and easily searched for on the web. For those who can read the language the enum name is written in, the meaning that the name conveys is a bonus, but we should be translating error messages anyways.

Finally, I guess constants make it more explicit when through merging or whatnot a number gets reused (whereas using enums, we might just get things renumbered without anyone noticing it), but if two different branches have used the same number to represent two new values in a pre-existing enum (e.g. dlmopen-multithread uses 229 for a machine_func() to create a new thread and the expat branch uses that same number for a machine_func() to get the value of an XML element) then renumbering is unavoidable regardless of whether constants or enums are used.

Also, as a side point, even if we have to manually assign numbers to every enum - which sigificantly erodes the advantage of using an enum in the first place, the ability to get the name of a enum (instead of having to create another constant as a string and then create a list that converts from number to string) is still a nice-to-have feature.

8. Comment by jimcbrown Sep 11, 2011

I see these numbers as hidden implementation details that the user of the language and the stdlib should never see or know about. They just need to know about the names of the constants and enums (and meaning associated with the names) and use those. If the numbers change, the user shouldn't have to care - it's really as if the enums are a separate, nonnumerical datatype that represents information (this is an apple, that is an orange, but they all are fruits).

On the other hand, this is a big deal when it comes to binary compatibility. I don't think we guarantee binary compatibility between different versions. (But shouldn't we?)

9. Comment by DerekParnell Sep 11, 2011

Internal items that are displayed (or otherwise presented) to the end-user need to be consistent across time in order to avoid potential confusion when reporting issues. So if we are displaying numbers, the numbers shouldn't really change once assigned to an internal thingy, likewise with names.

For internal items that are not meant for end-user consumption, we should be free to renumber/rename as we need to.

10. Comment by SDPringle Sep 11, 2011

Please note we will probably always need to go to msgtext.e when we want to use an old error message anyway because there are too many for us to memorize. So don't worry about making the constant names too long. In fact we can take the strings and replace illegal id characters into underscores and you'll have a really understandable constant name that looks like the string. So instead of using CompileErr(4, foo), you use CompileErr( ErrMsg_1__not_understood, foo ) which uses the string "[1] not understood". This could be converted in a script rather than manually. We know the script only should look into calls to functions that take these values.

11. Comment by mattlewis Sep 12, 2011

For an example, I've started doing this in the memstruct branch. The most important thing, IMHO, is that we start doing this going forward. We're getting better at developing features inside of branches. It's very common for new features to require new messages. Also, we might even get a new message from a bug fix in 4.0. I'm personally going to be using constants for all new messages, and I encourage everyone else to follow this standard. It would be nice if we could retrofit the new code (at least in 4.1 forward) as well for clarity.

The most difficult part of merging new messages (from experience) is updating the places that use them. You can't really do a search on 359 (or whatever) since you'll find not only your uses, but the conflicting uses. And if you've added multiple messages, this is even more difficult, since you have to ensure that you don't update twice.

Using names (i.e., constants) makes this trivial. We just have to adjust the constant definitions to agree with the message table.

Also, magic numbers are major code smells.

12. Comment by mattlewis Sep 12, 2011

Looking at that example, it seems like an enum might actually be easier to maintain, since we only have to maintain order, rather than counting.

In the documentation, we could autogenerate some creole to document the messages by number, name and message format.

13. Comment by jeremy Sep 12, 2011

With pretty good success, you can search for "(359". Or a regular expression "(\s*359". You may have some false positives but it should get every place you need to change.

14. Comment by jeremy Sep 12, 2011

I'm wondering, though, with so many error codes is anyone really going to care about a number other than to ask questions about it on the forum? For example...

Skipping [1:3.0]% [2] (up-to-date)

What are we actually going to put in the documentation?

I can understand documenting error codes when the possibility is limited, but we currently have 355 warnings or errors. Further, we have all sorts of places where RTFatal() is called w/messages that contain no error/warning number. Thus, even documenting our 355 warnings/errors found in msgtext is not even going to get 1/2 of the possible warnings/errors. I did a grep RTFatal *|wc and got 428 uses (probably a few false positives for func definitions, ... but not many I wouldn't think).

15. Comment by mattlewis Sep 12, 2011

Jeremy,

Yes you can find all of the 359s, but you only want to change your instances, not the instances from the other branch. Also, if you create 359, 360, 361, you have to go in descending order so that you don't change it multiple times. This is a giant PITA, making sure you got them all correct in all files.

Our numbered messages are all front end affairs. We haven't done anything about backend stuff. I'd envision the documentation of messages to be in the euphoria internals section.

16. Comment by DerekParnell Sep 12, 2011

Another consideration is that these messages are not assumed to be hardcoded. The text used in msgtext.e is just the default text to be used if the actual text is not found in the message text database. That database (called 'teksto.edb') is expected to contain the message text in the local language of the user. Each message uses the message number as its key. Therefore we must be very careful not to change the number assigned to a particular message.

The numbers do not have to be in any order or in any particular sequence. The only requirement is that they be integers. So, maybe each branch can get assigned a 'starting' number for its own messages and thus when merged into the main branch, they won't clash with any other branch's message numbers.

Whether one uses enums or constants is not important for that consideration, but it is important that each message in the system be assign a unique number and that the number is never changed once assigned.

17. Comment by jimcbrown Sep 12, 2011

I completely agree with all the arguments made by Derek in the previous comment, but I have to point out that the same purpose can be accomplished by using static neverchanging enum names. So, if we do have to use numbers, then I am sold on the fact that it must necessarily follow that the numbers must then become "set in stone."

However, I have not yet seen a reason to actually favor numbers over enums. I realize that in some cases (such as machine functions), it can be impractical to hide the actual numbers used from the user.

In that case, I can see this going as one of two routes: we either say those numbers are not meant to be used and are used at a users own peril (and such usage is subject to breakage) - ala the C way - or we act smart and do the extra effort required to completely hide these numbers from the user.

I'm willing to accept that, if we can't (or at least decide against) implementing a method to completely hide these numbers from the users, but at the same time don't want to go the C way, then we should consider these numbers as part of the API and unchangable.

Either way, I'm still not sold on the idea that error numbers themselves have a benefit that justifies their direct use.

The only argument I can see is that numbers are inheriently language-neutral while enum names will most likely be English (or at least Latin character) centric. But enum names can and are still used by speakers and writers of other (non-Latin character) languages, and considering Euphoria's historical closeness to the English language (using syntax structures that are similar to phrases that an English speaker would use), I personally find this a very weak argument.

18. Comment by mattlewis Sep 12, 2011

For me, the enum names are really for the benefit of the developers (or anyone else looking at the source code). The advantage it gives to simply merging code (and the headaches I've already experienced with doing this!) is reason enough for all future development to use this.

Eventual translations provide a sensible reason as to why we shouldn't reuse or change numbers, but this is mostly from the numbers as they exist once released.

19. Comment by jimcbrown Sep 12, 2011

Hmm, I accidently used enum instead of enum names in a few places. To clarify, I believe we should be using names of enums and constants, not the numbers that are used internally, as the API that the user sees and uses.

Sure, enums and constants use integers internally, but that's because integers are small to store and fast for comparison - otherwise, we could just as easily use actual strings for these values. I consider the fact that they actually are integers on the inside to be an internal implementation detail that the user shouldn't have to care about.

20. Comment by DerekParnell Sep 12, 2011

Names are nice for developers but after many years of handling end-user issues over the phone, I think its a lot easier and clearer that when you ask the user "what's the error being displayed" they answer "error number 23". Some of the attempts at saying message codes (usually abbreviations) can be hilarious.

21. Comment by SDPringle Sep 13, 2011

Suppose we adopt this system and we use the same number twice in these error messages. It's only one line to change to fix the conflict, instead of hunting for the calls where these.

Derek, when error messages are displayed it shows the filename and line number and error number inside angled brackets. That is not being discussed here. It is only that we put stop putting 'magic literals' and use constant names for CompileErr( ..., ) and WarnErr( ... ) calls.

22. Comment by DerekParnell Sep 14, 2011

Shawn, sorry but I got the impression (incorrectly it seems) that someone was also suggesting that the enum name be displayed in the error message too.

23. Comment by jimcbrown Sep 14, 2011

That's the correct impression. That might not have been Matt's intent, but this is what I was arguing for.

Keep in mind that if we end up showing the number to the end user anyways, then we must also set these numbers in stone. My arguments only make sense if we show the enum name as well as the error message, but hide the (internal) error number.

That said, Derek's point about telephone calls has me reconsidering my position.

24. Comment by SDPringle Dec 27, 2011

I have written software that does this. The little program maps the actual strings into identifiers so you get things like MSG_COULDN_T_OPEN_1. Surprisingly, this method results in seven name collisions. Four of these name collisions are identical strings. Why isn't this a Milestone for 4.0?

25. Comment by mattlewis Dec 27, 2011

I can't see any benefit for this for 4.0. We really shouldn't be adding much in the way of error messages for 4.0, though it would make sense to use a constant for any new 4.0 error messages, since that would make merging with 4.1 easier.

26. Comment by SDPringle Dec 27, 2011

Yes, any change in code around an error statement would merge better into 4.1. In 4.0, the similar errors are numbered 208 and 242; 51 and 276; and 301 and 317. These messages only differ by case or punctuation. We could get rid of 242, 276 and 317.

27. Comment by DerekParnell Dec 27, 2011

Removing those duplicated message texts seems harmless enough.

28. Comment by SDPringle Jan 02, 2012

See: hg:euphoria/rev/0899e1ae3d8d

changeset: 5440:0899e1ae3d8d branch: 4.0 parent: 5406:09cc7785b11a user: Shawn Pringle <shawn.pringle@gmail.com> date: Mon Jan 02 20:26:20 2012 -0300 files: source/il.e source/main.e source/msgtext.e description:

  • changed numbers of nearly exactly the same messages for earlier numbers
  • this is preparation for ticket 715

29. Comment by SDPringle Jan 02, 2012

We can refer by enum name instead of number when we have alternative_literals merged into 4.1.

I wrote a small program for digesting strings into ids and it processes the source of EUPHORIA. It can sometimes leave us with really long constant names however.

function only_alphanumeric(sequence s) 
	sequence t = "" 
	for si = 1 to length(s) do 
		if t_alnum(s[si]) then 
			t = upper(append(t,s[si])) 
		elsif length(t) and t[$] != '_' then 
			t = append(t,'_') 
		end if 
	end for 
	if length(t) and t[$] = '_' then 
		t = t[1..$-1] 
	end if		 
	return t 
end function 
		 
function make_id( integer val ) 
	sequence t = GetMsgText(val) 
	t = t[10..$] 
	return "MSG_" & only_alphanumeric(t) 
end function 

Basically the ids are about the length of the string itself. It is common to have symbols as long as 'MSG_PREPARES_A_FILE_FOR_USE_ON_WINDOWS'. Most of them are shorter, there are a few that are very long. Of course one can always truncate them. Only one is longer than 100 characters long.

30. Comment by SDPringle Jan 02, 2012

Run this program in a source directory http://openeuphoria.org/pastey/107.wc. I think we should mark this as blocked for now. Until we can get name_of working in all instances, Derek's original objection will be an issue.

31. Comment by jimcbrown Jan 02, 2012

After a lot of consideration, I've finished reconsidering my views and I believe Derek is correct.

The error number should be shown to the user (and thus set in stone and etc). Acronyms are not pleasant to deal with (especially when we have hundreds of them), and completely spelling out the entire phrase using complete words (MSG_ERROR_OUT_OF_MEMORY) is only practical for English speakers (unless we internationalise this part as well and then one day have to deal with the headache of cross-referencing error messages in different languages).

We could develop tools that could deal with cross-referencing error messages in French, Korean, Arabic, Greek, etc. to the original English, but that seems to be a violation of the KISS principle. I'm not a fan of magic numbers, but for error codes it seems like best practices.

32. Comment by SDPringle Jun 28, 2014

  1. Error messages are designed to be read by the end user. Why should
  2. Between patch releases, this change will not affect the numbering of the messages (because non will be added) and will only change in minor updates. Given the pace of development this should hardly be a concern.
  3. When merging new branches these ids will be completely new except for developers and users following that branch.
  4. Unless you are helping people over the phone, there is no advantage between three digit numbers or descriptive constant names.

33. Comment by jimcbrown Jun 28, 2014

re 1. Why should what?

re 2. The numbering shouldn't change at all, ever, since it's set in stone. Lets say that we went the other way, though, with enum names or descriptive names. I think it's pretty easy to get a numbering change in this case - two different devs find and fix two different bugs at the same time, but both of them require adding several new error messages each. This leads to...

re 3. When merging both feature branches into mainline, it's possible to get a conflict. So someone has to change their numbers in the end. Since these are feature branches, we can probably live with that. Things aren't set in stone until a post RC release.

re 4. Or in person, away from a computer. Or over any other set of communications that is primarily verbal. The other thing is, if using names, and english is not a language that person understands, and especially if that person does not know or use a Latin alphabet, then... it'd probably be pretty easy to confuse error messages without numbers.

Now, no one is going to code Euphoria without knowing the latin alphabet at least, but I can see Irv one day selling some of his custom products to a retailer, who in turn uses it in a product they sell to a rural village in China... now someone has to figure out how to pronouce MSG_COULDN_T_OPEN_1 over the phone.

34. Comment by SDPringle Jul 01, 2014

We can replace the magic numbers with enum without changing the values thereof. Is that acceptable?

35. Comment by SDPringle Jul 01, 2014

See: hg:euphoria/rev/505a4d8d1b36

changeset: 6239:505a4d8d1b36 branch: alternative_literals parent: 6234:7e622dc9b22d user: Shawn Pringle <shawn.pringle@gmail.com> date: Tue Jul 01 12:14:55 2014 -0300 files: source/block.e source/buildsys.e source/c_decl.e source/cominit.e source/compile.e source/coverage.e source/emit.e source/error.e source/fwdref.e source/global.e source/il.e source/inline.e source/intinit.e source/main.e source/make_message_enum.ex source/msgtext.e source/parser.e source/platform.e source/scanner.e source/symtab.e source/traninit.e description:

  • use type enum message_index for messages
  • fixes ticket 715

36. Comment by SDPringle Jul 01, 2014

See: hg:euphoria/rev/783448c7d99b

changeset: 6242:783448c7d99b tag: tip parent: 6236:faffe693f1dc user: Shawn Pringle <shawn.pringle@gmail.com> date: Tue Jul 01 09:40:03 2014 -0300 files: source/block.e source/buildsys.e source/c_decl.e source/cominit.e source/compile.e source/coverage.e source/emit.e source/error.e source/fwdref.e source/global.e source/il.e source/inline.e source/intinit.e source/main.e source/make_message_enum.ex source/msgtext.e source/parser.e source/platform.e source/scanner.e source/symtab.e source/traninit.e description:

  • added an enum type statement for message_index
  • enum type constants are used in the calls to message functions and in the implementation in the data structure used by said functions.
  • fixes ticket 715

37. Comment by jimcbrown Jul 01, 2014

The tricky bit there is that it's easier with an enum to accidently reorder something and change the magic numbers.

That said, maybe that's not a big enough deal to stop us from making that change. After some thought, I find that I don't have much opinion either way.

Search



Quick Links

User menu

Not signed in.

Misc Menu