Re: load_map in map.e - unstructured programming example.
- Posted by jimcbrown (admin) May 08, 2014
- 1862 views
Unfortunately this is a long post because I feel it is necessary to provide background as the code is quite obscure.
Understandable.
However nothing is improved unless it is questioned.
Agreed.
I have been reluctant to post because I could be seen as 'picking-on' Euphoria and posting the negative.
You say that, but you also say this.
5 Conclusion
It seems to me the writer of this procedure was more taken with the cleverness of nested while with entry loop than with any integrity the program might have.
The latter is more 'picking-on' than any direct criticism of code could be.
A key-value line is <key> = <value> where <key> and <value> are Eu objects.
Lines are evaluated applying get:value() to the key and value and placed into a map.
They can also be unquoted text, which are turned into strings in Eu.
load_map is a potentially valuable procedure. It is a pity it is all but unusable.
It's designed to work with the output of save_map() and only that. Are you aware of any cases where save_map() will create a file that load_map() can't load? If not, then you are just plain wrong here.
Of course, perhaps there are enhancements to load_map() to make it more usuable when dealing with hand-cranked map files. I won't disagree there, I'll only point out that this wasn't the original purpose of load_map() but a concept that was added later.
load_map and save_map have been mentioned several times as having problems but the code has not been corrected (a major task?)
References to that would help. I admit that I generally don't follow stuff about maps in general, hence my lack of familiarity with these in particular.
or removed.
They work, so why would they be removed? This just seems silly.
start with the basic structure: using shorter names
I disagree in general - we want the names to be as readable as possible. ll is really bad in that regard, though changing line_in to line is probably ok.
ll = -1 -- not needed while sequence(line) with entry do if not sequence(ll) then ll = "" end if -- consequence of ll = -1
Conceded. We can probably do this instead:
object ll = "", ... ... while sequence(line) with entry do
though perhaps with at least some of the original variable names, instead of shorthand ones.
if length(line) then -- ignoring blank lines??
Yep. That's in spec. As you said,
Blank lines and comments are ignored.
Perhaps the documentation could be clearer about that.
if line[$] not in conts then -- clean ll exit -- exit condition should be end if -- length(ll) and not ll[$] in conts
I disagree. If that were the case, wouldn't the function exit as soon as it read the first line in, regardless of line continuation? It'd break the line continuation functionality.
-- but this is unsatisfactory because if ll is not "" -- and line = -1 then the loop exits with an -- incomplete ll is fed into the main loop.
I think we can live with that. We've hit the end of the file in that case, so it is guaranteed that there's no more line input to add to the logical line. We don't have the ability to make the logical line any more complete in that case.
What is, should be, happening in the inner loop - the inner loop is combining getting ll with completing ll. Hence the mess.
Note: it is much simpler to get ll before considering continuations
How do you get the logical line without completing it??? The logical line is the line that has all line continuations combined (so the multiple physical lines have already been merged into a single logical line).
ll line action correct action ----------------------------------------------------- "" -1 exit inner exit outer -- unnecessary
You're the one saying that the correct action is unnecessary here... I agree with that.
cont -1 exit inner error -- ll is incomplete
Like I said, I don't believe that this means the logical line is incomplete. There's no more physical lines left to complete it.
cont "" loop error -- comments should be separate
This case is either a blank line, or a comment. We just strip off comments (as we should be doing), so I'm not convinced that this is an error.
Also, for a logical line, we might have comments dispersed throughout the many physical lines. We don't want to drop the rest of the logical line the moment we see the first comment.
-- rewrite the inner loop while TRUE label "MAIN" do object ll, loop do ll = getline(fh) -- ** we are not defining getline here (see Part 2) if not sequence(ll) then exit "MAIN" end if until length(ll) != 0 end loop while find(ll[$], line_conts) do object ln = getline(fh) if not sequence(ln) then goto "ERROR" end if if not length(ln) then goto "ERROR" end if end while -- clean ll -- process ll end while
I have not included functions for getline() and clean() as they are not particularly relevant.
getline() must be handling the comments and line continuation. I think that's relevant. The entire algorithm as a whole can't be judged until it is completely revealed.
has_comment = search:rmatch("--", ln) -- note search from the right if has_comment != 0 then ln = text:trim(ln[1..has_comment-1]) else ln = text:trim(ln) end if
Conceeded. We should be looking for the first --, not the last. A comment can have multiple of -- in it.
[quote gimlet]
ll = search:match_replace(`",$"`, ll, "") -- assumes these sequences ll = search:match_replace(`,$`, ll, "") -- will not contain spaces delim_pos = find('=', ll)
Why does it matter if they have spaces or not? We just want to strip out the line continuation characters from the logical line. Then the delim_pos is looking for the equal sign - not a space.
Maybe we have a problem here of not signalling an error if a line continuation character is in the middle or beginning of a line, instead of at the end.
2 ignoring the possible error
Where is else clause?
This silent passing over of errors is totally anti-user
if delim_pos > 0 then ... end if if length(data_key) > 0 then ... end if
Conceded. Of course, I'd prefer a load_map() that had the option of going on and processing as many lines as it could, but some sort of error mechanism to warn the user about any invalid lines found makes sense too.
if conv_res[1] = stdget:GET_SUCCESS then ... end if
I think this last one is not an error - it's just that for strings, the quotes are optional.
If one record is misread then
The code skips it and goes to the next line. Basically, it conceptually filters out bad records. I can live with that.
Assuming the line ends when there are no more continuations. Or that the line doesn't end until there are no more continuations.
are unjustified assumptions.
I disagree. By definition, a logical line is a physical line unless that physical line has a line continuation - in which case it is the combination of all physical lines until the line continuation stops. You say so yourself, in fact:
A line may be laid out over several lines if each line except the last ends with a continuation character ( `,${` ).
Now, should having a line continuation at the end of a file be an error? I think not, for the same reason we decided to allow a list of constants to end with a comma - it's much easier to add a line to the bottom of the list in the future, lowers teh chance of svn/hg conflicts, etc.
Lines contain sequences among other things and sequences can nest and one can have a sequence as a key or a value.
But these should all be on one logical line.
One can easily miss a comma at line-end or before a comment.
Conceded. value()/get() reports this error to us, but we just ignore the entire record in that case.
Why does he ignore the possibility of special characters `=,${` appearing in quotes?
Conceded. That seems like a bug.
Why allow unquoted text?
Because this way there are no errors?
Also because it's easier to turn a keyvalue text file into a map this way?
4 the true test of fidelity
If the text read in produces the same number of key-value pairs as are defined and when read out the resultant text defines the same key-value pairs as in the original then you have a successful read (of one text).
I agree, but I guess the issue is what should load_map() do on error? Should it abort the program, return failure and no valid data, or keep going and get as much valid data as possible? Of course it should report errors (which it sadly does not), but I think it should also try to handle erors gracefully.

