Euphoria Ticket #59: Doesn't detect up to date correctly

The testing that euc does to see if object files are up to date in order to avoid compiling them is incorrect. Even if a particular euphoria source file does not change, its translation can, due to optimizations in the translator. Consider the following:

-- foo.ex 
include bar.e 
bar(1) 
-- bar.e 
export procedure bar( object x ) 
    ? x & 3 
end procedure 

Translate and run:

>euc -keep -con foo 
Translating code, pass: 1 2 3  generating 
Compiling with Watcom 
Compiling  10% init-.c 
Compiling  40% foo.c 
Compiling  60% main-.c 
Compiling  80% bar.c 
Linking 100% foo.exe 
 
To run your project, type foo.exe 
 
>foo 
{1,3} 

Now, change foo.ex:

-- foo.ex 
include bar.e 
bar(1) 
bar("ex") 

>euc -keep -con foo 
Linking 100% foo.exe 
 
>euc -keep -con foo 
Translating code, pass: 1 2 3  generating 
Compiling with Watcom 
Compiling  10% init-.c 
Compiling  40% foo.c 
Compiling  60% main-.c 
Skipping   80% bar.c (up-to-date) 
Linking 100% foo.exe 
 
To run your project, type foo.exe 
>foo 
{1,3} 
{1,3} 

In fact, bar.c changed between the two, but since it wasn't recompiled, we get incorrect output.

Details

Type: Feature Request Severity: Minor Category: Translator
Assigned To: unknown Status: Accepted Reported Release: 2806
Fixed in SVN #: 2863, 2864, 2865, 2928, 2935 View VCS: 2863, 2864, 2865, 2928, 2935 Milestone: 4.1.0

1. Comment by jeremy Sep 13, 2009

Wow. This puts a wrench in the whole process! The only way I know how to handle this then is write to a temporary .c file, then do either a md5 sum or trust that we can detect based on file size. If it changed, then move it to the correct .c file name.

Any other suggestions?

2. Comment by mattlewis Sep 13, 2009

Yeah, that's basically what I was thinking. I'd try file size first, since it should be a lot faster (though I suppose we could test), and only do more detailed checks if it passes the file size test.

Or maybe we write out a checksum / md5 at the end of the file in a comment after the last pass. That should be easy to compare versions. If the file exists, then you could read the size / hash value before overwriting it. This would prevent the need for duplicate temp files.

3. Comment by DerekParnell Sep 13, 2009

Are you sure you've got the dependency check right? I had a quick glance at the code and I'm not sure its checking the right files. I would have thought that a .C file needs to be compiled if it is newer than the .EXE file. Comparing the date of the .C to the Euphoria file is not really relevant.

But maybe I'm misunderstanding the code in c_decl.e and buildsys.e

4. Comment by jeremy Sep 13, 2009

There are a few parts to the system:

  1. The translator first does a deep scan of the main source file and all includes. If any of those files are newer than the prjname.bld file, then the translator enters a full translation, otherwise, translation is skipped and a simple relink is done (i.e. you've updated one of the be_*.c files).
  2. If files have changed, then when the translator begins to parse a file, it is checked to see if the last .c file is older than the existing euphoria file. If so, then the .c file is opened and written to. If it is not, i.e. the .c file is newer than the .e/ex file, then the file handle that c_puts, etc... uses is set to 0. Thus, a new .c file is never written. This is needed because build tools don't check the content, they see that the actual timestamp has changed and they rebuild. Thus, if we write the .c file, a full rebuild happens all the time.

Due to #2, we cannot check the .c file vs. the .exe because we need to check to see if we should even write a .c file.

5. Comment by jeremy Sep 17, 2009

Matt, can you make it write a md5sum on the last pass? I'll take over from there. I am assuming this should be done only if direct build and the keep option is present.

6. Comment by mattlewis Sep 18, 2009

I've been thinking (a little) about what to use. If you look at how things are output, there are multiple ways. Do we need something like an md5sum? Would something simpler work? For instance, what if we simply xor'ed the length of the stmt parameter to c_stmt(), and then rotated it as a 30bit integer on each call? I suppose we could experiment a little with something like that. It seems like it would be good enough at detecting changes.

7. Comment by jeremy Sep 18, 2009

I don't think we need anything fancy. Just something that will be unique for that group of text and will be consistent, of course.

8. Comment by DerekParnell Sep 18, 2009

LOL, I almost wrote a comment about this too. I don't think MD5 is required. Something faster and not-so-perfect will suffice nicely.

Look at the 'crc.ex' program in /source for instance.

9. Comment by mattlewis Sep 18, 2009

Ok, I've added a checksum. It's based on the length of the 'format' strings sent to write to the c files, in addition to the number of statements that have been emitted to the file. See buildsys.e:update_checksum() for the function. I haven't tested it extensively, though it does detect the above scenario correctly.

It occurs to me (now that I've committed it) that it probably won't detect simple changes, like changing a digit, so it's not quite right, but better than what we currently have, so you can start updating the detection logic.

10. Comment by mattlewis Sep 18, 2009

Ok, svn:2864 catches small things like digit changes. I also moved the output and resetting into buildsys.e to make it easier for Jeremy to use that data.

11. Comment by DerekParnell Sep 18, 2009

I saw the generated C code for the update_checksum() function and figured we could do better by using the built-in hash() function. The generated code is now much smaller and a bit faster too.

12. Comment by mattlewis Oct 12, 2009

I temporarily disabled up-to-date checking in svn:2928 until we finish the checksum logic. I started trying to implement that, but got into trouble trying to figure out how exactly the build system was tracking files.

13. Comment by jeremy Oct 12, 2009

What do you mean by tracking files?

14. Comment by mattlewis Oct 12, 2009

Basically, how it figured out which files were out of date or not. I'd started messing with outdated_files, but that wasn't quite right. I think I needed to understand generated_files a little better than I did.

15. Comment by jeremy Oct 12, 2009

Take a look at quick_has_changed in buildsys.e. That's how it currently (and incorrectly) determines out of date files. When translating with the -keep option, euc writes a .bld file which contains a list of the files that it needs to translate. Then when building again, it compares the file dates to the .bld file.

Now, this has to change as I am comparing to see if the .e/.ex files have changed. With the new info you gave me it seems that we have to translate each file regardless if the .e/.ex has changed. Thus, the preliminary checks I am doing are almost totally invalid. The only thing it will catch is if only the library has changed, which is an important check. The way Euphoria should be built is to compile the backend into a .lib file, and then just call euc. That's the way the Bakefiles work. Therefore, when changing just backend source, euc will not trigger an entire build just a relink.

16. Comment by mattlewis Oct 12, 2009

quick_has_changed() is part of it, but there's also the handling of outdated_files in c_decl.e and buildsys.e. Basically, I was having trouble connecting a checksum with its associated element in outdated_files.

The quick check makes sense, however, we need to do it for all of the source files to make it accurate. We'd probably still have to scan and parse the code, but we could determine whether the euphoria code had changed since the last translation at least.

17. Comment by mattlewis Oct 12, 2009

There's also the issue where the translated C code didn't change, but the object file is missing, either because the user deleted it or because a previous translation was stopped before it completed.

18. Comment by jeremy Oct 12, 2009

Well, I think that if the quick out of date check shows no files out of date, then parsing of the code is not necessary. Now, if any of those are out of date, then every one has to be parsed and translated to it's own, new .c file. And only if the new resulting .c file is different (checksum) than the old .c file, then move it to it's real .c name.

I think most of the code needs to be scrapped to make this all happen. I don't think you can get by with just updating or changing the outdated files sequence.

19. Comment by mattlewis Oct 14, 2009

I disabled the quick checking for now, too. Lowered the severity to normal. We need to fix this (as it's a great feature) but I don't think it should be listed as blocking.

20. Comment by DerekParnell Aug 04, 2010

So the simple story is, if any source file is modified then the target needs to be rebuilt from scratch. Or in other words, the target does not need to be built only if all its source files have not changed.

If this is so, I'd argue that checking for updated files is probably a waste of time.

21. Comment by jimcbrown Aug 04, 2010

You do save time not compiling C code that has not changed since the previous translation and compile. Calculating and comparing checksums is a lot faster than compiling C code.

Since we need to retranslate the C code anyways, and also make sure that the object code still exists (otherwise we need to recompile the C code anyways even if it is unchanged), implementing this feature is arguably quite hard to do.

However, I'm strongly inclined to agree with Derek, since a) the translation phase is probably the larger of the two, and b) ccache implements this feature for gcc already, so the gain from doing it ourselves is quite small and really only benefits OpenWatcom compiler users...

22. Comment by jimcbrown Aug 04, 2010

I'm ok with closing this out as a "won't fix" or downgrading this to a feature request.

23. Comment by DerekParnell Aug 05, 2010

Changed this to a feature request.

Search



Quick Links

User menu

Not signed in.

Misc Menu