1. 2.4 Official Release - machine level exception

Rob,=20
You are correct, I was running sanity.exw, probably some copy I made
18 months ago so I could double click it to run it with exw. Quite the
detective!  I've deleted that file now.
exw sanity.ex runs fine (I was kinda hoping it wouldn'tblink

I get a machine-level exception during the statement
	pos[P_yto]-=3Drely
where
	pos =3D {24646,7901,32476,9990}
	rely =3D -3543
	and P_yto is the constant 4

If I just insert eg 'if integer(rely) then end if' before that
statement it works fine.

Worrying.

If making such a tiny alteration makes the problem go away, it will
probably work fine on most other machines (or any with a slightly
different kernel32.dll etc)...

Any ideas?

Pete

new topic     » topic index » view message » categorize

2. Re: 2.4 Official Release - machine level exception

Pete Lomax wrote:
> I get a machine-level exception during the statement
> 	pos[P_yto]-=rely
> where
> 	pos = {24646,7901,32476,9990}
> 	rely = -3543
> 	and P_yto is the constant 4
> 
> If I just insert eg 'if integer(rely) then end if' before that
> statement it works fine.
> 
> Worrying.
> 
> If making such a tiny alteration makes the problem go away, it will
> probably work fine on most other machines (or any with a slightly
> different kernel32.dll etc)...
> 
> Any ideas?

Your little code fragment above works fine for me.
I assume it's part of a much bigger program that
has pokes, calls to C routines etc.
Chances are, you have gone out of bounds with a poke,
passed a bad value to a C routine, etc. The program
could work fine on one run, but fail mysteriously
on another, after any kind of trivial and
seemingly irrelevant change. In this particular case
you may be clobbering a Euphoria variable, stored
in memory near one of your allocated blocks. Perhaps this
is showing up now with 2.4, since the memory is organized
differently. When using Win32Lib, I think you are now
more likely to trash a Euphoria variable when you
go out of bounds, rather than trashing another
memory block that you allocated.

I would at least try safe.e (see instructions at top of safe.e).
Since Win32Lib has gone back to using Euphoria's allocate(),
safe.e is now more effective. By default, in 2.4 Official,
safe.e will only check the edges of blocks allocated
by allocate(), so you shouldn't get any false alarms,
but a lot of memory accesses will not be checked.

Regards,
    Rob Craig
    Rapid Deployment Software
    http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

3. Re: 2.4 Official Release - machine level exception

Andy Serpa wrote:
> I still the get a mystery crash once in a while.  I can never isolate or 
> duplicate it precisely (usually random numbers are involved and I can 
> never find a seed that reproduces the problem), but it seems to happen 
> in programs that sum up a large number of floating point values, often 
> incredibly small values, and then do something like invert it.  Can you 
> think of a reason adding up a bunch of floats (there are probably 
> precision issues here) would ever cause a problem?  

No, I haven't come across anything like that.

> Sometimes a number 
> that isn't quite zero is taken to be zero in some operations but not 
> others (precision again, I assume).  What happens if I do 
> 1/not_quite_zero ?

There was a problem in 2.3, fixed in 2.4...

"bug fixed: Translated code compiled with Borland C was not
  producing INF's and NAN's, like Watcom and Lcc. Rather, it was
  crashing when a floating-point overflow (over 1e308), or an
  undefined f.p. result was calculated. The Interpreter Source Code
  was also corrected for those who wish to compile exw.exe
  using Borland. Thanks to Andy Serpa."

I know you are still using 2.3 for some things.
I also noticed that Windows ME provides less
detail when a crash occurs than Win98 did, so you
might not see that it was a floating-point exception.

Regards,
    Rob Craig
    Rapid Deployment Software
    http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

4. Re: 2.4 Official Release - machine level exception

> Pete Lomax wrote:
> I get a machine-level exception during the statement
>     pos[P_yto]-=rely
> ...
> If I just insert eg 'if integer(rely) then end if' before that
> statement it works fine.
>
> Worrying.
>
> If making such a tiny alteration makes the problem go away, it will
> probably work fine on most other machines (or any with a slightly
> different kernel32.dll etc)...
>
> Any ideas?

I've found the bug. If you use a statement with
subscripting on the left, and an assignment op,
e.g. +=, -= etc., and the internal representation
of the statement occurs at just the wrong point
in my internal data structure, then you may get a
machine exception on that statement. The chances of
this happening are zero for programs under a couple of
thousand statements, and very remote, maybe
1 program in 1000, for larger programs that use
subscripted assignment ops. (simple stuff like x += 1 is fine).
Any change that shifts the statement slightly in memory will
make the bug go away. e.g. adding or removing some earlier code,
turning trace or type_check on or off etc. If the bug is
there, it will always crash on that statement.
If it's not there, it will never crash.
It's not random in that sense. The Translator is not affected.
The bug has been there ever since assignment operators
were introduced in version 2.1, January 1999, and was never
detected before now. The new ability in 2.4 to catch
machine exceptions obviously helped a lot here.
The fix will be in the next release.

Regards,
    Rob Craig
    Rapid Deployment Software
    http://www.RapidEuphoria.com

new topic     » goto parent     » topic index » view message » categorize

5. Re: 2.4 Official Release - machine level exception

On 12 Jul 2003, at 1:27, Robert Craig wrote:

> 
> 
> > Pete Lomax wrote:
> > I get a machine-level exception during the statement
> >     pos[P_yto]-=rely
> > ...
> > If I just insert eg 'if integer(rely) then end if' before that
> > statement it works fine.
> >
> > Worrying.
> >
> > If making such a tiny alteration makes the problem go away, it will
> > probably work fine on most other machines (or any with a slightly
> > different kernel32.dll etc)...
> >
> > Any ideas?
> 
> I've found the bug. If you use a statement with
> subscripting on the left, and an assignment op,
> e.g. +=, -= etc., and the internal representation
> of the statement occurs at just the wrong point
> in my internal data structure, then you may get a
> machine exception on that statement. The chances of
> this happening are zero for programs under a couple of
> thousand statements, and very remote, maybe
> 1 program in 1000, for larger programs that use
> subscripted assignment ops. (simple stuff like x += 1 is fine).
> Any change that shifts the statement slightly in memory will
> make the bug go away. e.g. adding or removing some earlier code,
> turning trace or type_check on or off etc. If the bug is
> there, it will always crash on that statement.

Hmm, that might explain why some programs only run with trace, and crash 
without trace. I think i'll delete all those assignment ops.

Kat


> If it's not there, it will never crash.
> It's not random in that sense. The Translator is not affected.
> The bug has been there ever since assignment operators
> were introduced in version 2.1, January 1999, and was never
> detected before now. The new ability in 2.4 to catch
> machine exceptions obviously helped a lot here.
> The fix will be in the next release.
> 
> Regards,
>     Rob Craig
>     Rapid Deployment Software
>     http://www.RapidEuphoria.com
> 
> 
> 
> TOPICA - Start your own email discussion group. FREE!
> 
>

new topic     » goto parent     » topic index » view message » categorize

6. Re: 2.4 Official Release - machine level exception

On Sat, 12 Jul 2003 00:43:52 -0500, gertie at visionsix.com wrote:

>Hmm, that might explain why some programs only run with trace, and crash=
=20
>without trace. I think i'll delete all those assignment ops.
BTW, It's only subscripted assignment ops, Kat.

I spent pretty much all of Thursday on this, anyone running on 2.4
should not be worried at all (IMHO), since it identifies the line it
is having a problem with interpreting correctly (2.4 tells you about
the machine level exception), and virtually any change you make to the
code makes it go away.

In contrast, a global edit (which I accept may be a valid option for
pre-2.4 users) just might introduce a much more annoying bug, or
indeed move a previously working subscripted assignment op in 3rd
party code onto a critical boundary. (rare, but possible)

Same deal, I think with library authors: even if we assume zero
introduced typos, if they chose to remove subscripted assignment ops
they are just as likely (being highly unlikely in either case) to stop
users' code working as to help them any.

Just my tuppenceworth,
Pete

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu