1. Casting int <--> double and signs

Reverted my 4.1 source changes back to 6130, except for a few more minor changes to include/std. Right now, on OS X, the interpreter fails : t_callc.e, t_inline.e, t_io.e, t_net_http.e, and t_socket.e for 32-bit; and it fails t_callc.e, t_de_dep.e, t_de_safe.e, t_dep.e, t_inline.e, t_io.e, t_legacy_callback.e, t_net_http.e, t_safe.e, and t_socket.e for 64-bit.

I had fixed a few of these, but I got lost on the way and I think that I introduced some subtle bugs where the wrong pointer is being dereferenced. Once I discovered a hard failure (Bus Error or eui unable to even open Sanity.ex), I started backing out changes one at a time but I couldn't link the change to the error.

Plus one time when I was tracing be_callc.c with gdb, I would hit line 1513 but then it would jump to 1562, skipping a lot of logic in between and returning the wrong value. It had to have something to do with optimizations but I couldn't figure out what.

So while I think I was on the right track, I figured that I would step back and take the changes more slowly.

Here are some other things that I found, along with some questions:

  • I found that printf specifier %d does not work correctly with integers larger than 0x7fffffffffffffc9. I can fix it, I just wanted to make a note.

Reading the manual, I've learned the following:

  • HEX constants are always positive.
  • and_bits, or_bits, not_bits, xor_bits should return signed integers.
    • The implementation currently returns unsigned.
  • allocates return uint addresses. Therefore, we should assume that addresses are uints, but should we require them to be? We should be able to work with integers from INTPTR_MIN to UINTPTR_MAX without issue.
  • Data can be int or uint, we have signed and unsigned versions of poke. Are they bounds checked? Should they be?

Going through the source:

  • Why does the hex reading routine in scanner.e divide TMAXINT by 32?
new topic     » topic index » view message » categorize

2. Re: Casting int <--> double and signs

jaygade said...

I had fixed a few of these, but I got lost on the way and I think that I introduced some subtle bugs where the wrong pointer is being dereferenced. Once I discovered a hard failure (Bus Error or eui unable to even open Sanity.ex), I started backing out changes one at a time but I couldn't link the change to the error.

Those sorts of issues can be very tricky. I had tracked down a couple of bad casts:

      -		return (object)(int)(DBL_PTR(x)->dbl); 
      +		return (object)(DBL_PTR(x)->dbl); 
 
      -							? _x : NewDouble((eudouble)_x);})) 
      +							? _x : NewDouble((eudouble)(uintptr_t)_x);})) 

These were causing the problems with t_rand.e on 64-bit. But t_rand.e was segfaulting on 32-bit, and I hadn't had a chance to track that down.

jaygade said...
  • and_bits, or_bits, not_bits, xor_bits should return signed integers.
    • The implementation currently returns unsigned.

I think this is a problem with the manual. There was a lot of discussion about this, and the decision was made to return unsigned values.

jaygade said...
  • allocates return uint addresses. Therefore, we should assume that addresses are uints, but should we require them to be? We should be able to work with integers from INTPTR_MIN to UINTPTR_MAX without issue.

When using a pointer (in the back end or translated) it will be cast to a pointer type, and it's interpreted as unsigned. I suppose we could return it as a signed value, which on 32-bits would keep more pointers as integers, but this seems like a bad idea.

jaygade said...
  • Data can be int or uint, we have signed and unsigned versions of poke. Are they bounds checked? Should they be?

We have signed and unsigned versions of peek. Pokes don't matter. What sort of bounds checking? Something beyond what std/safe.e does?

jaygade said...
  • Why does the hex reading routine in scanner.e divide TMAXINT by 32?

It's to prevent overflow. It's been a while, but I think the problem is that if it's greater than that, then the next digit (if one exists) will cause the value to be stored as floating point. If it was the last digit, then it doesn't really matter, but if there are more, this keeps everything OK. We resume checking digits below.

Matt

new topic     » goto parent     » topic index » view message » categorize

3. Re: Casting int <--> double and signs

Thanks for the answers, Matt.

I knew I had probably introduced some bad casts, when my intention was to either remove redundant casts (casting NewDouble to an object when it already returns an object and the like) and fix casts where the result could be undefined (casting a double to an int or uint without determining sign and over/underflow).

I went a little overboard and got lost in the weeds. I didn't even realize that t_rand.e was causing a bus error/segfault on 32-bit because that didn't show up in test-report.html when I ran 'make test'. I assumed it was only failing on 64-bit.

mattlewis said...

I think this is a problem with the manual. There was a lot of discussion about this, and the decision was made to return unsigned values.

I think I want to revisit this. I'll check the sourceforge dev list archives to see if I can find the previous discussion, but my argument would be this: If the inputs are signed, then the outputs should be, otherwise they should be unsigned, with the exception of not_bits which should always return a signed value.

This is for consistency. We can either have not_bits(not_bits(-1)) = -1 or we can have not_bits(not_bits(#FFFFFFFF)) = #FFFFFFFF (32-bits used for brevity) because #FFFFFFFF != -1. I'll think about it some more and see if we can have both, I really think that we can.

On the plus side if it returns signed, that can be a way of converting an unsigned hex constant to a signed integer.

mattlewis said...

When using a pointer (in the back end or translated) it will be cast to a pointer type, and it's interpreted as unsigned. I suppose we could return it as a signed value, which on 32-bits would keep more pointers as integers, but this seems like a bad idea.

Maybe I'm misunderstanding, but all pointers in the back end are signed types (object is typedef'd to intptr_t) so that it can be determined if it is an integer, atom, or sequence.

I agree that it should be returned to the user as an unsigned pointer; but that users should be allowed to use signed pointers as arguments if they want. That's what I meant -- we should assume that addresses are uints for output back to the user, but we should accept either uints or signed ints FROM the user. Maybe they got a signed value from an expression? I don't know.

I guess what I mean is that the policy should be that the user shouldn't have to worry about signed vs. unsigned unless they want to do so.

mattlewis said...

We have signed and unsigned versions of peek. Pokes don't matter. What sort of bounds checking? Something beyond what std/safe.e does?

I'm not sure what safe.e does. I'll have to go look. What happens if you do

poke2u(addr, -16380) 
poke2s(addr, 60956) 

Do you get an error, or does it convert signed<-->unsigned silently?

mattlewis said...

It's to prevent overflow. It's been a while, but I think the problem is that if it's greater than that, then the next digit (if one exists) will cause the value to be stored as floating point. If it was the last digit, then it doesn't really matter, but if there are more, this keeps everything OK. We resume checking digits below.

Thanks, that actually makes sense thinking about it. Was 'i' originally declared as an integer? Because it is declared an atom now.

As I think about this, does it make more sense to be one of

i < TMAXINT/16 -- leave room for one more digit 
-- equivalent to i <= TMAXINT/32?  
--snip-- 
if i>= TMAXINT then 
    is_int = FALSE 

Or is that still subject to overflow?

Regardless, it seems to work as-is.

new topic     » goto parent     » topic index » view message » categorize

4. Re: Casting int <--> double and signs

jaygade said...
mattlewis said...

When using a pointer (in the back end or translated) it will be cast to a pointer type, and it's interpreted as unsigned. I suppose we could return it as a signed value, which on 32-bits would keep more pointers as integers, but this seems like a bad idea.

Maybe I'm misunderstanding, but all pointers in the back end are signed types (object is typedef'd to intptr_t) so that it can be determined if it is an integer, atom, or sequence.

I agree that it should be returned to the user as an unsigned pointer; but that users should be allowed to use signed pointers as arguments if they want. That's what I meant -- we should assume that addresses are uints for output back to the user, but we should accept either uints or signed ints FROM the user. Maybe they got a signed value from an expression? I don't know.

I guess what I mean is that the policy should be that the user shouldn't have to worry about signed vs. unsigned unless they want to do so.

An object isn't a pointer (I'm talking about the C code now). Whatever its value is, before we use it as a pointer in C code, it gets cast to some pointer type. I don't see anything about what you said users should be able to do that shouldn't work in practice, assuming you got pointers that had a negative representation as a signed integer. Definitely won't happen in 64-bit. Not sure how often that might happen in 32-bit.

jaygade said...
mattlewis said...

We have signed and unsigned versions of peek. Pokes don't matter. What sort of bounds checking? Something beyond what std/safe.e does?

I'm not sure what safe.e does. I'll have to go look. What happens if you do

poke2u(addr, -16380) 
poke2s(addr, 60956) 

Do you get an error, or does it convert signed<-->unsigned silently?

That would give you an error, since there are no such things as poke2u or poke2s. Just poke2. Yes, it's converted automatically for you. But you could do the following:

include std/machine.e 
 
atom addr = allocate( 100 ) 
poke2(addr, -16380) 
? peek2u( addr )  -- 49156 
poke2(addr, 60956) 
? peek2s( addr )  -- -4580 
free( addr )  
jaygade said...

Thanks, that actually makes sense thinking about it. Was 'i' originally declared as an integer? Because it is declared an atom now.

It probably was an integer originally. But this has obvious problems if you're translating 64-bit code with a 32-bit translator, which I think is why it was changed.

jaygade said...

As I think about this, does it make more sense to be one of

i < TMAXINT/16 -- leave room for one more digit 
-- equivalent to i <= TMAXINT/32?  
--snip-- 
if i>= TMAXINT then 
    is_int = FALSE 

Or is that still subject to overflow?

Regardless, it seems to work as-is.

I try not to think too hard about this, because I usually come up with the wrong answer, so I test it out. tongue

Matt

new topic     » goto parent     » topic index » view message » categorize

5. Re: Casting int <--> double and signs

mattlewis said...

An object isn't a pointer (I'm talking about the C code now). Whatever its value is, before we use it as a pointer in C code, it gets cast to some pointer type. I don't see anything about what you said users should be able to do that shouldn't work in practice, assuming you got pointers that had a negative representation as a signed integer. Definitely won't happen in 64-bit. Not sure how often that might happen in 32-bit.

But an object can represent a pointer, an integer, a Euphoria integer, or a munged pointer to a DBL or a SEQ. We don't know until we test it.

And there are points when an object which represents a value outside of the range of a Euphoria integer (or outside of the range of an unsigned Euphoria integer) gets cast into a eudouble.

I see what you're saying, I have to think about this a bit more. The reason it's cast to unsigned when converting to a DBL_PTR or SEQ_PTR is because shifts of some signed values are either undefined or implementation-defined in C. C only cares if a pointer is signed if you are doing math on it.

Like I said, I need to wrap my head around this some more. Sometimes going through all the casts makes me think I'm working with LISP. But then, I've never worked with LISP.

What I really want to understand is what assumptions is the code making at any given point? Is the assumption correct? If it is not correct, then how to correct it?

mattlewis said...

That would give you an error, since there are no such things as poke2u or poke2s. Just poke2. Yes, it's converted automatically for you.

Heh -- I knew this, but forgot it when I didn't want to type in a 64-bit signed value for poke8. I was trying to think of a situation where an unsigned value would be assumed, but a signed value would be possible.

mattlewis said...

It probably was an integer originally. But this has obvious problems if you're translating 64-bit code with a 32-bit translator, which I think is why it was changed.

That makes sense. I'll take your advice and test it.

new topic     » goto parent     » topic index » view message » categorize

6. Re: Casting int <--> double and signs

jaygade said...
mattlewis said...

An object isn't a pointer (I'm talking about the C code now). Whatever its value is, before we use it as a pointer in C code, it gets cast to some pointer type. I don't see anything about what you said users should be able to do that shouldn't work in practice, assuming you got pointers that had a negative representation as a signed integer. Definitely won't happen in 64-bit. Not sure how often that might happen in 32-bit.

But an object can represent a pointer, an integer, a Euphoria integer, or a munged pointer to a DBL or a SEQ. We don't know until we test it.

And there are points when an object which represents a value outside of the range of a Euphoria integer (or outside of the range of an unsigned Euphoria integer) gets cast into a eudouble.

Yes, but in C, it's still just a signed in of some sort. We do conversions and some bit twiddling to recover the pointer (which relies on the memory alignment of pointers such that we can discard the last few bits). After we do that twiddling, we cast it to an actual pointer type. In these cases struct s1* or struct d* (which are typecast to s1_ptr and d_ptr).

jaygade said...

I see what you're saying, I have to think about this a bit more. The reason it's cast to unsigned when converting to a DBL_PTR or SEQ_PTR is because shifts of some signed values are either undefined or implementation-defined in C. C only cares if a pointer is signed if you are doing math on it.

Actual C pointers are never signed. Actually, if you look at those macros, you'll see that we case objects to uintptr_t before we fiddle the bits and use them as pointers.

Matt

new topic     » goto parent     » topic index » view message » categorize

7. Re: Casting int <--> double and signs

It's cast to uintptr_t so that the and << and >> operators are defined to work correctly. Shifting signed values can lead to undesired or unpredictable results.

This could be by design or by accident. Either way, it wouldn't otherwise matter. When casting to an actual pointer type, it doesn't matter if the original value is an intptr_t or uintptr_t. They are equal so long as all of their bits are the same.

The only reason to use uintptr_t or intptr_t is to perform unsigned or signed arithmetic.

Anyway, I'm arguing semantics on stuff that doesn't matter. I'm not trying to be argumentative -- even as I'm responding to this I'm learning new stuff that I didn't know before.

But like I said, I need to learn the different assumptions in different areas of the code.

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu