LLVM’s shifty semantics

When we designed Alive, our interpretation for the shift instruction produced undefined behavior when the shift amount is greater than the width of the integer. We have been thinking about alternative interpretations for the shift instruction because we think this interpretation is inconsistent with LLVM’s language reference.

When using a shift operation on n-bit integers, LLVM requires the second argument (i.e., shift amount) to be less than n. One might expect that, say, left-shifting by more than n would result in zero, since all the bits have been shifted out, but LLVM does not require this because several popular processor architectures, including x86 and ARM, do not work this way. On x86, for example, shifting a 32-bit integer by 33 places is the same as shifting it 1 place. By restricting the shift amount, LLVM is free to map its shift instructions directly to the processor’s instructions without needing to insert range checks.

This leaves the question of what to do when the shift amount is too large, and this is where Alive (i.e., the main branch of Alive) diverged from LLVM. Alive treated too-large shifts as undefined behavior, similar to division by zero.

However, the relevant text in the LLVM Language Reference  states:

If [the shift amount] is (statically or dynamically) equal to or larger than the number of bits in [the first argument], the result is undefined.

Compare this with the language used for the division and remainder operations:

Division by zero is undefined behavior.

While the language reference doesn’t actually define what “the result is undefined” means, the intention seems to be something similar to undef, LLVM’s most restricted form of undefined behavior. That is, a shift with a too-large shift amount can produce an arbitrary bit-pattern, but cannot produce a poison value or have any side-effect.

A broken optimization

We have implemented this semantics in a variant of Alive-NJ, and using it we have discovered an optimization in the Alive suite, which Alive had incorrectly validated earlier.

  %Op0 = shl 1, %Y
  %r   = mul %Op0, %Op1
  %r   = shl %Op1, %Y

This transforms (1 << Y) * X to X << Y, an optimization which seems acceptable. But Alive-NJ reports a counterexample:

ERROR: Mismatch in values for i4 %r                                                                 

i4 %Y   = 0x8 (8, -8)
i4 %Op0 = 0x0 (0)
i4 %Op1 = 0x0 (0)
source: 0x0 (0)
target: 0x1 (1)

Essentially, if %Y is too large, then the shift %Op0 returns an undefined value. In the source, this might be subsequently multiplied by 0 or an even number, resulting in 0 or a partially-defined value. In contrast, the target produces a fully-undefined (undef) value regardless of the value of %Op1

For example, if %Y is 8 and %Op1 is 2, then %r will always be an even number in the source, but %r in the target may take any value. The optimization has added new possible behaviors to the program, and must therefore be rejected.

We have confirmed that this optimization is performed by LLVM 4.0. For example, this function:

int foo(int x, int y) {
    return (1 << y) * x;

will have a single shift operation after optimization.

But is it actually broken?

The argument that (1 << y) * x and x << y are different relies on LLVM’s semantics for undef, but these arguably do not reflect actual processor behavior. For example, LLVM says 2 << 33 could be any 32-bit integer, but we are not aware of any processor that would produce an odd value here.

In the forthcoming paper, “Taming undefined behavior” (PLDI 2017), Lee and others propose combining undef with LLVM’s notion of poison values. The relevant part here is that any arithmetic operator with a poison argument always returns poison. While multiplying undef by 0 yields 0, multiplying poison by 0 yields poison.

If shifts return poison when the shift amount is too large, then the optimization above is valid, as the original and optimized code both return poison in that case.

What about nsw, nuw, and exact?

LLVM provides three attributes which make stronger claims about the range of values its arguments may take. LLVM’s left shift, shl, can be modified by one or both of nsw, meaning no-signed-wrap, and nuw, meaning no-unsigned-wrap. Essentially, these are promises that the shift x << y is actually the same as x⋅2y. Similarly, the arithmetic and logical right shifts, ashr and lshr, can be modified by exact, which promises that no one bits are shifted out. If a shift with an attribute is given arguments which do not meet its requirements, it produces a poison value.

It is tempting to assume that a shift with an attribute always produces poison if the shift amount is too large, but this does not appear to be what the language reference specifies.

For shl nsw, the reference states that it is poison “if it shifts out any bits that disagree with the resultant sign bit.” When the shift amount is too large, the resultant sign bit is undefined, so we can always choose a sign bit such that the shifted bits disagree.

In other words, this optimization is valid:

Pre: C u>= width(%x)
  %r = shl nsw %x, C
  %r = poison

In contrast, the reference states that shl nuw is poison “if it shifts out any non-zero bits.” If the argument is 0, the shift will not shift out any non-zero bits, so a shl nuw with a too-large shift amount produces poison for every input except 0.

That means that we cannot perform this optimization:

Pre: C u>= width(%x)
  %r = shl nuw %x, C
  %r = poison

Because %r is not poison when %x is zero.

Similarly, ashr exact and lshr exact are poison “if any of the bits shifted out are non-zero.”

Of course, if undef and poison are merged, then it would no longer be necessary to make this distinction.

The proposal for combining undef and poison in “Taming undefined behavior” (PLDI 2017) addresses many of the inconsistencies that are currently present in the LLVM language reference  including the issues highlighted in the above examples.

LLVM’s shifty semantics

Verification with Type Polymorphism in Alive

Over the last few years, we have been designing light weight formal tools to help LLVM developers. Alive domain-specific language to express peephole optimizations in LLVM is one such example,  which was developed  in collaboration with my student David Menendez, John Regehr and Nuno lopes. You can read more about Alive here.

One of the less elegant aspects of Alive is the way we handle type polymorphism. All values in LLVM are typed, and instructions are typically polymorphic over types, subject to some constraints (e.g., the arguments to add can be integers of any width, but they must have the same width). Similar to LLVM, Alive optimizations are usually generic over types with additional constraints  to ensure correctness.

Alive verifies the correctness of the optimization by encoding the optimization into SMT queries. When the Alive interpreter encodes optimizations with SMT bit vector theory, it needs to pick a width. Alive performs multiple translations using different type assignments.  Since LLVM does not have an upper bound on integer width, the Alive interpreter  arbitrarily picks 64 as the upper limit for bit widths.

Even with this limit, an Alive transformation with multiple type variables can easily require lots of SMT queries to prove correctness—and even this is only a partial proof, since we haven’t ruled out the possibility that the optimization goes wrong for, say, 79-bit integers.

This is unsatisfying. Most optimizations are correct for all feasible typings, or correct for none. Exceptions typically are invalid for small widths. For example, Alive optimization below is defined for all higher widths except for bit width  1, where shifting by 1 is undefined for 1-bit integers:

Pre: width(%b) > 1
%a = add %b, %b
%a = shl %b, 1

It is also possible to describe an optimization which is valid only for widths below a threshold:

Pre: width(%a) <= 2
%b = add %a, %a
%c = add %b, %b
%c = 0

These typically are only valid for 1-bit integers, but the fact that we can define artificial examples means we cannot rule out actual examples.

The question then becomes, do all cases follow one of these patterns? If some optimization has a type variable ranging over integer types, will the range of widths where the optimization is valid always be a contiguous subrange of the possible widths?

If so, it may be possible to find these ranges for an optimization and determine validity using only a small number of SMT queries. In particular, testing an optimization for 1-bit and 64-bit arithmetic would be enough to show that it is valid for all widths between 1 and 64.

Unfortunately, this is not the case with the current design choices in Alive. Consider this optimization:

Pre: C1 + C2 < width(%x)
%y = shl %x, C1
%z = shl %y, C2
%z = shl %x, C1 + C2

This is very similar to an optimization used in LLVM, except the precondition uses a signed comparison instead of an unsigned comparison. Amazingly, this optimization is valid for every bit width except three.

ERROR: Target introduces undefined behavior for i3 %z                                               

i3 %x = 0x0 (0)
i3 C1 = 0x2 (2)
i3 %y = 0x0 (0)
i3 C2 = 0x2 (2)
source: 0x0 (0)
target: undefined

In the counter-example provided by Alive, it has attempted to replace two shifts by 2 bits with a single shift by 4 bits, which has undefined behavior in 3-bit arithmetic. We might expect the precondition to exclude this case, since two plus two equals four, which exceeds width(%x), but the addition of C1 and C2 and  the comparison of the results with width(%x) are also done in 3-bit arithmetic, and the addition wraps around to -4. Because the precondition uses a signed comparison, it accepts this case, but the well-defined condition for shift uses an unsigned comparison (because LLVM interprets the right argument to a shift as an unsigned integer).

This problem does not arise for any other bit width. In the 1-bit case, no shifts are possible, so the optimization can never apply. For 2-bits, width(%x) gets interpreted as -2, the smallest signed value, so the precondition can never be satisfied. For widths larger than 3, the addition can only wrap in cases where one of the initial shifts is too large, meaning the behavior of the input code was undefined, so any transformation is valid.

This means we cannot assume that an optimization which is valid for 1-bit and 64-bit arithmetic is valid for all widths between 1 and 64, which means we must rely on exhaustive testing (up to a bound) for the foreseeable future.

(However, note that the 1- and 2-bit cases involve an unsatisfiable precondition. It remains to be seen if an optimization exists which is valid and applicable to a non-contiguous range of integer types.)

Verification with Type Polymorphism in Alive