Over the last few years, we have been designing light weight formal tools to help LLVM developers. Alive domain-specific language to express peephole optimizations in LLVM is one such example, which was developed in collaboration with my student David Menendez, John Regehr and Nuno lopes. You can read more about Alive here.
One of the less elegant aspects of Alive is the way we handle type polymorphism. All values in LLVM are typed, and instructions are typically polymorphic over types, subject to some constraints (e.g., the arguments to add can be integers of any width, but they must have the same width). Similar to LLVM, Alive optimizations are usually generic over types with additional constraints to ensure correctness.
Alive verifies the correctness of the optimization by encoding the optimization into SMT queries. When the Alive interpreter encodes optimizations with SMT bit vector theory, it needs to pick a width. Alive performs multiple translations using different type assignments. Since LLVM does not have an upper bound on integer width, the Alive interpreter arbitrarily picks 64 as the upper limit for bit widths.
Even with this limit, an Alive transformation with multiple type variables can easily require lots of SMT queries to prove correctness—and even this is only a partial proof, since we haven’t ruled out the possibility that the optimization goes wrong for, say, 79-bit integers.
This is unsatisfying. Most optimizations are correct for all feasible typings, or correct for none. Exceptions typically are invalid for small widths. For example, Alive optimization below is defined for all higher widths except for bit width 1, where shifting by 1 is undefined for 1-bit integers:
Pre: width(%b) > 1 %a = add %b, %b => %a = shl %b, 1
It is also possible to describe an optimization which is valid only for widths below a threshold:
Pre: width(%a) <= 2 %b = add %a, %a %c = add %b, %b => %c = 0
These typically are only valid for 1-bit integers, but the fact that we can define artificial examples means we cannot rule out actual examples.
The question then becomes, do all cases follow one of these patterns? If some optimization has a type variable ranging over integer types, will the range of widths where the optimization is valid always be a contiguous subrange of the possible widths?
If so, it may be possible to find these ranges for an optimization and determine validity using only a small number of SMT queries. In particular, testing an optimization for 1-bit and 64-bit arithmetic would be enough to show that it is valid for all widths between 1 and 64.
Unfortunately, this is not the case with the current design choices in Alive. Consider this optimization:
Pre: C1 + C2 < width(%x) %y = shl %x, C1 %z = shl %y, C2 => %z = shl %x, C1 + C2
This is very similar to an optimization used in LLVM, except the precondition uses a signed comparison instead of an unsigned comparison. Amazingly, this optimization is valid for every bit width except three.
ERROR: Target introduces undefined behavior for i3 %z Example: i3 %x = 0x0 (0) i3 C1 = 0x2 (2) i3 %y = 0x0 (0) i3 C2 = 0x2 (2) source: 0x0 (0) target: undefined
In the counter-example provided by Alive, it has attempted to replace two shifts by 2 bits with a single shift by 4 bits, which has undefined behavior in 3-bit arithmetic. We might expect the precondition to exclude this case, since two plus two equals four, which exceeds width(%x), but the addition of C1 and C2 and the comparison of the results with width(%x) are also done in 3-bit arithmetic, and the addition wraps around to -4. Because the precondition uses a signed comparison, it accepts this case, but the well-defined condition for shift uses an unsigned comparison (because LLVM interprets the right argument to a shift as an unsigned integer).
This problem does not arise for any other bit width. In the 1-bit case, no shifts are possible, so the optimization can never apply. For 2-bits, width(%x) gets interpreted as -2, the smallest signed value, so the precondition can never be satisfied. For widths larger than 3, the addition can only wrap in cases where one of the initial shifts is too large, meaning the behavior of the input code was undefined, so any transformation is valid.
This means we cannot assume that an optimization which is valid for 1-bit and 64-bit arithmetic is valid for all widths between 1 and 64, which means we must rely on exhaustive testing (up to a bound) for the foreseeable future.
(However, note that the 1- and 2-bit cases involve an unsatisfiable precondition. It remains to be seen if an optimization exists which is valid and applicable to a non-contiguous range of integer types.)