This is a corrected version of:
bc886e9b58
I made a copy-paste error that created an "add" instead of the
intended "sub" on that attempt. The regression tests showed the
bug, but I overlooked that.
As I said in a comment on issue #58717, the bug reports resulting
from the botched patch confirm that the pattern does occur in
many real-world applications, so hopefully eliminating the multiply
results in better code.
I added one more regression test in this version of the patch,
and here's an Alive2 proof to show that exact example:
https://alive2.llvm.org/ce/z/dge7VC
Original commit message:
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e
I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
Differential Revision: https://reviews.llvm.org/D137157
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e
I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
X * ((1 << Z) + 1) --> (X << Z) + X
https://alive2.llvm.org/ce/z/P-7WK9
It's possible that we could do better with propagating
no-wrap, but this carries over the existing logic and
appears to be correct.
The naming differences on the existing folds are a result
of using getName() to set the final value via Builder.
That makes it easier to transfer no-wrap rather than the
gymnastics required from the raw create instruction APIs.
If the divisor is a power-of-2 or negative-power-of-2 and the dividend
is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)
This isn't the most direct optimization (we could create ashr in these
examples instead of relying on existing folds for exact divides), but
it's possible that there's a more general constraint than just a pow2
divisor, so this might be extended in the future.
This should solve issue #58348.
Differential Revision: https://reviews.llvm.org/D135970
This should be functionally equivalent - both calls are thin
wrappers around computeKnownBits(). We'll probably want to use
known-bits directly in follow-up patches because that could
determine "exact" for example (see issue #58348).
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/CLKzqT
This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.
This extends 008a89037a and is another transform
derived from issue #58137.
(X << Z) / (Y << Z) --> X / Y
https://alive2.llvm.org/ce/z/E5eaxU
This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
((Op1 * X) / Y) / Op1 --> X / Y
https://alive2.llvm.org/ce/z/JYxWjA
InstSimplify handles the more basic mul+div pattern with
shared operand, but we don't seem to have any reassociation
folds to handle cases where the common op is further away.
This is a generalization of 9cff4711ac and another
transform derived from issue #58137.
The 1st attempt failed to updated the test checks as expected.
Original commit message:
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)
https://alive2.llvm.org/ce/z/kB6VF7
It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)
https://alive2.llvm.org/ce/z/kB6VF7
It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).
InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.
I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR
Differential Revision: https://reviews.llvm.org/D133667
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278
Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
Currently, clang ignores the 0 initialisation in finite math
For example:
```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
f_prod *= arr[i];
}
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.
Reviewed By: fhahn, spatel
Differential Revision: https://reviews.llvm.org/D131672
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).
See issue #56810 for a C source example that exposed the bug.
Replace ConstantExpr:getFAdd etc with call to
ConstantFoldBinaryOpOperands(). I'm using the constant folding API
rather than IRBuilder here to ensure that this does actually
constant fold. These transforms don't use m_ImmConstant(), so this
would not otherwise be guaranteed (and apparently, they can't use
m_ImmConstant because they want to handle scalable vector splats).
There is an opportunity here to further migrate these to the
ConstantFoldFPInstOperands() API, which would respect the denormal
mode. I've held off on doing so here, because some of this code
explicitly checks for denormal results, and I don't want to touch
it in a mostly NFC change.
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName". This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.
This is the alternative to the less invasive clang-format only patch: D126783
Reviewed By: spatel, rengolin
Differential Revision: https://reviews.llvm.org/D126889
https://alive2.llvm.org/ce/z/o7rQ5q
This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.
Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.
This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS
With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).
Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.
This is similar but not the same optimization that is requested
in #55618.
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
This is similar to recent changes for urem and sdiv:
d428f09b2c99ef341ce9
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Presumably, in real cases that are similar to the tests
where a subsequent transform removes the rem, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
This is similar to a recent change for urem:
d428f09b2c
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.
There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
libcalls." (was 0f8c626). This reverts commit 14d9390.
The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.
In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.
Reviewed By: Eli Friedman
Differential Revision: https://reviews.llvm.org/D123198
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.
This patch slove TODO left in InstCombine/div.ll.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D122967
Rather than queuing up actions, have one function that does the
log2() fold in the obvious way, but with a flag that allows us
to check whether the fold will succeed without actually performing
it.
What we're really doing here is converting Op0 udiv Op1 into
Op0 lshr log2(Op1), so phrase it in that way. Actually pushing
the lshr into the log2(Op1) expression should be seen as a separate
transform.