Commit Graph

389 Commits

Author SHA1 Message Date
Matt Arsenault e661185fb3 InstCombine: Fold fdiv nnan x, 0 -> copysign(inf, x)
https://alive2.llvm.org/ce/z/gLBFKB
2022-11-07 22:00:15 -08:00
Sanjay Patel f03b069c5b [InstCombine] fold mul with decremented "shl -1" factor (2nd try)
This is a corrected version of:
bc886e9b58

I made a copy-paste error that created an "add" instead of the
intended "sub" on that attempt. The regression tests showed the
bug, but I overlooked that.

As I said in a comment on issue #58717, the bug reports resulting
from the botched patch confirm that the pattern does occur in
many real-world applications, so hopefully eliminating the multiply
results in better code.

I added one more regression test in this version of the patch,
and here's an Alive2 proof to show that exact example:
https://alive2.llvm.org/ce/z/dge7VC

Original commit message:

This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e

I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.

Differential Revision: https://reviews.llvm.org/D137157
2022-11-02 09:30:01 -04:00
Florian Mayer e1de7ac20f Revert "[InstCombine] fold mul with decremented "shl -1" factor"
This reverts commit bc886e9b58.

Broke MSAN bootstrap buildbots with Assertion `RangeAfterCopy % ExtraScale == 0 && "Extra instruction requires immediate to be aligned"' failed.
2022-10-31 17:39:05 -07:00
Sanjay Patel bc886e9b58 [InstCombine] fold mul with decremented "shl -1" factor
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e

I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
2022-10-31 09:06:55 -04:00
zhongyunde f58311796c [InstCombine] refactor the SimplifyUsingDistributiveLaws NFC
Precommit for D136015
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D137019
2022-10-30 21:04:06 +08:00
Sanjay Patel 6064e92b0a [InstCombine] fold mul with incremented "shl 1" factor
X * ((1 << Z) + 1) --> (X << Z) + X

https://alive2.llvm.org/ce/z/P-7WK9

It's possible that we could do better with propagating
no-wrap, but this carries over the existing logic and
appears to be correct.

The naming differences on the existing folds are a result
of using getName() to set the final value via Builder.
That makes it easier to transfer no-wrap rather than the
gymnastics required from the raw create instruction APIs.
2022-10-29 12:50:19 -04:00
Sanjay Patel 50000ec2cb [InstCombine] create helper function for mul patterns with 1<<X; NFC
There are at least 2 other potential patterns that could go here.
2022-10-29 12:50:19 -04:00
Sanjay Patel d344146857 [InstCombine] reduce code duplication in visitMul(); NFC 2022-10-29 09:26:02 -04:00
Sanjay Patel 44b7da89d7 [InstCombine] fmul nnan X, 0.0 --> copysign(0.0, X)
https://alive2.llvm.org/ce/z/ybgM5F

Differential Revision: https://reviews.llvm.org/D136166
2022-10-18 11:34:02 -04:00
Sanjay Patel e5ee0b06d6 [InstCombine] try to determine "exact" for sdiv
If the divisor is a power-of-2 or negative-power-of-2 and the dividend
is known to have >= trailing zeros than the divisor, the division is exact:
https://alive2.llvm.org/ce/z/UGBksM (general proof)
https://alive2.llvm.org/ce/z/D4yPS- (examples based on regression tests)

This isn't the most direct optimization (we could create ashr in these
examples instead of relying on existing folds for exact divides), but
it's possible that there's a more general constraint than just a pow2
divisor, so this might be extended in the future.

This should solve issue #58348.

Differential Revision: https://reviews.llvm.org/D135970
2022-10-16 10:59:56 -04:00
Sanjay Patel 340ae45be0 [InstCombine] use isKnownNonNegative() for readability; NFCI
This should be functionally equivalent - both calls are thin
wrappers around computeKnownBits(). We'll probably want to use
known-bits directly in follow-up patches because that could
determine "exact" for example (see issue #58348).
2022-10-16 10:59:56 -04:00
Sanjay Patel 7b9482df3d [InstCombine] fold sdiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/CLKzqT

This requires a surprising "nuw" constraint because we have
to guard against immediate UB via signed-div overflow with
-1 divisor.

This extends 008a89037a and is another transform
derived from issue #58137.
2022-10-12 11:32:15 -04:00
Sanjay Patel 008a89037a [InstCombine] fold udiv with common shl amount in operands
(X << Z) / (Y << Z) --> X / Y

https://alive2.llvm.org/ce/z/E5eaxU

This fixes the motivating example from issue #58137,
but it is not the most general transform. We should
probably also convert left-shift in the divisor to
right-shift in the dividend for that, but that exposes
another missed canonicalization for shifts and adds.
2022-10-12 11:12:26 -04:00
Sanjay Patel fe97f95036 [InstCombine] propagate "exact" through folds of div
These folds were added recently with:
6b869be810
8da2fa856f
...but they didn't account for the "exact" attribute,
and that can be safely propagated:
https://alive2.llvm.org/ce/z/F_WhnR
https://alive2.llvm.org/ce/z/ft9Cgr
2022-10-12 09:25:05 -04:00
Sanjay Patel d117ee25b8 [InstCombine] add helper function for div+shl folds; NFC
There are at least 2 similar patterns that could be added here,
and the existing fold can be improved because it fails to
propagate "exact".
2022-10-12 09:25:04 -04:00
Sanjay Patel 7ec604a317 [InstCombine] try harder to cancel out mul/div
((Op1 * X) / Y) / Op1 --> X / Y
https://alive2.llvm.org/ce/z/JYxWjA

InstSimplify handles the more basic mul+div pattern with
shared operand, but we don't seem to have any reassociation
folds to handle cases where the common op is further away.

This is a generalization of 9cff4711ac and another
transform derived from issue #58137.
2022-10-11 09:51:51 -04:00
Sanjay Patel 9cff4711ac [InstCombine] fold udiv with common factor
((X *nuw Y) >> Z) / X --> Y >> Z

https://alive2.llvm.org/ce/z/x3kKnq

This is similar to 6b869be810 / 8da2fa856f, but I have
not found a signed equivalent, so it's just an unsigned
match for now.
2022-10-10 08:12:06 -04:00
Sanjay Patel eccb9a77c6 [InstCombine] fold exact sdiv to ashr (2nd try)
The 1st attempt failed to updated the test checks as expected.

Original commit message:

sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)

https://alive2.llvm.org/ce/z/kB6VF7

It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
2022-10-08 10:09:44 -04:00
Sanjay Patel 68d4dbc2c1 Revert "[InstCombine] fold exact sdiv to ashr"
This reverts commit fe15290e0c.
The test checks were not updated as expected.
2022-10-08 10:02:03 -04:00
Sanjay Patel fe15290e0c [InstCombine] fold exact sdiv to ashr
sdiv exact X, (1<<ShAmt) --> ashr exact X, ShAmt (if shl is non-negative)

https://alive2.llvm.org/ce/z/kB6VF7

It would probably be better to use ValueTracking to replace this
and the existing transform above it, but the analysis does not
account for the no-wrap properly, and it's not immediately clear
to me how to fix it.
2022-10-08 09:23:46 -04:00
Sanjay Patel bdfefac9a4 [InstCombine] refactor sdiv by (negative) power-of-2 folds; NFCI
It's probably better to try harder on this kind of
pattern by using ValueTracking.
2022-10-07 11:35:17 -04:00
Sanjay Patel 8da2fa856f [InstCombine] fold sdiv with hidden common factor
(X * Y) s/ (X << Z) --> Y s/ (1 << Z)

https://alive2.llvm.org/ce/z/yRSddG

issue #58137
2022-10-06 13:11:50 -04:00
Sanjay Patel 6b869be810 [InstCombine] fold udiv with hidden common factor
(X * Y) u/ (X << Z) --> Y u>> Z

https://alive2.llvm.org/ce/z/4G9D_W
2022-10-06 11:35:27 -04:00
Sanjay Patel 2e87333bfe [InstCombine] convert mul by negative-pow2 to negate and shift
This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).

InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.

I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR

Differential Revision: https://reviews.llvm.org/D133667
2022-10-02 12:22:25 -04:00
zhongyunde 23a5de4294 [InstCombine] Distributive or+mul with const operand
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278

Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
2022-08-30 20:36:52 +08:00
zhongyunde 84d6966e4d [InstCombine] Propagate the nuw for combine of add+mul
As the commit of D132658, make the 'nuw' change separately.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D132777
2022-08-28 23:01:11 +08:00
Zain Jaffal f61f99a105
[instcombine] Optimise for zero initialisation of product given fast flags are enabled
Currently, clang ignores the 0 initialisation in finite math
For example:

```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
  f_prod *= arr[i];
 }
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.

Reviewed By: fhahn, spatel

Differential Revision: https://reviews.llvm.org/D131672
2022-08-17 11:12:15 +01:00
Sanjay Patel f95a6aea1b [InstCombine] avoid splitting a constant expression with div/rem fold
Follow-up to d4940c0f3d to further limit the transform
to avoid an unintended pattern/fold of a constant expression.
2022-07-30 09:45:25 -04:00
Sanjay Patel d4940c0f3d [InstCombine] fix miscompile from urem/udiv transform with constant expression
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).

See issue #56810 for a C source example that exposed the bug.
2022-07-29 17:14:30 -04:00
Nikita Popov 5eaeeed8cb [InstCombine] Avoid ConstantExpr::getFNeg() calls (NFCI)
Instead call the constant folding API, which can fail. For now,
this should be NFC, as we still allow the creation of fneg
constant expressions.
2022-07-29 16:01:46 +02:00
Nikita Popov fc18a88231 [InstCombine] Avoid creating float binop ConstantExprs
Replace ConstantExpr:getFAdd etc with call to
ConstantFoldBinaryOpOperands(). I'm using the constant folding API
rather than IRBuilder here to ensure that this does actually
constant fold. These transforms don't use m_ImmConstant(), so this
would not otherwise be guaranteed (and apparently, they can't use
m_ImmConstant because they want to handle scalable vector splats).

There is an opportunity here to further migrate these to the
ConstantFoldFPInstOperands() API, which would respect the denormal
mode. I've held off on doing so here, because some of this code
explicitly checks for denormal results, and I don't want to touch
it in a mostly NFC change.
2022-07-08 16:36:04 +02:00
Simon Moll b8c2781ff6 [NFC] format InstructionSimplify & lowerCaseFunctionNames
Clang-format InstructionSimplify and convert all "FunctionName"s to
"functionName".  This patch does touch a lot of files but gets done with
the cleanup of InstructionSimplify in one commit.

This is the alternative to the less invasive clang-format only patch: D126783

Reviewed By: spatel, rengolin

Differential Revision: https://reviews.llvm.org/D126889
2022-06-09 16:10:08 +02:00
Sanjay Patel 3f33d67d8a [InstCombine] fold mul with masked low bit operand to trunc+select
https://alive2.llvm.org/ce/z/o7rQ5q

This shows an extra instruction in some cases, but that is
caused by an existing canonicalization of trunc -> and+icmp.

Codegen should be better for any target where a multiply is
more costly than the most simple ALU op.

This ends up producing the requested x86 asm from issue #55618,
but it's not the same IR. We are missing a canonicalization
from the negate+mask pattern to the trunc+select created here.
2022-06-05 20:07:18 -04:00
Sanjay Patel 8689463bfb [InstCombine] make pattern matching more consistent; NFC
We could go either way on this and several similar matches.
Just matching as a binop is possibly slightly more efficient;
we don't need to re-confirm the opcode of the instruction.
2022-06-02 16:01:23 -04:00
zhongyunde 3e6ba89055 [InstCombine] Fold a mul with bool value into and
Fixes https://github.com/llvm/llvm-project/issues/55599
  X * Y --> X & Y, iff X, Y can be only {0, 1}.
https://alive2.llvm.org/ce/z/_RsTKF

Reviewed By: spatel, nikic

Differential Revision: https://reviews.llvm.org/D126040
2022-05-30 21:05:00 +08:00
Sanjay Patel b5b6aa4d53 [InstCombine] fold multiply by signbit-splat to cmp+select
(ashr i32 X, 31) * C --> (X < 0) ? -C : 0
https://alive2.llvm.org/ce/z/G8u9SS

With a constant operand, this is an improvement in IR
and codegen (where it can be converted to a mask op).

Without a constant operand, we would have to negate
the operand, so that is probably better left to the backend.

This is similar but not the same optimization that is requested
in #55618.
2022-05-27 11:54:19 -04:00
Sanjay Patel 5a6e085757 [InstCombine] reduce code duplication; NFC 2022-05-27 11:54:19 -04:00
Sanjay Patel c4c750058f [InstCombine] fold mul of signbit directly to X < 0 ? Y : 0
This is effectively NFC (intentionally no test diffs)
because we already have the related fold that converts
the 'and' pattern to select. So this is just an efficiency
improvement.
2022-05-26 16:19:15 -04:00
Sanjay Patel e8c20d995b [IR] add and use pattern match specialization for sqrt intrinsic; NFC
This was included in D126190 originally, but it's
independent and a useful change for readability.
2022-05-23 14:16:30 -04:00
Sanjay Patel be7f09f7b2 [IR] create and use helper functions that test the signbit; NFCI 2022-05-16 11:26:23 -04:00
Sanjay Patel 2fa8fc3d0a [InstCombine] freeze operand in div+mul fold
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to recent changes for urem and sdiv:
d428f09b2c
99ef341ce9

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the rem, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-12 13:49:29 -04:00
Sanjay Patel 99ef341ce9 [InstCombine] freeze operand in sdiv expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

This is similar to a recent change for urem:
d428f09b2c

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.

Presumably, in real cases that are similar to the tests
where a subsequent transform removes the select, we
will also be able to remove the freeze by seeing that
the parameter has 'noundef'.
2022-05-11 14:01:28 -04:00
Sanjay Patel d428f09b2c [InstCombine] freeze operand in urem expansion
As discussed in issue #37809, this transform is not safe
if the input is an undefined value.

There is no difference in codegen on the basic examples,
but this could lead to regressions. We may need to
improve freeze analysis or lowering if that happens.
2022-05-11 12:47:26 -04:00
Jonas Paulsson 304378fd09 Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building
libcalls." (was 0f8c626). This reverts commit 14d9390.

The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.

In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.

Reviewed By: Eli Friedman

Differential Revision: https://reviews.llvm.org/D123198
2022-05-02 19:37:00 +02:00
Liqin Weng fa4b4f0fcb [InstCombine] fold more constant remainder to select-of-constants remainder
Reviewed By: xbolva00, spatel, Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D123486
2022-04-12 09:40:56 +08:00
Chenbing Zheng 467cbb6249 [InstCombine] fold more constant divisor to select-of-constants divisor
By adding a parameter to function FoldOpIntoSelect, we can fold more Ops to Select.
For this example, we tend to fold the division instruction,
so we no longer care whether SelectInst is one use.

This patch slove TODO left in InstCombine/div.ll.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D122967
2022-04-08 10:19:24 +08:00
serge-sans-paille 59630917d6 Cleanup includes: Transform/Scalar
Estimated impact on preprocessor output line:
before: 1062981579
after:  1062494547

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120817
2022-03-03 07:56:34 +01:00
Nikita Popov 587c7ff15c [InstCombine] Support min/max intrinsics in udiv->lshr fold
This complements the existing fold for selects. This fold is a bit
more conservative, requiring one-use. The other folds here should
probably also be subjected to a one-use restriction.

https://alive2.llvm.org/ce/z/Q9eCDU
https://alive2.llvm.org/ce/z/8YK2CJ
2022-02-23 15:51:36 +01:00
Nikita Popov 03e6efb8c2 [InstCombine] Further simplify udiv -> lshr folding
Rather than queuing up actions, have one function that does the
log2() fold in the obvious way, but with a flag that allows us
to check whether the fold will succeed without actually performing
it.
2022-02-23 15:29:21 +01:00
Nikita Popov 5ccb0582c2 [InstCombine] Simplify udiv -> lshr folding
What we're really doing here is converting Op0 udiv Op1 into
Op0 lshr log2(Op1), so phrase it in that way. Actually pushing
the lshr into the log2(Op1) expression should be seen as a separate
transform.
2022-02-23 14:55:23 +01:00