This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Similar to what we do in IIQ, add an isUndefValue() helper that
checks for undef values while respective CanUseUndef. This makes
it much easier to search for places that don't respect the flag
yet.
This is the replacement for D84250 based on D84792. As we recursively
fold with the same value twice, we need to disable undef folds,
to prevent an undef from being folded to two different values.
Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the
test case from https://reviews.llvm.org/D83360#2145793, it no longer
performs the incorrect fold.
Differential Revision: https://reviews.llvm.org/D85684
I think this is the last remaining translation of an existing
instcombine transform for the corresponding cmp+sel idiom.
This interpretation is more general though - we can remove
mismatched signed/unsigned combinations in addition to the
more obvious cases.
min/max(X, Y) must produce X or Y as the result, so this is
just another clause in the existing transform that was already
matching a min/max of min/max.
Making use of undef is not safe if the simplification result is not used
to replace all uses of the result. This leads to problems in NewGVN,
which does not replace all uses in the IR directly. See PR33165 for more
details.
This patch adds an option to SimplifyQuery to disable the use of undef.
Note that I've only guarded uses if isa<UndefValue>/m_Undef where
SimplifyQuery is currently available. If we agree on the general
direction, I'll update the remaining uses.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84792
https://rise4fun.com/Alive/pZEr
Name: mul nuw with icmp eq
Pre: (C2 %u C1) != 0
%a = mul nuw i8 %x, C1
%r = icmp eq i8 %a, C2
=>
%r = false
Name: mul nuw with icmp ne
Pre: (C2 %u C1) != 0
%a = mul nuw i8 %x, C1
%r = icmp ne i8 %a, C2
=>
%r = true
There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
This revision adds the following peephole optimization
and it's negation:
%a = urem i64 %x, %y
%b = icmp ule i64 %a, %x
====>
%b = true
With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.
This pattern occurs in the bound checks of Rust code, the program
const N: usize = 3;
const T = u8;
pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
let len = slice.len() / N;
slice.split_at(len * N)
}
the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.
ref: https://github.com/rust-lang/rust/issues/74938
Differential Revision: https://reviews.llvm.org/D85092
This is based on the existing code for the non-intrinsic idioms
in InstCombine.
The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:
https://alive2.llvm.org/ce/z/-2bVbM
define <2 x i8> @src(<2 x i8> %x) {
%0:
%m = umin <2 x i8> %x, { 7, undef }
%m2 = umin <2 x i8> { 9, 9 }, %m
ret <2 x i8> %m2
}
=>
define <2 x i8> @tgt(<2 x i8> %x) {
%0:
%m = umin <2 x i8> %x, { 7, undef }
ret <2 x i8> %m
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
<2 x i8> %x = < undef, undef >
Source:
<2 x i8> %m = < #x00 (0) [based on undef value], #x00 (0) >
<2 x i8> %m2 = < #x00 (0), #x00 (0) >
Target:
<2 x i8> %m = < #x07 (7), #x10 (16) >
Source value: < #x00 (0), #x00 (0) >
Target value: < #x07 (7), #x10 (16) >
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.
Differential Revision: https://reviews.llvm.org/D85053
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.
Differential Revision: https://reviews.llvm.org/D85043
This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.
I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.
This is the main icmp simplification shortcoming seen in D84655.
Alive2 agrees that the basic examples are correct at least:
define <2 x i1> @src(<2 x i8> %x) {
%0:
%r = icmp sle <2 x i8> { undef, 128 }, %x
ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!
define <2 x i1> @src(<2 x i32> %X) {
%0:
%A = or <2 x i32> %X, { 63, 63 }
%B = icmp ult <2 x i32> %A, { undef, 50 }
ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!
https://alive2.llvm.org/ce/z/omt2eehttps://alive2.llvm.org/ce/z/GW4nP_
Differential Revision: https://reviews.llvm.org/D84762
This is a step towards trying to remove unnecessary FP compares
with infinity when compiling with -ffinite-math-only or similar.
I'm intentionally not checking FMF on the fcmp itself because
I'm assuming that will go away eventually.
The analysis part of this was added with rGcd481136 for use with
isKnownNeverNaN. Similarly, that could be an enhancement here to
get predicates like 'one' and 'ueq'.
Differential Revision: https://reviews.llvm.org/D84035
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Follow up from the transform being removed in D83360. If X is probably not poison, then the transform is safe.
Still plan to remove or adjust the code from ConstantFolding after this.
Differential Revision: https://reviews.llvm.org/D83440
We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions.
Differential Revision: https://reviews.llvm.org/D83442
These represent the same thing but 64BIT only showed up from
getHostCPUFeatures providing a list of featuers to clang. While
EM64T showed up from getting the features for a named CPU.
EM64T didn't have a string specifically so it would not be passed
up to clang when getting features for a named CPU. While 64bit
needed a name since that's how it is index.
Merge them by filtering 64bit out before sending features to clang
for named CPUs.
This is picking up a loose thread from D69006: We can simplify
(zext x) ule (sext x) and (zext x) sge (sext x) to true, with
various permutations. Oddly, SCEV knows about this identity,
but nothing on the IR level does.
Differential Revision: https://reviews.llvm.org/D83081
If we assume(x > y), then we should be able to fold the basic
implications of that, like x >= y. This already happens if either
one of the operands is constant (LVI) or if the conditions are
exactly the same (GVN), but not if we have an implication with
non-constant operands. Support this by querying AssumptionCache.
Fixes https://bugs.llvm.org/show_bug.cgi?id=40149.
Differential Revision: https://reviews.llvm.org/D82717
Summary:
simplifyDivRem attempts to walk a VectorType elementwise. Ensure that it
only does so for FixedVectorType
Reviewers: efriedma, spatel, lebedev.ri, david-arm, kmclaughlin
Reviewed By: spatel, david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81856
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.
---
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
The more general fold was not poison-safe, so it was removed:
rG5486e00
...but it is ok to have this transform if analysis can determine
the vector contains no poison. The test shows a simple example
of that: constant integer elements are not poison.
PR45481:
https://bugs.llvm.org/show_bug.cgi?id=45481
SDAG has an identical transform to this, so there's little
chance of any real-world impact. OTOH, that means we are
effectively sweeping the bug out of sight because poison
exists in codegen too.
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sunfish, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77273
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.
Differential Revision: https://reviews.llvm.org/D72930
Summary:
Skip folds that rely on DataLayout::getTypeAllocSize(). For scalable
vector, only minimal type alloc size is known at compile-time.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75892
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
This is part of the IR sibling for:
D75576
(I'm splitting part of the transform as a separate commit
to reduce risk. I don't know of any bugs that might be
exposed by this improved folding, but it's hard to see
those in advance...)
Summary:
For scalable vector, index out-of-bound can not be determined at compile-time.
The same apply for VectorUtil findScalarElement().
Add test cases to check the functionality of SimplifyInsert/ExtractElementInst for scalable vector.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: cameron.mcinally, tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75782
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. The "-inst-simplify" pass
already handled this for the constant integer argument case via
known bits, which is invoked in SimplifyInstruction. However,
non-constant (or non-int) arguments are not handled at all right now.
This addresses one of the regressions from D75801.
Differential Revision: https://reviews.llvm.org/D75815
As pointed out by jdoerfert on D75815, we must be careful when
simplifying musttail calls: We can only replace the return value
if we can eliminate the call entirely. As we can't make this
guarantee for all consumers of InstSimplify, this patch disables
simplification of musttail calls. Without this patch, musttail
simplification currently results in module verification errors.
Differential Revision: https://reviews.llvm.org/D75824
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
This is a resubmission of 952ad47 with crash fix of llvm/test/Transforms/LoopRotate/freeze-crash.ll.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
InstSimplify can fold icmps of gep where the base pointers are the
same and the offsets are constant. It does so by constructing a
constant expression icmp and assumes that it gets folded -- but
this doesn't actually happen, because GEP expressions can usually
only be folded by the target-dependent constant folding layer.
As such, we need to explicitly invoke it here.
Differential Revision: https://reviews.llvm.org/D75407
Spin-off from D75407. As described there, ConstantFoldConstant()
currently returns null for non-ConstantExpr/ConstantVector inputs,
but otherwise always returns non-null, independently of whether
any folding has happened or not.
This is confusing and makes consumer code more complicated.
I would expect either that ConstantFoldConstant() returns only if
it actually folded something, or that it always returns non-null.
I'm going to the latter possibility here, which appears to be more
useful considering existing usage.
Differential Revision: https://reviews.llvm.org/D75543
Summary:
```
br i1 c, BB1, BB2:
BB1:
use1(c)
BB2:
use2(c)
```
In BB1 and BB2, c is never undef or poison because otherwise the branch would have triggered UB.
Checked with Alive2
Reviewers: xbolva00, spatel, lebedev.ri, reames, jdoerfert, nlopes, sanjoy
Reviewed By: reames
Subscribers: jdoerfert, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75401
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable
Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73555
This addresses https://bugs.llvm.org/show_bug.cgi?id=42801.
The m_c_ICmp() matcher is changed to provide the swapped predicate
if the operands are swapped.
Existing uses of m_c_ICmp() fall in one of two categories: Working
on equality predicates only, where swapping is irrelevant.
Or performing a manual swap, in which case this patch removes it.
The only exception is the foldICmpWithLowBitMaskedVal() fold, which
does not swap the predicate, and instead reasons about whether
a swap occurred or not for each predicate. Getting the swapped
predicate allows us to merge the logic for pairs of predicates,
instead of duplicating it.
Differential Revision: https://reviews.llvm.org/D72976
As mentioned in D72643, we'd like to be able to assert that any select
of equivalent constants has been removed before we're deep into InstCombine.
But there's a loophole in that assertion for vectors with undef elements
that don't match exactly.
This patch should close that gap. If we have undefs, we can't safely
propagate those unless both constants elements for that lane are undef.
Differential Revision: https://reviews.llvm.org/D72958
This is step 1 of damage control assuming that we need to remove several
over-reaching folds for select-of-booleans because they can cause
miscompiles as shown in D72396.
The scalar case seems obviously safe:
https://rise4fun.com/Alive/jSj
And I don't think there's any danger for vectors either - if the
condition is poisoned, then the select must be poisoned too, so undef
elements don't make any difference.
Differential Revision: https://reviews.llvm.org/D72412
shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
This is another missing shuffle fold pattern uncovered by the
shuffle correctness fix from D70246.
The problem was visible in the post-commit thread example, but
we managed to overcome the limitation for that particular case
with D71220.
This is something like the inverse of the previous fix - there
we didn't demand the inserted scalar, and here we are only
demanding an inserted scalar.
Differential Revision: https://reviews.llvm.org/D71488
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
This fixes the buildbot failures.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Removed code duplication in ThreadCmpOverSelect and broke it
into several smaller functions for reusing them.
Differential Revision: https://reviews.llvm.org/D71158
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.
Differential Revision: https://reviews.llvm.org/D68328
Patch by Joseph Faulls!
Summary:
Same as D60846 and D69571 but with a fix for the problem encountered
after them. Both times it was a missing context adjustment in the
handling of PHI nodes.
The reproducers created from the bugs that caused the old commits to be
reverted are included.
Reviewers: nikic, nlopes, mkazantsev, spatel, dlrobertson, uabelho, hakzsam, hans
Subscribers: hiraditya, bollu, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71181
This is another transform suggested in PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
The backend for some targets already manages to get
this if it converts copysign to bitwise logic.
This is correct for any value including NaN/inf.
We don't have this fold directly in the backend either,
but x86 manages to get it after converting things to bitops.
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:
$ cat /tmp/a.c
unsigned char f(unsigned char *p) {
unsigned char result = 0;
for (int shift = 0; shift < 1; ++shift)
result |= p[0] << (shift * 8);
return result;
}
$ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
f: # @f
.cfi_startproc
# %bb.0: # %entry
xorl %eax, %eax
retq
That's nicely optimized, but I don't think it's the right result :-)
> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571
This reverts commit 57dd4b03e4.
Same as D60846 but with a fix for the problem encountered there which
was a missing context adjustment in the handling of PHI nodes.
The test that caused D60846 to be reverted was added in e15ab8f277.
Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69571
In similar fashion to D67721, we can simplify FMA multiplications if any
of the operands is NaN or undef. In instcombine, we will simplify the
FMA to an fadd with a NaN operand, which in turn gets folded to NaN.
Note that this just changes SimplifyFMAFMul, so we still not catch the
case where only the Add part of the FMA is Nan/Undef.
Reviewers: cameron.mcinally, mcberg2017, spatel, arsenm
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D68265
llvm-svn: 373459
This is intended to be similar to the constant folding results from
D67446
and earlier, but not all operands are constant in these tests, so the
responsibility for folding is left to InstSimplify.
Differential Revision: https://reviews.llvm.org/D67721
llvm-svn: 373455
Because we do not constant fold multiplications in SimplifyFMAMul,
we match 1.0 and 0.0 for both operands, as multiplying by them
is guaranteed to produce an exact result (if it is allowed to do so).
Note that it is not enough to just swap the operands to ensure a
constant is on the RHS, as we want to also cover the case with
2 constants.
Reviewers: lebedev.ri, spatel, reames, scanon
Reviewed By: lebedev.ri, reames
Differential Revision: https://reviews.llvm.org/D67553
llvm-svn: 372915
As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.
This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.
Reviewers: spatel, lebedev.ri, reames, scanon
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D67434
llvm-svn: 372899
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67411
llvm-svn: 371718
This was actually the original intention in D67332,
but i messed up and forgot about it.
This patch was originally part of D67411, but precommitting this.
llvm-svn: 371630
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.
In this particular case, given
```
char* test(char& base, unsigned long offset) {
return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
%base_int = ptrtoint i8* %base to i64
%adjusted = add i64 %base_int, %offset
%non_null_after_adjustment = icmp ne i64 %adjusted, 0
%no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
%res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:
Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.
Alive proofs:
https://rise4fun.com/Alive/WRzq
There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.
https://bugs.llvm.org/show_bug.cgi?id=43246
Reviewers: spatel, nikic, vsk
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits, reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67332
llvm-svn: 371349
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow or zero
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %retval.0
=>
%iszero = icmp eq i4 %y, 0
%umul = smul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%umul.ov.not = xor %umul.ov, -1
%retval.0 = or i1 %iszero, %umul.ov.not
ret i1 %umul.ov.not
Done: 1
Optimization is correct!
```
Note that this is inverted from what we have in a previous patch,
here we are looking for the inverted overflow bit.
And that inversion is kinda problematic - given this particular
pattern we neither hoist that `not` closer to `ret` (then the pattern
would have been identical to the one without inversion,
and would have been handled by the previous patch), neither
do the opposite transform. But regardless, we should handle this too.
I've filled [[ https://bugs.llvm.org/show_bug.cgi?id=42720 | PR42720 ]].
Reviewers: nikic, spatel, xbolva00, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65151
llvm-svn: 370351
Summary:
Now that with D65143/D65144 we've produce `@llvm.umul.with.overflow`,
and with D65147 we've flattened the CFG, we now can see that
the guard may have been there to prevent division by zero is redundant.
We can simply drop it:
```
----------------------------------------
Name: no overflow and not zero
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret i1 %retval.0
=>
%iszero = icmp ne i4 %y, 0
%umul = umul_overflow i4 %x, %y
%umul.ov = extractvalue {i4, i1} %umul, 1
%retval.0 = and i1 %iszero, %umul.ov
ret %umul.ov
Done: 1
Optimization is correct!
```
Reviewers: nikic, spatel, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65150
llvm-svn: 370350
As discussed in PR42696:
https://bugs.llvm.org/show_bug.cgi?id=42696
...but won't help that case yet.
We have an odd situation where a select operand equivalence fold was
implemented in InstSimplify when it could have been done more generally
in InstCombine if we allow dropping of {nsw,nuw,exact} from a binop operand.
Here's an example:
https://rise4fun.com/Alive/Xplr
%cmp = icmp eq i32 %x, 2147483647
%add = add nsw i32 %x, 1
%sel = select i1 %cmp, i32 -2147483648, i32 %add
=>
%sel = add i32 %x, 1
I've left the InstSimplify code in place for now, but my guess is that we'd
prefer to remove that as a follow-up to save on code duplication and
compile-time.
Differential Revision: https://reviews.llvm.org/D65576
llvm-svn: 367695
Summary:
SimplifyFPBinOp is a variant of SimplifyBinOp that lets you specify
fast math flags, but the name is misleading because both functions
can simplify both FP and non-FP ops. Instead, overload SimplifyBinOp
so that you can optionally specify fast math flags.
Likewise for SimplifyFPUnOp.
Reviewers: spatel
Reviewed By: spatel
Subscribers: xbolva00, cameron.mcinally, eraman, hiraditya, haicheng, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64902
llvm-svn: 366902
Summary:
- As the pointer stripping could trace through `addrspacecast` now, need
to sext/trunc the offset to ensure it has the same width as the
pointer after stripping.
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64768
llvm-svn: 366162
The interface predates CallBase, so both it and implementation were
significantly more complicated than they needed to be. There was even
some redundancy that could be eliminated.
Should also help with OpaquePointers by not trying to derive a
function's type from it's PointerType.
llvm-svn: 365767
This patch replaces the three almost identical "strip & accumulate"
implementations for constant pointer offsets with a single one,
combining the respective functionalities. The old interfaces are kept
for now.
Differential Revision: https://reviews.llvm.org/D64468
llvm-svn: 365723
As discussed in PR42314:
https://bugs.llvm.org/show_bug.cgi?id=42314
Improving the canonicalization for these patterns:
rL363956
...means we should adjust/enhance the related simplification.
https://rise4fun.com/Alive/w1cp
Name: isPow2 or zero
%x = and i32 %xx, 2048
%a = add i32 %x, -1
%r = and i32 %a, %x
=>
%r = i32 0
llvm-svn: 363997
Fix folds of addo and subo with an undef operand to be:
`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
as per LLVM undef rules.
Same for commuted variants.
Based on the original version of the patch by @nikic.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]
Differential Revision: https://reviews.llvm.org/D63065
llvm-svn: 363522
This is another step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
This is a continuation of D62979 / rL362879.
llvm-svn: 362903
This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
I'll update the 'ult' case below here as a follow-up assuming no problems here.
Differential Revision: https://reviews.llvm.org/D62979
llvm-svn: 362879
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.
llvm-svn: 361559
This is the sibling transform for rL360899 (D61691):
maxnum(X, GreaterC) == C --> false
maxnum(X, GreaterC) <= C --> false
maxnum(X, GreaterC) < C --> false
maxnum(X, GreaterC) >= C --> true
maxnum(X, GreaterC) > C --> true
maxnum(X, GreaterC) != C --> true
llvm-svn: 361118
minnum(X, LesserC) == C --> false
minnum(X, LesserC) >= C --> false
minnum(X, LesserC) > C --> false
minnum(X, LesserC) != C --> true
minnum(X, LesserC) <= C --> true
minnum(X, LesserC) < C --> true
maxnum siblings will follow if there are no problems here.
We should be able to perform some other combines when the constants
are equal or greater-than too, but that would go in instcombine.
We might also generalize this by creating an FP ConstantRange
(similar to what we do for integers).
Differential Revision: https://reviews.llvm.org/D61691
llvm-svn: 360899
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.
llvm-svn: 358913
As discussed on PR41125 and D59363, we have a mismatch between icmp eq/ne cases with an undef operand:
When the other operand is constant we fold to undef (handled in ConstantFoldCompareInstruction)
When the other operand is non-constant we fold to a bool constant based on isTrueWhenEqual (handled in SimplifyICmpInst).
Neither is really wrong, but this patch changes the logic in SimplifyICmpInst to consistently fold to undef.
The NewGVN test change is annoying (as with most heavily reduced tests) but AFAICT I have kept the purpose of the test based on rL291968.
Differential Revision: https://reviews.llvm.org/D59541
llvm-svn: 356456
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.
Differential Revision: https://reviews.llvm.org/D59511
llvm-svn: 356409
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.
We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.
Differential Revision: https://reviews.llvm.org/D59374
llvm-svn: 356192
InstructionSimplify currently has some code to determine the constant
range of integer instructions for some simple cases. It is used to
simplify icmps.
This change moves the relevant code into ValueTracking as
llvm::computeConstantRange(), so it can also be reused for other
purposes.
In particular this is with the optimization of overflow checks in
mind (ref D59071), where constant ranges cover some cases that
known bits don't.
llvm-svn: 355781
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-February/130491.html
We can't remove the compare+select in the general case because
we are treating funnel shift like a standard instruction (as
opposed to a special instruction like select/phi).
That means that if one of the operands of the funnel shift is
poison, the result is poison regardless of whether we know that
the operand is actually unused based on the instruction's
particular semantics.
The motivating case for this transform is the more specific
rotate op (rather than funnel shift), and we are preserving the
fold for that case because there is no chance of introducing
extra poison when there is no anonymous extra operand to the
funnel shift.
llvm-svn: 354905
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201
The previous attempt at this in rL354406 had a logic bug that
actually triggered a regression test failure, but I failed to
notice it the first time.
llvm-svn: 354467
The m_APFloat matcher does not work with anything but strict
splat vector constants, so we could miss these folds and then
trigger an assertion in instcombine:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13201
llvm-svn: 354406
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
If a saturating add/sub has one constant operand, then we can
determine the possible range of outputs it can produce, and simplify
an icmp comparison based on that.
The implementation is based on a similar existing mechanism for
simplifying binary operator + icmps.
Differential Revision: https://reviews.llvm.org/D55735
llvm-svn: 349369
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.
llvm-svn: 348088
This is an almost direct move of the functionality from InstCombine to
InstSimplify. There's no reason not to do this in InstSimplify because
we never create a new value with this transform.
(There's a question of whether any dominance-based transform belongs in
either of these passes, but that's a separate issue.)
I've changed 1 of the conditions for the fold (1 of the blocks for the
branch must be the block we started with) into an assert because I'm not
sure how that could ever be false.
We need 1 extra check to make sure that the instruction itself is in a
basic block because passes other than InstCombine may be using InstSimplify
as an analysis on values that are not wired up yet.
The 3-way compare changes show that InstCombine has some kind of
phase-ordering hole. Otherwise, we would have already gotten the intended
final result that we now show here.
llvm-svn: 347896
This is a problem seen in common rotate idioms as noted in:
https://bugs.llvm.org/show_bug.cgi?id=34924
Note that we are not canonicalizing standard IR (shifts and logic) to the intrinsics yet.
(Although I've written this before...) I think this is the last step before we enable
that transform. Ie, we could regress code by doing that transform without this
simplification in place.
In PR34924, I questioned whether this is a valid transform for target-independent IR,
but I convinced myself this is ok. If we're speculating a funnel shift by turning cmp+br
into select, then SimplifyCFG has already determined that the transform is justified.
It's possible that SimplifyCFG is not taking into account profile or other metadata,
but if that's true, then it's a bug independent of funnel shifts.
Also, we do have CGP code to restore a guard like this around an intrinsic if it can't
be lowered cheaply. But that isn't necessary for funnel shift because the default
expansion in SelectionDAGBuilder includes this same cmp+select.
Differential Revision: https://reviews.llvm.org/D54552
llvm-svn: 346960
This is NFCI for InstCombine because it calls InstSimplify,
so I left the tests for this transform there. As noted in
the code comment, we can allow this fold more often by using
FMF and/or value tracking.
llvm-svn: 346169
This is a bit awkward in a handful of places where we didn't even have
an instruction and now we have to see if we can build one. But on the
whole, this seems like a win and at worst a reasonable cost for removing
`TerminatorInst`.
All of this is part of the removal of `TerminatorInst` from the
`Instruction` type hierarchy.
llvm-svn: 340701
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.
llvm-svn: 340279
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
This is the second patch of the series which intends to enable jump threading for an inlined method whose return type is std::pair<int, bool> or std::pair<bool, int>.
The first patch is https://reviews.llvm.org/rL338485.
This patch handles code sequences that merges two values using `shl` and `or`, then extracts one value using `and`.
Differential Revision: https://reviews.llvm.org/D49981
llvm-svn: 338817
This patch intends to enable jump threading when a method whose return type is std::pair<int, bool> or std::pair<bool, int> is inlined.
For example, jump threading does not happen for the if statement in func.
std::pair<int, bool> callee(int v) {
int a = dummy(v);
if (a) return std::make_pair(dummy(v), true);
else return std::make_pair(v, v < 0);
}
int func(int v) {
std::pair<int, bool> rc = callee(v);
if (rc.second) {
// do something
}
SROA executed before the method inlining replaces std::pair by i64 without splitting in both callee and func since at this point no access to the individual fields is seen to SROA.
After inlining, jump threading fails to identify that the incoming value is a constant due to additional instructions (like or, and, trunc).
This series of patch add patterns in InstructionSimplify to fold extraction of members of std::pair. To help jump threading, actually we need to optimize the code sequence spanning multiple BBs.
These patches does not handle phi by itself, but these additional patterns help NewGVN pass, which calls instsimplify to check opportunities for simplifying instructions over phi, apply phi-of-ops optimization to result in successful jump threading.
SimplifyDemandedBits in InstCombine, can do more general optimization but this patch aims to provide opportunities for other optimizers by supporting a simple but common case in InstSimplify.
This first patch in the series handles code sequences that merges two values using shl and or and then extracts one value using lshr.
Differential Revision: https://reviews.llvm.org/D48828
llvm-svn: 338485
This fold is repeated/misplaced in instcombine, but I'm
not sure if it's safe to remove that yet because some
other folds appear to be asserting that the transform
has occurred within instcombine itself.
This isn't the best fix for PR37776, but it probably
hides the bug with the given code example:
https://bugs.llvm.org/show_bug.cgi?id=37776
We have another test to demonstrate the more general bug.
llvm-svn: 337127
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
For both operands are unsigned, the following optimizations are valid, and missing:
1. X > Y && X != 0 --> X > Y
2. X > Y || X != 0 --> X != 0
3. X <= Y || X != 0 --> true
4. X <= Y || X == 0 --> X <= Y
5. X > Y && X == 0 --> false
unsigned foo(unsigned x, unsigned y) { return x > y && x != 0; }
should fold to x > y, but I found we haven't done it right now.
besides, unsigned foo(unsigned x, unsigned y) { return x < y && y != 0; }
Has been folded to x < y, so there may be a bug.
Patch by: Li Jia He!
Differential Revision: https://reviews.llvm.org/D47922
llvm-svn: 335129
Summary:
`%ret = add nuw i8 %x, C`
From [[ https://llvm.org/docs/LangRef.html#add-instruction | langref ]]:
nuw and nsw stand for “No Unsigned Wrap” and “No Signed Wrap”,
respectively. If the nuw and/or nsw keywords are present,
the result value of the add is a poison value if unsigned
and/or signed overflow, respectively, occurs.
So if `C` is `-1`, `%x` can only be `0`, and the result is always `-1`.
I'm not sure we want to use `KnownBits`/`LVI` here, because there is
exactly one possible value (all bits set, `-1`), so some other pass
should take care of replacing the known-all-ones with constant `-1`.
The `test/Transforms/InstCombine/set-lowbits-mask-canonicalize.ll` change *is* confusing.
What happening is, before this: (omitting `nuw` for simplicity)
1. First, InstCombine D47428/rL334127 folds `shl i32 1, %NBits`) to `shl nuw i32 -1, %NBits`
2. Then, InstSimplify D47883/rL334222 folds `shl nuw i32 -1, %NBits` to `-1`,
3. `-1` is inverted to `0`.
But now:
1. *This* InstSimplify fold `%ret = add nuw i32 %setbit, -1` -> `-1` happens first,
before InstCombine D47428/rL334127 fold could happen.
Thus we now end up with the opposite constant,
and it is all good: https://rise4fun.com/Alive/OA9https://rise4fun.com/Alive/sldC
Was mentioned in D47428 review.
Follow-up for D47883.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47908
llvm-svn: 334298
Summary:
`%r = shl nuw i8 C, %x`
As per langref:
```
If the nuw keyword is present, then the shift produces
a poison value if it shifts out any non-zero bits.
```
Thus, if the sign bit is set on `C`, then `%x` can only be `0`,
which means that `%r` can only be `C`.
Or in other words, set sign bit means that the signed value
is negative, so the constant is `<= 0`.
https://rise4fun.com/Alive/WMkhttps://rise4fun.com/Alive/udv
Was mentioned in D47428 review.
We already handle the `0` constant, https://godbolt.org/g/UZq1sJ, so this only handles negative constants.
Could use computeKnownBits() / LazyValueInfo,
but the cost-benefit analysis (https://reviews.llvm.org/D47891)
suggests it isn't worth it.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47883
llvm-svn: 334222
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
I was reminded today that this patch got reverted in r301885. I can no
longer reproduce the failure that caused the revert locally (...almost
one year later), and the patch applied pretty cleanly, so I guess we'll
see if the bots still get angry about it.
The original breakage was InstSimplify complaining (in "assertion
failed" form) about getting passed some crazy IR when running `ninja
check-sanitizer`. I'm unable to find traces of what, exactly, said crazy
IR was. I suppose we'll find out pretty soon if that's still the case.
:)
Original commit:
Author: gbiv
Date: Mon May 1 18:12:08 2017
New Revision: 301880
URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev
Log:
[InstSimplify] Handle selects of GEPs with 0 offset
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 330667
This is the last step in getting constant pattern matchers to allow
undef elements in constant vectors.
I'm adding a dedicated m_ZeroInt() function and building m_Zero() from
that. In most cases, calling code can be updated to use m_ZeroInt()
directly when there's no need to match pointers, but I'm leaving that
efficiency optimization as a follow-up step because it's not always
clear when that's ok.
There are just enough icmp folds in InstSimplify that can be used for
integer or pointer types, that we probably still want a generic m_Zero()
for those cases. Otherwise, we could eliminate it (and possibly add a
m_NullPtr() as an alias for isa<ConstantPointerNull>()).
We're conservatively returning a full zero vector (zeroinitializer) in
InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify),
but I'm not sure if that's actually necessary in all cases. We may be
able to propagate an undef lane instead. One test where this happens is
marked with 'TODO'.
llvm-svn: 330550
As shown in the code comment, we don't need all of 'fast',
but we do need reassoc + nsz + nnan.
Differential Revision: https://reviews.llvm.org/D43765
llvm-svn: 327796
This matcher implementation appears to be slightly more efficient than
the generic constant check that it is replacing because every use was
for matching FP patterns, but the previous code would check int and
pointer type nulls too.
llvm-svn: 327627
From the LangRef definition for frem:
"The value produced is the floating-point remainder of the two operands.
This is the same output as a libm ‘fmod‘ function, but without any
possibility of setting errno. The remainder has the same sign as the
dividend. This instruction is assumed to execute in the default
floating-point environment."
llvm-svn: 327626
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.
The fold was also dependent on 'ninf' but that's not necessary.
From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two
operands with like signs) is exactly zero, the sign of that sum (or difference)
shall be +0...However, x + x = x − (−x) retains the same sign as x even when
x is zero."
llvm-svn: 327575