This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
It was pointed out the verifier rejects inttoptr and ptrtoint casts with inputs and outputs whose scalability doesn't match. As such, checking the input type separately from the type of the cast itself is redundant.
This is a follow up to D136470 which extends the same scheme used there to ComputeNumSignBits and isKnownNonNull. As a reminder, for scalable vectors we track a single bit which is implicitly broadcast to all lanes. We do not know how many lanes there are statically, and thus have to be conservative along paths which require exact sizes.
Differential Revision: https://reviews.llvm.org/D137046
programUndefinedIfUndefOrPoison used to eagerly propagate the fact that
a value is poison to the users of the value. The problem is that if the
value has a lot of uses (orders of magnitude more than the scanning
limit we use in this function), then we spend the bulk of our time in
eagerly propagating the poison property, which we will mostly never use
later anyway due to the scanning limit.
I have a test case (of ~50k lines of machine generated C++), where this
results in ~60% of 35s compilation time being spent doing just this
eager propagation.
This patch changes programUndefinedIfUndefOrPoison to only propagate to
instructions actually visited, looking back to see if their operands are
poison. This should be equivalent and no functional change is intended,
but we regain virtually all of the 60% compilation time spent in this
function in my test case (i.e.: a 2.5x total compilation speedup).
Differential Revision: https://reviews.llvm.org/D137027
We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly.
Differential Revision: https://reviews.llvm.org/D136832
This extends the computeKnownBits analysis to support scalable vectors. The critical detail is in deciding how to represent the demanded elements of a vector whose length is unknown at compile time.
For this patch, I adopt the convention that we track one bit which corresponds to all lanes. That is, that bit is implicitly broadcast to all lanes of the scalable vector resulting in all lanes being demanded. This is the same convention we use in getSplatValue in SelectionDAG.
Note that this convention doesn't actually impact much. Most of the code is agnostic to the interpretation of the demanded elements, and the few cases which actually care need case by case handling anyways. In this patch, I just bail out of those cases.
A prior patch (D128159) proposed using a different convention in SDAG. I don't see any strong reason to prefer one scheme over the other, so I propose we go with this one as it's conceptually the simplest. Getting known and demanded bit optimizations unblocked at all is a significant win.
I've locally implemented this scheme in reasonable large parts of ValueTracking.cpp and SelectionDAG equivalents, and have not hit any blockers. If this is approved, I plan to post a series of patches plumbing this through all the relevant parts.
In the discussion on that patch, a preference was expressed for introducing some form of abstraction around the demanded elements. I'll note that I've played with several variations on that idea locally, and have yet to find anything which results in more readable code. If anyone has concrete ideas in this area, I'm happy to explore in follow up patches. I'd strongly prefer to be making API changes in NFC manner with tests in place.
Differential Revision: https://reviews.llvm.org/D136470
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).
Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.
The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.
This patch is supposed to resolve the bug by converting between
bytes and elements when needed.
Differential Revision: https://reviews.llvm.org/D135263
This is a long-standing FIXME with a non-FMF test that exposes
the bug as shown in issue #57357.
It's possible that there's still a way to miscompile by
mis-identifying/mis-folding FP min/max patterns, but
this patch only exposes a couple of seemingly minor
regressions while preventing the broken transform.
Currently, clang ignores the 0 initialisation in finite math
For example:
```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
f_prod *= arr[i];
}
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.
Reviewed By: fhahn, spatel
Differential Revision: https://reviews.llvm.org/D131672
If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi.
Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary.
I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow.
Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though)
Differential Revision: https://reviews.llvm.org/D131838
As constant expressions can no longer trap, it only makes sense to
call isSafeToSpeculativelyExecute on Instructions, so limit the
API to accept only them, rather than general Operators or Values.
As integer div/rem constant expressions are no longer supported,
constants can no longer trap and are always safe to speculate.
Remove the Constant::canTrap() method and its usages.
Enhance getConstantDataArrayInfo to let the memchr and memcmp library
call folders look through arbitrarily long sequences of bitcast and
GEP instructions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128364
A llvm.vscale will always be at least 1, never zero. Teaching that to
isKnownNonZero can help fold away some statically known compares.
Differential Revision: https://reviews.llvm.org/D128217
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
This extends a similar pattern from D125500 and D127754.
If we know that operand 1 (RHS) of a subtract is itself a
non-overflowing subtract from operand 0 (LHS), then the
final/outer subtract is also non-overflowing:
https://alive2.llvm.org/ce/z/Bqan8v
InstCombine uses this analysis to trigger a narrowing
optimization, so that is what the first changed test shows.
The last test models a motivating case from issue #48013.
In that example, we determine 'nuw' on the first sub from
the urem, then we determine that the 2nd sub can be narrowed,
and that leads to eliminating both subtracts.
here are still several missing subtract narrowing optimizations
demonstrated in the tests above the diffs shown here - those
should be handled in InstCombine with another set of patches.
This extends a similar pattern from D125500.
If we know that operand 1 (RHS) of a subtract is itself a
non-overflowing subtract from operand 0 (LHS), then the
final/outer subtract is also non-overflowing:
https://alive2.llvm.org/ce/z/Bqan8v
InstCombine uses this analysis to trigger a narrowing
optimization, so that is what the first changed test shows.
The last test models the motivating case from issue #48013.
In that example, we determine 'nsw' on the first sub from
the srem, then we determine that the 2nd sub can be narrowed,
and that leads to eliminating both subtracts.
This works for unsigned sub too, but I left that out to keep
the patch minimal. If this looks ok, I will follow up with
that change. There are also several missing subtract narrowing
optimizations demonstrated in the tests above the diffs shown
here - those should be handled in InstCombine with another set
of patches.
Differential Revision: https://reviews.llvm.org/D127754
Now that SimpleLoopUnswitch and other transforms no longer introduce
branch on poison, enable the -branch-on-poison-as-ub option by
default. The practical impact of this is mostly better flag
preservation in SCEV, and some freeze instructions no longer being
necessary.
Differential Revision: https://reviews.llvm.org/D125299