ritht is initialized with either exact (if available) or
with constant max exit count. In the future, this can be improved.
Hypothetically this is not an NFC (it is possible that exact is not
known and max is known for a particular exit), but for how we use
it now it seems be an NFC (or at least I could not find an example
where it differs). constant max exit count. In the future, this can
be improved.
Differential Revision: https://reviews.llvm.org/D138699
Reviewed By: lebedev.ri
Currently this is NFC, because SymbolicMaximum for BB is not implemented and just
reuses exact result. However, from code purity perspective, it's a necessary step
to do. Plans to implement symbolic max for blocks are underway.
At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.
This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.
getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.
This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.
https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructionsFixes#49579.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D130728
This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.
AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.
The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.
This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.
Fixes#57818.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135667
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.
Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.
Fixes#58439.
This extends the existing SCEV verification to catch cache invalidation
issues as in #57837.
The validation logic is similar to the recently added loop disposition
cache validation in bb68b2402d.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134531
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.
We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134614
breakLoopBackedge may remove blocks and loops. Also clear block &
loop disposition to avoid the cache containing invalid blocks and loops.
The coverage for the change is provided when using an ASAN build of opt
to run the LoopDeletion unit tests; without the fix, pointers to invalid
objects would be used.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134663
After unrolling a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.
This fixes a verification failure surfaced by D134531.
I am planning on reviewing/updating the existing uses of
forgetLoopDispositions to check if they should be replaced by
forgetBlockAndLoopDispositions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134612
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
Initial implementation had too weak requirements to positive/negative
range crossings. Not crossing zero with nuw is not enough for two reasons:
- If ArLHS has negative step, it may turn from positive to negative
without crossing 0 boundary from left to right (and crossing right to
left doesn't count for unsigned);
- If ArLHS crosses SINT_MAX boundary, it still turns from positive to
negative;
In fact we require that ArLHS always stays non-negative or negative,
which an be enforced by the following set of preconditions:
- both nuw and nsw;
- positive step (looks liftable);
Because of positive step, boundary crossing is only possible from left
part to the right part. And because of no-wrap flags, it is guaranteed
to never happen.
This reverts commit 354fa0b480.
Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.
Returning the patch to be able to reproduce the issue.
This reverts commit 34ae308c73.
Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.
Contextual knowledge may be used to prove invariance of some conditions.
For example, in this case:
```
; %len >= 0
guard(%iv = {start,+,1}<nuw> <s %len)
guard(%iv = {start,+,1}<nuw> <u %len)
```
the 2nd check always fails if `start` is negative and always passes otherwise.
It looks like there are more opportunities of this kind that are still to be
implemented in the future.
Differential Revision: https://reviews.llvm.org/D129753
Reviewed By: apilipenko
Unfortunately, this overflow is extremely hard to reproduce reliably (in fact, I was unable to do so). The issue is that:
- getOperandsToCreate sometimes skips creating an SCEV for the LHS
- then, createSCEV is called for the BinaryOp
- ... which calls getNoWrapFlagsFromUB
- ... which under certain circumstances calls isSCEVExprNeverPoison
- ... which under certain circumstances requires the SCEVs of all operands
For certain deep dependency trees, this causes a stack overflow.
Reviewed By: bkramer, fhahn
Differential Revision: https://reviews.llvm.org/D129745
Sometimes SCEV cannot infer nuw/nsw from something as simple as
```
len in [0, MAX_INT]
...
iv = phi(0, iv.next)
guard(iv <s len)
guard(iv <u len)
iv.next = iv + 1
```
just because flag strenthening only relies on definition and does not use local facts.
This patch adds support for the simplest case: inference of flags of `add(x, constant)`
if we can contextually prove that `x <= max_int - constant`.
In case if it has negative CT impact, we can add an option to switch it off. I woudln't
expect that though.
Differential Revision: https://reviews.llvm.org/D129643
Reviewed By: apilipenko
At the moment, proveNoSignedWrapViaInduction may be called for the
same AddRec a large number of times via getSignExtendExpr. This can have
a severe compile-time impact for very loop-heavy code.
If proveNoSignedWrapViaInduction failed to prove NSW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.
This is the signed version of 8daa338297 / D130648.
This can drastically improve compile-time in some excessive cases and
also has a slightly positive compile-time impact on CTMark:
NewPM-O3: -0.06%
NewPM-ReleaseThinLTO: -0.04%
NewPM-ReleaseLTO-g: -0.04%
https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D130694
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.
If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.
Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:
NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06
https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D130648
Handle guards uniformly with assumes, rather than iterating through all
block instructions in attempt to find them.
Differential Revision: https://reviews.llvm.org/D129874
Reviewed By: nikic
In fact, in unreached code we can say that every fact is true. So do not waste time trying to
do something smarter.
Formally it's not an NFC because it may change query results in unreached code, but they
won't have any impact on execution.
Hypothetical CT boost expected but not measured in practice.
Differential Revision: https://reviews.llvm.org/D129878
Explicitly list all binops rather than having a default case. There
were two bugs here:
1. U->getOpcode() was used instead of BO->Opcode, which means we
used the logic for the wrong opcode in some cases.
2. SCEV construction does not support LShr. We should return
unknown for it rather than recursing into the operands.
After 675080a453, we always create SCEVs for all operands of a
SelectInst. This can cause notable compile-time regressions compared to
the recursive algorithm, which only evaluates the operands if the select
is in a form we can create a usable expression.
This approach adds additional logic to getOperandsToCreate to only
queue operands for selects if we will later be able to construct a
usable SCEV.
Unfortunately this introduces a bit of coupling between actual SCEV
construction for selects and getOperandsToCreate, but I am not sure if
there are better alternatives to address the regression mentioned for
675080a453.
This doesn't have any notable compile-time impact on CTMark.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D129731
Motivation here is to unblock LSRs ability to use ICmpZero uses - the major effect of which is to enable count down IVs. The test changes reflect this goal, but the potential impact is much broader since this isn't a change in LSR at all.
SCEVExpander needs(*) to prove that expanding the expression is safe anywhere the SCEV expression is valid. In general, we can't expand any node which might fault (or exhibit UB) unless we can either a) prove it won't fault, or b) guard the faulting case. We'd been allowing non-zero constants here; this change extends it to non-zero values.
vscale is never zero. This is already implemented in ValueTracking, and this change just adds the same logic in SCEV's range computation (which in turn drives isKnownNonZero). We should common up some logic here, but let's do that in separate changes.
(*) As an aside, "needs" is such an interesting word here. First, we don't actually need to guard this at all; we could choose to emit a select for the RHS of ever udiv and remove this code entirely. Secondly, the property being checked here is way too strong. What the client actually needs is to expand the SCEV at some particular point in some particular loop. In the examples, the original urem dominates that loop and yet we completely ignore that information when analyzing legality. I don't plan to actively pursue either direction, just noting it for future reference.
Differential Revision: https://reviews.llvm.org/D129710