Commit Graph

11972 Commits

Author SHA1 Message Date
Nikita Popov 9c0314f54e [ValueTracking] Switch isKnownNonZero() to switch over opcodes (NFCI)
The change in the assume-queries-counter.ll test is because we skip
and unnecessary known bits query for arguments.
2022-10-04 10:54:28 +02:00
Florian Hahn db720dc17c
[LAA] Use LoopAccessInfoManager in legacy pass.
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.

Depends on D134606.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134608
2022-10-04 08:37:11 +01:00
Guozhi Wei ded26bf6b9 [IVDescriptors] Before moving an instruction in SinkAfter checking if it is target of other instructions
The attached test case can cause LLVM crash in buildVPlanWithVPRecipes because
invalid VPlan is generated.

  FIRST-ORDER-RECURRENCE-PHI ir<%792> = phi ir<%501>, ir<%806>
  CLONE ir<%804> = fdiv ir<1.000000e+00>, vp<%17>      // use of %17
  CLONE ir<%806> = load ir<%805>
  EMIT vp<%17> = first-order splice ir<%792> ir<%806>   // def of %17
  ...

There is a use before def error on %17.

When vectorizer generates a VPlan, it generates a "first-order splice"
instruction for a loop carried variable after its definition. All related PHI
users are changed to use this "first-order splice" result, and are moved after
it. The move is guided by a MapVector SinkAfter. And the content of SinkAfter is
filled by RecurrenceDescriptor::isFixedOrderRecurrence.

Let's look at the first PHI and related instructions

  %v792 = phi double [ %v806, %Loop ], [ %d1, %Entry ]
  %v802 = fdiv double %v794, %v792
  %v804 = fdiv double 1.000000e+00, %v792
  %v806 = load double, ptr %v805, align 8

%v806 is a loop carried variable, %v792 is related PHI instruction. Vectorizer
will generated a new "first-order splice" instruction for %v806, and it will be
used by %v802 and %v804. So %v802 and %v804 will be moved after %v806 and its
"first-order splice" instruction. So SinkAfter contains

   %v802   ->  %v806
   %v804   ->  %v802

It means %v802 should be moved after %v806 and %v804 will be moved after %v802.
Please pay attention that the order is important.

When isFixedOrderRecurrence processing PHI instruction %v794, related
instructions are

  %v793 = phi double [ %v813, %Loop ], [ %d1, %Entry ]
  %v794 = phi double [ %v793, %Loop ], [ %d2, %Entry ]
  %v802 = fdiv double %v794, %v792
  %v813 = load double, ptr %v812, align 8

This time its related loop carried variable is %v813, its user is %v802. So
%v802 should also be moved after %v813. But %v802 is already in SinkAfter,
because %v813 is later than %v806, so the original %v802 entry in SinkAfter is
deleted, a new %v802 entry is added. Now SinkAfter contains

  %v804   ->  %v802
  %v802   ->  %v813

With these data, %v802 can still be moved after all its operands, but %v804
can't be moved after %v806 and its "first-order splice" instruction. And causes
use before def error.

So when remove/re-insert an instruction I in SinkAfter, we should also
recursively remove instructions targeting I and re-insert them into SinkAfter.
But for simplicity I just bail out in this case.

Differential Revision: https://reviews.llvm.org/D134083
2022-10-03 18:47:51 +00:00
Sanjay Patel ba7da14d83 Revert "[InstSimplify] reduce code duplication for fmul folds; NFC"
This reverts commit 7b7940f9da.
This missed a test update.
2022-10-03 11:21:23 -04:00
Sanjay Patel 7b7940f9da [InstSimplify] reduce code duplication for fmul folds; NFC
The constant is already commuted for an fmul opcode,
but this code can be called more directly for fma,
so we have to swap for that caller. There are tests
in InstSimplify and InstCombine to verify that this
works as expected.
2022-10-03 10:36:02 -04:00
Bjorn Pettersson 66fcdfca4d [Analysis][SimplifyLibCalls] Refactor code related to size_t in lib func signatures. NFC
Added a helper in TargetLibraryInfo to get size of "size_t" in bits,
given a Module reference. The new getSizeTSize helper is using the
same strategy as for example isValidProtoForLibFunc has been using
in the past, assuming that the size can be derived by asking
DataLayout about the size/type of a pointer to int.

FortifiedLibCallSimplifier::optimizeStrpCpyChk was changed to use
the new getSizeTSize helper instead of assuming that sizeof(size_t)
is equal to sizeof(int*) by itself (that is the assumption used in
TargetLibraryInfoImpl::getSizeTSize so the result will be the same).

Having a common helper for this ensure that we use the same strategy
when deriving the size of "size_t" in different parts of the code.
One bonus with this refactoring (basing it on Module instead of just
DataLayout) is that it makes it easier to override this for a specific
target triple, in case the assumption of using getPointerSizeInBits
wouldn't hold.

Differential Revision: https://reviews.llvm.org/D110585
2022-10-03 12:02:50 +02:00
Sanjay Patel 4490cfbaf4 [ValueTracking] peek through fpext in isKnownNeverInfinity()
https://alive2.llvm.org/ce/z/BkNoRW
2022-10-02 11:20:23 -04:00
Florian Hahn 7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Teresa Johnson 0d7f3464ce [MemProf] Update metadata during inlining
Update both memprof and callsite metadata to reflect inlined functions.

For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.

For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original), via utilities in MemoryProfileInfo.

Depends on D128142.

Differential Revision: https://reviews.llvm.org/D128143
2022-09-30 16:46:17 -07:00
Teresa Johnson f9403ca41e Profile matching and IR annotation for memprof profiles.
See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*

* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d

Differential Revision: https://reviews.llvm.org/D128142
2022-09-30 16:46:17 -07:00
Eric Wang 5b26f4f042 Reland "[MLGO] ML Regalloc Priority Advisor"
This relands commit 8f4f26ba5b, which was reverted in 91c96a806c because of Buildbot failures. The previous model test is not compatible with tflite. e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041

Differential Revision: https://reviews.llvm.org/D133616
2022-09-30 16:27:26 -05:00
Nikita Popov 57f7f0d6cf [AST] Use BatchAA in aliasesPointer() (NFC) 2022-09-30 16:22:29 +02:00
Sanjay Patel 3f906f057c [InstSimplify] look through vector select (shuffle) in min/max fold
This is an extension of the existing min/max+select fold (which already
has a very large number of variations) to allow a vector shuffle because
that's what we have in the motivating example from issue #42100.

A couple of Alive2 checks of variants (I don't know how to generalize
these in Alive):
https://alive2.llvm.org/ce/z/jUFAqT

And verify the PR42100 test:
https://alive2.llvm.org/ce/z/3EcASf

It's possible there is some generalization of the fold or a
VectorCombine/SLP answer for the motivating test, but I haven't found a
better/smaller solution yet.

We can also add even more variants here as follow-up patches. For example,
we can have shuffle followed by min/max; we also don't have this
canonicalization or the reverse:
https://alive2.llvm.org/ce/z/StHD9f

Differential Revision: https://reviews.llvm.org/D134879
2022-09-30 08:27:00 -04:00
Florian Hahn 8ae0d9aa07
[LoopDeletion] Clear block & loop dispo cache after breaking backedge.
breakLoopBackedge may remove blocks and loops. Also clear block &
loop disposition to avoid the cache containing invalid blocks and loops.

The coverage for the change is provided when using an ASAN build of opt
to run the LoopDeletion unit tests; without the fix, pointers to invalid
objects would be used.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134663
2022-09-30 11:21:58 +01:00
Mircea Trofin 91c96a806c Revert "[MLGO] ML Regalloc Priority Advisor"
This reverts commit 8f4f26ba5b.

Buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041
2022-09-29 18:26:40 -07:00
Eric Wang 8f4f26ba5b [MLGO] ML Regalloc Priority Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information and then produce the log file.

Differential Revision: https://reviews.llvm.org/D133616
2022-09-29 16:55:15 -05:00
Arthur Eubanks 0cdd671df9 [CGSCC][DevirtWrapper] Properly handle invalidating analyses for invalidated SCCs
f77342693 handled the adaptor and pass manager but missed the devirt wrapper.
2022-09-29 09:55:23 -07:00
Kazu Hirata 4e9dd21015 [ModuleInliner] Add a cost-benefit-based priority
This patch teaches the module inliner a traversal order designed for
the instrumentation FDO (+ThinLTO) scenario.

The new traversal order prioritizes call sites in the following order:

1. Those call sites that are expected to reduce the caller size

2. Those call sites that have gone through the cost-benefit analaysis

3. The remaining call sites

With this fairly simple traversal order, a large internel benchmark
yields performance comparable to the bottom-up inliner -- both in
terms of the execution performance and .text* sizes.

Big thanks goes to Liqiang Tao for the module inliner infrastructure.

I still have hacks outside this patch to prevent excessively long
compilation or .text* size explosion.  I'm trying to come up with
acceptable solutions in near future.

Differential Revision: https://reviews.llvm.org/D134376
2022-09-29 09:00:38 -07:00
Nikita Popov aa25c92f33 [ValueTracking] Fix CannotBeOrderedLessThanZero() for fdiv (PR58046)
When checking the RHS of fdiv, we should set the SignBitOnly flag,
because a negative zero can become -Inf, which is ordered less
than zero.

Fixes https://github.com/llvm/llvm-project/issues/58046.

Differential Revision: https://reviews.llvm.org/D134876
2022-09-29 17:07:48 +02:00
Vitaly Buka 01f3e2d619 [StackLifetime] More efficient loop for LivenessType::Must
CFG with cycles may requires additional passes of "while (Changed)"
iteration if to propagate data back from latter blocks to earlier blocks,
ordered according to depth_fist.

OR logic, used for ::May, converge to stable state faster then AND logic
use for ::Must.

Though the better solution is to switch to some some form of queue, but
having that this one is good enough, I will consider to do that later.

We can switch ::Must to OR logic if we calculate "may be dead" instead
of direct "must be alive" and then convert values to match existing
interface.

Additionally it fixes correctness in "@cycle" test.

Reviewed By: kstoimenov, fmayer

Differential Revision: https://reviews.llvm.org/D134796
2022-09-28 16:28:45 -07:00
serge-sans-paille 16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Vitaly Buka 07cf1a25a3 [NFC][StackLifetime] Rename local variable
The next patch will require more generic name.
2022-09-27 23:25:39 -07:00
Vitaly Buka c1bf3df576 [NFC][StackLifetime] Remove local variable 2022-09-27 23:07:03 -07:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Nikita Popov 18b7c44086 [BasicAA] Use ScopeExit to clear Visited set (NFC) 2022-09-27 10:26:30 +02:00
Florian Hahn 275bee32ad
[LoopUnroll] Forget block and loop dispositions during unrolling.
After unrolling a loop, the block and loop dispositions need to be
cleared. As we don't know which SCEVs in the loop/blocks may be
impacted, completely clear the cache. This should also fix some cases
where deleted loops remained in the LoopDispositions cache.

This fixes a verification failure surfaced by D134531.

I am planning on reviewing/updating the existing uses of
forgetLoopDispositions to check if they should be replaced by
forgetBlockAndLoopDispositions.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134612
2022-09-27 08:49:04 +01:00
Sanjay Patel 222e1c73f3 [InstSimplify] don't commute constant expression operand in min/max calls
The test shows that we would fail to consistently fold the
instruction based on the max value operand. This is also
the root cause for issue #57986, but I'll add an instcombine
test + assert for that exact problem in another commit.
2022-09-26 16:01:09 -04:00
Sanjay Patel e9e994838a [InstSimplify] rearrange matching for select-of-min/max folds; NFC
This makes the code a little shorter and should be easier to extend
for a pattern like in issue #42100.
2022-09-26 15:02:40 -04:00
Sanjay Patel 096f1c4db4 [InstSimplify] remove redundant predicate check; NFC
It's still possible that there's a simpler way to specify
the conditions needed for this set of folds, but "getStrictPred"
converts >= to > for example, so there's no need to explicitly
check that.
2022-09-26 15:02:40 -04:00
Kazu Hirata e2398a4d7c [Analysis] Introduce getStaticBonusApplied (NFC)
InlineCostCallAnalyzer encourages inlining of the last call to the
static function by subtracting LastCallToStaticBonus from Cost.

This patch introduces getStaticBonusApplied to make available the
amount of LastCallToStaticBonus applied.

The intent is to allow the module inliner to determine whether
inlining a given call site is expected to reduce the caller size with
an expression like:

  IC.getCost() + IC.getStaticBonusApplied() < 0

This patch does not add a use of getStaticBonus yet.

Differential Revision: https://reviews.llvm.org/D134373
2022-09-25 23:21:40 -07:00
Sanjay Patel b0bfefb6ec [InstSimplify] fold redundant select of min/max, part 2
This extends e5d15e1162 to handle the inverse predicates
(there's probably a more elegant way to specify the preds).

These patterns correspond to the existing simplify:
max (min X, Y), X --> X
...and extra preds for (non)equality.

The tests cycle through all 10 icmp preds for each min/max
variant with 4 swapped operand patterns each (and the min/max
operands are commuted in every other test within those).

Some Alive2 examples to verify:
https://alive2.llvm.org/ce/z/XMvEKQ
https://alive2.llvm.org/ce/z/QpMChr
2022-09-25 07:06:43 -04:00
Sanjay Patel e5d15e1162 [InstSimplify] fold redundant select of min/max
This is similar to the existing simplify:
max (max X, Y), X --> max X, Y
...but the select condition can be one of
several predicates as shown in the tests.

The tests cycle through all 10 icmp preds for
each min/max variant with 4 swapped operand
patterns each (and the min/max operands are
commuted in every other test within those).

Some Alive2 examples to verify:
https://alive2.llvm.org/ce/z/lCAQm4
https://alive2.llvm.org/ce/z/kzxVXC
2022-09-24 11:34:05 -04:00
Yi Kong 32994b7357 Make MLIR model URLs cache variables
This allows us to directly use the models published on Github.

Differential Revision: https://reviews.llvm.org/D134566
2022-09-23 15:21:53 -07:00
Teresa Johnson b1926f308f Restore "[MemProf] Memprof profile matching and annotation"
This reverts commit 794b7ea960, and
thus restores commit a212d8da94, and
follow on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

Use a hash function (BLAKE3) instead of hash_combine/hash_code which are
not guaranteed to be stable across executions.

Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have
raw profile inputs to avoid failures on big endian bots.

Reviewers: snehasish, davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D128142
2022-09-23 11:38:47 -07:00
Simon Pilgrim c94cbc343e Fix gcc warning about ambiguous if-else chain
Fixes warnings introduced by D111968
2022-09-23 14:36:28 +01:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Nikita Popov 6c6b48434e [BasicAA] Clean up calculation of FMRB from attributes
The current implementation for call sites is pretty convoluted
when you take the underlying implementation of the used APIs
into account. We will query the call site attributes, and then
fall back to the function attributes while taking into account
operand bundles. However, getModRefBehavior() already has it's
own (more accurate) logic for combining call-site FMRB with
function FMRB.

Clean this up by extracting a function that only fetches FMRB
from attributes, which can be directly used in getModRefBehavior()
for functions, and needs to be combined with an operand-bundle
respecting fallback in the call site case.

One caveat (that makes this non-NFC) is that CallBase function
attribute lookups allow using attributes from functions with
mismatching signature. To ensure we don't regress quality, do
the same for the function FMRB fallback.
2022-09-23 12:05:35 +02:00
Teresa Johnson 794b7ea960 Revert "[MemProf] Memprof profile matching and annotation"
This reverts commit a212d8da94, and follow
on fixes 0cd6763fa9,
e9ff53d42f, and
37c6a25e9a.

After re-reading the documentation for hash_combine, I don't think this
is the appropriate hash function to use for computing the hash to use as
a stack id in the metadata, since it is not guaranteed to produce stable
values across executions. I have not hit this problem, but plan to
switch to using an MD5 hash. I am hitting an issue with one of the bots
(https://lab.llvm.org/buildbot/#/builders/171/builds/20732)
where the values produced are only the lower 32 bits of the expected
hash values, however, which I assume is related to the implementation of
hash_combine and hash_code.

I believe I fixed all of the other bot failures with the follow on fixes,
which I'll merge into the new version before reapplying.
2022-09-22 16:08:03 -07:00
Arthur Eubanks a8f1da128d [LazyCallGraph] Handle spurious ref edges when deleting a dead function
Spurious ref edges are ref edges that still exist in the call graph even
though the corresponding IR reference no longer exists. This can cause
issues when deleting a dead function which has a spurious ref edge
pointed at it because currently we expect the dead function's RefSCC to
be trivial.

In the case that the dead function's RefSCC is not trivial, remove all
ref edges from other nodes in the RefSCC to it.

Removing a ref edge can result in splitting RefSCCs. There's actually no
reason to revisit those RefSCCs because currently we only run passes on
SCCs, and we've already added all SCCs in the RefSCC to the worklist.
(as opposed to removing the ref edge in
updateCGAndAnalysisManagerForPass() which can modify the call graph of
SCCs we have not visited yet). We also don't expect that RefSCC
refinement will allow us to glean any more information for optimization
use. Also, doing so would drastically increase the complexity of
LazyCallGraph::removeDeadFunction(), requiring us to return a list of
invalidated RefSCCs and new RefSCCs to add to the worklist.

Fixes #56503

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D133907
2022-09-22 15:01:15 -07:00
Teresa Johnson 37c6a25e9a [MemProf] Fix buildbot error due to hasValue deprecation warning
Use has_value instead of hasValue to address a deprecation warning from
a212d8da94. E.g.:

https://lab.llvm.org/buildbot/#/builders/57/builds/22166/steps/5/logs/stdio
2022-09-22 13:26:53 -07:00
Teresa Johnson a212d8da94 [MemProf] Memprof profile matching and annotation
Profile matching and IR annotation for memprof profiles.

See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*

* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.

The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.

The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.

Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].

Depends on D128141 and D128854.

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d

Differential Revision: https://reviews.llvm.org/D128142
2022-09-22 12:48:31 -07:00
Nikita Popov 60990d9042 [BasicAA] Move experimental.guard modelling to getModRefBehavior()
While we can't express this with attributes yet, we can model
these intrinsics as readonly + writing inaccessible memory (for
the control dependence) in FMRB. This way we don't need to
special-case them in getModRefInfo(), it falls out of the usual
logic.
2022-09-22 16:48:51 +02:00
Nikita Popov ab25ea6d35 [AA] Model operand bundles more precisely
Based on D130896, we can model operand bundles more precisely. In
addition to the baseline ModRefBehavior, a reading/clobbering operand
bundle may also read/write all locations. For example, a memcpy with
deopt bundle can read any memory, but only write argument memory.

This means that getModRefInfo() for memcpy with a pointer that does
not alias the arguments results in Ref, rather than ModRef, without
the need to implement any special handling.

Differential Revision: https://reviews.llvm.org/D130980
2022-09-22 11:15:20 +02:00
Nikita Popov 41dde5d858 [InstSimplify] Support vectors in simplifyWithOpReplaced()
We can handle vectors inside simplifyWithOpReplaced(), as long as
cross-lane operations are excluded. The equality can hold (or not
hold) for each vector lane independently, so we shouldn't use the
replacement value from other lanes.

I believe the only operations relevant here are shufflevector (where
all previous bugs were seen) and calls (which might use shuffle-like
intrinsics and would require more careful classification).

Differential Revision: https://reviews.llvm.org/D134348
2022-09-22 10:45:42 +02:00
Philip Reames 8c46881a53 [TTI] Recognize fp constants in getOperandInfo
We were recognizing vectors of floats, but not scalars.  That's a tad odd.
2022-09-21 14:34:34 -07:00
Arthur Eubanks f77342693b [CGSCC] Properly handle invalidating analyses for invalidated SCCs
Currently if we mark an SCC as invalid, if we haven't set UR.UpdatedC, we won't propagate the PreservedAnalyses up to the parent pass (adaptor/pass manager).

In the provided test case, we inline the function into itself then delete it as it has no users. The SCC is marked as invalid without providing a replacement UR.UpdatedC. Then the CGSCC pass manager and adaptor discard the PreservedAnalyses. Instead, handle PreservedAnalyses first before bailing due to the invalid SCC.

Fixes crashes due to out of date analyses.
2022-09-21 09:50:00 -07:00
Kazu Hirata 0a0ccc85bb [ModuleInliner] Factor out common code in InlineOrder.cpp (NFC)
This patch factors out common code in InlineOrder.cpp.

Without this patch, the model is to ask classes like SizePriority and
CostPriority to compare a pair of call sites:

  bool hasLowerPriority(const CallBase *L, const CallBase *R) const override {

while these priority classes have their own caches of priorities:

  DenseMap<const CallBase *, PriorityT> Priorities;

This model results in a lot of duplicate code like hasLowerPriority
and updateAndCheckDecreased.

This patch changes the model so that priority classes just have two
methods to compute a priority for a given call site and to compare two
previously computed priorities (as opposed to call sites).

Facilities like hasLowerPriority and updateAndCheckDecreased move to
PriorityInlineOrder along with the map from call sites to their
priorities.  PriorityInlineOrder becomes a template class so that it
can accommodate different priority classes.

Differential Revision: https://reviews.llvm.org/D134149
2022-09-21 08:50:30 -07:00
Graham Hunter 3c74ed9ee3 [LAA] Fix ICE with scAddExpr in forked pointers
The IR from https://github.com/llvm/llvm-project/issues/57368 results
in an assert firing when trying to create a runtime check for the
forked pointer. One of the forks is fine since it's loop invariant,
but the other is a scAddExpr (containing a scAddRecExpr, so not
invariant) when RtCheck::insert expects a scAddRecExpr.

This is a simple fix to just avoid forks which aren't AddRec or
loop invariant. We can allow it as a forked pointer later with
more work.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D133020
2022-09-21 10:27:06 +01:00
Nikita Popov 17994ed919 [MemorySSA] Remove PerformedPhiTranslation flag
I believe this is no longer necessary, as the underlying problem
has been fixed in a different way: Nowadays, we will adjust the
location size to beforeOrAfterPointer() if the pointer is not loop
invariant. This makes merging results translated across loop
backedges safe.

The two tests in phi-translation.ll show an improvement while still
being correct: The loads in the loop no longer alias with noalias
pointers, but still alias with the store in the entry block (which
they originally did not -- this is the bug that
PerformedPhiTranslation originally fixed).

Differential Revision: https://reviews.llvm.org/D133404
2022-09-21 10:32:09 +02:00
Matt Arsenault 1a18fe65d3 Analysis: Remove redundant assertion
This assert guards the same assertion inside getTypeStoreSizeInBits
2022-09-20 09:39:45 -04:00
Matt Arsenault 1e1aefbf70 Analysis: Pass AssumptionCache through isKnownNonZero
Pass this through now that isDereferenceableAndAlignedPointer has
access to this.
2022-09-20 09:25:18 -04:00
Mateusz Mikuła 4f30c5808a [TargetLibraryInfo] Mark memrchr as unavailable on Windows
Otherwise LLVM will optimise strrchr into memrchr on Windows resulting in linker error:
```
$ cat memrchr_test.c
int main(int argc, char **argv) {
    return (long)strrchr("KkMm", argv[argc-1][0]);
}

$ clang memrchr_test.c -O
memrchr_test.c:3:12: warning: cast to smaller integer type 'long' from 'char *' [-Wpointer-to-int-cast]
    return (long)strrchr("KkMm", argv[argc-1][0]);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
ld.lld: error: undefined symbol: memrchr
>>> referenced by D:/msys64/tmp/memrchr_test-e7aabd.o:(main)
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```

Example taken from MSYS2 Discord and tested with windows-gnu target.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134134
2022-09-20 10:50:31 +03:00
luxufan bfd31c6f12 [MemorySSA][NFC] Use const whenever possible
Differential Revision: https://reviews.llvm.org/D134162
2022-09-20 02:21:02 +00:00
Matt Arsenault 2adae8e1b7 VectorCombine: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Matt Arsenault ce44357216 Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute
Does not update any of the uses.
2022-09-19 19:25:22 -04:00
Matt Arsenault 0d8ffcc532 Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer
This does not try to pass it through from the end users.
2022-09-19 18:57:33 -04:00
Nikita Popov 36f325413e [SCEV] Don't verify dispositions of invalid loops
This should fix the expensive checks build. Ideally we would not
have invalid loops in LoopDispositions.
2022-09-19 15:07:44 +02:00
Max Kazantsev bb68b2402d [SCEV] Verify contents of loop disposition cache
It seems that it is sometimes broken. Initial motivation for this was
investigation of https://github.com/llvm/llvm-project/issues/56260, but
it also seems that we have found an unrelated bug in LoopFusion that leaves
broken caches.

Differential Revision: https://reviews.llvm.org/D134158
Reviewed By: nikic
2022-09-19 17:43:00 +07:00
Max Kazantsev 818b1ab84e [SCEV][NFC] Remove unused parameter from forgetLoopDispositions
Let's be honest about it, we don't drop loop dispositions for
particular loops. Remove the parameter that misleadingly makes
it apparent that we do.
2022-09-19 14:06:42 +07:00
Kazu Hirata 71b12030b9 [ModuleInliner] Capitalize a variable name (NFC) 2022-09-18 14:35:09 -07:00
Kazu Hirata 82293ed486 [ModuleInliner] Remove unused using declarations (NFC) 2022-09-18 14:27:06 -07:00
Kazu Hirata 00d982699b [ModuleInliner] Move getInlineCostWrapper to an anonymous namespace (NFC)
This patch moves getInlineCostWrapper to an anonymous namespace.
While I am at it, I'm moving the function closer to the beginning of
the file so that I can use it elsewhere in the file without a forward
declaration.
2022-09-18 14:09:21 -07:00
Kazu Hirata d3b95ecc98 [ModuleInliner] Remove InlineOrder::front (NFC)
InlineOrder::front is a remnant from the era when we had a nested
"while" loops in the module inliner, with the inner one grouping the
call sites with the same caller.

Now that we have a simple "while" loop draining the priority queue, we
can just use InlineOrder::pop.

Differential Revision: https://reviews.llvm.org/D134121
2022-09-18 08:49:44 -07:00
Kazu Hirata cf355bf36e [Analysis] Introduce isSoleCallToLocalFunction (NFC)
We check to see if a given CallBase is a sole call to a local function
at multiple places in InlineCost.cpp.  This patch factors out the
common code.

Differential Revision: https://reviews.llvm.org/D134114
2022-09-17 20:59:54 -07:00
Kazu Hirata 20d764aff0 [llvm] Don't including SetVector.h (NFC)
llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without
including SetVector.h, so this patch adds an appropriate #include
there.
2022-09-17 12:36:43 -07:00
Kazu Hirata 5faf4bf195 [ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC)
UseInlinePriority specifies the priority function.  This patch
simplifies the code by moving UseInlinePriority closer to the actual
consumer -- the switch statement inside getInlineOrder.

Differential Revision: https://reviews.llvm.org/D134100
2022-09-17 11:41:28 -07:00
Kazu Hirata 6e30a9cc08 [Inliner] Retire DefaultInlineOrder (NFC)
DefaultInlineOrder was largely an exercise in generalizing the
traversal order of call sites within the inliner.

Now that the module inliner is starting to form its shape, there is no
point in sharing DefaultInlineOrder between the module inliner and the
CGSCC inliner.  DefaultInlineOrder and all the other inline orders are
mutually exclusive in the following sense:

- The use of DefaultInlineOrder doesn't make sense in the module
  inliner because there is no priority inherent in the order in which
  call sites are added to the list of call sites -- SmallVector.

- The use of any other inline order doesn't make sense in the CGSCC
  inliner because little prioritization can be done within one CGSCC.

This patch essentially reverts the addition of DefaultInlineOrder so
that the loop structure of Inliner.cpp looks like the state just
before we started working on the module inliner (circa June 2021).

At the same time, ww remove the choice of DefaultInlineOrder from
UseInlinePriority.

Differential Revision: https://reviews.llvm.org/D134080
2022-09-16 15:36:40 -07:00
Kazu Hirata e0bc76eb23 [ModuleInliner] Move InlinePriority and its derived classes to InlineOrder.cpp (NFC)
These classes are referred to only from getInlineOrder in
InlineOrder.cpp.  This patch hides the entire class declarations and
definitions in InlineOrder.cpp.

Differential Revision: https://reviews.llvm.org/D134056
2022-09-16 12:32:16 -07:00
Nikita Popov b1cd393f9e [AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI)
Currently, FunctionModRefBehavior tracks whether the function reads
or writes memory (ModRefInfo) and which locations it can access
(argmem, inaccessiblemem and other). This patch changes it to track
ModRef information per-location instead.

To give two examples of why this is useful:

* D117095 highlights a weakness of ModRef modelling in the presence
  of operand bundles. For a memcpy call with deopt operand bundle,
  we want to say that it can read any memory, but only write argument
  memory. This would allow them to be treated like any other calls.
  However, we currently can't express this and have to say that it
  can read or write any memory.
* D127383 would ideally be modelled as a separate threadid location,
  where threadid Refs outside pre-split coroutines can be ignored
  (like other accesses to constant memory). The current representation
  does not allow modelling this precisely.

The patch as implemented is intended to be NFC, but there are some
obvious opportunities for improvements and simplification. To fully
capitalize on this we would also want to change the way we represent
memory attributes on functions, but that's a larger change, and I
think it makes sense to separate out the FunctionModRefBehavior
refactoring.

Differential Revision: https://reviews.llvm.org/D130896
2022-09-14 16:34:41 +02:00
Nikita Popov 1cfbbba15b [AA] Remove unnecessary intersections from getModRefBehavior() (NFC)
Intersection with other providers is performed by AAResults. Doing
this here is both pointless and confusing.
2022-09-14 14:26:39 +02:00
Nikita Popov 31cc0ab321 [BasicAA] Delay getAllocTypeSize() call (NFC)
This call is expensive, so don't perform it for zero indices.

Also rename the variable to use Alloc rather than Alloca, this
doesn't have anything to do with allocas in particular.
2022-09-13 10:24:50 +02:00
Matthias Gehre c1502425ba Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth
Also remove new-pass-manager version of ExpandLargeDivRem because there is no way
yet to access TargetLowering in the new pass manager.

Differential Revision: https://reviews.llvm.org/D133691
2022-09-12 17:06:16 +01:00
zhongyunde 8a15695be2 [AA] Improve the BasicAA analysis capability
According https://discourse.llvm.org/t/memoryssa-does-the-accessedbetween-support-scalable-vector-pointer/65052,
scalable vector support in BasicAA is currently essentially limited,
and should be improved effectively for a constant offset GEP if the scalable index is zero, eg:
  getelementptr <vscale x 4 x i32>, ptr %p, i64 0, i64 %i

Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D133567
2022-09-12 19:41:17 +08:00
Junduo Dong 6975ab7126 [Clang] Reimplement time tracing of NewPassManager by PassInstrumentation framework
The previous implementation of time tracing in NewPassManager is direct but messive.

The key codes are like the demo below:
```
  /// Runs the function pass across every function in the module.
  PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM,
                        LazyCallGraph &CG, CGSCCUpdateResult &UR) {
      /// ...
      PreservedAnalyses PassPA;
      {
        TimeTraceScope TimeScope(Pass.name());
        PassPA = Pass.run(F, FAM);
      }
      /// ...
 }
```

It can be bothered to judge where should we add the tracing codes by hands.

With the PassInstrumentation framework, we can easily add `Before/After` callback
functions to add time tracing codes.

Differential Revision: https://reviews.llvm.org/D131960
2022-09-11 05:42:55 -07:00
Aiden Grossman ec83c7e358 [MLGO] Make TFLiteUtils throw an error if some features haven't been passed to the model
In the Tensorflow C lib utilities, an error gets thrown if some features
haven't gotten passed into the model (due to differences in ordering
which now don't exist with the transition to TFLite). However, this is
not currently the case when using TFLiteUtils. This patch makes some
minor changes to throw an error when not all inputs of the model have
been passed, which when not handled will result in a seg fault within
TFLite.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D133451
2022-09-10 22:59:03 +00:00
Nikita Popov a9f312c7f4 [AST] Use BatchAA in aliasesUnknownInst() (NFCI) 2022-09-09 15:54:48 +02:00
Mircea Trofin a219a8a822 [mlgo][nfc] Set logging level to warning or higher for TFLite 2022-09-08 12:10:56 -07:00
Nikita Popov 98a3a340c3 [ConstantExpr] Don't create fneg expressions
Don't create fneg expressions unless explicitly requested by IR or
bitcode.
2022-09-07 11:27:25 +02:00
Matthias Gehre 2090e85fee [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
This adds the ExpandLargeDivRem to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

Differential Revision: https://reviews.llvm.org/D130076
2022-09-06 15:32:04 +01:00
Sanjay Patel a8fcb51242 [InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true"
This fold was added with 5e9522c311, but over-specified.
We can assume that an undef element is an odd number:
https://alive2.llvm.org/ce/z/djQmWU
2022-09-06 08:19:30 -04:00
eopXD ea3630e8d4 [CMake][MLGO] Fix cmake for MLGO
The if-statement should check whehter TFLITE is on or not rather than if the variable is specified.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D132902
2022-09-06 00:32:08 -07:00
Arthur Eubanks 7e3aa8f01a Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests"
This reverts commit 57fd866551.

Causes crashes, see comments in D132581.
2022-09-05 15:42:48 -07:00
LiaoChunyu 456c7ef68e [InstSimplify][NFC] shortened the code 2022-09-05 23:57:53 +08:00
LiaoChunyu 5e9522c311 [InstSimplify] Odd - X ==/!= X -> false/true
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D132989
2022-09-05 23:51:45 +08:00
Kazu Hirata 3850edd9e0 Use llvm::count_if (NFC) 2022-09-03 11:17:35 -07:00
Simon Pilgrim e2d140e9c3 [TTI] Add isExpensiveToSpeculativelyExecute wrapper
CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead.

This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.
2022-09-03 13:12:22 +01:00
Arthur Eubanks 57fd866551 [LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests
The current code is basically just emulating what the analysis manager does.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D132581
2022-09-02 10:55:53 -07:00
Rong Xu 0caa4a9559 [PGO] Support PGO annotation of CallBrInst
We currently instrument CallBrInst but do not annotate it with
the branch weight. This patch enables PGO annotation of CallBrInst.

Differential Revision: https://reviews.llvm.org/D133040
2022-09-01 14:13:50 -07:00
Nikita Popov 4f046bc8e0 [PHITranslateAddr] Require dominance when searching for translated address (PR57025)
This is a fix for PR57025 and an alternative to D131776. The problem
in the phi-translation-to-wrong-context.ll test case is that phi
translation of %gep.j into if2 pick %gep.i as the result. While this
instruction has the correct pointer address, it occurs in a context
where %i != 0. As such, we get a NoAlias result for the store in
if2, even though they do alias for %i == 0 (which is legal in the
original context of the pointer).

PHITranslateValue already has a MustDominate option, which can be
used to restrict PHI translation results to values that dominate the
translated-into block. However, this is more aggressive than what we
need and would significantly regress GVN results. In particular, if
we have a pointer value that does not require any translation, then
it is fine to continue using that value in the predecessor, because
the context is still correct for the original query. We only run into
problems if PHITranslateSubExpr() picks a completely random
instruction in a context that may have preconditions that do not hold.

Fix this by always performing the dominance checks in
PHITranslateSubExpr(), without enabling the more general MustDominate
requirement.

Fixes https://github.com/llvm/llvm-project/issues/57025. This also
fixes the test case for https://github.com/llvm/llvm-project/issues/30999,
but I'm not sure whether that's just the particular test case,
or a general solution to the problem.

Differential Revision: https://reviews.llvm.org/D132935
2022-09-01 16:26:42 +02:00
Pavel Samolysov 88581db62f [LazyCallGraph] Reformat the code in accordance with the code style. NFC
Also, some local variables were renamed in accordance with the code
style as well as `std::tie` occurrences and `.first`/`.second` member
uses were replaced with structure bindings.

Differential Revision: https://reviews.llvm.org/D132806
2022-08-30 11:06:42 +03:00
Kazu Hirata 6ed2cb4ad5 Revert "[llvm] Use llvm::is_contained (NFC)"
This reverts commit ebf574f59a.

This patch seems to cause build failures on Windows.
2022-08-28 18:52:49 -07:00
Kazu Hirata ebf574f59a [llvm] Use llvm::is_contained (NFC) 2022-08-28 17:35:03 -07:00
Kazu Hirata ce9f007c7c [llvm] Use llvm::find_if (NFC) 2022-08-28 10:41:48 -07:00
Kazu Hirata 7a617fdf39 Use std::gcd (NFC)
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
2022-08-27 21:20:59 -07:00
Arthur Eubanks 7a94d189ad [LazyCallGraph] Update libcall list when replacing a libcall node's function
Otherwise when we visit all libcalls in
updateCGAndAnalysisManagerForPass(), the old libcall is dead and doesn't
have a node.

We treat libcalls conservatively in LazyCallGraph because any function
may introduce calls to them out of thin air.

It is weird to change the signature of a libcall since introducing calls
to the libcall with a different signature may break, but other passes
like deadargelim already do it, so let's preserve this behavior for now.

Fixes an issue found in D128830.

Reviewed By: psamolysov

Differential Revision: https://reviews.llvm.org/D132764
2022-08-27 10:57:53 -07:00
Mircea Trofin 3546b5c520 [mlgo] Fix flaky test
The source of the flakyness is internal uninitialized buffers due to
a dangling variable in the model.
2022-08-26 21:29:25 -07:00
Florian Hahn 9405af1c85
[LAA] Require AddRecs to be in the innermost loop for diff-checks.
The simpler diff-checks require pointers with add-recs from the same
innermost loop, but this property wasn't check completely. Add the
missing check to ensure both addrecs are in the innermost loop.

Fixes #57315.
2022-08-26 20:39:52 +01:00
Philip Reames 86b67a310d [LAA] Prune dependencies with distance large than access implied by trip count
When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances.

Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all.

Differential Revision: https://reviews.llvm.org/D131924
2022-08-25 14:24:13 -07:00
Sanjay Patel 4e44c22c97 [ValueTracking][InstCombine] restrict FP min/max matching to avoid miscompile
This is a long-standing FIXME with a non-FMF test that exposes
the bug as shown in issue #57357.

It's possible that there's still a way to miscompile by
mis-identifying/mis-folding FP min/max patterns, but
this patch only exposes a couple of seemingly minor
regressions while preventing the broken transform.
2022-08-25 16:52:40 -04:00
Florian Hahn c035efc814
[LAA] Cache PSE.getSE() in variable (NFC).
Preparation for follow-up patches will introduce additional uses
of SE.
2022-08-25 21:40:22 +01:00
Mircea Trofin b2b460b0a0 [mlgo] Fix tests
Missed a few tests in D119507
2022-08-24 17:31:40 -07:00
Mircea Trofin 5ce4c9aa04 [mlgo] Use TFLite for 'development' mode.
TLite is a lightweight, statically linkable[1], model evaluator, supporting a
subset of what the full tensorflow library does, sufficient for the
types of scenarios we envision having. It is also faster.

We still use saved models as "source of truth" - 'release' mode's AOT
starts from a saved model; and the ML training side operates in terms of
saved models.

Using TFLite solves the following problems compared to using the full TF
C API:

- a compiler-friendly implementation for runtime-loadable (as opposed
  to AOT-embedded) models: it's statically linked; it can be built via
  cmake;
- solves an issue we had when building the compiler with both AOT and
  full TF C API support, whereby, due to a packaging issue on the TF
  side, we needed to have the pip package and the TF C API library at
  the same version. We have no such constraints now.

The main liability is it supporting a subset of what the full TF
framework does. We do not expect that to cause an issue, but should that
be the case, we can always revert back to using the full framework
(after also figuring out a way to address the problems that motivated
the move to TFLite).

Details:

This change switches the development mode to TFLite. Models are still
expected to be placed in a directory - i.e. the parameters to clang
don't change; what changes is the directory content: we still need
an `output_spec.json` file; but instead of the saved_model protobuf and
the `variables` directory, we now just have one file, `model.tflite`.

The change includes a utility showing how to take a saved model and
convert it to TFLite, which it uses for testing.

The full TF implementation can still be built (not side-by-side). We
intend to remove it shortly, after patching downstream dependencies. The
build behavior, however, prioritizes TFLite - i.e. trying to enable both
full TF C API and TFLite will just pick TFLite.

[1] thanks to @petrhosek's changes to TFLite's cmake support and its deps!
2022-08-24 16:07:24 -07:00
Jakub Kuderski 6fa87ec10f [ADT] Deprecate is_splat and replace all uses with all_equal
See the discussion thread for more details:
https://discourse.llvm.org/t/adt-is-splat-and-empty-ranges/64692

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D132335
2022-08-23 11:36:27 -04:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Aditya Kumar 0af3ab02fd [NFC] LoopAccess: Move expressions close to usage
Avoids useless evaluation of these expressions.

Reviewed By: michaelmaitland, fhahn

Differential Revision: https://reviews.llvm.org/D132337
2022-08-23 07:08:42 -07:00
liqinweng 9181ab9223 [NFC]] Use llvm::all_of instead of std::all_of
Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D131886
2022-08-23 12:21:53 +08:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Philip Reames 27d3321c4f [TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc]
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
2022-08-22 11:26:31 -07:00
Philip Reames 274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Philip Reames c42a5f1cc2 [TTI] Migrate getOperandInfo to OperandVaueInfo [nfc]
This is part of merging OperandValueKind and OperandValueProperties.
2022-08-22 10:19:02 -07:00
Philip Reames 5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Max Kazantsev e587199a50 [SCEV] Prove condition invariance via context, try 2
Initial implementation had too weak requirements to positive/negative
range crossings. Not crossing zero with nuw is not enough for two reasons:

- If ArLHS has negative step, it may turn from positive to negative
  without crossing 0 boundary from left to right (and crossing right to
  left doesn't count for unsigned);
- If ArLHS crosses SINT_MAX boundary, it still turns from positive to
  negative;

In fact we require that ArLHS always stays non-negative or negative,
which an be enforced by the following set of preconditions:

- both nuw and nsw;
- positive step (looks liftable);

Because of positive step, boundary crossing is only possible from left
part to the right part. And because of no-wrap flags, it is guaranteed
to never happen.
2022-08-22 14:31:19 +07:00
Ting Wang d2d77e050b [PowerPC][Coroutines] Add tail-call check with call information for coroutines
Fixes #56679.

Reviewed By: ChuanqiXu, shchenz

Differential Revision: https://reviews.llvm.org/D131953
2022-08-21 22:20:40 -04:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Sanjay Patel 2981a94902 [EarlyCSE][ConstantFolding] do not constant fold atan2(+/-0.0, +/-0.0), part 2
Follow-up to 7f1262a322. That patch avoided removing the
call, but it still allowed the constant-folded result. This
makes the behavior consistent with 1-arg libm folding: if the
call potentially raises an exception, then we just bail out.

It seems likely that there are other corner-cases like this,
but the tests are incomplete, so we have lived with these
discrepancies for a long time. This was untested before the
the constant folding was expanded in D127964.
2022-08-20 10:16:06 -04:00
Philip Reames b0a2c48e9f [tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc] 2022-08-19 16:22:22 -07:00
Sanjay Patel 7f1262a322 [EarlyCSE][ConstantFolding] do not constant fold atan2(+/-0.0, +/-0.0)
These may raise an error (set errno) as discussed in the post-commit
comments for D127964, so we can't fold away the call and potentially
alter that behavior.
2022-08-19 12:27:29 -04:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Max Kazantsev f798c042f4 Revert "[SCEV] Prove condition invariance via context"
This reverts commit a3d1fb3b59.

Reverting until investigation of https://github.com/llvm/llvm-project/issues/57247
has concluded.
2022-08-19 21:02:06 +07:00
Michael Maitland f29401fcdf [LoopVectorize][LoopAccessAnalysis] add newline to debug message
A debug message in `LoopAccessAnalysis` did not have a newline in it, causing printed debug messages to be formatted incorrectly.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D132172
2022-08-18 13:44:05 -07:00
Florian Hahn b8709a9d03
[LV] Support fixed order recurrences.
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D119661
2022-08-18 19:15:52 +01:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Simon Pilgrim b994f87184 [Analysis] CostModel.cpp - merge isa<IntrinsicInst> and dyn_cast<IntrinsicInst> checks
Pulled out of D79483
2022-08-18 10:43:29 +01:00
Simon Pilgrim 1d522a39f7 [TTI] Remove getInstructionThroughput cost helper.
Pulled out of D79483 - we can just as easily use getUserCost directly
2022-08-17 11:41:47 +01:00
Zain Jaffal f61f99a105
[instcombine] Optimise for zero initialisation of product given fast flags are enabled
Currently, clang ignores the 0 initialisation in finite math
For example:

```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
  f_prod *= arr[i];
 }
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.

Reviewed By: fhahn, spatel

Differential Revision: https://reviews.llvm.org/D131672
2022-08-17 11:12:15 +01:00
Graham Hunter 70d35443dc [LAA] Handle forked pointers with add/sub instructions
Handle cases where a forked pointer has an add or sub instruction
before reaching a select.

Reviewed By: fhahn
Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130278
2022-08-17 09:51:13 +01:00
Simon Pilgrim 08d153d806 [ValueTracking] computeKnownBits - attempt to use a branch condition feeding a phi to improve known bits range (PR38280)
If computeKnownBits encounters a phi node, and we fail to determine any known bits through direct analysis, see if the incoming value is part of a branch condition feeding the phi.

Handle cases where icmp(IncomingValue PRED Constant) is driving a branch instruction feeding that phi node - at the moment this only handles EQ/ULT/ULE predicate cases as they are the most straightforward to handle and most likely for branch-loop 'max upper bound' cases - we can extend this if/when necessary.

I investigated a more general icmp(LHS PRED RHS) KnownBits system, but the hard limits we put on value tracking depth through phi nodes meant that we were mainly catching constants anyhow.

Fixes the pointless vectorization in PR38280 / Issue #37628 (excessive unrolling still needs handling though)

Differential Revision: https://reviews.llvm.org/D131838
2022-08-16 16:54:44 +01:00
Max Kazantsev ebabd6bf18 Return "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 354fa0b480.

Returning as is. The patch was reverted due to a miscompile, but
this patch is not causing it. This patch made it possible to infer
some nuw flags in code guarded by `false` condition, and then someone
else to managed to propagate the flag from dead code outside.

Returning the patch to be able to reproduce the issue.
2022-08-16 14:12:36 +07:00
Craig Topper ef8c34e954 [InstSimplify] sle on i1 also encodes implication
We already support SGE, so the same logic should hold for SLE with
the LHS and RHS swapped.

I didn't see this in the wild. Just happened to walk past this code
and thought it was odd that it was asymmetric in what condition
codes it handled.

Reviewed By: spatel, reames

Differential Revision: https://reviews.llvm.org/D131805
2022-08-15 08:27:23 -07:00
Max Kazantsev 354fa0b480 Revert "[SCEV] Use context to strengthen flags of BinOps"
This reverts commit 34ae308c73.

Our internal testing found a miscompile. Not sure if it's caused by
this patch or it revealed something else. Reverting while investigating.
2022-08-15 18:51:59 +07:00
Wolfgang Pieb 7ddfb4dfeb [Inlining] Introduce the function attribute "inline-max-stacksize"
The value of the attribute is a size in bytes. It has the effect of
suppressing inlining of functions whose stacksizes exceed the given value.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D129904
2022-08-12 11:07:18 -07:00
Max Kazantsev a3d1fb3b59 [SCEV] Prove condition invariance via context
Contextual knowledge may be used to prove invariance of some conditions.
For example, in this case:
```
  ; %len >= 0
  guard(%iv = {start,+,1}<nuw> <s %len)
  guard(%iv = {start,+,1}<nuw> <u %len)
```
the 2nd check always fails if `start` is negative and always passes otherwise.

It looks like there are more opportunities of this kind that are still to be
implemented in the future.

Differential Revision: https://reviews.llvm.org/D129753
Reviewed By: apilipenko
2022-08-12 14:23:35 +07:00
Mircea Trofin 3486b1b736 [mlgo][nfc] regalloc test model generator: prep for TFLite
Casting operator to make TFLite happy.

Reviewed By: yundiqian

Differential Revision: https://reviews.llvm.org/D131584
2022-08-11 15:53:23 -07:00
Fangrui Song 57f334d817 [Support] Remove Log2 workaround for Android API level < 18
The function added by D9467 is unneeded.
https://github.com/android/ndk/wiki/Changelog-r24 shows that the NDK has
moved forward to at least a minimum target API of 19.

Reviewed By: srhines

Differential Revision: https://reviews.llvm.org/D131656
2022-08-11 17:39:41 +00:00
Kevin P. Neal de64d0076e [FPEnv][InstSimplify] Fix formatting error.
My most recent change for D131607 had a formatting error that I didn't
notice until after I committed it. Let me fix it now so changes to this
file will be back-to-back from me.
2022-08-11 12:10:05 -04:00
Kevin P. Neal 7bdb010d7c [FPEnv][InstSimplify] 0.0 - -X ==> X
Another ticket split out of D107285, this extends the optimization
of 0.0 - -X to just X when using constrained intrinsics and the
optimization is allowed.

If the negation of X is done with fsub then the match fails because of
the lack of IR Matcher support for constrained intrinsics.

While I'm here, remove some TODO notices since the work is no longer
planned.

Differential Revision: https://reviews.llvm.org/D131607
2022-08-11 11:35:33 -04:00
Martin Sebor 0dcfe7aa35 [InstCombine] Tighten up known library function signature tests (PR #56463)
Replace a switch statement used to validate arguments to known library
functions with a more consistent table-driven approach and tighten it
up.
2022-08-10 14:15:46 -06:00
Simon Pilgrim 77d33f4c1b [Analysis] Remove unused CostModelAnalysis::getInstructionCost helper. NFCI.
Everything now uses TTI costs calls directly
2022-08-10 17:21:46 +01:00
Mohammed Nurul Hoque 30abc1a6a1 [ConstantFolding] Eliminate atan and atan2 calls
From the opengroup specifications, atan2 may fail if the result
underflows and atan may fail if the argument is subnormal, but
we assume that does not happen and eliminate the calls if we
can constant fold the result at compile-time.

Differential Revision: https://reviews.llvm.org/D127964
2022-08-10 11:01:50 -04:00
Dinar Temirbulatov cab6cd6834 [AArch64][LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding.
After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.

  Differential Revision: https://reviews.llvm.org/D130755
2022-08-09 22:10:17 +01:00
yundiqian 3edd8978c3 fix mlgo regalloc test model generation for tflite
To move from TF C API to TFLite, we found that the argmax op in TFLite does not work for int64 inputs, so cast the int64 inputs to int32 inputs to make TFLite argmax op work

Differential Revision: https://reviews.llvm.org/D131462
2022-08-09 12:36:28 -07:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata e20d210eef [llvm] Qualify auto (NFC)
Identified with readability-qualified-auto.
2022-08-07 23:55:27 -07:00
Sanjay Patel 74b5e797d5 [InstSimplify] fold scalable vectors with over-shift splat constant to poison
Fixes #56968
2022-08-07 16:26:05 -04:00
Sanjay Patel 8148c28fad [ConstFolding] fix overzealous assert when converting FP half
Fixes #56981
2022-08-07 13:34:51 -04:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Kazu Hirata c8e6ebd74e Use value instead of getValue (NFC) 2022-08-06 11:21:39 -07:00
Vitaly Buka 8d2901d537 [NFC][Inliner] Add Load/Store handler
This is an additional signal which may benefit sanitizers.

Reviewed By: kda

Differential Revision: https://reviews.llvm.org/D131129
2022-08-05 13:42:17 -07:00
Sanjay Patel b63fc26d33 [InstSimplify] make uses of isImpliedCondition more efficient (NFCI)
As suggested in the post-commit comments for 019d76196f,
this makes the usage symmetric with the 'and' patterns and should
be more efficient.
2022-08-05 12:06:47 -04:00
Sanjay Patel 019d76196f [InstSimplify] use isImpliedCondition() instead of semi-duplicated code
We get a couple of improvements from recognizing swapped
operand patterns that were not handled by the replicated
code.

This should also enable simplifying larger patterns as
seen in issue #56653 and issue #56654, but that requires
enhancements to isImpliedCondition() itself.
2022-08-05 10:59:09 -04:00
David Green b2de84633a [ConstProp] Don't fallthorugh for poison constants on vctp and active_lane_mask.
Given a poison constant as input, the dyn_cast to a ConstantInt would
fail so we would fall through to the generic code that attempts to fold
each element of the input vectors. The inputs to these intrinsics are
not vectors though, leading to a compile time crash. Instead bail out
properly for poison values by returning nullptr. This doesn't try to
define what poison means for these intrinsics.

Fixes #56945
2022-08-05 11:19:36 +01:00
Fangrui Song 7d6017fd31 [TTI] Change new getVectorInstrCost overload to use const reference after D131114
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D131197
2022-08-04 15:16:51 -07:00
Sanjay Patel 8e7acb670b [ValueTracking] improve readability in isImpliedCond helper functions; NFC
This matches the caller code naming scheme and avoids the
potentially confusing transition from left/right to A/B.
2022-08-04 17:43:31 -04:00
Sanjay Patel 657bfa364f [ValueTracking] reduce code in isImpliedCondICmps; NFC
This copies the implementation of the subsequent match with constants.
2022-08-04 17:03:42 -04:00
Mingming Liu bc8f2f3649 [AArch64][TTI][NFC] Overload method 'getVectorInstrCost' to provide vector instruction itself, as a context information for cost estimation.
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
   SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.

Differential Revision: https://reviews.llvm.org/D131114
2022-08-04 12:58:25 -07:00
Arthur Eubanks 203296d642 [BoundsChecking] Fix merging of sizes
BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the
underlying size/offset of pointers in allocations.  However,
ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator
uses to check for constant sizes/offsets)
doesn't quite treat sizes and offsets the same way as
BoundsChecking.  BoundsChecking wants to know the size of the
underlying allocation and the current pointer's offset within
it, but ObjectSizeOffsetVisitor only cares about the size
from the pointer to the end of the underlying allocation.

This only comes up when merging two size/offset pairs. Add a new mode to
ObjectSizeOffsetVisitor which cares about the underlying size/offset
rather than the size from the current pointer to the end of the
allocation.

Fixes a false positive with -fsanitize=bounds.

Reviewed By: vitalybuka, asbirlea

Differential Revision: https://reviews.llvm.org/D131001
2022-08-03 17:21:19 -07:00
Vitaly Buka a2aa6809a8 [NFC][Inliner] Add cl::opt<int> to tune InstrCost
The plan is tune this for sanitizers.

Differential Revision: https://reviews.llvm.org/D131123
2022-08-03 17:14:10 -07:00
Congzhe Cao 76be554931 [DependenceAnalysis][PR56275] Normalize negative dependence analysis results
This patch is the first of the two-patch series (D130188, D130179) that
resolve PR56275 (https://github.com/llvm/llvm-project/issues/56275)
which is a missed opportunity, where a perfrectly valid case for loop
interchange failed interchange legality.

If the distance/direction vector produced by dependence analysis (DA) is
negative, it needs to be normalized (reversed). This patch provides helper
functions `isDirectionNegative()` and `normalize()` in DA that does the
normalization, and clients can query DA to do normalization if needed.

A pass option `<normalized-results>` is added to DependenceAnalysisPrinterPass,
and we leverage it to update DA test cases to make sure of test coverage. The
test cases added in `Banerjee.ll` shows that negative vectors are normalized
with `print<da><normalized-results>`.

Reviewed By: bmahjour, Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D130188
2022-08-03 19:59:00 -04:00
Mircea Trofin 0cb9746a7d [nfc][mlgo] Separate logger and training-mode model evaluator
This just shuffles implementations and declarations around. Now the
logger and the TF C API-based model evaluator are separate.

Differential Revision: https://reviews.llvm.org/D131116
2022-08-03 16:20:28 -07:00
Vitaly Buka 26dd42705c [NFC][Inliner] Simplify clamping in addCost 2022-08-03 14:54:37 -07:00
Vitaly Buka e056e74dda [NFC][inline] Add const to an argument 2022-08-03 13:20:47 -07:00
Johannes Reifferscheid 3e9e43b48e Fix compiler error: init-statements in if/switch.
Reviewed By: pifon2a

Differential Revision: https://reviews.llvm.org/D131061
2022-08-03 11:36:41 +02:00
Johannes Reifferscheid 7ae5d00afa Fix a stack overflow in ScalarEvolution.
Unfortunately, this overflow is extremely hard to reproduce reliably (in fact, I was unable to do so). The issue is that:

- getOperandsToCreate sometimes skips creating an SCEV for the LHS
- then, createSCEV is called for the BinaryOp
- ... which calls getNoWrapFlagsFromUB
- ... which under certain circumstances calls isSCEVExprNeverPoison
- ... which under certain circumstances requires the SCEVs of all operands

For certain deep dependency trees, this causes a stack overflow.

Reviewed By: bkramer, fhahn

Differential Revision: https://reviews.llvm.org/D129745
2022-08-03 11:08:01 +02:00
Nikita Popov b128e057c1 [AA] Make ModRefInfo a bitmask enum (NFC)
Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...

Differential Revision: https://reviews.llvm.org/D130870
2022-08-03 10:05:55 +02:00
Max Kazantsev 34ae308c73 [SCEV] Use context to strengthen flags of BinOps
Sometimes SCEV cannot infer nuw/nsw from something as simple as
```
  len in [0, MAX_INT]
...
  iv = phi(0, iv.next)
  guard(iv <s len)
  guard(iv <u len)
  iv.next = iv + 1
```
just because flag strenthening only relies on definition and does not use local facts.
This patch adds support for the simplest case: inference of flags of `add(x, constant)`
if we can contextually prove that `x <= max_int - constant`.

In case if it has negative CT impact, we can add an option to switch it off. I woudln't
expect that though.

Differential Revision: https://reviews.llvm.org/D129643
Reviewed By: apilipenko
2022-08-03 14:08:57 +07:00
Paul Kirth d434e40f39 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-08-03 00:09:45 +00:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
jacquesguan e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
Nikita Popov 4ec22ba9c8 [GlobalsAA] Remove unnecessary AAResultBase fallback (NFC)
This is unnecessary, as AA result chaining is implemented at a
higher level now.
2022-08-01 08:34:58 +02:00
Nikita Popov 5b1d10bda6 [AA] Drop setModAndRef() function (NFC)
Without the "must" state, this function is pointless, because we
can just directly create a ModRef instead.
2022-08-01 07:55:39 +02:00
Nikita Popov 34683c3e35 [MSSA] Fix expensive checks build 2022-08-01 07:28:52 +02:00
Nikita Popov f96ea53e89 [AA] Do not track Must in ModRefInfo
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.

The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.

Differential Revision: https://reviews.llvm.org/D130713
2022-08-01 07:14:31 +02:00
Kazu Hirata bf6021709a Use drop_begin (NFC) 2022-07-31 15:17:09 -07:00
Sanjay Patel 02b3a35892 [InstSimplify] fold FP rounding intrinsic with rounded operand
issue #56775

I rearranged the Thumb2 codegen test to avoid simplifying the chain
of rounding instructions. I'm assuming the intent of the test is
to verify lowering of each of those intrinsics.
2022-07-31 10:00:27 -04:00
Kazu Hirata 60db8d9b4e Use nullptr instead of 0 (NFC)
Identified with modernize-use-nullptr.
2022-07-30 10:35:48 -07:00
Nuno Lopes d4b4747de5 ConstantFolding: fold OOB accesses to poison instead of undef 2022-07-30 15:20:32 +01:00
Nuno Lopes fffabd5348 [NFC] Switch a few uses of undef to poison as placeholders for unreachable code 2022-07-30 13:55:56 +01:00
Florian Hahn 214e2d8fe5
[SCEV] Avoid repeated proveNoSignedWrapViaInduction calls.
At the moment, proveNoSignedWrapViaInduction may be called for the
same AddRec a large number of times via getSignExtendExpr. This can have
a severe compile-time impact for very loop-heavy code.

If proveNoSignedWrapViaInduction failed to prove NSW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

This is the signed version of 8daa338297 / D130648.

This can drastically improve compile-time in some excessive cases and
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.06%
NewPM-ReleaseThinLTO: -0.04%
NewPM-ReleaseLTO-g: -0.04%

https://llvm-compile-time-tracker.com/compare.php?from=8daa338297d533db4d1ae8d3770613eb25c29688&to=aed126a196e7a5a9803543d9b4d6bdb233d0009c&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130694
2022-07-29 09:15:03 +01:00
Liqiang Tao d52e775b05 [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 22:44:03 +08:00
Liqiang Tao c113594378 Revert "[llvm][ModuleInliner] Add inline cost priority for module inliner"
This reverts commit bb7f62bbbd.
2022-07-28 22:36:28 +08:00
Liqiang Tao bb7f62bbbd [llvm][ModuleInliner] Add inline cost priority for module inliner
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D130012
2022-07-28 21:28:07 +08:00
Florian Hahn 8daa338297
[SCEV] Avoid repeated proveNoUnsignedWrapViaInduction calls.
At the moment, proveNoUnsignedWrapViaInduction may be called for the
same AddRec a large number of times via getZeroExtendExpr. This can have
a severe compile-time impact for very loop-heavy code. One one
particular workload, LSR takes ~51s without this patch, almost
exlusively in proveNoUnsignedWrapViaInduction. With this patch, the time
in LSR drops to ~0.4s.

If proveNoUnsignedWrapViaInduction failed to prove NUW the first time,
it is unlikely to succeed on subsequent tries and the cost doesn't seem
to be justified.

Besides drastically improving compile-time in some excessive cases, this
also has a slightly positive compile-time impact on CTMark:

NewPM-O3: -0.07%
NewPM-ReleaseThinLTO: -0.08%
NewPM-ReleaseLTO-g: -0.06

https://llvm-compile-time-tracker.com/compare.php?from=b435da027d7774c24cdb8c88d09f6b771e07fb14&to=f2729e33e8284b502f6c35a43345272252f35d12&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D130648
2022-07-28 10:02:19 +01:00
Max Kazantsev 2d1c6e0b44 [LAA] Remove block order sensitivity in LAA algorithm. PR56672
As test in PR56672 shows, LAA produces different results which lead to either
positive or negative vectorization decisions depending on the order of blocks
in loop. The exact reason of this is not clear to me, however this makes investigation
of related bugs extremely complex.

Current order of blocks in the loop is arbitrary. It may change, for example, if loop
info analysis is dropped and recomputed. Seems that it interferes with LAA's logic.
This patch chooses fixed traversal order of blocks in loops, making it RPOT.

Note: this is *not* a fix for bug with incorrect analysis result. It just makes
the answer more robust to make the investigation easier.

Differential Revision: https://reviews.llvm.org/D130482
Reviewed By: aeubanks, fhahn
2022-07-28 13:36:56 +07:00
Paul Kirth 6e9bab71b6 Revert "[llvm][NFC] Refactor code to use ProfDataUtils"
This reverts commit 300c9a7881.

We will reland once these issues are ironed out.
2022-07-27 21:38:11 +00:00
Paul Kirth 300c9a7881 [llvm][NFC] Refactor code to use ProfDataUtils
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.

Reviewed By: bogner, davidxl

Differential Revision: https://reviews.llvm.org/D128860
2022-07-27 21:13:54 +00:00
Stanislav Mekhanoshin 0562cf442f Allow data prefetch into non-default address space
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.

This does not currently affect any known targets, but seems to be
generally useful for the future.

Differential Revision: https://reviews.llvm.org/D129795
2022-07-27 10:01:26 -07:00
Martin Sebor 4447603616 [InstCombine] Fold strtoul and strtoull and avoid PR #56293
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D129224
2022-07-26 14:11:40 -06:00
Sanjay Patel dcd09467b0 [InstSimplify] remove redundant calls to 'isImplied'; NFCI
We already call the more general isImpliedCondition() (which calls
isImpliedTrueByMatchingCmp() internally) from simplifyAndInst()
and simplifyOrInst().

There was a difference visible with this change on a vector test
before a925bef70c, but I can't find any gaps now.
2022-07-26 14:47:21 -04:00
Arthur Eubanks 2eade1dba4 [WPD] Use new llvm.public.type.test intrinsic for potentially publicly visible classes
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.

Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.

To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D128955
2022-07-26 08:01:08 -07:00
Sinan Lin 7a2f5dca09 [CodeMetrics] use hasOneLiveUse instead of hasOneUse while analyzing inlinable callsites
It would be better for CodeMetrics to use hasOneLiveUse while analyzing
static and called once callsites, since inline cost now uses
hasOneLiveUse instead of hasOneUse to avoid overpessimization on dead
constant cases (since this patch https://reviews.llvm.org/D109294).

This change has no noticeable influence now, but it helps improve the
accuracy of cost models of passes that use CodeMetrics.

Reviewed By: fhahn, nikic

Differential Revision: https://reviews.llvm.org/D130461
2022-07-26 13:46:19 +08:00
Augie Fackler 85063090e9 MemoryBuiltins: remove malloc-family funcs from list
We no longer need specialized knowledge of these allocator functions in
this file since we have the correct attributes available now.

As far as I can tell the changes in the attributor tests are due to
things getting more consistent on alloc-family once we remove the static
list entries.

The two test changes in NewGVN merit extra scrutiny: NewGVN appears to
be _extremely_ sensitive to the inaccessiblememonly for reasons that
are beyond me. As a result, I had-enumerated all the attributes on
allocation functions in those two tests instead of using -inferattrs.
I assumed that the two -disable-simplify-libcalls tests there no
longer are sensible since the function declaration now includes all the
relevant attributes.

Differential Revision: https://reviews.llvm.org/D130107
2022-07-25 17:29:01 -04:00
Benjamin Kramer 5fde785186 [ValueTracking] Fix unused variable warning in release builds. NFC 2022-07-25 13:28:32 +02:00
Peter Waller f8919d2f7e [NFC][GVN] Put phi-translation of 'add' behind a switch
The code in this `#if 0` block appears to be a net benefit. Put it
behind a switch defaulting to off to support experimentation and as a
request for comment.

The codegen impact of enabling this that I'm currently persuing is that
it allows PRE to take place more frequently, particularly in loops with
second order recurrences.

Preliminary experimental data:

Across LNT on AArch64, 54 benchmarks are sped up by >1%, and 42 are
regressed by >1%, the geomean (exec_time_enabled / exec_time_disabled)
of these 96 "1% or greater significance" benchmarks is 0.991. For the
full set of 770 benchmarks it's 0.998.

There are two benchmarks which experience a >30% speedup, and the worst
slowdown is ~12%, and for every benchmark with a slowdown there is a
benckmark which is sped up by a greater factor.

Differential Revision: https://reviews.llvm.org/D130241
2022-07-25 07:59:47 +00:00
Max Kazantsev a053f35990 [SCEV][NFC][CT] Cheaper handling of guards in isBasicBlockEntryGuardedByCond
Handle guards uniformly with assumes, rather than iterating through all
block instructions in attempt to find them.

Differential Revision: https://reviews.llvm.org/D129874
Reviewed By: nikic
2022-07-25 13:38:59 +07:00
Kazu Hirata b5188591a0 [llvm] Remove redundaunt virtual specifiers (NFC)
Identified with modernize-use-override.
2022-07-24 21:50:35 -07:00
Kazu Hirata acf648b5e9 Use llvm::less_first and llvm::less_second (NFC) 2022-07-24 16:21:29 -07:00
Sanjay Patel a925bef70c [ValueTracking] allow vector types in isImpliedCondition()
The matching of constants assumed integers, but we can handle
splat vector constants seamlessly with m_APInt.
2022-07-24 17:46:48 -04:00
Kazu Hirata 97718180d7 [Analysis] Remove a redundant return statement (NFC)
Identified with readability-redundant-control-flow.
2022-07-23 11:35:19 -07:00
Malhar Jajoo 41958f76d8 [Costmodel] Add "type-based-intrinsic-cost" cli option
This patch adds a command line flag to be able to test
the type based cost-model analysis for Intrinsics.

Differential Revision: https://reviews.llvm.org/D129109
2022-07-22 15:50:57 +01:00