llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	28b31d9ccc	[ValueLattice] Move getCompare() out of line (NFC) This is a fairly large method that is unlikely to benefit from inlining.	2022-11-02 10:33:44 +01:00
Nikita Popov	41dba9e6a3	[AA] Remove some overloads (NFC) Having all these instruction-specific overloads does not seem to provide any compile-time benefit, so drop them in favor of the generic methods accepting "const Instruction *". Only leave behind the per-instruction AAQI overloads, which are part of the internal implementation.	2022-11-02 10:21:10 +01:00
Philip Reames	9472a810ed	Address post commit review feedback from D137046 It was pointed out the verifier rejects inttoptr and ptrtoint casts with inputs and outputs whose scalability doesn't match. As such, checking the input type separately from the type of the cast itself is redundant.	2022-11-01 13:36:13 -07:00
Philip Reames	2e999b7dd1	Allow scalable vectors in ComputeNumSignBits and isKnownNonNull This is a follow up to D136470 which extends the same scheme used there to ComputeNumSignBits and isKnownNonNull. As a reminder, for scalable vectors we track a single bit which is implicitly broadcast to all lanes. We do not know how many lanes there are statically, and thus have to be conservative along paths which require exact sizes. Differential Revision: https://reviews.llvm.org/D137046	2022-11-01 09:29:42 -07:00
Nikita Popov	45143240b2	[AA] Add missing const qualifier (NFC)	2022-11-01 16:17:18 +01:00
Patrick Walton	01859da84b	[AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory(). The pointsToConstantMemory() method returns true only if the memory pointed to by the memory location is globally invariant. However, the LLVM memory model also has the semantic notion of locally-invariant: memory that is known to be invariant for the life of the SSA value representing that pointer. The most common example of this is a pointer argument that is marked readonly noalias, which the Rust compiler frequently emits. It'd be desirable for LLVM to treat locally-invariant memory the same way as globally-invariant memory when it's safe to do so. This patch implements that, by introducing the concept of a ModRefInfo mask. A ModRefInfo mask is a bound on the Mod/Ref behavior of an instruction that writes to a memory location, based on the knowledge that the memory is globally-constant memory (in which case the mask is NoModRef) or locally-constant memory (in which case the mask is Ref). ModRefInfo values for an instruction can be combined with the ModRefInfo mask by simply using the & operator. Where appropriate, this patch has modified uses of pointsToConstantMemory() to instead examine the mask. The most notable optimization change I noticed with this patch is that now redundant loads from readonly noalias pointers can be eliminated across calls, even when the pointer is captured. Internally, before this patch, AliasAnalysis was assigning Ref to reads from constant memory; now AA can assign NoModRef, which is a tighter bound. Differential Revision: https://reviews.llvm.org/D136659	2022-10-31 13:03:41 -07:00
Philip Reames	93798fb740	Address post commit style comment from `087bb0f`	2022-10-31 11:16:14 -07:00
Geza Lore	d5e59e99f4	[ValueTracking] Improve performance of programUndefinedIfUndefOrPoison (NFC) programUndefinedIfUndefOrPoison used to eagerly propagate the fact that a value is poison to the users of the value. The problem is that if the value has a lot of uses (orders of magnitude more than the scanning limit we use in this function), then we spend the bulk of our time in eagerly propagating the poison property, which we will mostly never use later anyway due to the scanning limit. I have a test case (of ~50k lines of machine generated C++), where this results in ~60% of 35s compilation time being spent doing just this eager propagation. This patch changes programUndefinedIfUndefOrPoison to only propagate to instructions actually visited, looking back to see if their operands are poison. This should be equivalent and no functional change is intended, but we regain virtually all of the 60% compilation time spent in this function in my test case (i.e.: a 2.5x total compilation speedup). Differential Revision: https://reviews.llvm.org/D137027	2022-10-31 10:20:11 +01:00
Nikita Popov	efbb4d0245	[BasicAA] Include MayBeCrossIteration in cache key Rather than switching to a new AAQI instance with empty cache when MayBeCrossIteration is toggled, include the value in the cache key. The implementation redundantly include the information in both sides of the pair, but that seems simpler than trying to store it only on one side. Differential Revision: https://reviews.llvm.org/D136175	2022-10-31 09:59:42 +01:00
Philip Reames	35a1161c24	[ValueTracking] Assert known bits sanity in isKnownNonZero These are the same asserts we have in other query routines; cover this interface too.	2022-10-30 10:53:52 -07:00
Simon Pilgrim	55a11b542e	[VectorUtils] Add getShuffleDemandedElts helper We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly. Differential Revision: https://reviews.llvm.org/D136832	2022-10-30 17:03:55 +00:00
Philip Reames	087bb0f1fe	Allow scalable vectors in computeKnownBits This extends the computeKnownBits analysis to support scalable vectors. The critical detail is in deciding how to represent the demanded elements of a vector whose length is unknown at compile time. For this patch, I adopt the convention that we track one bit which corresponds to all lanes. That is, that bit is implicitly broadcast to all lanes of the scalable vector resulting in all lanes being demanded. This is the same convention we use in getSplatValue in SelectionDAG. Note that this convention doesn't actually impact much. Most of the code is agnostic to the interpretation of the demanded elements, and the few cases which actually care need case by case handling anyways. In this patch, I just bail out of those cases. A prior patch (D128159) proposed using a different convention in SDAG. I don't see any strong reason to prefer one scheme over the other, so I propose we go with this one as it's conceptually the simplest. Getting known and demanded bit optimizations unblocked at all is a significant win. I've locally implemented this scheme in reasonable large parts of ValueTracking.cpp and SelectionDAG equivalents, and have not hit any blockers. If this is approved, I plan to post a series of patches plumbing this through all the relevant parts. In the discussion on that patch, a preference was expressed for introducing some form of abstraction around the demanded elements. I'll note that I've played with several variations on that idea locally, and have yet to find anything which results in more readable code. If anyone has concrete ideas in this area, I'm happy to explore in follow up patches. I'd strongly prefer to be making API changes in NFC manner with tests in place. Differential Revision: https://reviews.llvm.org/D136470	2022-10-30 08:44:37 -07:00
Nikita Popov	8e5f57d738	[BasicAA] Remove redundant libcall handling The writeonly attribute for memset_pattern16 (and other referenced libcalls) is being added by InferFunctionAttrs nowadays. No need to special-case it here.	2022-10-27 12:01:33 +02:00
Haojian Wu	41b1669ca5	Fix a -Wunused-const-variable warning.	2022-10-27 10:51:28 +02:00
Nikita Popov	6c269a3f89	[BasicAA] Replace VisitedPhiBBs with a single flag When looking through phis, BasicAA has to guard against the possibility that values from two separate cycle iterations are being compared -- in this case, even though the SSA values may be the same, they cannot be considered as equal. This is currently done by keeping a set of VisitedPhiBBs for any phis we looked through, and then checking whether the relevant instruction is reachable from one of the phis. This patch replaces this set with a single flag. If the flag is set, then we will not assume equality for any instruction part of a cycle. While this is nominally less accurate, it makes essentially no difference in practice. Here are the AA stats for test-suite: aa.NumMayAlias \| 3072005 \| 3072016 aa.NumMustAlias \| 337858 \| 337854 aa.NumNoAlias \| 13255345 \| 13255349 The motivation for the change is to expose the MayBeCrossIteration flag to AA users, which will allow fixing miscompiles related to incorrect handling of cross-iteration AA queries. Differential Revision: https://reviews.llvm.org/D136174	2022-10-27 10:29:41 +02:00
Philip Reames	269bc684e7	[LV][RISCV] Disable vectorization of epilogue loops Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder. In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV. In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV. As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination. As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV. Differential Revision: https://reviews.llvm.org/D136695	2022-10-25 14:28:02 -07:00
Kazu Hirata	3f8d2c917c	Ensure newlines at the end of files (NFC)	2022-10-22 09:29:40 -07:00
Arthur Eubanks	4153f989ba	[ObjCARC] Remove legacy PM versions of optimization passes This doesn't touch objc-arc-contract because that's in the codegen pipeline. However, this does move its corresponding initialize function into initializeCodegen(). Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135041	2022-10-21 13:40:54 -07:00
Nikita Popov	eb470e67c1	[ModuleSummaryAnalysis] Use helper methods to check readnone/readonly (NFC) This makes sure that this code continue working when switching to the memory attribute. A caveat here is that onlyReadsMemory() will also true for readnone. To be conservative, I'm explicitly excluding that case here.	2022-10-21 12:18:57 +02:00
Paul Walker	ab8257ca0e	[NFC] Fix a few whitespace inconsistencies.	2022-10-20 14:52:25 +00:00
Florian Hahn	1625224fbb	[SCEV] Replace assert with returning CouldNotComp in computeMaxBECountForLT. This patch removes the bail out for signed predicates and non-positive strides in howManyLessThans and updates computeMaxBECountForLT to return SCEVCouldNotCompute for signed predicates with negative strides. AFAICT bail-out was only added because computeMaxBECountForLT may not handle negative signed strides correctly. Instead of not calling computeMaxBECountForLT at all because we bail out earlier, we can instead return SCEVCouldNotCompute in computeMaxBECountForLT. The max backedge taken count will be computed as the max value of the symbolic backedge taken count. This improves precision in cases where we can compute symbolic backedge taken counts and also fixes a crash. Fixes #57818. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135667	2022-10-19 11:24:10 +01:00
Nikita Popov	747f27d97d	[AA] Rename getModRefBehavior() to getMemoryEffects() (NFC) Follow up on D135962, renaming the method name to match the new type name.	2022-10-19 11:03:54 +02:00
Nikita Popov	1a9d9823c5	[AA] Rename uses of FunctionModRefBehavior (NFC) Followup to D135962 to rename remaining uses of FunctionModRefBehavior to MemoryEffects. Does not touch API names yet, but also updates variables names FMRB/MRB to ME, to match the new type name.	2022-10-19 10:54:47 +02:00
Arthur Eubanks	743087fb63	Port print-cfg-sccs to new pass manager This is actually used, see https://discourse.llvm.org/t/use-print-callgrapg-sccs-from-opt/65782. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135718	2022-10-18 08:47:08 -07:00
Florian Hahn	a8e9742bd4	[IndVarSimplify] Clear block and loop dispositions after moving instr. Moving an instruction can invalidate the cached block dispositions of the corresponding SCEV. Invalidate the cached dispositions. Also fixes a copy-paste error in forgetBlockAndLoopDispositions where the start expression S was removed from BlockDispositions in the loop but not the current values. This was also exposed by the new test case. Fixes #58439.	2022-10-18 16:18:14 +01:00
Nikita Popov	d06131fda2	[AST] Pass BatchAA to mergeSetIn() (NFCI)	2022-10-18 16:54:55 +02:00
Nikita Popov	e162a73e41	[CFG] Add const qualifier to isPotentiallyReachableFromMany() (NFC) Accept a const pointer for StopBB. Unfortunately the worklist has to use non-const pointers due to LoopInfo interaction.	2022-10-18 10:06:07 +02:00
Daniel Sanders	021e6e05d3	[instsimplify] Move (extelt (inselt Vec, Value, Index), Index) -> Value from InstCombine As requested in https://reviews.llvm.org/D135625#3858141 Differential Revision: https://reviews.llvm.org/D136099	2022-10-17 15:22:06 -07:00
Nikita Popov	ac74e7a780	[InstSimplify] Only check self-simplify in simplifyInstruction() InstSimplify currently checks whether the instruction simplifies back to itself, and returns undef in that case. Generally, this should only occur in unreachable code. However, this was also done for the simplifyInstructionWithOperands() API. In that case, the instruction only serves as a template that provides the opcode and other non-operand data. In this case, simplifying back to the same "instruction" may be expected. This caused PR58401 in conjunction with D134954. As such, move this check into simplifyInstruction() only. The only other caller of simplifyInstructionWithOperands() also handles the self-simplification case explicitly.	2022-10-17 15:52:38 +02:00
Nikita Popov	436fb27186	[BasicAA] Support loop phis in pointsToConstantMemory() When looking for underlying objects, if we encounter one that we have already seen, then we should skip it (as it has already been checked) rather than bail out. In particular, this adds support for the case where we have a loop use of a phi recurrence.	2022-10-17 12:34:55 +02:00
Florian Hahn	16cf666bb7	[Loop] Move block and loop dispo invalidation to makeLoopInvariant. makeLoopInvariant may recursively move its operands to make them invariant, before moving the passed in instruction. Those recursively moved instructions are currently missed when invalidating block and loop dispositions. To address this, move the invalidation code to Loop::makeLoopInvariant. Fixes #58314. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135909	2022-10-14 21:58:14 +01:00
Nikita Popov	237b962031	[BasicAA] Account for cycles when checking for same select condition If we have translated across a cycle backedge, the same SSA value for the condition might be referring to two different loop iterations. Use the isValueEqualInPotentialCycles() helper to avoid assuming equality in that case.	2022-10-14 10:37:40 +02:00
Nikita Popov	03f9d0ff22	[TBAA] Model call accessing immutable type as readnone Accesses to constant memory are not observable and should be reported as readnone, not readonly. This is consistent with what we do for normal (non-call) instructions: For those, the TBAA metadata will result in pointsToConstantMemory() returning true, which will then result in a NoModRef result, not a Ref result. Differential Revision: https://reviews.llvm.org/D135864	2022-10-14 10:08:37 +02:00
Jacob Hegna	17095dfe36	Move interpreter check before modifying the allocation type.	2022-10-12 19:50:36 +00:00
Jacob Hegna	9d93a98f85	[MLGO] Force persistency in tflite buffers. When training large models, we encounter use-after-free bugs when writing to the input tensors for various MLGO models. This patch fixes the issue by marking the tensors as "persistent". Differential Revision: https://reviews.llvm.org/D135739	2022-10-12 19:50:36 +00:00
Arthur Eubanks	60e4af7ab8	[CallGraph] Port -print-callgraph-sccs to new pass manager And remove the legacy opt-specific pass. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D135487	2022-10-11 14:43:16 -07:00
Nikita Popov	884bb97dca	[MustExec][LICM] Handle latch being part of an inner cycle (PR57780) The algorithm in allLoopPathsLeadToBlock() does not handle the case where the loop latch is part of the predecessor set correctly: In this case, we may take the backedge (escaping to a different loop iteration) and not execute other latch successors. This can happen if the latch is part of an inner cycle. Fixes https://github.com/llvm/llvm-project/issues/57780. Differential Revision: https://reviews.llvm.org/D134279	2022-10-11 09:30:13 +02:00
Florian Hahn	4b599fa1ee	[SCEV] Verify block disposition cache. This extends the existing SCEV verification to catch cache invalidation issues as in #57837. The validation logic is similar to the recently added loop disposition cache validation in `bb68b2402d`. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134531	2022-10-10 20:42:19 +01:00
Shubham Narlawar	b920407cf5	[LICM] Disable thread-safety checks in single-thread model If the single-thread model is used, or the -licm-force-thread-model-single flag is specified, skip checks related to thread-safety. This means that store promotion for conditionally executed stores only requires proof of dereferenceability and writability, but not of thread-safety. For example, this enables promotion of stores to (non-constant) globals, as well as captured allocas. Fixes https://github.com/llvm/llvm-project/issues/50537. Differential Revision: https://reviews.llvm.org/D130466	2022-10-10 16:51:16 +02:00
Florian Hahn	19ad1cd5ce	Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `92f698f01f`. The updated version of the patch includes handling for non-SCEVable types. A test case has been added in `ec86e9a99b`.	2022-10-07 20:15:44 +01:00
Florian Hahn	92f698f01f	Revert "[SCEV] Support clearing Block/LoopDispositions for a single value." This reverts commit `9e931439dd`. This commit causes a crash when TSan, e.g. with https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio Reverting while I extract a reproducer and submit a fix.	2022-10-07 17:58:54 +01:00
Florian Hahn	9e931439dd	[SCEV] Support clearing Block/LoopDispositions for a single value. Extend forgetBlockAndLoopDisposition to allow clearing information for a single value. This can be useful when only a single value is changed, e.g. because the instruction is moved. We also need to clear the cached values for all SCEV users, because they may depend on the starting value's disposition. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134614	2022-10-07 16:07:17 +01:00
Bjorn Pettersson	01e1f32971	[ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen) it uses ValueTracking helpers with a CharSize/ElementSize that isn't 8, but rather 16 or 32 (to match with the size in bits of a wchar_t). Problem I've seen is that llvm::getConstantDataArrayInfo is taking both an "ElementSize" argument (basically indicating size of a char/element in bits) and an "Offset" which afaict is an offset in the unit "number of elements". Then it also use stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict is calculated in bytes. The returned Slice.Length is based on arithmetics that add/subtract variables that are having different units (bytes vs elements). Most notably I think the "StartIdx" must be scaled using the "ElementSize" to get correct results. The symptom of the above problem was seen in the wcslen-1.ll test case which miscompiled. This patch is supposed to resolve the bug by converting between bytes and elements when needed. Differential Revision: https://reviews.llvm.org/D135263	2022-10-07 15:29:32 +02:00
Nikita Popov	ccf53cae32	[ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC)	2022-10-07 11:35:55 +02:00
Nikita Popov	c5bf452022	[AA] Pass AAResults through AAQueryInfo Currently, AAResultBase (from which alias analysis providers inherit) stores a reference back to the AAResults aggregation it is part of, so it can perform recursive alias analysis queries via getBestAAResults(). This patch removes the back-reference from AAResultBase to AAResults, and instead passes the used aggregation through the AAQueryInfo. This can be used to perform recursive AA queries using the full aggregation. Differential Revision: https://reviews.llvm.org/D94363	2022-10-06 10:10:19 +02:00
Nikita Popov	6053b37e45	[AA] Thread AAQI through getModRefBehavior() (NFC) This is in preparation for D94363, as we will need AAQI to perform the recursive call to the function variant.	2022-10-06 09:57:42 +02:00
Sanjay Patel	0a1210e482	[InstSimplify] try harder to fold fmul with 0.0 operand https://alive2.llvm.org/ce/z/oShzr3 This was noted as a missing fold in D134876 (with additional examples based on issue #58046). I'm assuming that fmul with a zero operand is rare enough that the use of ValueTracking will not noticeably increase compile-time. This adjusts a PowerPC codegen test that was added with D88388 because it would get folded away and no longer provide coverage for the bug fix.	2022-10-04 11:20:01 -04:00
Sanjay Patel	7f7a0f2f83	[InstSimplify] reduce code duplication for fmul folds; NFC This is a modification of the earlier attempt from: `7b7940f9da` For fma callers, we only want to swap a 0.0 or 1.0 constant.	2022-10-04 10:29:53 -04:00
Nikita Popov	6e504d637d	[ValueTracking] Handle constant exprs in isKnownNonZero() Handle constant expressions by falling through to the general operator-based code. In particular, this adds support for bitcast and GEP expressions.	2022-10-04 11:58:07 +02:00
Nikita Popov	45dec8f5fd	[ValueTracking] Avoid known bits fallthrough for freeze (NFCI) The known bits logic should never produce a better result than the direct recursive non-zero query here, so skip the fallthrough.	2022-10-04 11:02:31 +02:00
Nikita Popov	9c0314f54e	[ValueTracking] Switch isKnownNonZero() to switch over opcodes (NFCI) The change in the assume-queries-counter.ll test is because we skip and unnecessary known bits query for arguments.	2022-10-04 10:54:28 +02:00
Florian Hahn	db720dc17c	[LAA] Use LoopAccessInfoManager in legacy pass. Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from D134606. As a side-effect this also removes printing support from LoopAccessLegacyAnalysis. Depends on D134606. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134608	2022-10-04 08:37:11 +01:00
Guozhi Wei	ded26bf6b9	[IVDescriptors] Before moving an instruction in SinkAfter checking if it is target of other instructions The attached test case can cause LLVM crash in buildVPlanWithVPRecipes because invalid VPlan is generated. FIRST-ORDER-RECURRENCE-PHI ir<%792> = phi ir<%501>, ir<%806> CLONE ir<%804> = fdiv ir<1.000000e+00>, vp<%17> // use of %17 CLONE ir<%806> = load ir<%805> EMIT vp<%17> = first-order splice ir<%792> ir<%806> // def of %17 ... There is a use before def error on %17. When vectorizer generates a VPlan, it generates a "first-order splice" instruction for a loop carried variable after its definition. All related PHI users are changed to use this "first-order splice" result, and are moved after it. The move is guided by a MapVector SinkAfter. And the content of SinkAfter is filled by RecurrenceDescriptor::isFixedOrderRecurrence. Let's look at the first PHI and related instructions %v792 = phi double [ %v806, %Loop ], [ %d1, %Entry ] %v802 = fdiv double %v794, %v792 %v804 = fdiv double 1.000000e+00, %v792 %v806 = load double, ptr %v805, align 8 %v806 is a loop carried variable, %v792 is related PHI instruction. Vectorizer will generated a new "first-order splice" instruction for %v806, and it will be used by %v802 and %v804. So %v802 and %v804 will be moved after %v806 and its "first-order splice" instruction. So SinkAfter contains %v802 -> %v806 %v804 -> %v802 It means %v802 should be moved after %v806 and %v804 will be moved after %v802. Please pay attention that the order is important. When isFixedOrderRecurrence processing PHI instruction %v794, related instructions are %v793 = phi double [ %v813, %Loop ], [ %d1, %Entry ] %v794 = phi double [ %v793, %Loop ], [ %d2, %Entry ] %v802 = fdiv double %v794, %v792 %v813 = load double, ptr %v812, align 8 This time its related loop carried variable is %v813, its user is %v802. So %v802 should also be moved after %v813. But %v802 is already in SinkAfter, because %v813 is later than %v806, so the original %v802 entry in SinkAfter is deleted, a new %v802 entry is added. Now SinkAfter contains %v804 -> %v802 %v802 -> %v813 With these data, %v802 can still be moved after all its operands, but %v804 can't be moved after %v806 and its "first-order splice" instruction. And causes use before def error. So when remove/re-insert an instruction I in SinkAfter, we should also recursively remove instructions targeting I and re-insert them into SinkAfter. But for simplicity I just bail out in this case. Differential Revision: https://reviews.llvm.org/D134083	2022-10-03 18:47:51 +00:00
Sanjay Patel	ba7da14d83	Revert "[InstSimplify] reduce code duplication for fmul folds; NFC" This reverts commit `7b7940f9da`. This missed a test update.	2022-10-03 11:21:23 -04:00
Sanjay Patel	7b7940f9da	[InstSimplify] reduce code duplication for fmul folds; NFC The constant is already commuted for an fmul opcode, but this code can be called more directly for fma, so we have to swap for that caller. There are tests in InstSimplify and InstCombine to verify that this works as expected.	2022-10-03 10:36:02 -04:00
Bjorn Pettersson	66fcdfca4d	[Analysis][SimplifyLibCalls] Refactor code related to size_t in lib func signatures. NFC Added a helper in TargetLibraryInfo to get size of "size_t" in bits, given a Module reference. The new getSizeTSize helper is using the same strategy as for example isValidProtoForLibFunc has been using in the past, assuming that the size can be derived by asking DataLayout about the size/type of a pointer to int. FortifiedLibCallSimplifier::optimizeStrpCpyChk was changed to use the new getSizeTSize helper instead of assuming that sizeof(size_t) is equal to sizeof(int*) by itself (that is the assumption used in TargetLibraryInfoImpl::getSizeTSize so the result will be the same). Having a common helper for this ensure that we use the same strategy when deriving the size of "size_t" in different parts of the code. One bonus with this refactoring (basing it on Module instead of just DataLayout) is that it makes it easier to override this for a specific target triple, in case the assumption of using getPointerSizeInBits wouldn't hold. Differential Revision: https://reviews.llvm.org/D110585	2022-10-03 12:02:50 +02:00
Sanjay Patel	4490cfbaf4	[ValueTracking] peek through fpext in isKnownNeverInfinity() https://alive2.llvm.org/ce/z/BkNoRW	2022-10-02 11:20:23 -04:00
Florian Hahn	7c0ff64b0f	[LAA] Change to function analysis for new PM. At the moment, LoopAccessAnalysis is a loop analysis for the new pass manager. The issue with that is that LAI caches SCEV expressions and modifications in a loop may impact SCEV expressions in other loops, but we do not have a convenient way to invalidate LAI for other loops withing a loop pipeline. To avoid this issue, turn it into a function analysis which returns a manager object that keeps track of the individual LAI objects per loop. Fixes #50940. Fixes #51669. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134606	2022-10-01 15:44:27 +01:00
Teresa Johnson	0d7f3464ce	[MemProf] Update metadata during inlining Update both memprof and callsite metadata to reflect inlined functions. For callsite metadata this is simply a concatenation of each cloned call's call stack with that of the inlined callsite's. For memprof metadata, each profiled memory info block (MIB) is either moved to the cloned allocation call or left on the original allocation call depending on whether its context matches the newly refined call stack context on the cloned call. We also reapply context trimming optimizations based on the refined set of contexts on each of the calls (cloned and original), via utilities in MemoryProfileInfo. Depends on D128142. Differential Revision: https://reviews.llvm.org/D128143	2022-09-30 16:46:17 -07:00
Teresa Johnson	f9403ca41e	Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-30 16:46:17 -07:00
Eric Wang	5b26f4f042	Reland "[MLGO] ML Regalloc Priority Advisor" This relands commit `8f4f26ba5b`, which was reverted in `91c96a806c` because of Buildbot failures. The previous model test is not compatible with tflite. e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041 Differential Revision: https://reviews.llvm.org/D133616	2022-09-30 16:27:26 -05:00
Nikita Popov	57f7f0d6cf	[AST] Use BatchAA in aliasesPointer() (NFC)	2022-09-30 16:22:29 +02:00
Sanjay Patel	3f906f057c	[InstSimplify] look through vector select (shuffle) in min/max fold This is an extension of the existing min/max+select fold (which already has a very large number of variations) to allow a vector shuffle because that's what we have in the motivating example from issue #42100. A couple of Alive2 checks of variants (I don't know how to generalize these in Alive): https://alive2.llvm.org/ce/z/jUFAqT And verify the PR42100 test: https://alive2.llvm.org/ce/z/3EcASf It's possible there is some generalization of the fold or a VectorCombine/SLP answer for the motivating test, but I haven't found a better/smaller solution yet. We can also add even more variants here as follow-up patches. For example, we can have shuffle followed by min/max; we also don't have this canonicalization or the reverse: https://alive2.llvm.org/ce/z/StHD9f Differential Revision: https://reviews.llvm.org/D134879	2022-09-30 08:27:00 -04:00
Florian Hahn	8ae0d9aa07	[LoopDeletion] Clear block & loop dispo cache after breaking backedge. breakLoopBackedge may remove blocks and loops. Also clear block & loop disposition to avoid the cache containing invalid blocks and loops. The coverage for the change is provided when using an ASAN build of opt to run the LoopDeletion unit tests; without the fix, pointers to invalid objects would be used. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134663	2022-09-30 11:21:58 +01:00
Mircea Trofin	91c96a806c	Revert "[MLGO] ML Regalloc Priority Advisor" This reverts commit `8f4f26ba5b`. Buildbot failures, e.g. https://lab.llvm.org/buildbot/#/builders/6/builds/14041	2022-09-29 18:26:40 -07:00
Eric Wang	8f4f26ba5b	[MLGO] ML Regalloc Priority Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information and then produce the log file. Differential Revision: https://reviews.llvm.org/D133616	2022-09-29 16:55:15 -05:00
Arthur Eubanks	0cdd671df9	[CGSCC][DevirtWrapper] Properly handle invalidating analyses for invalidated SCCs `f77342693` handled the adaptor and pass manager but missed the devirt wrapper.	2022-09-29 09:55:23 -07:00
Kazu Hirata	4e9dd21015	[ModuleInliner] Add a cost-benefit-based priority This patch teaches the module inliner a traversal order designed for the instrumentation FDO (+ThinLTO) scenario. The new traversal order prioritizes call sites in the following order: 1. Those call sites that are expected to reduce the caller size 2. Those call sites that have gone through the cost-benefit analaysis 3. The remaining call sites With this fairly simple traversal order, a large internel benchmark yields performance comparable to the bottom-up inliner -- both in terms of the execution performance and .text* sizes. Big thanks goes to Liqiang Tao for the module inliner infrastructure. I still have hacks outside this patch to prevent excessively long compilation or .text* size explosion. I'm trying to come up with acceptable solutions in near future. Differential Revision: https://reviews.llvm.org/D134376	2022-09-29 09:00:38 -07:00
Nikita Popov	aa25c92f33	[ValueTracking] Fix CannotBeOrderedLessThanZero() for fdiv (PR58046) When checking the RHS of fdiv, we should set the SignBitOnly flag, because a negative zero can become -Inf, which is ordered less than zero. Fixes https://github.com/llvm/llvm-project/issues/58046. Differential Revision: https://reviews.llvm.org/D134876	2022-09-29 17:07:48 +02:00
Vitaly Buka	01f3e2d619	[StackLifetime] More efficient loop for LivenessType::Must CFG with cycles may requires additional passes of "while (Changed)" iteration if to propagate data back from latter blocks to earlier blocks, ordered according to depth_fist. OR logic, used for ::May, converge to stable state faster then AND logic use for ::Must. Though the better solution is to switch to some some form of queue, but having that this one is good enough, I will consider to do that later. We can switch ::Must to OR logic if we calculate "may be dead" instead of direct "must be alive" and then convert values to match existing interface. Additionally it fixes correctness in "@cycle" test. Reviewed By: kstoimenov, fmayer Differential Revision: https://reviews.llvm.org/D134796	2022-09-28 16:28:45 -07:00
serge-sans-paille	16544cbe64	[iwyu] Move <cmath> out of llvm/Support/MathExtras.h Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of that header and include it when needed. No functional change intended, but there's no longer a transitive include fromMathExtras.h to cmath.	2022-09-28 20:49:01 +02:00
Vitaly Buka	07cf1a25a3	[NFC][StackLifetime] Rename local variable The next patch will require more generic name.	2022-09-27 23:25:39 -07:00
Vitaly Buka	c1bf3df576	[NFC][StackLifetime] Remove local variable	2022-09-27 23:07:03 -07:00
Philip Reames	f6d110e26f	[LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc] This is purely NFC restructure in advance of a change which actually exposes zero strides. This is mostly because I find this interface confusing each time I look at it.	2022-09-27 15:55:44 -07:00
Nikita Popov	18b7c44086	[BasicAA] Use ScopeExit to clear Visited set (NFC)	2022-09-27 10:26:30 +02:00
Florian Hahn	275bee32ad	[LoopUnroll] Forget block and loop dispositions during unrolling. After unrolling a loop, the block and loop dispositions need to be cleared. As we don't know which SCEVs in the loop/blocks may be impacted, completely clear the cache. This should also fix some cases where deleted loops remained in the LoopDispositions cache. This fixes a verification failure surfaced by D134531. I am planning on reviewing/updating the existing uses of forgetLoopDispositions to check if they should be replaced by forgetBlockAndLoopDispositions. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D134612	2022-09-27 08:49:04 +01:00
Sanjay Patel	222e1c73f3	[InstSimplify] don't commute constant expression operand in min/max calls The test shows that we would fail to consistently fold the instruction based on the max value operand. This is also the root cause for issue #57986, but I'll add an instcombine test + assert for that exact problem in another commit.	2022-09-26 16:01:09 -04:00
Sanjay Patel	e9e994838a	[InstSimplify] rearrange matching for select-of-min/max folds; NFC This makes the code a little shorter and should be easier to extend for a pattern like in issue #42100.	2022-09-26 15:02:40 -04:00
Sanjay Patel	096f1c4db4	[InstSimplify] remove redundant predicate check; NFC It's still possible that there's a simpler way to specify the conditions needed for this set of folds, but "getStrictPred" converts >= to > for example, so there's no need to explicitly check that.	2022-09-26 15:02:40 -04:00
Kazu Hirata	e2398a4d7c	[Analysis] Introduce getStaticBonusApplied (NFC) InlineCostCallAnalyzer encourages inlining of the last call to the static function by subtracting LastCallToStaticBonus from Cost. This patch introduces getStaticBonusApplied to make available the amount of LastCallToStaticBonus applied. The intent is to allow the module inliner to determine whether inlining a given call site is expected to reduce the caller size with an expression like: IC.getCost() + IC.getStaticBonusApplied() < 0 This patch does not add a use of getStaticBonus yet. Differential Revision: https://reviews.llvm.org/D134373	2022-09-25 23:21:40 -07:00
Sanjay Patel	b0bfefb6ec	[InstSimplify] fold redundant select of min/max, part 2 This extends `e5d15e1162` to handle the inverse predicates (there's probably a more elegant way to specify the preds). These patterns correspond to the existing simplify: max (min X, Y), X --> X ...and extra preds for (non)equality. The tests cycle through all 10 icmp preds for each min/max variant with 4 swapped operand patterns each (and the min/max operands are commuted in every other test within those). Some Alive2 examples to verify: https://alive2.llvm.org/ce/z/XMvEKQ https://alive2.llvm.org/ce/z/QpMChr	2022-09-25 07:06:43 -04:00
Sanjay Patel	e5d15e1162	[InstSimplify] fold redundant select of min/max This is similar to the existing simplify: max (max X, Y), X --> max X, Y ...but the select condition can be one of several predicates as shown in the tests. The tests cycle through all 10 icmp preds for each min/max variant with 4 swapped operand patterns each (and the min/max operands are commuted in every other test within those). Some Alive2 examples to verify: https://alive2.llvm.org/ce/z/lCAQm4 https://alive2.llvm.org/ce/z/kzxVXC	2022-09-24 11:34:05 -04:00
Yi Kong	32994b7357	Make MLIR model URLs cache variables This allows us to directly use the models published on Github. Differential Revision: https://reviews.llvm.org/D134566	2022-09-23 15:21:53 -07:00
Teresa Johnson	b1926f308f	Restore "[MemProf] Memprof profile matching and annotation" This reverts commit `794b7ea960`, and thus restores commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. Use a hash function (BLAKE3) instead of hash_combine/hash_code which are not guaranteed to be stable across executions. Additionally, it adds a "REQUIRES: x86_64-linux" to the tests that have raw profile inputs to avoid failures on big endian bots. Reviewers: snehasish, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D128142	2022-09-23 11:38:47 -07:00
Simon Pilgrim	c94cbc343e	Fix gcc warning about ambiguous if-else chain Fixes warnings introduced by D111968	2022-09-23 14:36:28 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Nikita Popov	6c6b48434e	[BasicAA] Clean up calculation of FMRB from attributes The current implementation for call sites is pretty convoluted when you take the underlying implementation of the used APIs into account. We will query the call site attributes, and then fall back to the function attributes while taking into account operand bundles. However, getModRefBehavior() already has it's own (more accurate) logic for combining call-site FMRB with function FMRB. Clean this up by extracting a function that only fetches FMRB from attributes, which can be directly used in getModRefBehavior() for functions, and needs to be combined with an operand-bundle respecting fallback in the call site case. One caveat (that makes this non-NFC) is that CallBase function attribute lookups allow using attributes from functions with mismatching signature. To ensure we don't regress quality, do the same for the function FMRB fallback.	2022-09-23 12:05:35 +02:00
Teresa Johnson	794b7ea960	Revert "[MemProf] Memprof profile matching and annotation" This reverts commit `a212d8da94`, and follow on fixes `0cd6763fa9`, `e9ff53d42f`, and `37c6a25e9a`. After re-reading the documentation for hash_combine, I don't think this is the appropriate hash function to use for computing the hash to use as a stack id in the metadata, since it is not guaranteed to produce stable values across executions. I have not hit this problem, but plan to switch to using an MD5 hash. I am hitting an issue with one of the bots (https://lab.llvm.org/buildbot/#/builders/171/builds/20732) where the values produced are only the lower 32 bits of the expected hash values, however, which I assume is related to the implementation of hash_combine and hash_code. I believe I fixed all of the other bot failures with the follow on fixes, which I'll merge into the new version before reapplying.	2022-09-22 16:08:03 -07:00
Arthur Eubanks	a8f1da128d	[LazyCallGraph] Handle spurious ref edges when deleting a dead function Spurious ref edges are ref edges that still exist in the call graph even though the corresponding IR reference no longer exists. This can cause issues when deleting a dead function which has a spurious ref edge pointed at it because currently we expect the dead function's RefSCC to be trivial. In the case that the dead function's RefSCC is not trivial, remove all ref edges from other nodes in the RefSCC to it. Removing a ref edge can result in splitting RefSCCs. There's actually no reason to revisit those RefSCCs because currently we only run passes on SCCs, and we've already added all SCCs in the RefSCC to the worklist. (as opposed to removing the ref edge in updateCGAndAnalysisManagerForPass() which can modify the call graph of SCCs we have not visited yet). We also don't expect that RefSCC refinement will allow us to glean any more information for optimization use. Also, doing so would drastically increase the complexity of LazyCallGraph::removeDeadFunction(), requiring us to return a list of invalidated RefSCCs and new RefSCCs to add to the worklist. Fixes #56503 Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D133907	2022-09-22 15:01:15 -07:00
Teresa Johnson	37c6a25e9a	[MemProf] Fix buildbot error due to hasValue deprecation warning Use has_value instead of hasValue to address a deprecation warning from `a212d8da94`. E.g.: https://lab.llvm.org/buildbot/#/builders/57/builds/22166/steps/5/logs/stdio	2022-09-22 13:26:53 -07:00
Teresa Johnson	a212d8da94	[MemProf] Memprof profile matching and annotation Profile matching and IR annotation for memprof profiles. See also related RFCs: RFC: Sanitizer-based Heap Profiler [1] RFC: A binary serialization format for MemProf [2] RFC: IR metadata format for MemProf [3]* * Note that the IR metadata format has changed from the RFC during implementation, as described in the preceeding patch adding the basic metadata and verification support. The matching is performed during the normal PGO annotation phase, to ensure that the inlines applied in the IR at that point are a subset of the inlines in the profiled binary and thus reflected in the profile's call stacks. This is important because the call frames are associated with functions in the profile based on the inlining in the symbolized call stacks, and this simplifies locating the subset of profile data relevant for matching onto each function's IR. The PGOInstrumentationUse pass is enhanced to perform matching for whatever combination of memprof and regular PGO profile data exists in the profile. Using the utilities introduced in D128854: The memprof profile data for each context is converted to "cold" or "notcold" based on parameterized thresholds for size, access count, and lifetime. The memprof allocation contexts are trimmed to the minimal amount of context required to uniquely identify whether the context is cold or not cold. For allocations where all profiled contexts have the same allocation type, no memprof metadata is attached and instead the allocation call is directly annotated with an attribute specifying the alloction type. This is the same attributed that will be applied to allocation calls once cloned for different contexts, and later used during LibCall simplification to emit allocation hints [4]. Depends on D128141 and D128854. [1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html [2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html [3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165 [4] `ab87cf382d` Differential Revision: https://reviews.llvm.org/D128142	2022-09-22 12:48:31 -07:00
Nikita Popov	60990d9042	[BasicAA] Move experimental.guard modelling to getModRefBehavior() While we can't express this with attributes yet, we can model these intrinsics as readonly + writing inaccessible memory (for the control dependence) in FMRB. This way we don't need to special-case them in getModRefInfo(), it falls out of the usual logic.	2022-09-22 16:48:51 +02:00
Nikita Popov	ab25ea6d35	[AA] Model operand bundles more precisely Based on D130896, we can model operand bundles more precisely. In addition to the baseline ModRefBehavior, a reading/clobbering operand bundle may also read/write all locations. For example, a memcpy with deopt bundle can read any memory, but only write argument memory. This means that getModRefInfo() for memcpy with a pointer that does not alias the arguments results in Ref, rather than ModRef, without the need to implement any special handling. Differential Revision: https://reviews.llvm.org/D130980	2022-09-22 11:15:20 +02:00
Nikita Popov	41dde5d858	[InstSimplify] Support vectors in simplifyWithOpReplaced() We can handle vectors inside simplifyWithOpReplaced(), as long as cross-lane operations are excluded. The equality can hold (or not hold) for each vector lane independently, so we shouldn't use the replacement value from other lanes. I believe the only operations relevant here are shufflevector (where all previous bugs were seen) and calls (which might use shuffle-like intrinsics and would require more careful classification). Differential Revision: https://reviews.llvm.org/D134348	2022-09-22 10:45:42 +02:00
Philip Reames	8c46881a53	[TTI] Recognize fp constants in getOperandInfo We were recognizing vectors of floats, but not scalars. That's a tad odd.	2022-09-21 14:34:34 -07:00
Arthur Eubanks	f77342693b	[CGSCC] Properly handle invalidating analyses for invalidated SCCs Currently if we mark an SCC as invalid, if we haven't set UR.UpdatedC, we won't propagate the PreservedAnalyses up to the parent pass (adaptor/pass manager). In the provided test case, we inline the function into itself then delete it as it has no users. The SCC is marked as invalid without providing a replacement UR.UpdatedC. Then the CGSCC pass manager and adaptor discard the PreservedAnalyses. Instead, handle PreservedAnalyses first before bailing due to the invalid SCC. Fixes crashes due to out of date analyses.	2022-09-21 09:50:00 -07:00
Kazu Hirata	0a0ccc85bb	[ModuleInliner] Factor out common code in InlineOrder.cpp (NFC) This patch factors out common code in InlineOrder.cpp. Without this patch, the model is to ask classes like SizePriority and CostPriority to compare a pair of call sites: bool hasLowerPriority(const CallBase L, const CallBase R) const override { while these priority classes have their own caches of priorities: DenseMap<const CallBase *, PriorityT> Priorities; This model results in a lot of duplicate code like hasLowerPriority and updateAndCheckDecreased. This patch changes the model so that priority classes just have two methods to compute a priority for a given call site and to compare two previously computed priorities (as opposed to call sites). Facilities like hasLowerPriority and updateAndCheckDecreased move to PriorityInlineOrder along with the map from call sites to their priorities. PriorityInlineOrder becomes a template class so that it can accommodate different priority classes. Differential Revision: https://reviews.llvm.org/D134149	2022-09-21 08:50:30 -07:00
Graham Hunter	3c74ed9ee3	[LAA] Fix ICE with scAddExpr in forked pointers The IR from https://github.com/llvm/llvm-project/issues/57368 results in an assert firing when trying to create a runtime check for the forked pointer. One of the forks is fine since it's loop invariant, but the other is a scAddExpr (containing a scAddRecExpr, so not invariant) when RtCheck::insert expects a scAddRecExpr. This is a simple fix to just avoid forks which aren't AddRec or loop invariant. We can allow it as a forked pointer later with more work. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133020	2022-09-21 10:27:06 +01:00
Nikita Popov	17994ed919	[MemorySSA] Remove PerformedPhiTranslation flag I believe this is no longer necessary, as the underlying problem has been fixed in a different way: Nowadays, we will adjust the location size to beforeOrAfterPointer() if the pointer is not loop invariant. This makes merging results translated across loop backedges safe. The two tests in phi-translation.ll show an improvement while still being correct: The loads in the loop no longer alias with noalias pointers, but still alias with the store in the entry block (which they originally did not -- this is the bug that PerformedPhiTranslation originally fixed). Differential Revision: https://reviews.llvm.org/D133404	2022-09-21 10:32:09 +02:00
Matt Arsenault	1a18fe65d3	Analysis: Remove redundant assertion This assert guards the same assertion inside getTypeStoreSizeInBits	2022-09-20 09:39:45 -04:00
Matt Arsenault	1e1aefbf70	Analysis: Pass AssumptionCache through isKnownNonZero Pass this through now that isDereferenceableAndAlignedPointer has access to this.	2022-09-20 09:25:18 -04:00
Mateusz Mikuła	4f30c5808a	[TargetLibraryInfo] Mark memrchr as unavailable on Windows Otherwise LLVM will optimise strrchr into memrchr on Windows resulting in linker error: ``` $ cat memrchr_test.c int main(int argc, char *argv) { return (long)strrchr("KkMm", argv[argc-1][0]); } $ clang memrchr_test.c -O memrchr_test.c:3:12: warning: cast to smaller integer type 'long' from 'char ' [-Wpointer-to-int-cast] return (long)strrchr("KkMm", argv[argc-1][0]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. ld.lld: error: undefined symbol: memrchr >>> referenced by D:/msys64/tmp/memrchr_test-e7aabd.o:(main) clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Example taken from MSYS2 Discord and tested with windows-gnu target. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D134134	2022-09-20 10:50:31 +03:00
luxufan	bfd31c6f12	[MemorySSA][NFC] Use const whenever possible Differential Revision: https://reviews.llvm.org/D134162	2022-09-20 02:21:02 +00:00
Matt Arsenault	2adae8e1b7	VectorCombine: Pass through AssumptionCache	2022-09-19 19:25:22 -04:00
Matt Arsenault	ce44357216	Analysis: Add AssumptionCache to isSafeToSpeculativelyExecute Does not update any of the uses.	2022-09-19 19:25:22 -04:00
Matt Arsenault	0d8ffcc532	Analysis: Add AssumptionCache argument to isDereferenceableAndAlignedPointer This does not try to pass it through from the end users.	2022-09-19 18:57:33 -04:00
Nikita Popov	36f325413e	[SCEV] Don't verify dispositions of invalid loops This should fix the expensive checks build. Ideally we would not have invalid loops in LoopDispositions.	2022-09-19 15:07:44 +02:00
Max Kazantsev	bb68b2402d	[SCEV] Verify contents of loop disposition cache It seems that it is sometimes broken. Initial motivation for this was investigation of https://github.com/llvm/llvm-project/issues/56260, but it also seems that we have found an unrelated bug in LoopFusion that leaves broken caches. Differential Revision: https://reviews.llvm.org/D134158 Reviewed By: nikic	2022-09-19 17:43:00 +07:00
Max Kazantsev	818b1ab84e	[SCEV][NFC] Remove unused parameter from forgetLoopDispositions Let's be honest about it, we don't drop loop dispositions for particular loops. Remove the parameter that misleadingly makes it apparent that we do.	2022-09-19 14:06:42 +07:00
Kazu Hirata	71b12030b9	[ModuleInliner] Capitalize a variable name (NFC)	2022-09-18 14:35:09 -07:00
Kazu Hirata	82293ed486	[ModuleInliner] Remove unused using declarations (NFC)	2022-09-18 14:27:06 -07:00
Kazu Hirata	00d982699b	[ModuleInliner] Move getInlineCostWrapper to an anonymous namespace (NFC) This patch moves getInlineCostWrapper to an anonymous namespace. While I am at it, I'm moving the function closer to the beginning of the file so that I can use it elsewhere in the file without a forward declaration.	2022-09-18 14:09:21 -07:00
Kazu Hirata	d3b95ecc98	[ModuleInliner] Remove InlineOrder::front (NFC) InlineOrder::front is a remnant from the era when we had a nested "while" loops in the module inliner, with the inner one grouping the call sites with the same caller. Now that we have a simple "while" loop draining the priority queue, we can just use InlineOrder::pop. Differential Revision: https://reviews.llvm.org/D134121	2022-09-18 08:49:44 -07:00
Kazu Hirata	cf355bf36e	[Analysis] Introduce isSoleCallToLocalFunction (NFC) We check to see if a given CallBase is a sole call to a local function at multiple places in InlineCost.cpp. This patch factors out the common code. Differential Revision: https://reviews.llvm.org/D134114	2022-09-17 20:59:54 -07:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Kazu Hirata	5faf4bf195	[ModuleInliner] Move UseInlinePriority to InlineOrder.cpp (NFC) UseInlinePriority specifies the priority function. This patch simplifies the code by moving UseInlinePriority closer to the actual consumer -- the switch statement inside getInlineOrder. Differential Revision: https://reviews.llvm.org/D134100	2022-09-17 11:41:28 -07:00
Kazu Hirata	6e30a9cc08	[Inliner] Retire DefaultInlineOrder (NFC) DefaultInlineOrder was largely an exercise in generalizing the traversal order of call sites within the inliner. Now that the module inliner is starting to form its shape, there is no point in sharing DefaultInlineOrder between the module inliner and the CGSCC inliner. DefaultInlineOrder and all the other inline orders are mutually exclusive in the following sense: - The use of DefaultInlineOrder doesn't make sense in the module inliner because there is no priority inherent in the order in which call sites are added to the list of call sites -- SmallVector. - The use of any other inline order doesn't make sense in the CGSCC inliner because little prioritization can be done within one CGSCC. This patch essentially reverts the addition of DefaultInlineOrder so that the loop structure of Inliner.cpp looks like the state just before we started working on the module inliner (circa June 2021). At the same time, ww remove the choice of DefaultInlineOrder from UseInlinePriority. Differential Revision: https://reviews.llvm.org/D134080	2022-09-16 15:36:40 -07:00
Kazu Hirata	e0bc76eb23	[ModuleInliner] Move InlinePriority and its derived classes to InlineOrder.cpp (NFC) These classes are referred to only from getInlineOrder in InlineOrder.cpp. This patch hides the entire class declarations and definitions in InlineOrder.cpp. Differential Revision: https://reviews.llvm.org/D134056	2022-09-16 12:32:16 -07:00
Nikita Popov	b1cd393f9e	[AA] Tracking per-location ModRef info in FunctionModRefBehavior (NFCI) Currently, FunctionModRefBehavior tracks whether the function reads or writes memory (ModRefInfo) and which locations it can access (argmem, inaccessiblemem and other). This patch changes it to track ModRef information per-location instead. To give two examples of why this is useful: * D117095 highlights a weakness of ModRef modelling in the presence of operand bundles. For a memcpy call with deopt operand bundle, we want to say that it can read any memory, but only write argument memory. This would allow them to be treated like any other calls. However, we currently can't express this and have to say that it can read or write any memory. * D127383 would ideally be modelled as a separate threadid location, where threadid Refs outside pre-split coroutines can be ignored (like other accesses to constant memory). The current representation does not allow modelling this precisely. The patch as implemented is intended to be NFC, but there are some obvious opportunities for improvements and simplification. To fully capitalize on this we would also want to change the way we represent memory attributes on functions, but that's a larger change, and I think it makes sense to separate out the FunctionModRefBehavior refactoring. Differential Revision: https://reviews.llvm.org/D130896	2022-09-14 16:34:41 +02:00
Nikita Popov	1cfbbba15b	[AA] Remove unnecessary intersections from getModRefBehavior() (NFC) Intersection with other providers is performed by AAResults. Doing this here is both pointless and confusing.	2022-09-14 14:26:39 +02:00
Nikita Popov	31cc0ab321	[BasicAA] Delay getAllocTypeSize() call (NFC) This call is expensive, so don't perform it for zero indices. Also rename the variable to use Alloc rather than Alloca, this doesn't have anything to do with allocas in particular.	2022-09-13 10:24:50 +02:00
Matthias Gehre	c1502425ba	Move TargetTransformInfo::maxLegalDivRemBitWidth -> TargetLowering::maxSupportedDivRemBitWidth Also remove new-pass-manager version of ExpandLargeDivRem because there is no way yet to access TargetLowering in the new pass manager. Differential Revision: https://reviews.llvm.org/D133691	2022-09-12 17:06:16 +01:00
zhongyunde	8a15695be2	[AA] Improve the BasicAA analysis capability According https://discourse.llvm.org/t/memoryssa-does-the-accessedbetween-support-scalable-vector-pointer/65052, scalable vector support in BasicAA is currently essentially limited, and should be improved effectively for a constant offset GEP if the scalable index is zero, eg: getelementptr <vscale x 4 x i32>, ptr %p, i64 0, i64 %i Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D133567	2022-09-12 19:41:17 +08:00
Junduo Dong	6975ab7126	[Clang] Reimplement time tracing of NewPassManager by PassInstrumentation framework The previous implementation of time tracing in NewPassManager is direct but messive. The key codes are like the demo below: ``` /// Runs the function pass across every function in the module. PreservedAnalyses run(LazyCallGraph::SCC &C, CGSCCAnalysisManager &AM, LazyCallGraph &CG, CGSCCUpdateResult &UR) { /// ... PreservedAnalyses PassPA; { TimeTraceScope TimeScope(Pass.name()); PassPA = Pass.run(F, FAM); } /// ... } ``` It can be bothered to judge where should we add the tracing codes by hands. With the PassInstrumentation framework, we can easily add `Before/After` callback functions to add time tracing codes. Differential Revision: https://reviews.llvm.org/D131960	2022-09-11 05:42:55 -07:00
Aiden Grossman	ec83c7e358	[MLGO] Make TFLiteUtils throw an error if some features haven't been passed to the model In the Tensorflow C lib utilities, an error gets thrown if some features haven't gotten passed into the model (due to differences in ordering which now don't exist with the transition to TFLite). However, this is not currently the case when using TFLiteUtils. This patch makes some minor changes to throw an error when not all inputs of the model have been passed, which when not handled will result in a seg fault within TFLite. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D133451	2022-09-10 22:59:03 +00:00
Nikita Popov	a9f312c7f4	[AST] Use BatchAA in aliasesUnknownInst() (NFCI)	2022-09-09 15:54:48 +02:00
Mircea Trofin	a219a8a822	[mlgo][nfc] Set logging level to warning or higher for TFLite	2022-09-08 12:10:56 -07:00
Nikita Popov	98a3a340c3	[ConstantExpr] Don't create fneg expressions Don't create fneg expressions unless explicitly requested by IR or bitcode.	2022-09-07 11:27:25 +02:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Sanjay Patel	a8fcb51242	[InstSimplify] allow poison/undef in constant match for "C - X ==/!= X -> false/true" This fold was added with `5e9522c311`, but over-specified. We can assume that an undef element is an odd number: https://alive2.llvm.org/ce/z/djQmWU	2022-09-06 08:19:30 -04:00
eopXD	ea3630e8d4	[CMake][MLGO] Fix cmake for MLGO The if-statement should check whehter TFLITE is on or not rather than if the variable is specified. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132902	2022-09-06 00:32:08 -07:00
Arthur Eubanks	7e3aa8f01a	Revert "[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests" This reverts commit `57fd866551`. Causes crashes, see comments in D132581.	2022-09-05 15:42:48 -07:00
LiaoChunyu	456c7ef68e	[InstSimplify][NFC] shortened the code	2022-09-05 23:57:53 +08:00
LiaoChunyu	5e9522c311	[InstSimplify] Odd - X ==/!= X -> false/true Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D132989	2022-09-05 23:51:45 +08:00
Kazu Hirata	3850edd9e0	Use llvm::count_if (NFC)	2022-09-03 11:17:35 -07:00
Simon Pilgrim	e2d140e9c3	[TTI] Add isExpensiveToSpeculativelyExecute wrapper CGP uses a raw `getInstructionCost(I, TargetTransformInfo::TCK_SizeAndLatency) >= TCC_Expensive` check to see if its better to move an expensive instruction used in a select behind a branch instead. This is causing issues with upcoming improvements to TCK_SizeAndLatency costs on X86 as we need to use TCK_SizeAndLatency as an uop count (so its compatible with various target-specific buffer sizes - see D132288), but we can have instructions that have a low TCK_SizeAndLatency value but should still be treated as 'expensive' (FDIV for example) - by adding a isExpensiveToSpeculativelyExecute wrapper we can keep the current behaviour but still add an x86 override in a future patch when the cost tables are updated to compensate.	2022-09-03 13:12:22 +01:00
Arthur Eubanks	57fd866551	[LoopPassManager] Implement and use LoopNestAnalysis::run() instead of manually creating LoopNests The current code is basically just emulating what the analysis manager does. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D132581	2022-09-02 10:55:53 -07:00
Rong Xu	0caa4a9559	[PGO] Support PGO annotation of CallBrInst We currently instrument CallBrInst but do not annotate it with the branch weight. This patch enables PGO annotation of CallBrInst. Differential Revision: https://reviews.llvm.org/D133040	2022-09-01 14:13:50 -07:00
Nikita Popov	4f046bc8e0	[PHITranslateAddr] Require dominance when searching for translated address (PR57025) This is a fix for PR57025 and an alternative to D131776. The problem in the phi-translation-to-wrong-context.ll test case is that phi translation of %gep.j into if2 pick %gep.i as the result. While this instruction has the correct pointer address, it occurs in a context where %i != 0. As such, we get a NoAlias result for the store in if2, even though they do alias for %i == 0 (which is legal in the original context of the pointer). PHITranslateValue already has a MustDominate option, which can be used to restrict PHI translation results to values that dominate the translated-into block. However, this is more aggressive than what we need and would significantly regress GVN results. In particular, if we have a pointer value that does not require any translation, then it is fine to continue using that value in the predecessor, because the context is still correct for the original query. We only run into problems if PHITranslateSubExpr() picks a completely random instruction in a context that may have preconditions that do not hold. Fix this by always performing the dominance checks in PHITranslateSubExpr(), without enabling the more general MustDominate requirement. Fixes https://github.com/llvm/llvm-project/issues/57025. This also fixes the test case for https://github.com/llvm/llvm-project/issues/30999, but I'm not sure whether that's just the particular test case, or a general solution to the problem. Differential Revision: https://reviews.llvm.org/D132935	2022-09-01 16:26:42 +02:00
Pavel Samolysov	88581db62f	[LazyCallGraph] Reformat the code in accordance with the code style. NFC Also, some local variables were renamed in accordance with the code style as well as `std::tie` occurrences and `.first`/`.second` member uses were replaced with structure bindings. Differential Revision: https://reviews.llvm.org/D132806	2022-08-30 11:06:42 +03:00
Kazu Hirata	6ed2cb4ad5	Revert "[llvm] Use llvm::is_contained (NFC)" This reverts commit `ebf574f59a`. This patch seems to cause build failures on Windows.	2022-08-28 18:52:49 -07:00
Kazu Hirata	ebf574f59a	[llvm] Use llvm::is_contained (NFC)	2022-08-28 17:35:03 -07:00
Kazu Hirata	ce9f007c7c	[llvm] Use llvm::find_if (NFC)	2022-08-28 10:41:48 -07:00
Kazu Hirata	7a617fdf39	Use std::gcd (NFC) This patch replaces calls to GreatestCommonDivisor64 with std::gcd where both arguments are known to be of unsigned types no larger than 64 bits in size.	2022-08-27 21:20:59 -07:00
Arthur Eubanks	7a94d189ad	[LazyCallGraph] Update libcall list when replacing a libcall node's function Otherwise when we visit all libcalls in updateCGAndAnalysisManagerForPass(), the old libcall is dead and doesn't have a node. We treat libcalls conservatively in LazyCallGraph because any function may introduce calls to them out of thin air. It is weird to change the signature of a libcall since introducing calls to the libcall with a different signature may break, but other passes like deadargelim already do it, so let's preserve this behavior for now. Fixes an issue found in D128830. Reviewed By: psamolysov Differential Revision: https://reviews.llvm.org/D132764	2022-08-27 10:57:53 -07:00
Mircea Trofin	3546b5c520	[mlgo] Fix flaky test The source of the flakyness is internal uninitialized buffers due to a dangling variable in the model.	2022-08-26 21:29:25 -07:00
Florian Hahn	9405af1c85	[LAA] Require AddRecs to be in the innermost loop for diff-checks. The simpler diff-checks require pointers with add-recs from the same innermost loop, but this property wasn't check completely. Add the missing check to ensure both addrecs are in the innermost loop. Fixes #57315.	2022-08-26 20:39:52 +01:00
Philip Reames	86b67a310d	[LAA] Prune dependencies with distance large than access implied by trip count When we have a dependency with a dependence distance which can only be hit on an iteration beyond the actual trip count of the loop, we can ignore that dependency when analyzing said loop. We already had this code, but had restricted it solely to unknown dependence distances. This change applies it to all dependence distances. Without this code, we relied on the vectorizer reducing VF such that our infeasible dependence was respected. This usually worked out to about the same result, but not always. For fixed length vectorization, this could mean a smaller VF than optimal being chosen or additional runtime checks. For scalable vectorization - where the bounds on access implied by VF are broader - we could often not find a feasible VF at all. Differential Revision: https://reviews.llvm.org/D131924	2022-08-25 14:24:13 -07:00
Sanjay Patel	4e44c22c97	[ValueTracking][InstCombine] restrict FP min/max matching to avoid miscompile This is a long-standing FIXME with a non-FMF test that exposes the bug as shown in issue #57357. It's possible that there's still a way to miscompile by mis-identifying/mis-folding FP min/max patterns, but this patch only exposes a couple of seemingly minor regressions while preventing the broken transform.	2022-08-25 16:52:40 -04:00
Florian Hahn	c035efc814	[LAA] Cache PSE.getSE() in variable (NFC). Preparation for follow-up patches will introduce additional uses of SE.	2022-08-25 21:40:22 +01:00

1 2 3 4 5 ...

11972 Commits