Commit Graph

32158 Commits

Author SHA1 Message Date
Chuanqi Xu 2c61848c9d [Coroutines] Handle the writes to promise alloca prior to
llvm.coro.begin

Previously we've taken care of the writes to allocas prior to llvm.coro.begin.
However, since the promise alloca is special so that we never handled it
before. For the long time, since the programmers can't access the
promise_type due to the c++ language specification, we still failed to
recognize the problem until a recent report:
https://github.com/llvm/llvm-project/issues/57861

And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin()
prope until a recent report:
https://github.com/llvm/llvm-project/issues/57861

And we've tested many codes that the problem gone away after we handle
the writes to the promise alloca prior to @llvm.coro.begin() properly.

Closes https://github.com/llvm/llvm-project/issues/57861
2022-11-18 15:39:39 +08:00
Alexey Bataev 07015e12f0 [SLP]Fix PR59053: trying to erase instruction with users.
Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.
2022-11-17 17:23:48 -08:00
Fangrui Song 6b852ffa99 [Sink] Process basic blocks with a single successor
This condition seems unnecessary.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93511
2022-11-18 01:23:12 +00:00
Fangrui Song fc91c70593 Revert D135411 "Add generic KCFI operand bundle lowering"
This reverts commit eb2a57ebc7.

llvm/include/llvm/Transforms/Instrumentation/KCFI.h including
llvm/CodeGen is a layering violation. We should use an approach where
Instrumementation/ doesn't need to include CodeGen/.
Sorry for not spotting this in the review.
2022-11-17 22:45:30 +00:00
Sami Tolvanen eb2a57ebc7 Add generic KCFI operand bundle lowering
The KCFI sanitizer emits "kcfi" operand bundles to indirect
call instructions, which the LLVM back-end lowers into an
architecture-specific type check with a known machine instruction
sequence. Currently, KCFI operand bundle lowering is supported only
on 64-bit X86 and AArch64 architectures.

As a lightweight forward-edge CFI implementation that doesn't
require LTO is also useful for non-Linux low-level targets on
other machine architectures, add a generic KCFI operand bundle
lowering pass that's only used when back-end lowering support is not
available and allows -fsanitize=kcfi to be enabled in Clang on all
architectures.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D135411
2022-11-17 21:55:00 +00:00
Mengxuan Cai cd58333a62 [LoopInterchange] Refactor and rewrite validDepInterchange()
The current code of validDepInterchange() enumerates cases that are
legal for interchange. This could be simplified by checking
lexicographically order of the swapped direction matrix.

Reviewed By: congzhe, Meinersbur, bmahjour

Differential Revision: https://reviews.llvm.org/D137461
2022-11-17 13:41:02 -05:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Evgeniy Brevnov 50f8eb05af Revert "[JT] Preserve exisiting BPI/BFI during JumpThreading"
This reverts commit 52a4018506.
2022-11-17 17:11:47 +07:00
Evgeniy Brevnov 52a4018506 [JT] Preserve exisiting BPI/BFI during JumpThreading
Currently, JT creates and updates local instances of BPI\BFI. As a result global ones have to be invalidated if JT made any changes.
In fact, JT doesn't use any information from BPI/BFI for the sake of the transformation itself. It only creates BPI/BFI to keep them up to date. But since it updates local copies (besides cases when it updates profile metadata) it just waste of time.

Current patch is a rework of D124439. D124439 makes one step and replaces local copies with global ones retrieved through AnalysisPassManager. Here we do one more step and don't create BPI/BFI if the only reason of creation is to keep BPI/BFI up to date. Overall logic is the following. If there is cached BPI/BFI then update it along the transformations. If there is no existing BPI/BFI, then create it only if it is required to update profile metadata.

Please note if BPI/BFI exists on exit from JT (either cached or created) it is always up to date and no reason to invalidate it.

Differential Revision: https://reviews.llvm.org/D136827
2022-11-17 17:00:00 +07:00
Fangrui Song 12050a3fb7 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.

Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.

See windows-vftable.ll: Windows vftable uses an alias pointing to a
private constant where the alias is the COMDAT leader. The COMDAT use case
is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT.
This patch retains the behavior.

See new tests ctor-dtor-alias2.ll: depending on whether the complete object
destructor emitted, when ctor/dtor aliases are used, we may see D0/D2 COMDATs in
one TU and D0/D1/D2 in a D5 COMDAT in another TU.
Allow such a mix-and-match with `if (GO->getComdat()->getName() == GO->getName()) NonPrevailingComdats.insert(GO->getComdat());`

GlobalAlias handling in ThinLTO is still weird, but this patch should hopefully
improve the situation for at least all cases I can think of.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-11-16 22:13:22 -08:00
Fangrui Song 2c239da691 Revert D135427 "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 8901635423.

This change broke the following example and we need to check `if (GO->getComdat()->getName() == GO->getName())`
before `NonPrevailingComdats.insert(GO->getComdat());`
Revert for clarify.

```
// a.cc
template <typename T>
struct A final { virtual ~A() {} };
extern "C" void aa() { A<int> a; }
// b.cc
template <typename T>
struct A final { virtual ~A() {} };
template struct A<int>;
extern "C" void bb(A<int> *a) { delete a; }

clang -c -fpic -O0 -flto=thin a.cc && ld.lld -shared a.o b.o
```
2022-11-16 21:43:50 -08:00
Florian Hahn 55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn aa16689f82
[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC).
Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:03:34 +00:00
Florian Hahn 239b52d4b6
[VPlan] Update stale comment (NFC).
Update comment to reflect current code, which also allows for
VPScalarIVStepsRecipes to be uniform.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:39:50 +00:00
Florian Hahn bcc9c5d959
[LV] Replace unnecessary cast_or_null with cast (NFC).
The existing code already unconditionally dereferences RepR, so
cast_or_null can be replaced by just cast.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:31:59 +00:00
Florian Hahn 32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00
Alexey Bataev 9f9fdab9f1 [SLP]Fix PR58766: deleted value used after vectorization.
If same instruction is reduced several times, but in one graph is part
of buildvector sequence and in another it is vectorized, we may loose
information that it was part of buildvector and must be extracted from
later vectorized value.
2022-11-16 10:57:03 -08:00
Alexey Bataev 2f8f17c157 [SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs.
If the graph is only the buildvector node without main operation, need
to inherit insrtpoint from the redution instruction. Otherwise the
compiler crashes trying to insert instruction at the entry block.
2022-11-16 07:38:49 -08:00
OCHyams 4898568caa [Assignment Tracking][11/*] Update RemoveRedundantDbgInstrs
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Update the RemoveRedundantDbgInstrs utility to avoid sometimes losing
information when deleting dbg.assign intrinsics.

removeRedundantDbgInstrsUsingBackwardScan - treat dbg.assign intrinsics that
are not linked to any instruction just like dbg.values. That is, in a block of
contiguous debug intrinsics, delete all other than the last definition for a
fragment. Leave linked dbg.assign intrinsics in place.

removeRedundantDbgInstrsUsingForwardScan - Don't delete linked dbg.assign
intrinsics and don't delete the next intrinsic found even if it would otherwise
be eligible for deletion.

remomveUndefDbgAssignsFromEntryBlock - Delete undef and unlinked dbg.assign
intrinsics encountered in the entry block that come before non-undef
non-unlinked intrinsics for the same variable.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133294
2022-11-16 12:27:18 +00:00
Paul Kirth 0cc8752fa1 Revert "[pgo] Avoid introducing relocations by using private alias"
This reverts commit 2b8917f8ad.

This breaks with lld and gold
2022-11-16 03:38:14 +00:00
eopXD c0ef83e3b9 [LSR] Check if terminating value is safe to expand before transformation
According to report by @JojoR, the assertion error was hit hence we need
to have this check before the actual transformation.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D136415
2022-11-15 14:56:47 -08:00
Arthur Eubanks 70dc3b811e [AggressiveInstCombine] Remove legacy PM pass
As part of legacy PM optimization pipeline removal.

This shouldn't be used in codegen pipelines so it should be ok to remove.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D137116
2022-11-15 14:35:15 -08:00
Alexey Bataev 0a33ceee01 [SLP]Fix a crash on analysis of the vectorized node.
Need to use advanced check for the same vectorized node to avoid
possible compiler crash. We may have 2 similar nodes (vector one and
gather) after graph nodes rotation, need to do extra checks for the
exact match.
2022-11-15 13:40:28 -08:00
Mikhail Goncharov 4a77d96903 [LegacyPM] remove unset variables in PassManagerBuilder
D137915 stopped setting this variables but NewGVN was still used and caused asan failure

Differential Revision: https://reviews.llvm.org/D138034
2022-11-15 17:57:44 +01:00
Paul Kirth 2b8917f8ad [pgo] Avoid introducing relocations by using private alias
Instead of using the public, interposable symbol, we can use a private
alias and avoid relocations and addends.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D137982
2022-11-15 16:05:24 +00:00
OCHyams 139e08efc5 [Assignment Tracking][23/*] Account for assignment tracking in SLP Vectorizer
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

The SLP-Vectorizer can merge a set of scalar stores into a single vectorized
store. Merge DIAssignID intrinsics from the scalar stores onto the new
vectorized store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133320
2022-11-15 15:20:18 +00:00
Fraser Cormack cfb0d628a7 [MergeICmps][NFC] Fix a couple of typos in a comment 2022-11-15 14:46:23 +00:00
OCHyams bfa7f62412 [Assignment Tracking][20/*] Account for assignment tracking in DSE
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

DeadStoreElimmination shortens stores that are shadowed by later stores such
that the overlapping part of the earlier store is omitted. Insert an unlinked
dbg.assign intrinsic with a variable fragment that describes the omitted part
to signal that that fragment of the variable has a stale value in memory.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133315
2022-11-15 13:42:56 +00:00
Sameer Sahasrabuddhe 376d0469b9 [AAPointerInfo] refactor how offsets and Access objects are tracked
This restores commit b756096b0c, which was
originally reverted in 00b09a7b18.

AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:

- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access

A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.

Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:

- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
  in the callgraph:

```
               A (local inst)
               |
               B
              / \
             C1  C2
              \ /
               D (remote inst)

```
This fixes a bug where a store is incorrectly eliminated in a lit test.

Reviewed By: jdoerfert, ye-luo

Differential Revision: https://reviews.llvm.org/D136526
2022-11-15 18:52:11 +05:30
OCHyams 98562e8bb7 [Assignment Tracking][19/*] Account for assignment tracking in ADCE
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

In an attempt to preserve more info, don't delete dbg.assign intrinsics that
are considered "out of scope" if they're linked to instructions.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133314
2022-11-15 13:13:38 +00:00
OCHyams 2da67e8053 [Assignment Tracking][18/*] Account for assignment tracking in LICM
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Merge DIAssignID attachments on stores that are merged and sunk out of
loops. The store may be sunk into multiple exit blocks, and in this case all
the copies of the store get the same DIAssignID.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133313
2022-11-15 12:24:16 +00:00
OCHyams e292f91291 [Assignment Tracking][17/*] Account for assignment tracking in memcpyopt
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Maintain and propagate DIAssignID attachments in memcpyopt.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133312
2022-11-15 11:51:10 +00:00
OCHyams 98c1d11492 [Assignment Tracking][16/*] Account for assignment tracking in mldst-motion
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

mldst-motion will merge and sink the stores in if-diamond branches into the
common successor. Attach a merged DIAssignID to the merged store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133311
2022-11-15 11:28:20 +00:00
OCHyams 0946e463e8 [Assignment Tracking][12/*] Account for assignment tracking in mem2reg
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

The changes for assignment tracking in mem2reg don't require much of a
deviation from existing behaviour. dbg.assign intrinsics linked to an alloca
are treated much in the same way as dbg.declare users of an alloca, except that
we don't insert dbg.value intrinsics to describe assignments when there is
already a dbg.assign intrinsic present, e.g. one linked to a store that is
going to be removed.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133295
2022-11-15 11:11:57 +00:00
Arthur Eubanks cbcf123af2 [LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.

Part of removing the legacy pass manager for optimization pipeline.

Reland with UseNewGVN usage in clang removed.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D137915
2022-11-14 09:38:17 -08:00
Arthur Eubanks d7c1427953 Revert "[LegacyPM] Remove cl::opts controlling optimization pass manager passes"
This reverts commit 7ec05fec71.

Breaks bots, e.g. https://lab.llvm.org/buildbot#builders/217/builds/15008
2022-11-14 09:33:38 -08:00
Arthur Eubanks 7ec05fec71 [LegacyPM] Remove cl::opts controlling optimization pass manager passes
Move these to the new PM if they're used there.

Part of removing the legacy pass manager for optimization pipeline.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D137915
2022-11-14 09:23:17 -08:00
Nikita Popov 6d98f3a6df [LoopVersioningLICM] Clarify scope of AST (NFC)
Make it clearer that the AST is only temporarily used during the
legality check, and does not have to survive into the transformation
phase.
2022-11-14 15:28:59 +01:00
Nikita Popov 47eddbbf33 [LoopVersioningLICM] Remove unnecessary reset code (NFC)
The LoopVersioningLICM object is only ever used for a single loop,
but there was various unnecessary code for handling the case where
it is reused across loops. Drop that code, and pass the loop to the
constructor.
2022-11-14 15:18:26 +01:00
HanSheng Zhang 200f3410cd [reg2mem] Skip non-sized Instructions (PR58890)
We can only convert sized values into alloca/load/store, skip
instructions returning other types.

Fixes https://github.com/llvm/llvm-project/issues/58890.

Differential Revision: https://reviews.llvm.org/D137700
2022-11-14 12:47:39 +01:00
Nikita Popov 42d9261417 [ConstraintElimination] Use SmallVectorImpl (NFC)
When passing a SmallVector by reference, don't specify its size.
2022-11-14 11:01:15 +01:00
Sebastian Neubauer ce879a03c9 [Coroutines] Do not add allocas for retcon coroutines
Same as for async-style lowering, if there are no resume points in a
function, the coroutine frame pointer will be replaced by an undef,
making all accesses to the frame undefinde behavior.

Fix this by not adding allocas to the coroutine frame if there are no
resume points.

Differential Revision: https://reviews.llvm.org/D137866
2022-11-14 10:46:46 +01:00
Nikita Popov e82b5b5bbd [ConstraintElimination] Add Decomposition struct (NFCI)
Replace the vector of DecompEntry with a struct that stores the
constant offset separately. I think this is cleaner than giving the
first element special handling.

This probably also fixes some potential ubsan errors by more
consistently using addWithOverflow/multiplyWithOverflow.
2022-11-14 10:44:16 +01:00
Nikita Popov 30982a595d [ConstraintElimination] Make decompose() infallible
decompose() currently returns a mix of {} and 0 + 1*V on failure.
This changes it to always return the 0 + 1*V form, thus making
decompose() infallible.

This makes the code marginally more powerful, e.g. we now fold
sub_decomp_i80 by treating the constant as a symbolic value.

Differential Revision: https://reviews.llvm.org/D137847
2022-11-14 10:42:04 +01:00
Dmitry Makogon 10ab29ec6e [IRCE] Bail out if AddRec in icmp is for another loop (PR58912)
When IRCE runs on outer loop and sees a check of an AddRec of
inner loop, it crashes with an assert in SCEV that the AddRec
must be loop invariant.
This adds a bail out if the AddRec which is checked in icmp
is for another loop.

Fixes https://github.com/llvm/llvm-project/issues/58912.

Differential Revision: https://reviews.llvm.org/D137822
2022-11-14 15:06:13 +07:00
luxufan 98eb917939 [LoopFlatten] Forget all block and loop dispositions after flatten
Method forgetLoop only forgets expression of phi or its users. SCEV
expressions except the above mentioned may still has loop dispositions
that point to the destroyed loop, which might cause a crash.

Fixes: https://github.com/llvm/llvm-project/issues/58865

Reviewed By: nikic, fhahn

Differential Revision: https://reviews.llvm.org/D137651
2022-11-14 10:19:11 +08:00
Florian Hahn 7854a1abfd
[SimpleLoopUnswitch] Forget SCEVs for replaced phis.
Forget SCEVs based on exit phis in case SCEV looked through the phi.
After unswitching, it may not be possible to look through the phi due to
it having multiple incoming values, so it needs to be re-computed.

Fixes #58868
2022-11-13 17:38:39 +00:00
Sanjay Patel 362c23500a Revert "[InstCombine] allow more folds for multi-use selects (2nd try)"
This reverts commit 6eae6b3722.

This version of the patch results in the same DFSAN bot failure as before,
so my guess about the SimplifyQuery context instruction was wrong.
I don't know what the real bug is.
2022-11-13 11:47:21 -05:00
Sanjay Patel 6eae6b3722 [InstCombine] allow more folds for multi-use selects (2nd try)
The 1st try ( 681a6a3990 ) was reverted because it caused
a DataFlowSanitizer bot failure.

This try modifies the existing calls to simplifyBinOp() to
not use a query that sets the context instruction because
that seems like a likely source of failure. Since we already
try those simplifies with multi-use patterns in some cases,
that means the bug is likely present even without this patch.

However, I have not been able to reduce a test to prove that
this was the bug, so if we see any bot failures with this patch,
then it should be reverted again.

The reduced simplify power does not affect any optimizations
in existing, motivating regression tests.

Original commit message:

The 'and' case showed up in a recent bug report and prevented
more follow-on transforms from happening.

We could handle more patterns (for example, the select arms
simplified, but not to constant values), but this seems
like a safe, conservative enhancement. The backend can
convert select-of-constants to math/logic in many cases
if it is profitable.

There is a lot of overlapping logic for these kinds of patterns
(see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()),
so there may be some opportunity to improve efficiency.

There are also optimization gaps/inconsistency because we do
not call this code for all bin-opcodes (see TODO for ashr test).
2022-11-13 10:28:06 -05:00
Michał Górny f6f1fd443f Revert "[InstCombine] allow more folds more multi-use selects"
This reverts commit 681a6a3990.
It broke sanitizer tests (as seen on buildbots), see:
https://reviews.llvm.org/rG681a6a399022#1143137
2022-11-13 07:27:01 +01:00
Mengxuan Cai ec210f3942 [LoopFuse] Ensure inner loops are in loop simplified form under new PM
LoopInfo doesn't give all loops in a loop nest, it gives top level loops
only. While isLoopSimplifyForm() only checkes for the outter most loop of a
loop nest. As a result, inner loops that are not in simplied form can
not be simplified with the original code.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D137672
2022-11-11 15:55:59 -05:00
Sanjay Patel 681a6a3990 [InstCombine] allow more folds more multi-use selects
The 'and' case showed up in a recent bug report and prevented
more follow-on transforms from happening.

We could handle more patterns (for example, the select arms
simplified, but not to constant values), but this seems
like a safe, conservative enhancement. The backend can
convert select-of-constants to math/logic in many cases
if it is profitable.

There is a lot of overlapping logic for these kinds of patterns
(see SimplifySelectsFeedingBinaryOp() and FoldOpIntoSelect()),
so there may be some opportunity to improve efficiency.

There are also optimization gaps/inconsistency because we do
not call this code for all bin-opcodes (see TODO for ashr test).
2022-11-11 15:26:54 -05:00
Jordan Rupprecht 81896f88ce [NFC] Remove unused OrigLoopID vars 2022-11-11 07:51:40 -08:00
Florian Hahn 2d7e5e29b7
[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC).
The argument is not used any longer and can be removed.
2022-11-11 15:39:08 +00:00
Nikita Popov 4d33cf4166 [MemCpyOpt] Avoid moving lifetime marker above def (PR58903)
This is unlikely to happen with opaque pointers, so just bail out
of the transform, rather than trying to move bitcasts/etc as well.

Fixes https://github.com/llvm/llvm-project/issues/58903.
2022-11-11 15:06:34 +01:00
Fangrui Song 8901635423 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.

Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.

See windows-vftable.ll: Windows vftable uses an alias pointing to a
private constant where the alias is the COMDAT leader. The COMDAT use case
is skeptical and ThinLTO does not discard the alias in the non-prevailing COMDAT.
This patch retains the behavior.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-11-10 21:54:43 -08:00
Alan Zhao 885e6105b4 Revert "[LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally"
This reverts commit 89ddcff1d2.

Reason: This breaks bootstrapping builds of LLVM on Windows using
ThinLTO; see https://crbug.com/1382839
2022-11-10 17:48:18 -08:00
William Huang bd2b5ec803 [InstCombine] PR58901 - fix bug with swapping GEP of different types
Fix https://github.com/llvm/llvm-project/issues/58901 by adding stricter check whether non-opaque GEP can be swapped. This will not affect GEP swapping optimization in the future since we are switching to opaque GEP

Reviewed By: clin1

Differential Revision: https://reviews.llvm.org/D137752
2022-11-10 20:24:41 +00:00
Sanjay Patel b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Alexey Bataev b505fd559d [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-10 10:59:54 -08:00
Wu, Yingcong 7f07c4d513 [SanitizerCoverage] Fix wrong pointer type return from CreateSecStartEnd()
`CreateSecStartEnd()` will return pointer to the input type, so when called with `CreateSecStartEnd(M, SanCovCFsSectionName, IntptrPtrTy)`, `SecStartEnd.first` and `SecStartEnd.second` will have type `IntptrPtrPtrTy`, not `IntptrPtrTy`.

This problem should not impact the functionality and with opaque pointer enable, this will not trigger any alarm. But if runs with `-no-opaque-pointers`, this mismatch pointer type will cause type check assertion in `CallInst::init()` to fail.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D137310
2022-11-09 23:29:04 -08:00
wlei 47b0758049 [SampleFDO] Persist profile staleness metrics into binary
With https://reviews.llvm.org/D136627, now we have the metrics for profile staleness based on profile statistics, monitoring the profile staleness in real-time can help user quickly identify performance issues. For a production scenario, the build is usually incremental and if we want the real-time metrics, we should store/cache all the old object's metrics somewhere and pull them in a post-build time. To make it more convenient, this patch add an option to persist them into the object binary, the metrics can be reported right away by decoding the binary rather than polling the previous stdout/stderrs from a cache system.

For implementation, it writes the statistics first into a new metadata section(llvm.stats) then encode into a special ELF `.llvm_stats` section. The section data is formatted as a list of key/value pair so that future statistics can be easily extended. This is also under a new switch(`-persist-profile-staleness`)

In terms of size overhead, the metrics are computed at module level, so the size overhead should be small, measured on one of our internal service, it costs less than < 1MB for a 10GB+ binary.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D136698
2022-11-09 22:34:33 -08:00
OCHyams 23bb4735ca [Assignment Tracking][10/*] salvageDebugInfo for dbg.assign intrinsics
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Plumb in salvaging for the address part of dbg.assign intrinsics.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133293
2022-11-09 11:49:46 +00:00
Nikita Popov ce2f9ba2c9 [SCCP] Add helper for getting constant range (NFC)
Add a helper for the recurring pattern of getting a constant range
if the value lattice element is one, or a full range otherwise.
2022-11-09 12:42:36 +01:00
OCHyams a9025f57ba [Assignment Tracking][8/*] Add DIAssignID merging utilities
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

Add method:

  Instruction::mergeDIAssignID(
      ArrayRef<const Instruction* > SourceInstructions)

which merges the DIAssignID metadata attachments on `SourceInstructions` and
`this` and replaces uses of the original IDs with the new shared one.

This is used when stores are merged, for example sinking stores out of a
if-diamond CFG or vectorizing contiguous stores.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133291
2022-11-09 10:46:04 +00:00
Akira Hatanaka 295861514e [ObjC][ARC] Fix non-deterministic behavior in ProvenanceAnalysis
ProvenanceAnalysis::relatedCheck was giving different answers depending
on the order in which the pointers were passed.

Specifically, it was returning different values when A and B were both
loads and were both referring to identifiable objects, but only one was
used by a store instruction.
2022-11-08 15:05:25 -08:00
OCHyams fd16ff3a7e Reapply: [NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h
Reverted in b22d80dc6a.

Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the
Assignment Tracking patch stack and remove redundant parameter Src.

Reviewed By: jryans

Differential Revision: https://reviews.llvm.org/D132357
2022-11-08 16:25:39 +00:00
Alexey Bataev b5d91ab73e [SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis.
Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash.

Differential Revision: https://reviews.llvm.org/D137639
2022-11-08 07:54:57 -08:00
skc7 42bce72536 Reapply "[SLP] Extend reordering data of tree entry to support PHInodes".
Reapplies 87a2086 (which was reverted in 656f1d8).
Fix for scalable vectors in getInsertIndex merged in 46d53f4.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137537
2022-11-08 21:21:28 +05:30
Nathan James 6aa050a690 Reland "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 632a389f96.

This relands commit
1834a310d0.

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 14:15:15 +00:00
Nathan James 632a389f96 Revert "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 1834a310d0.
2022-11-08 13:11:41 +00:00
skc7 46d53f45d8 [SLP][NFC] Restructure getInsertIndex
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137567
2022-11-08 18:07:50 +05:30
Nathan James 1834a310d0
[llvm][NFC] Use c++17 style variable type traits
This was done as a test for D137302 and it makes sense to push these changes

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 12:22:52 +00:00
Dmitry Makogon ebac59999f [SimpleLoopUnswitch] Skip trivial selects in guards conditions unswitch candidates
We do this for conditional branches, but not for guards for some reason.

Fixes pr58666.

Differential Revision: https://reviews.llvm.org/D137249
2022-11-08 13:29:27 +07:00
Matt Arsenault e661185fb3 InstCombine: Fold fdiv nnan x, 0 -> copysign(inf, x)
https://alive2.llvm.org/ce/z/gLBFKB
2022-11-07 22:00:15 -08:00
skc7 9d96feb19b [SLP][NFC] Restructure areTwoInsertFromSameBuildVector
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137569
2022-11-08 09:32:19 +05:30
Shubham Sandeep Rastogi b22d80dc6a Revert "[NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h"
This reverts commit 80378a4ca7.

    I am reverting this patch because I need to revert 171f7024cc and without reverting this patch, reverting 171f7024cc causes conflicts.

     Patch 171f7024cc introduced a cyclic dependancy in the module build.

        https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/48197/consoleFull#-69937453049ba4694-19c4-4d7e-bec5-911270d8a58c

        In file included from <module-includes>:1:
        /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:18:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR
                 ^
        While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
        While building module 'LLVM_IR' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:
        In file included from <module-includes>:12:
        /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/DebugInfo.h:24:10: fatal error: could not build module 'LLVM_intrinsic_gen'
         ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
        While building module 'LLVM_MC' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:
        In file included from <module-includes>:15:
        In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCContext.h:23:
        /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/MC/MCPseudoProbe.h:57:10: fatal error: could not build module 'LLVM_IR'
         ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~
        /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/MC/MCAsmInfoCOFF.cpp:14:10: fatal error: could not build module 'LLVM_MC'
         ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
        4 errors generated.
2022-11-07 15:19:04 -08:00
Fangrui Song 89ddcff1d2 [LTO] Make local linkage GlobalValue in non-prevailing COMDAT available_externally
For a local linkage GlobalObject in a non-prevailing COMDAT, it remains defined while its
leader has been made available_externally. This violates the COMDAT rule that
its members must be retained or discarded as a unit.

To fix this, update the regular LTO change D34803 to track local linkage
GlobalValues, and port the code to ThinLTO (GlobalAliases are not handled.)

This fixes two problems.

(a) `__cxx_global_var_init` in a non-prevailing COMDAT group used to
linger around (unreferenced, hence benign), and is now correctly discarded.
```
int foo();
inline int v = foo();
```

(b) Fix https://github.com/llvm/llvm-project/issues/58215:
as a size optimization, we place private `__profd_` in a COMDAT with a
`__profc_` key. When FuncImport.cpp makes `__profc_` available_externally due to
a non-prevailing COMDAT, `__profd_` incorrectly remains private. This change
makes the `__profd_` available_externally.

```
cat > c.h <<'eof'
extern void bar();
inline __attribute__((noinline)) void foo() {}
eof
cat > m1.cc <<'eof'
#include "c.h"
int main() {
  bar();
  foo();
}
eof
cat > m2.cc <<'eof'
#include "c.h"
__attribute__((noinline)) void bar() {
  foo();
}
eof

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw

clang -O2 -fprofile-generate=./t m1.cc m2.cc -flto=thin -fuse-ld=lld -o t_gen
rm -fr t && ./t_gen && llvm-profdata show -function=foo t/default_*.profraw
```

If a GlobalAlias references a GlobalValue which is just changed to
available_externally, change the GlobalAlias as well (e.g. C5/D5 comdats due to
cc1 -mconstructor-aliases). The GlobalAlias may be referenced by other
available_externally functions, so it cannot easily be removed.

Depends on D137441: we use available_externally to mark a GlobalAlias in a
non-prevailing COMDAT, similar to how we handle GlobalVariable/Function.
GlobalAlias may refer to a ConstantExpr, not changing GlobalAlias to
GlobalVariable gives flexibility for future extensions (the use case is niche.
For simplicity we don't handle it yet). In addition, available_externally
GlobalAlias is the most straightforward implementation and retains the aliasee
information to help optimizers.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D135427
2022-11-07 10:07:10 -08:00
Miguel Saldivar de36d39e24 [InstCombine] Avoid passing pow attributes to sqrt
As described in issue #58475, we could pass the attributes of pow to sqrt and crash.

Differential Revision: https://reviews.llvm.org/D137454
2022-11-07 12:07:37 -05:00
Alexey Bataev ecd0b5a532 Revert "[SLP]Redesign vectorization of the gather nodes."
This reverts commit 8ddd1ccdf8 to fix
buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839
2022-11-07 08:35:21 -08:00
Matt Devereau a8c24d57b8 [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-07 15:39:05 +00:00
Matt Arsenault 0f68ffe1e2 InstCombine: Fold compare with smallest normal if input denormals are flushed
Try to simplify comparisons with the smallest normalized value. If
denormals will be treated as 0, we can simplify by using an equality
comparison with 0.

fcmp olt fabs(x), smallest_normalized_number -> fcmp oeq x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0
fcmp oge fabs(x), smallest_normalized_number -> fcmp one x, 0.0
fcmp ult fabs(x), smallest_normalized_number -> fcmp ueq x, 0.0

The device libraries have a few range checks that look like
this for denormal handling paths.
2022-11-07 07:16:47 -08:00
OCHyams 80378a4ca7 [NFC] Move getDebugValueLoc from static in Local.cpp to DebugInfo.h
Move getDebugValueLoc so that it can be accessed from DebugInfo.h for the
Assignment Tracking patch stack and remove redundant parameter Src.

Reviewed By: jryans

Differential Revision: https://reviews.llvm.org/D132357
2022-11-07 15:14:43 +00:00
Alexey Bataev 8ddd1ccdf8 [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-07 07:04:38 -08:00
Nikita Popov 9a45e4beed [MemCpyOpt] Move lifetime marker before call to enable call slot optimization
Currently call slot optimization may be prevented because the
lifetime markers for the destination only start after the call.
In this case, rather than aborting the transform, we should move
the lifetime.start before the call to enable the transform.

Differential Revision: https://reviews.llvm.org/D135886
2022-11-07 15:26:00 +01:00
luxufan 49143f9d14 [IndVars] Forget the SCEV when the instruction has been sunk.
In the past, the SCEV expression of the sunk instruction was not
forgetted. This led to the incorrect block dispositions after the
instruction be sunk.

Fixes https://github.com/llvm/llvm-project/issues/58662

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D137060
2022-11-06 22:46:49 +08:00
Sanjay Patel bff6880a5f [SimplifyLibCalls] improve code readability for AttributeList propagation; NFC
It is possible that we can do better on some of these transforms
by passing some subset of attributes, but we were not doing that
in any of the changed code. So it's better to give that a name
to indicate we're clearing attributes or make that more obvious
by using the default-constructed empty list.
2022-11-06 09:07:17 -05:00
Sanjay Patel 1c6ebe29d3 [InstCombine] reduce multi-use casts+masks
As noted in the code comment, we could generalize this:
https://alive2.llvm.org/ce/z/N5m-eZ

It saves an instruction even without a constant operand,
but the 'and' is wider. We can do that as another step
if it doesn't harm anything.

I noticed that this missing pattern with a constant operand
inhibited other transforms in a recent bug report, so this
is enough to solve that case.
2022-11-06 09:07:17 -05:00
David Green 656f1d8b74 Revert "[SLP] Extend reordering data of tree entry to support PHI nodes"
This reverts commit 87a20868eb as it has
problems with scalable vectors and use-list orders. Test to follow.
2022-11-06 11:43:51 +00:00
Florian Hahn a41cb8bf58
[SimpleLoopUnswitch] Forget block & loop dispos during trivial unswitch.
Unswitching adjusts the CFG in ways that may invalidate cached loop
dispositions. Clear all cached block and loop dispositions during
trivial unswitching. The same is already done for non-trivial
unswitching.

Fixes #58751.
2022-11-05 16:56:06 +00:00
chenglin.bi 6703290361 [InstCombine] fold `sub + and` pattern with specific const value
`C1 - ((C3 - X) & C2) --> (X & C2) + (C1 - (C2 & C3))`
when:
    (C3 - ((C2 & C3) - 1)) is pow2 &&
    ((C2 + C3) & ((C2 & C3) - 1)) == ((C2 & C3) - 1) &&
    C2 is negative pow2 || (C3 - X) is nuw

https://alive2.llvm.org/ce/z/HXQJV-

Fix: #58523

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D136582
2022-11-05 12:58:45 +08:00
Matthew Voss a4b543a5a5 [llvm-profdata] Check for all duplicate entries in MemOpSize table
Previously, we only checked for duplicate zero entries when merging a
MemOPSize table (see D92074), but a user recently provided a reproducer
demonstrating that other entries can also be duplicated. As demonstrated
by the test in this patch, PGOMemOPSizeOpt can potentially generate
invalid IR for non-zero, non-consecutive duplicate entries. This seems
to be a rare case, since the duplicate entry is often below the
threshold, but possible. This patch extends the existing warning to
check for any duplicate values in the table, both in the optimization
and in llvm-profdata.

Differential Revision: https://reviews.llvm.org/D136211
2022-11-04 17:08:54 -07:00
Florian Hahn 9a456b7ad3
[IndVars] Forget SCEV for replaced PHI.
Additional SCEV verification highlighted a case where the cached loop
dispositions where incorrect after simplifying a phi node in IndVars.
Fix it by invalidating the phi before replacing it.

Fixes #58750
2022-11-04 18:42:07 +00:00
Sanjay Patel 710e34e136 [VectorCombine] move load safety checks to helper function; NFC
These checks can be re-used with other potential transforms
such as a load of a subvector-insert.
2022-11-04 10:39:37 -04:00
Juan Manuel MARTINEZ CAAMAÑO 96ad51e3eb [StructurizeCFG][DebugInfo] Avoid use-after-free
Reviewed By: dstuttard

Differential Revision: https://reviews.llvm.org/D137408
2022-11-04 13:39:49 +00:00
Alex Gatea 7d0648cb6c [GVN] Patch for invalid GVN replacement
If PRE is performed as part of the main GVN pass (to PRE GEP
operands before processing loads), and it is performed across a
backedge, we will end up adding the new instruction to the leader
table of a block that has not yet been processed. When it will be
processed, GVN will incorrectly assume that the value is already
available, even though it is only available at the end of the
block.

Avoid this by not performing PRE across backedges.

Fixes https://github.com/llvm/llvm-project/issues/58418.

Differential Revision: https://reviews.llvm.org/D136095
2022-11-04 14:28:17 +01:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Nikita Popov 01ec0ff2dc [SimplifyCFG] Allow speculating block containing assume()
SpeculativelyExecuteBB(), which converts a branch + phi structure
into a select, currently bails out if the block contains an assume
(because it is not speculatable).

Adjust the fold to ignore ephemeral values (i.e. assumes and values
only used in assumes) for cost modelling purposes, and drop them
when performing the fold.

Theoretically, we could try to preserve the assume information by
generating a assume(br_cond || assume_cond) style assume, but this
is very unlikely to to be useful (because we don't do anything
useful with assumes of this form) and it would make things
substantially more complicated once we take operand bundle assumes
into account (which don't really support a || operation).
I'd prefer not to do that without good motivation.

Differential Revision: https://reviews.llvm.org/D137339
2022-11-04 09:26:35 +01:00
Congzhe Cao 75b33d6bd5 [LoopInterchange] Check phis in all subloops
This is the bugfix to the miscompile mentioned in
https://reviews.llvm.org/D132055#3814831. The IR
that reproduced the bug is added as the test case in
this patch.

What this patch does is that, during legality phase
instead of checking the phi nodes only in `InnerLoop`
and `OuterLoop`, we check phi nodes in all subloops
of the `OuterLoop`. Suppose if the loop nest is triply
nested, and `InnerLoop` and `OuterLoop` is the middle
loop and the outermost loop respectively, we'll check
phi nodes in the innermost loop as well, in addition to
the ones in the middle and outermost loops.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D134930
2022-11-04 00:20:52 -04:00
Florian Hahn 8086b0c8a8
[ConstraintElim] Drop bail out for scalable vectors after using getTrue
ConstantInt::getTrue/getFalse can materialize scalable vectors with all
lanes true/false.
2022-11-03 19:05:45 +00:00
LiDongjin d1cee3539f [LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627)
Check hasOneUser before user_back().

Differential Revision: https://reviews.llvm.org/D136227
2022-11-03 23:13:37 +08:00
Nikita Popov 68b24c3b44 [CVP] Simplify comparisons without constant operand
CVP currently only tries to simplify comparisons if there is a
constant operand. However, even if both are non-constant, we may
be able to determine the result of the comparison based on range
information.

IPSCCP is already capable of doing this, but because it runs very
early, it may miss some cases.

Differential Revision: https://reviews.llvm.org/D137253
2022-11-03 15:35:27 +01:00
bipmis 150fc73dda [AggressiveInstCombine] Avoid load merge/widen if stores are present b/w loads
This patch is to address the test cases in which the load has to be inserted at a right point. This happens when there is a store b/w the loads.

This patch reverts the loads merge in all cases when stores are present b/w loads and will eventually be replaced with proper fix and test cases.

Differential Revision: https://reviews.llvm.org/D137333
2022-11-03 14:32:07 +00:00
Nikita Popov b03f7c3365 [SimplifyCFG] Use range based for loop (NFC) 2022-11-03 15:21:45 +01:00
Nikita Popov 592a96c03b [SimplifyCFG] Extract code for tracking ephemeral values (NFC)
To allow reusing this in more places in SimplifyCFG.
2022-11-03 14:28:12 +01:00
Alexey Bataev f090e3c00f [SLP]Fix write after bounds.
Need to use comma instead of + symbol to prevent writing after bounds.
2022-11-03 05:30:41 -07:00
Alexey Bataev 8b015b2078 [SLP][NFC]Formatting and reduce number of iterations, NFC. 2022-11-03 05:30:13 -07:00
Florian Hahn 44d8f80b26
[ConstraintElim] Use ConstantInt::getTrue to create constants (NFC).
Use existing ConstantInt::getTrue/getFalse functionality instead of
custom getScalarConstOrSplat as suggested by @nikic.
2022-11-03 12:20:23 +00:00
Peter Waller e1790c8c29 Revert "[InstCombine] Remove redundant splats in InstCombineVectorOps"
This reverts commit 957eed0b1a.
2022-11-03 07:56:03 +00:00
Ye Luo 00b09a7b18 Revert "[AAPointerInfo] refactor how offsets and Access objects are tracked"
This reverts commit b756096b0c.
See regression https://github.com/llvm/llvm-project/issues/58774
2022-11-03 00:01:51 -05:00
Youling Tang d16b5c3504 [asan] Use proper shadow offset for loongarch64 in instrumentation passes
Instrumentation passes now use the proper shadow offset. There will be many
asan test failures without this patch. For example:

```
$ ./lib/asan/tests/LOONGARCH64LinuxConfig/Asan-loongarch64-calls-Test
AddressSanitizer:DEADLYSIGNAL
=================================================================
==651209==ERROR: AddressSanitizer: SEGV on unknown address 0x1ffffe2dfa9b (pc 0x5555585e151c bp 0x7ffffb9ec070 sp 0x7ffffb9ebfd0 T0)
==651209==The signal is caused by a UNKNOWN memory access.
```

Before the patch:

```
$ make check-asan
Testing Time: 36.13s
  Unsupported      : 205
  Passed           :  83
  Expectedly Failed:   1
  Failed           : 239
```

After the patch:
```
$ make check-asan
Testing Time: 58.98s
  Unsupported      : 205
  Passed           : 421
  Expectedly Failed:   1
  Failed           :  89
```

Differential Revision: https://reviews.llvm.org/D137013
2022-11-03 11:04:47 +08:00
Fangrui Song 1ada819c23 [asan] Default to -fsanitize-address-use-odr-indicator for non-Windows
This enables odr indicators on all platforms and private aliases on non-Windows.
Note that GCC also uses private aliases: this fixes bogus
`The following global variable is not properly aligned.` errors for interposed global variables

Fix https://github.com/google/sanitizers/issues/398
Fix https://github.com/google/sanitizers/issues/1017
Fix https://github.com/llvm/llvm-project/issues/36893 (we can restore D46665)

Global variables of non-hasExactDefinition() linkages (i.e.
linkonce/linkonce_odr/weak/weak_odr/common/external_weak) are not instrumented.
If an instrumented variable gets interposed to an uninstrumented variable due to
symbol interposition (e.g. in issue 36893, _ZTS1A in foo.so is resolved to _ZTS1A in
the executable), there may be a bogus error.

With private aliases, the register code will not resolve to a definition in
another module, and thus prevent the issue.

Cons: minor size increase. This is mainly due to extra `__odr_asan_gen_*` symbols.
(ELF) In addition, in relocatable files private aliases replace some relocations
referencing global symbols with .L symbols and may introduce some STT_SECTION symbols.

For lld, with -g0, the size increase is 0.07~0.09% for many configurations I
have tested: -O0, -O1, -O2, -O3, -O2 -ffunction-sections -fdata-sections
-Wl,--gc-sections. With -g1 or above, the size increase ratio will be even smaller.

This patch obsoletes D92078.

Don't migrate Windows for now: the static data member of a specialization
`std::num_put<char>::id` is a weak symbol, as well as its ODR indicator.
Unfortunately, link.exe (and lld without -lldmingw) generally doesn't support
duplicate weak definitions (weak symbols in different TUs likely pick different
defined external symbols and conflict).

Differential Revision: https://reviews.llvm.org/D137227
2022-11-02 19:21:33 -07:00
Florian Hahn 74d8628cf7
[ConstraintElimination] Skip compares with scalable vector types.
Materializing scalable vectors with boolean values is not implemented
yet. Skip those cases for now and leave a TODO.
2022-11-02 23:57:14 +00:00
Joseph Huber cccbd2a2b2 Revert "[Attributor][NFCI] Move MemIntrinsic handling into the initializer"
This was causing failures when optimizing codes with complex numbers.
Revert until a fix can be implemented.

This reverts commit 7fdf3564c0.
2022-11-02 15:28:50 -05:00
Florian Hahn b478d8b966
[ConstraintElimination] Generate true/false vectors for vector cmps.
This fixes crashes when vector compares can be simplified to true/false.
2022-11-02 18:13:35 +00:00
Rong Xu 8acb881c19 [PGO] Add a threshold for number of critical edges in PGO
For some auto-generated sources, we have a huge number of critical
edges (like from switch statements). We have seen instance of 183777
critical edges in one function.

After we split the critical edges in PGO instrumentation/profile-use
pass, the CFG is so large that we have compiler time issues in
downstream passes (like in machine CSE and block placement). Here I
add a threshold to skip PGO if the number of critical edges are too
large.

The threshold is large enough so that it will not affect the majority
of PGO compilation.

Also sync the logic for skipping instrumentation and profile-use. I
think this is the correct thing to do.

Differential Revision: https://reviews.llvm.org/D137184
2022-11-02 10:14:04 -07:00
Sanjay Patel f03b069c5b [InstCombine] fold mul with decremented "shl -1" factor (2nd try)
This is a corrected version of:
bc886e9b58

I made a copy-paste error that created an "add" instead of the
intended "sub" on that attempt. The regression tests showed the
bug, but I overlooked that.

As I said in a comment on issue #58717, the bug reports resulting
from the botched patch confirm that the pattern does occur in
many real-world applications, so hopefully eliminating the multiply
results in better code.

I added one more regression test in this version of the patch,
and here's an Alive2 proof to show that exact example:
https://alive2.llvm.org/ce/z/dge7VC

Original commit message:

This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e

I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.

Differential Revision: https://reviews.llvm.org/D137157
2022-11-02 09:30:01 -04:00
OCHyams 88d34d4626 [DebugInfo] Fix minor debug info bug in deleteDeadLoop
Using a DebugVariable as the set key rather than std::pair<DIVariable *,
DIExpression *> ensures we don't accidently confuse multiple instances of
inlined variables.

Reviewed By: jryans

Differential Revision: https://reviews.llvm.org/D133303
2022-11-02 13:28:05 +00:00
Sanjay Patel b24e2f6ef6 [InstCombine] use logical-and matcher to avoid crash
Follow-on to:
ec0b406e16

This should prevent crashing for example like issue #58552
by not matching a select-of-vectors-with-scalar-condition.

The test that shows a regression seems unlikely to occur
in real code.

This also picks up an optimization in the case where a real
(bitwise) logic op is used. We could already convert some
similar select ops to real logic via impliesPoison(), so
we don't see more diffs on commuted tests. Using commutative
matchers (when safe) might also handle one of the TODO tests.
2022-11-02 08:23:52 -04:00
Matt Devereau 957eed0b1a [InstCombine] Remove redundant splats in InstCombineVectorOps
Splatting the first vector element of the result of a BinOp, where any of the
BinOp's operands are the result of a first vector element splat can be simplified to
splatting the first vector element of the result of the BinOp

Differential Revision: https://reviews.llvm.org/D135876
2022-11-02 11:57:05 +00:00
Max Kazantsev 2baabd2c19 [LoopPredication][NFCI] Perform 'visited' check before pushing to worklist
This prevents duplicates to be pushed into the stack and hypothetically
should reduce memory footprint on ugly cornercases with multiple repeating
duplicates in 'and' tree.
2022-11-02 18:04:23 +07:00
Nikita Popov 134bda4b61 [ValueLattice] Use DL-aware folding in getCompare()
Use DL-aware ConstantFoldCompareInstOperands() API instead of
ConstantExpr API. The practical effect of this is that SCCP can
now fold comparisons that require DL.
2022-11-02 10:41:11 +01:00
Nikita Popov 7ad3bb527e Reapply [ValueLattice] Fix getCompare() for undef values
Relative to the previous attempt, this also updates the
ValueLattice unit tests.

-----

Resolve the TODO about incorrect getCompare() behavior. This can
be made more precise (e.g. by materializing the undef value and
performing constant folding on it), but for now just return an
unknown result to fix the correctness issue.

This should be NFC in terms of user-visible behavior, because the
only user of this method (SCCP) was already guarding against
UndefValue results.
2022-11-02 10:23:06 +01:00
Bjorn Pettersson 44bb4099cd [ConstraintElimination] Do not crash on vector GEP in decomposeGEP
Commit 359bc5c541 caused
 Assertion `isa<To>(Val) && "cast<Ty>() argument of incompatible type!"'
failures in decomposeGEP when the GEP pointer operand is a vector.

Fix is to use DataLayout::getIndexTypeSizeInBits when fetching the
index size, as it will use the scalar type in case of a ptr vector.

Differential Revision: https://reviews.llvm.org/D137185
2022-11-02 10:04:11 +01:00
Fangrui Song 87d98a7300 [hwasan] Remove no-op setDSOLocal. NFC
PrivateLinkage and HiddenVisibility are implicitly dso_local.
2022-11-02 00:17:16 -07:00
Johannes Doerfert 7fdf3564c0 [Attributor][NFCI] Move MemIntrinsic handling into the initializer
The handling for MemIntrinsic is not dependent on optimistic
information, no need to put it in update for now. Added a TODO though.
2022-11-01 20:37:53 -07:00
Johannes Doerfert f89deef13e [Attributor][NFC] Hide verbose output behind `attributor-verbose` 2022-11-01 20:37:53 -07:00
Sanjay Patel ec0b406e16 [InstCombine] use logical-or matcher to avoid crash
This should prevent crashing for the example in issue #58552
by not matching a select-of-vectors-with-scalar-condition.

A similar change is likely needed for the related fold to
properly fix that kind of bug.

The test that shows a regression seems unlikely to occur
in real code.

This also picks up an optimization in the case where a real
(bitwise) logic op is used. We could already convert some
similar select ops to real logic via impliesPoison(), so
we don't see more diffs on commuted tests. Using commutative
matchers (when safe) might also handle one of the TODO tests.
2022-11-01 16:47:41 -04:00
Arthur Eubanks 8c49b01a1e [GlobalOpt] Don't remove inalloca from varargs functions
Varargs and inalloca have a weird interaction where varargs are actually
passed via the inalloca alloca. Removing inalloca breaks the varargs
because they're still not passed as separate arguments.

Fixes #58718

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D137182
2022-11-01 13:04:05 -07:00
Florian Hahn e83e289fd1
[SCEVExpander] Forget SCEV when replacing congruent phi.
Otherwise there may be stale values left over in the cache, causing a
SCEV verification failure.

Fixes #58702.
2022-11-01 17:49:38 +00:00
Nikita Popov 094d1298d1 Revert "[ValueLattice] Fix getCompare() for undef values"
This reverts commit 6de41e6d76.

Missed a unit test affected by this change, revert for now.
2022-11-01 17:03:40 +01:00
Nikita Popov 6de41e6d76 [ValueLattice] Fix getCompare() for undef values
Resolve the TODO about incorrect getCompare() behavior. This can
be made more precise (e.g. by materializing the undef value and
performing constant folding on it), but for now just return an
unknown result.
2022-11-01 16:41:24 +01:00
Sanjay Patel 4299b28a9b [InstCombine] add helper function for select-of-bools folds; NFC
This set of folds keeps growing, and it contains
bugs like issue #58552, so make it easier to
spot those via backtrace.
2022-11-01 11:06:18 -04:00
Florian Hahn 27f6091bb0
[ConstraintElimination] Fix nested GEP check.
At the moment, the implementation requires that the outer GEP has a
single index, the inner GEP can have an arbitrary indices, because the
general `decompose` helper is used.
2022-11-01 14:43:42 +00:00
skc7 87a20868eb [SLP] Extend reordering data of tree entry to support PHI nodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136757
2022-11-01 04:50:04 +00:00
Sameer Sahasrabuddhe b756096b0c [AAPointerInfo] refactor how offsets and Access objects are tracked
AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:

- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access

A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.

Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:

- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
  in the callgraph:

```
               A (local inst)
               |
               B
              / \
             C1  C2
              \ /
               D (remote inst)

```
This fixes a bug where a store is incorrectly eliminated in a lit test.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D136526
2022-11-01 09:57:12 +05:30
Florian Mayer e1de7ac20f Revert "[InstCombine] fold mul with decremented "shl -1" factor"
This reverts commit bc886e9b58.

Broke MSAN bootstrap buildbots with Assertion `RangeAfterCopy % ExtraScale == 0 && "Extra instruction requires immediate to be aligned"' failed.
2022-10-31 17:39:05 -07:00
Usman Nadeem 32755786e0 [JumpThreading] Put a limit on the PHI nodes when duplicating a BB.
Do not duplicate a BB if it has a lot of PHI nodes.
If a threadable chain is too long then the number of duplicated PHI nodes
can add up, leading to a substantial increase in compile time when rewriting
the SSA.

Fixes https://github.com/llvm/llvm-project/issues/58203
Differential Revision: https://reviews.llvm.org/D136716

The threshold of 76 in this patch is reasonably high and reduces the compile
time of cldwat2m_macro.f90 in SPEC2017/cam4 from 80+min to <2min.

Change-Id: I153c89a8e0d89b206a5193dc1b908c67e320717e
2022-10-31 15:51:56 -07:00
Sanjay Patel 1f8ac37e2d [InstCombine] adjust branch on logical-and fold
The transform was just added with:
115d2f69a5
...but as noted in post-commit feedback, it was
confusingly coded. Now, we create the final
expected canonicalized form directly and put
an extra use check on the match, so we should
not ever end up with more instructions.
2022-10-31 17:39:29 -04:00
Philip Reames a819f6c8d1 [InstCombine] Allow simplify demanded transformations on scalable vectors
Differential Revision: https://reviews.llvm.org/D136475
2022-10-31 13:39:36 -07:00
Patrick Walton 01859da84b [AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory().
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.

It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.

The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.

Differential Revision: https://reviews.llvm.org/D136659
2022-10-31 13:03:41 -07:00
Sanjay Patel 115d2f69a5 [InstCombine] canonicalize branch with logical-and-not condition
https://alive2.llvm.org/ce/z/EfHlWN

In the motivating case from issue #58313,
this allows forming a duplicate 'not' op
which then gets CSE'd and simplifyCFG'd
and combined into the expected 'xor'.
2022-10-31 15:51:45 -04:00
Alexey Bataev 99f9bd4807 [SLP]Fix a crash in the analysis of the compatible cmp operands.
We can skip the analysis of the operands opcodes, can compare directly
them in some cases.
2022-10-31 09:47:25 -07:00
Sameer Sahasrabuddhe 2e7636c13b [AAPointerInfo] check for Unknown offsets in callee
When translating offset info from the callee at a call site, first check if the
offset is Unknown. Any offset in the caller should be added only if the callee
offset is valid.

Differential Revision: https://reviews.llvm.org/D137011
2022-10-31 22:15:56 +05:30
Mengxuan Cai eda3c93486 [LoopFuse] Ensure loops are in loop simplified form under new PM
Loop Fusion (Function Pass) requires loops in simplified form. With
legacy-pm, loop-simplify pass is added as a dependency for loop-fusion.
But the new pass manager does not always ensure this format. This patch
tries to invoke simplifyLoop() on loops that are not in simplified form
only for new PM.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D136781
2022-10-31 11:46:28 -04:00
Brendon Cahoon f59205aef9 [BasicBlockUtils] Add a new way for CreateControlFlowHub()
The existing way of creating the predicate in the guard blocks uses
a boolean value per outgoing block. This increases the number of live
booleans as the number of outgoing blocks increases. The new way added
in this change is to store one integer to represent the outgoing block
we want to branch to, then at each guard block, an integer equality
check is performed to decide which a specific outgoing block is taken.

Using an integer reduces the number of live values and decreases
register pressure especially in cases where there are a large number
of outgoing blocks. The integer based approach is used when the
number of outgoing blocks crosses a threshold, which is currently set
to 32.

Patch by Ruiling Song.

Differential review: https://reviews.llvm.org/D127831
2022-10-31 08:58:54 -05:00
Brendon Cahoon 9265b7fa83 NFC: restructure code for CreateControlFlowHub()
Differential review: https://reviews.llvm.org/D127830
2022-10-31 08:39:09 -05:00
Sanjay Patel bc886e9b58 [InstCombine] fold mul with decremented "shl -1" factor
This is a sibling to:
6064e92b0a
...but we canonicalize the shl+add to shl+xor,
so the pattern is different than I expected:
https://alive2.llvm.org/ce/z/8CX16e

I have not found any patterns that are safe
to propagate no-wrap, so that is not included
here.
2022-10-31 09:06:55 -04:00
Patrick Walton 36bbd68667 [InstCombine] Allow memcpys from constant memory to readonly nocapture parameters to be elided.
Currently, InstCombine can elide a memcpy from a constant to a local alloca if
that alloca is passed as a nocapture parameter to a *function* that's readnone
or readonly, but it can't forward the memcpy if the *argument* is marked
readonly nocapture, even though readonly guarantees that the callee won't
mutate the pointee through that pointer. This patch adds support for detecting
and handling such situations, which arise relatively frequently in Rust, a
frontend that liberally emits readonly.

A more general version of this optimization would use alias analysis to check
the call's ModRef info for the pointee, but I was concerned about blowing up
compile time, so for now I'm just checking for one of readnone on the function,
readonly on the function, or readonly on the parameter.

Differential Revision: https://reviews.llvm.org/D136822
2022-10-30 14:41:03 -07:00
Florian Hahn 1f1fb208da
[SimpleLoopUnswitch] Forget block and loop dispositions.
Also invalidate block and loop dispositions during non-trivial
unswitching.

Fixes #58564.
2022-10-30 20:44:22 +00:00
zhongyunde f58311796c [InstCombine] refactor the SimplifyUsingDistributiveLaws NFC
Precommit for D136015
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D137019
2022-10-30 21:04:06 +08:00
Patrick Walton cb2cb2d201 [InstCombine] Avoid deleting volatile memcpys.
InstCombine can replace memcpy to an alloca with a pointer directly to the
source in certain cases. Unfortunately, it also did so for volatile memcpys.
This patch makes it stop doing that.

This was discovered in D136822.

Differential Revision: https://reviews.llvm.org/D137031
2022-10-30 02:40:16 -07:00
Florian Hahn 43f0f1a66f
[VPlan] Use onlyFirstLaneUsed in sinkScalarOperands.
Replace custom code to check if only the first lane is used by generic
helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few
additional cases and was suggested in D133760.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136368
2022-10-29 19:45:19 +01:00
Sanjay Patel 6064e92b0a [InstCombine] fold mul with incremented "shl 1" factor
X * ((1 << Z) + 1) --> (X << Z) + X

https://alive2.llvm.org/ce/z/P-7WK9

It's possible that we could do better with propagating
no-wrap, but this carries over the existing logic and
appears to be correct.

The naming differences on the existing folds are a result
of using getName() to set the final value via Builder.
That makes it easier to transfer no-wrap rather than the
gymnastics required from the raw create instruction APIs.
2022-10-29 12:50:19 -04:00
Sanjay Patel 50000ec2cb [InstCombine] create helper function for mul patterns with 1<<X; NFC
There are at least 2 other potential patterns that could go here.
2022-10-29 12:50:19 -04:00
Sanjay Patel d344146857 [InstCombine] reduce code duplication in visitMul(); NFC 2022-10-29 09:26:02 -04:00
Arthur Eubanks 5404fe3456 Revert "[LegacyPM] Remove pipeline extension mechanism"
This reverts commit 4ea6ffb7e8.

Breaks various backends.
2022-10-28 10:26:58 -07:00
Arthur Eubanks 4ea6ffb7e8 [LegacyPM] Remove pipeline extension mechanism
Part of gradually removing the legacy PM optimization pipeline.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D136622
2022-10-28 10:16:52 -07:00
Momchil Velikov c9b4dc3a81 [FuncSpec][NFC] Avoid redundant computations of DominatorTree/LoopInfo
The `FunctionSpecialization` pass needs loop analysis results for its
cost function. For this purpose, it computes the `DominatorTree` and
`LoopInfo` for a function in `getSpecializationBonus`.  This function,
however, is called O(number of call sites x number of arguments), but
the DominatorTree/LoopInfo can be computed just once.

This patch plugs into the PassManager infrastructure to obtain
LoopInfo for a function and removes ad-hoc computation from
`getSpecializatioBonus`.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D136332
2022-10-28 16:08:41 +01:00
Momchil Velikov 14384c96df Recommit: [FuncSpec][NFC] Refactor finding specialisation opportunities
[recommitting after recommitting a dependency]

This patch reorders the traversal of function call sites and function
formal parameters to:

* do various argument feasibility checks (`isArgumentInteresting` )
  only once per argument, i.e. doing N-args checks instead of
  N-calls x  N-args checks.

* do hash table lookups only once per call site, i.e. N-calls
  lookups/inserts instead of N-call x N-args lookups/inserts.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D135968

Change-Id: I7d21081a2479cbdb62deac15f903d6da4f3b8529
2022-10-28 11:26:25 +01:00
Simon Pilgrim 3c71253eb7 ConstraintElimination - pass const DataLayout by reference in (recursive) MergeResults lambda capture. NFC.
There's no need to copy this and fixes a coverity remark about large copy by value
2022-10-28 11:20:09 +01:00
David Green 5dd7d2ce67 [InstCombine] Don't change switch table from desirable to illegal types
In InstCombine we treat i8/i16 as desirable, even if they are not legal.
The current logic in shouldChangeType will decide to convert from an
illegal but desirable type (such as an i8) to an illegal and undesirable
type (such as i3). This patch prevents changing the switch conditions to
an irregular type on like Arm/AArch64 where i8/i16 are not legal.

This is the same issue as https://reviews.llvm.org/D54115. In the case I
was looking it is was converting an i32 switch to an i8 switch, which
then became a i3 switch.

Differential Revision: https://reviews.llvm.org/D136763
2022-10-28 10:15:41 +01:00
Juan Manuel MARTINEZ CAAMAÑO 256f8b06c6 [StructurizeCFG][DebugInfo] Maintain DILocations in the branches created by StructurizeCFG
Make StructurizeCFG preserve the debug locations of the branch instructions it introduces.

Differential Revision: https://reviews.llvm.org/D135967
2022-10-28 02:51:02 -05:00
Alexey Bataev 2ec51f1c75 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 16:29:16 -07:00
Alexey Bataev 8ce0c7b1c9 Revert "[SLP]Improve analysis of same/alternate code ops and scheduling."
This reverts commit dad64448c6 to fix
a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584
2022-10-27 15:21:35 -07:00
wlei 63c27c5c75 Use getCanonicalFnName for callee name 2022-10-27 13:14:47 -07:00
Sanjay Patel fd90f542cf [InstCombine] improve efficiency of sub demanded bits; NFC
There's no reason to shrink a constant or simplify
an operand in 2 steps.

This matches what we currently do for 'add' (although that
seems like it should be altered to handle the commutative
case).
2022-10-27 15:28:05 -04:00
Alexey Bataev dad64448c6 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 11:31:18 -07:00
eopXD 10da9844d0 [LSR] Drop LSR solution if it is less profitable than baseline
The LSR may suggest less profitable transformation to the loop. This
patch adds check to prevent LSR from generating worse code than what
we already have.

Since LSR affects nearly all targets, the patch is guarded by the
option 'lsr-drop-solution' and default as disable for now.

The next step should be extending an TTI interface to allow target(s)
to enable this enhancememnt.

Debug log is added to remind user of such choice to skip the LSR
solution.

Reviewed By: Meinersbur, #loopoptwg

Differential Revision: https://reviews.llvm.org/D126043
2022-10-27 10:13:57 -07:00
Alexandros Lamprineas dbeaf6baa2 [FuncSpec] Do not overestimate the specialization bonus for users inside loops.
When calculating the specialization bonus for a given function argument,
we recursively traverse the chain of (certain) users, accumulating the
instruction costs. Then we exponentially increase the bonus to account
for loop nests. This is problematic for two reasons: (a) the users might
not themselves be inside the loop nest, (b) if they are we are accounting
for it multiple times. Instead we should be adjusting the bonus before
traversing the user chain.

This reduces the instruction count for CTMark (newPM-O3) when Function
Specialization is enabled without actually reducing the amount of
specializations performed (geomean: -0.001% non-LTO, -0.406% LTO).

Differential Revision: https://reviews.llvm.org/D136692
2022-10-27 15:26:11 +01:00
Sanjay Patel d2d23795ca [InstCombine] improve demanded bits for Sub operand 0
This is copying the code that was added for 'add' with D130075.
(That patch removed a fallthrough in the cases, but we can
probably still share at least some code again as a follow-up
cleanup, but I didn't want to risk it here.)

The reasoning is similar to the carry propagation for 'add':
if we don't demand low bits of the subtraction and the
subtrahend (aka RHS or operand 1) is known zero in those low
bits, then there can't be any borrowing required from the
higher bits of operand 0, so the low bits don't matter.

Also, the no-wrap flags can be propagated (and I think that
should be true for add too).

Here's an attempt to prove that in Alive2:
https://alive2.llvm.org/ce/z/xqh7Pa
(can add nsw or nuw to src and tgt, and it should still pass)

Differential Revision: https://reviews.llvm.org/D136788
2022-10-27 09:41:57 -04:00
Momchil Velikov 38f44ccfba Recommit: [FuncSpec] Fix specialisation based on literals
[fixed test to work with reverse iteration]

The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` .  There are a few
issues with the implementation, though:

* even with the default, the pass will still specialise based on
   floating-point literals

* even when it's enabled, the pass will specialise only for the `i1`
    type (or `i2` if all of the possible 4 values occur, or `i3` if all
    of the possible 8 values occur, etc)

The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.

This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:

* unknown or undef
* a constant
* a constant range with a single element

on the basis that specialisation is pointless for those cases.

Is also changes the criteria for picking up an actual argument to
specialise if the argument is:

* a LLVM IR constant
* has `constant` lattice value
 has `constantrange` lattice value with a single element.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135893

Change-Id: Iea273423176082ec51339aa66a5fe9fea83557ee
2022-10-27 12:48:20 +01:00
Sameer Sahasrabuddhe 0fd018f9a9 [NFC] [AAPointerInfo] OffsetAndSize is no longer an std::pair
The struct OffsetAndSize is a simple tuple of two int64_t. Treating it as a
derived class of std::pair has no special benefit, but it makes the code
verbose since we need get/set functions that avoid using "first" and "second" in
client code. Eliminating the std::pair makes this more readable.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D136745
2022-10-27 11:00:28 +05:30
wlei d6a0585dd1 [SampleFDO] Compute and report profile staleness metrics
When a profile is stale and profile mismatch could happen, the mismatched samples are discarded, so we'd like to compute the mismatch metrics to quantify how stale the profile is, which will suggest user to refresh the profile if the number is high.

Two sets of metrics are introduced here:

 - (Num_of_mismatched_funchash/Total_profiled_funchash), (Samples_of_mismached_func_hash / Samples_of_profiled_function) : Here it leverages the FunctionSamples's checksums attribute which is a feature of pseudo probe. When the source code CFG changes, the function checksums will be different, later sample loader will discard the whole functions' samples, this metrics can show the percentage of samples are discarded due to this.
 -  (Num_of_mismatched_callsite/Total_profiled_callsite), (Samples_of_mismached_callsite / Samples_of_profiled_callsite) : This shows how many mismatching for the callsite location as callsite location mismatch will affect the inlining which is highly correlated with the performance. It goes through all the callsite location in the IR and profile, use the call target name to match, report the num of samples in the profile that doesn't match a IR callsite.

This is implemented in a new class(SampleProfileMatcher) and under a switch("--report-profile-staleness"), we plan to extend it with a fuzzy profile matching feature in the future.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D136627
2022-10-26 21:06:52 -07:00
Johannes Rudolf Doerfert 41a278f56a [OpenMP][FIX] Do not add custom state machine eagerly in LTO runs
If we run LTO optimization we migth end up introducing a custom state machine
and later transforming the region into SPMD. This is a problem. While a follow
up will introduce a check for the SPMD conversion, this already prevents the
eager custom state machine generation. Only if the kernel init function is
defined, rather then declared, we will emit a custom state machine. SPMD-zation
can happen eagerly though. Tests are adjusted via a weak definition. The LTO
test was added to verify this works as expected.

Differential Revision: https://reviews.llvm.org/D136740
2022-10-26 10:40:11 -07:00
Alex Brachet 443e2a10f6 Reland "[PGO] Make emitted symbols hidden"
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562

Then reverted again because it broke tests on MacOS, they should be
fixed now.

Bug: https://github.com/llvm/llvm-project/issues/58265

Differential Revision: https://reviews.llvm.org/D135340
2022-10-26 17:13:05 +00:00
Momchil Velikov 9901583968 Revert "[FuncSpec] Fix specialisation based on literals"
This reverts commit a8b0f58017 because
of "reverse-iteration" buildbot failure.
2022-10-26 13:54:12 +01:00
Momchil Velikov 2c8a4c6e62 Revert "[FuncSpec][NFC] Refactor finding specialisation opportunities"
This reverts commit a8853924bd due to dependency
on a8b0f58017
2022-10-26 13:54:12 +01:00
Guillaume Chatelet 1a726cfa83 Take memset_inline into account in analyzeLoadFromClobberingMemInst
This appeared in https://reviews.llvm.org/D126903#3884061

Differential Revision: https://reviews.llvm.org/D136752
2022-10-26 09:50:13 +00:00
Momchil Velikov a8853924bd [FuncSpec][NFC] Refactor finding specialisation opportunities
This patch reorders the traversal of function call sites and function
formal parameters to:

* do various argument feasibility checks (`isArgumentInteresting` ) only once per argument, i.e. doing N-args checks instead of N-calls x N-args checks.

* do hash table lookups only once per call site, i.e. N-calls lookups/inserts instead of N-call x N-args lookups/inserts.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D135968
2022-10-26 10:18:35 +01:00
Momchil Velikov 606d25e545 [FuncSpec] Compute specialisation gain even when forcing specialisation
When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g.  for a function

    void f(int x, int y)

the call like `f(1, 2)` could be matched by either

    void f.1(int x /* int y == 2 */);

or

    void f.2(/* int x == 1, int y == 2 */);

The `FunctionSpecialisation` pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag `-force-function-specialization`, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.

This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when `-force-function-specialization`
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.

Reviewed By: ChuanqiXu, labrinea

Differential Revision: https://reviews.llvm.org/D136180
2022-10-26 10:08:03 +01:00
Momchil Velikov a8b0f58017 [FuncSpec] Fix specialisation based on literals
The `FunctionSpecialization` pass has support for specialising
functions, which are called with literal arguments. This functionality
is disabled by default and is enabled with the option
`-function-specialization-for-literal-constant` .  There are a few
issues with the implementation, though:

* even with the default, the pass will still specialise based on
   floating-point literals

* even when it's enabled, the pass will specialise only for the `i1`
    type (or `i2` if all of the possible 4 values occur, or `i3` if all
    of the possible 8 values occur, etc)

The reason for this is incorrect check of the lattice value of the
function formal parameter. The lattice value is `overdefined` when the
constant range of the possible arguments is the full set, and this is
the reason for the specialisation to trigger. However, if the set of
the possible arguments is not the full set, that must not prevent the
specialisation.

This patch changes the pass to NOT consider a formal parameter when
specialising a function if the lattice value for that parameter is:

* unknown or undef
* a constant
* a constant range with a single element

on the basis that specialisation is pointless for those cases.

Is also changes the criteria for picking up an actual argument to
specialise if the argument is:

* a LLVM IR constant
* has `constant` lattice value
 has `constantrange` lattice value with a single element.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135893
2022-10-26 09:55:33 +01:00
Momchil Velikov 1a525dec7f [FuncSpec] Fix missed opportunities for function specialisation
When collecting the possible constant arguments to
specialise a function the compiler will abandon the search
on the first argument that is for some reason unsuitable as
a specialisation constant. Thus, depending on the traversal
order of the functions and call sites, the compiler can end
up with a different set of possible constants, hence with
different set of specialisations.

With this patch, the compiler will skip unsuitable
constants, but nevertheless will continue searching for
more.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135867
2022-10-25 23:19:48 +01:00
Philip Reames 269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Arthur Eubanks ef37504879 [Instrumentation] Remove legacy passes
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D136615
2022-10-25 13:11:07 -07:00
Alina Sbirlea d1b19da854 [LoopPeeling] Add flag to disable support for peeling loops with non-latch exits
Add a flag to allow disabling the changes in
https://reviews.llvm.org/D134803.

Differential Revision: https://reviews.llvm.org/D136643
2022-10-25 12:19:14 -07:00
Momchil Velikov c47739b45c [FuncSpec] Consider small noinline functions for specialisation
Small functions with size under a given threshold are not
considered for specialisaion on the presumption that they
are easy to inline. This does not apply to `noinline`
functions, though.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D135862
2022-10-25 19:49:04 +01:00
Fangrui Song a527bda520 [LegacyPM] Remove DataFlowSanitizerLegacyPass
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove DataFlowSanitizerLegacyPass.

Differential Revision: https://reviews.llvm.org/D124594
2022-10-25 10:55:29 -07:00
Simon Pilgrim 50fe87a5c8 [Transforms] classifyArgUse - don't deference pointer before null test
Reported here: https://pvs-studio.com/en/blog/posts/cpp/1003/ (N11)
2022-10-25 17:24:00 +01:00
Yaxun (Sam) Liu 9d5adc7e49 Revert "reland e5581df60a [SimplifyCFG] accumulate bonus insts cost"
This reverts commit bd7949bcd8.

Revert this patch since reviwers have different opinions regarding
the approach in post-commit review.

Will open RFC for further discussion.

Differential Revision: https://reviews.llvm.org/D132408
2022-10-25 12:15:39 -04:00
zhongyunde 620cff096a [InstCombine] Fold series of instructions into mull for more types
Relax the constraint of wide/vectors types.
Address the comment https://reviews.llvm.org/D136015?id=469189#inline-1314520

Reviewed By: spatel, chfast
Differential Revision: https://reviews.llvm.org/D136661
2022-10-25 23:04:46 +08:00
Nico Weber 76745d2b58 Revert "[PGO] Make emitted symbols hidden"
This reverts commit 04877284b4.
Looks like this is still breaking the test
Profile-x86_64 :: instrprof-darwin-dead-strip.c
(see comment on https://reviews.llvm.org/D135340).
2022-10-25 08:54:47 -04:00
LiaoChunyu e6c8418aab [ObjCARC][NFC] Fix defined but not used warning from D135041
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D136665
2022-10-25 15:16:42 +08:00
Kevin Athey 31bfa4a69b [MSAN] Add handleCountZeroes for ctlz and cttz.
This addresses a bug where vector versions of ctlz are creating false positive reports.

Depends on D136369

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D136523
2022-10-24 17:31:34 -07:00
Roy Sundahl 0c35b6165c [ASAN] Don't inline when -asan-max-inline-poisoning-size=0
When -asan-max-inline-poisoning-size=0, all shadow memory access should be
outlined (through asan calls). This was not occuring when partial poisoning
was required on the right side of a variable's redzone. This diff contains
the changes necessary to implement and utilize  __asan_set_shadow_01() through
__asan_set_shadow_07(). The change is necessary for the full abstraction of
the asan implementation and will enable experimentation with alternate strategies.

Differential Revision: https://reviews.llvm.org/D136197
2022-10-24 14:17:59 -07:00
Sanjay Patel 5dcfc32822 [InstCombine] allow more commutative matches for logical-and to select fold
This is a sibling transform to the fold just above it. That was changed
to allow the corresponding commuted patterns with:
3073074562
e1bd759ea5
8628e6df70
2022-10-24 16:40:43 -04:00
Yaxun (Sam) Liu bd7949bcd8 reland e5581df60a [SimplifyCFG] accumulate bonus insts cost
Fixed compile time increase due to always constructing LocalCostTracker.
Now only construct LocalCostTracker when needed.
2022-10-24 15:43:53 -04:00
Craig Topper 1edc51b56a [InstCombine] Explicitly check for scalable TypeSize.
Instead of assuming it is a fixed size.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D136517
2022-10-24 12:29:06 -07:00
Alex Brachet 04877284b4 [PGO] Make emitted symbols hidden
This was reverted because it was breaking when targeting Darwin which
tried to export these symbols which are now hidden. It should be safe
to just stop attempting to export these symbols in the clang driver,
though Apple folks will need to change their TAPI allow list described
in the commit where these symbols were originally exported
f538018562

Bug: https://github.com/llvm/llvm-project/issues/58265

Differential Revision: https://reviews.llvm.org/D135340
2022-10-24 19:05:10 +00:00
Alexey Bataev da4e0f7ac5 [SLP][NFC]Fix PR58476: Fix compile time for reductions, NFC.
Improve O(N^2) to O(N) in some cases, reduce number of allocations by
reserving memory.
Also, improve analysis of loads reduction values to avoid analysis
of not vectorizable cases.
2022-10-24 10:13:24 -07:00