* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
NewGVN tables are not cleared out between the initial run of NewGVN and the verification. In case of phi-of-ops optimization, OpSafeForPHIOfOps goes out of sync between the two runs. One operand might not be safe for one basic block, but it might be safe for one of its successors. In this case, the operand will be added in OpSafeForPHIOfOps map. In verification phase, we reuse OpSafeForPHIOfOps without updating it again. As a result, the operand will be considered safe for phi-of-ops optimization even for the case that it is not. This patch fixes this problem.
Fix for 53807.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D130910
In D131869 we noticed that we jump through some hoops because we parse the
tolerance option used in MisExpect.cpp into a 64-bit integer. This is
unnecessary, since the value can only be in the range [0, 100).
This patch changes the underlying type to be 32-bit from where it is
parsed in Clang through to it's use in LLVM.
Reviewed By: jloser
Differential Revision: https://reviews.llvm.org/D131935
If a function only has a few instructions, instrumentation can significantly increase the size and performance overhead of that function. Add the `-pgo-function-size-threshold` option to select a size threshold so these small functions are not instrumented.
A similar option `-fxray-instruction-threshold=<N>` is used for XRay to reduce binary size overhead [1].
[1] https://www.llvm.org/docs/XRay.html
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131816
Currently, clang ignores the 0 initialisation in finite math
For example:
```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
f_prod *= arr[i];
}
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.
Reviewed By: fhahn, spatel
Differential Revision: https://reviews.llvm.org/D131672
We don't have a dominator tree in this pass, so we
can't bail out sooner by checking for unreachable
code, but this is a minimal fix for the example in
issue #56875.
Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320
Differential Revision: https://reviews.llvm.org/D131894
This should fix these build bot errors:
Step 6 (build-check-mlir-build-only) failure: build (failure)
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
Previously we would only CSE constrained FP intrinsics in the default
floating point environment. Exception behavior of "strict" is still not
allowed since we are not allowed to remove any traps in that case.
There are no restrictions on CSE across function calls inside a function.
Differential Revision: https://reviews.llvm.org/D112256
Using the legacy PM for the optimization pipeline is deprecated and in
the process of being removed. This is a small step in that direction.
For an example of migrating to the new PM:
853b57fe80
The basic patterns look like this:
https://alive2.llvm.org/ce/z/MDj9EC
The tests have a use of the overflow value too.
Otherwise, existing folds should reduce already.
This was noted as a missing IR fold in:
926e7312b2
Hopefully, this makes it easier to implement a backend
fix because we should get the same IR regardless of
whether the source used builtins or inline code.
(A | ?) | (A ^ B) --> (A | ?) | B
https://alive2.llvm.org/ce/z/dbNQw4
This extends the existing transform to peek through
another 'or' instruction for the common operand.
This is the underlying missing fold that should allow
issue #56711 and issue #57120 to reduce even more.
Allows for even more savings in the binary image while simultaneously removing the name of the offending stack variable.
Depends on D131631
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131728
We're seeing non-determinism with loading sample profiles. It seems to
be related to the order in which we merge FunctionSamples in
promoteMergeNotInlinedContextSamples(). Use a MapVector to iterate over
NonInlinedCallSites in the order entries were inserted.
Reviewed By: wenlei, davidxl
Differential Revision: https://reviews.llvm.org/D131592
The goal is to reduce the size of the MSAN with track origins binary, by making
the variable name locations constant which will allow the linker to compress
them.
Follows: https://reviews.llvm.org/D131415
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131631
Contextual knowledge may be used to prove invariance of some conditions.
For example, in this case:
```
; %len >= 0
guard(%iv = {start,+,1}<nuw> <s %len)
guard(%iv = {start,+,1}<nuw> <u %len)
```
the 2nd check always fails if `start` is negative and always passes otherwise.
It looks like there are more opportunities of this kind that are still to be
implemented in the future.
Differential Revision: https://reviews.llvm.org/D129753
Reviewed By: apilipenko
Closing https://github.com/llvm/llvm-project/issues/56329
The problem happens when we try to simplify the suspend points. We might
break the assumption that the final suspend lives in the last slot of
Shape.CoroSuspends. This patch tries to main the assumption and fixes
the problem.
We manage to iteratively achieve this result with no extra
uses, and the reassociate pass can also do this, but this
pattern falls through the cracks in the example from
issue #57053.
Other sanitizers (ASan, TSan, see added tests) already handle
memcpy.inline and memset.inline by not relying on InstVisitor to turn
the intrinsics into calls. Only MSan instrumentation currently does not
support them due to missing InstVisitor callbacks.
Fix it by actually making InstVisitor handle Mem*InlineInst.
While the mem*.inline intrinsics promise no calls to external functions
as an optimization, for the sanitizers we need to break this guarantee
since access into the runtime is required either way, and performance
can no longer be guaranteed. All other cases, where generating a call is
incorrect, should instead use no_sanitize.
Fixes: https://github.com/llvm/llvm-project/issues/57048
Reviewed By: vitalybuka, dvyukov
Differential Revision: https://reviews.llvm.org/D131577
This is done by calling __msan_set_alloca_origin and providing the location of the variable by using the call stack.
This is prepatory work for dropping variable names when track-origins is enabled.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D131205
The relevant property of allocation functions of interest here is
their uniqueness (in the sense of disjoint provenance), which is
encoded by the noalias return attribute.
Differential Revision: https://reviews.llvm.org/D130225
After D121595 was commited, I noticed regressions assosicated with small trip
count numbersvectorisation by tail folding with scalable vectors. As a solution
for those issues I propose to introduce the minimal trip count threshold value.
Differential Revision: https://reviews.llvm.org/D130755
The RelLookupTableConverter pass currently only supports 64-bit
pointers. This is currently enforced using an isArch64Bit() check
on the target triple. However, we consider x32 to be a 64-bit target,
even though the pointers are 32-bit. (And independently of that
specific example, there may be address spaces with different pointer
sizes.)
As such, add an additional guard for the size of the pointers that
are actually part of the lookup table.
Differential Revision: https://reviews.llvm.org/D131399
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.
Reviewed By: aeubanks, asbirlea
Differential Revision: https://reviews.llvm.org/D129599
We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128073
When EarlyCSE tries to common vector masked loads/stores, it first checks that
they have same base operand and then assumes that this is enough for mask types
to be equal. This is true for typed pointers but false for opaque ones -
two loads of different vector sizes from same base pointer '%b' are the same,
`ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases
were different).
Change assert to return from lambda `isSubmask` so this transformation properly
works with opaque pointers.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D131251