Commit Graph

3689 Commits

Author SHA1 Message Date
Matt Arsenault 7cf5581712 Analysis: Update some tests for opaque pointers
StackSafetyAnalysis/lifetime.ll had one bitcast removed that may have
mattered. The concluded lifetime is longer based on the underlying
alloca, instead of the bitcasted pointer so left that as a pointless
cast.

local.ll memintrin.ll needed some manual fixes
2022-12-02 18:47:43 -05:00
Matt Arsenault 81c163e3e1 StackSafetyAnalysis: Don't use anonymous values in test 2022-12-02 18:47:43 -05:00
Matt Arsenault a74c5707be Fix some test files with executable permissions 2022-12-02 17:12:03 -05:00
Bjorn Pettersson a11faeed44 [test] Switch to use -passes syntax in various test cases 2022-12-01 21:25:59 +01:00
Philip Reames 73eacf94e0 [RISCV] Incorporate LMUL into costs for arithmetic and shuffles
This reuses the routine implemented in 0e6f0b7 to implement several existing TODOs. Many of the operations scale linearly with LMUL; this change represents that in the cost model.

Differential Revision: https://reviews.llvm.org/D139039
2022-12-01 10:46:27 -08:00
Roman Lebedev 7850ab2112
[NFC] Port an assortment of tests that invoke SROA to new pass manager 2022-12-01 21:17:18 +03:00
Philip Reames 7d82c99403 [RISCV][TTI] Account for constant materialization cost when costing arithmetic operations
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.

We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.

We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.

Differential Revision: https://reviews.llvm.org/D138941
2022-11-30 07:20:51 -08:00
Paul Robinson 3558da3d89 [Sanitizers] Fix test that never ran anywhere
Incorrect REQUIRES clause. Also fixed the incorrect 'opt' line
and removed a redundant -mtriple option.
2022-11-30 07:20:27 -08:00
David Green f2a92db29e [AArch64] Don't treat SVE scalable extends as free widening instructions
The logic in isWideningInstruction handles instructions like uaddw and
smull, where 'add(x, zext(y))' or 'mul(sext(x), sext(y))' can be
converted to single instructions, making the extends free. This doesn't
apply the same to SVE instructions though.
https://godbolt.org/z/695d3nhGd

(There are instructions like SMULLT/B, but they require top/bottom lane
interleaving. That is similar to MVE instructions, which required a
special pass to perform the lane interleaving).

This patch just bails out of the call to isWideningInstruction if the
vector is scalable, getting a more accurate cost.

Differential Revision: https://reviews.llvm.org/D138591
2022-11-30 13:09:48 +00:00
ShihPo Hung 0e6f0b7cc3 [RISCV] Add cost model for fixed broadcast shuffle
This patch adds basic broadcast shuffle costs in order to enable SLP vectorization.
And adds `getLMULCost` to consider reciprocal throughput for different LMUL.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D137276
2022-11-30 04:58:52 -08:00
Philip Reames 3c9d247112 [RISCV] Add test coverage for vector constant materialization costs on arithmetic instructions 2022-11-29 12:00:58 -08:00
Philip Reames e726c5879a [RISCV] Add cost model coverage for vector arithmetic 2022-11-29 11:50:52 -08:00
Mateja Marjanovic 595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
David Green 57dc4a8cab [AArch64] Extend testing for widening conditions under SVE. NFC 2022-11-29 15:53:39 +00:00
Slava Zakharin 5bd8175dd7 [AA] A global cannot escape through nocapture/nocallback call.
When an internal global is passed to a 'nocallback' call as
a 'nocapture' pointer, it cannot escape through this call and
be indirectly referenced in this module.
So it must not alias with any pointer in the module.

This may provide some remedy for Fortran module-private array descriptors
that are usually passed by address to some runtime functions
(e.g. to allocation/deallocation functions). In general, a good aliasing
information derived from Fortran language rules would solve the same issue,
but I think this change may be beneficial as-is (given that nocapture,
nocallback attributes are properly set).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D138336
2022-11-28 12:50:31 -08:00
Philip Reames db07d79ab0 [RISCV] Add cost model for integer and float vector arithmetic instructions.
This patch implements getArithmeticInstrCost for RISCV, supports cost
model for integer and float vector arithmetic instructions.

Differential Revision: https://reviews.llvm.org/D133552 (Original patch by jacquesguan.  Subset by me with todos added.)
2022-11-28 09:04:38 -08:00
Matt Arsenault 8c58a9ace0 DivergenceAnalysis: Convert tests to opaque pointers 2022-11-28 08:42:38 -05:00
Zain Jaffal 6e4cea55f0 [AArch64] Fix cost model for `udiv` instruction when one of the operands is a uniform constant
Currently the model over estimates the cost of a udiv instruction with one constant. The correct cost for a udiv instruction is
insert_cost * extract_cost * num_elements

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D135991
2022-11-28 10:38:17 +02:00
Max Kazantsev 06c4103d41 [Test] Add couple more tests where we can compute symbolic max exit count (fixed) 2022-11-25 14:40:32 +07:00
Max Kazantsev eb95ab5745 Revert "[Test] Add couple more tests where we can compute symbolic max exit count"
This reverts commit 7e3373c9e1.

Some changes that were not supposed to be commited came with it.
2022-11-25 13:37:24 +07:00
Max Kazantsev 7e3373c9e1 [Test] Add couple more tests where we can compute symbolic max exit count 2022-11-25 13:35:16 +07:00
Max Kazantsev b9c1d73725 [Test] Add test showing that SCEV fails to evaluate symbolic max for 'and' conditions 2022-11-25 11:45:10 +07:00
Max Kazantsev 4496d553bd [SCEV] Fix misplaced \n in printout of max symbolic exit counts 2022-11-25 11:41:36 +07:00
Florian Hahn ae852750b3
[MemoryLocation] Support memcpy_chk in getForArgument.
Similar to 9f9e8ba114, add support for memcyp_chk to
MemoryLocation::getForArgument.

The size argument for memcpy_chk is an upper bound for the size of the
pointer argument. memcpy_chk may read/write less than the specified length,
if it exceeds the specified max size and aborts.

Reviewed By: xbolva00, jdoerfert

Differential Revision: https://reviews.llvm.org/D138613
2022-11-24 19:17:48 +00:00
Max Kazantsev e5fa7eb120 [SCEV] Add printout of symbolic max backedge-taken and block exit count
We do compute it and use in optimizations, but never print it out. We need
to do it in order to be able to track improvements in its computation.
2022-11-24 19:29:58 +07:00
Max Kazantsev 211d941188 [SCEV] Rename max backedge-taken count -> constant max backedge taken-count in printout
This is a preparatory step for introducing symbolic max backedge-taken count.
2022-11-24 18:43:42 +07:00
Florian Hahn 4b4cbbd7fb
[BasicAA] Add tests with __memcpy_chk. 2022-11-23 22:09:53 +00:00
Haohai Wen 1215e86a0e [CostModel][X86] Fix permute latency cost
Avx512 permute latency should be 3 instead of 1.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138427
2022-11-23 19:17:16 +08:00
Haohai Wen 2dfe76e989 [CostModel][X86] Add CostKinds test coverage for shufflevector instruction
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138485
2022-11-23 10:30:48 +08:00
Florian Hahn 5dad4c6788
[SCEV] Iteratively compute ranges for deeply nested expressions.
At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.

This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.

getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.

This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.

https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructions

Fixes #49579.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D130728
2022-11-21 21:56:14 +00:00
Florian Hahn 535c2da58d
[SCEV] Add range test with phi and division.
Extra test coverage for D130728.
2022-11-21 19:58:43 +00:00
Yeting Kuo ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Yeting Kuo 5c3ca10b09 [VP][RISCV] Add vp.bswap and RISC-V support.
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137928
2022-11-16 11:36:38 +08:00
Roman Lebedev 11abb7fedb
[NFC][X86][Costmodel] Drop reduntant interleaved cost test coverage
These are already covered by the more general tests i've added.
2022-11-15 21:30:06 +03:00
Roman Lebedev 8e37b53360
[X86] Rewrite `getScalarizationOverhead()`
All of our insert/extract ops work on 128-bit lanes.

For `Insert`, we need to extract affected 128-bit lane,
unless it's being fully overwritten (FIXME: do we need to be
careful about legalization-induced padding that we obviously don't demand?),
perform insertions, and then insert the 128-bit lane back.

But hold on. If we are operating on an 256-bit legal vector,
and thus have two 128-bit subvectors, and are fully overwriting them both,
we don't actually need to insert *both* subvectors,
only the second one, into the implicitly-widened first one.

Also, `Insert` wasn't actually querying the costs,
but just assuming them to be `1`.

`getShuffleCost(TTI::SK_ExtractSubvector)` notes:
```
  // Note that in general, the insertion starting at the beginning of a vector
  // isn't free, because we need to preserve the rest of the wide vector.
```
... so as far as i can tell, we didn't account for that.

I was hoping this would allow vectorization at a higher VF at one case i looked at,
but the subvector insertion cost is still dis-advising that.

The change for `Extract` is NFC, and is for consistency only,
i wanted to get rid of of that weird explicit discounting of insertion of 0'th element,
since the general code should already deal with that.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137913
2022-11-15 21:07:12 +03:00
Nikita Popov 458ae539df [AST] Remove legacy AliasSetPrinter pass
A NewPM version of this pass exists, drop the legacy version of
this testing-only pass.
2022-11-14 15:50:38 +01:00
Matt Arsenault 583450fa09 AMDGPU: Fix DivergenceAnalysis for llvm.read_register
This was treating all calls as uniform by default, which
is wrong if used to read a VGPR.
2022-11-07 10:42:35 -08:00
Matt Arsenault 541041d1ea AMDGPU: Fix faulty divergence analysis tests
These were supposed to be checking that atomics were treated
as divergence sources. However, they were using function arguments
which are always treated as divergent, so they could have
been found divergent for the wrong reason.
2022-11-06 22:14:12 -08:00
Matt Arsenault f72416e974 AMDGPU: Fix missing divergence tests for csub intrinsics 2022-11-06 22:14:12 -08:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Philip Reames 73482b457e [RISCV] Fix cost of legal fixed length masked load and stores
We can cost them the same way as a scalable masked load/store. By hitting the default path, we were costing them as if they were being scalarized. This is a significant over estimate.

Differential Revision: https://reviews.llvm.org/D137218
2022-11-02 07:24:38 -07:00
Nikita Popov 5fe9273c73 [BasicAA] Re-enable cs-cs-arm.ll test (PR58738)
Fixes https://github.com/llvm/llvm-project/issues/58738.
2022-11-02 14:22:44 +01:00
Paul Robinson 9a4aa37dbf Patch up attributes on a newly enabled test 2022-11-01 14:14:40 -07:00
Paul Robinson 4f0a1201a4 [lit][REQUIRES] Fix some tests with incorrect REQUIRES clauses
These weren't running anywhere because of bad specifications.
One test has bit-rotted and had to be XFAILed, the rest are okay.

Differential Revision: https://reviews.llvm.org/D136612
2022-11-01 13:49:23 -07:00
Nikita Popov 6aa672f141 [IR] Take operand bundles into account for call argument readonly/writeonly
We currently only take operand bundle effects into account when
querying the function-level memory attributes. However, I believe
that we also need to do the same for parameter attributes. For
example, a call with deopt bundle to a function with readnone
parameter attribute cannot treat that parameter as readnone,
because the deopt bundle may read it.

Differential Revision: https://reviews.llvm.org/D136834
2022-11-01 09:30:03 +01:00
Yeting Kuo 71e4e35581 [VP][RISCV] Add vp.rint and RISC-V support.
FRINT uses dynamic rounding mode instead of static rounding mode. The patch
rename VFCVT_X_F_VL to VFCVT_RM_X_F_VL for static rounding mode uses and added
new ISDNode VFCVT_X_F_VL directly selected to PseudoVFCVT_X_F_V.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136662
2022-11-01 14:52:47 +08:00
Patrick Walton 01859da84b [AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory().
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.

It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.

The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.

Differential Revision: https://reviews.llvm.org/D136659
2022-10-31 13:03:41 -07:00
Patrick Walton 81767f2d18 [test][AliasAnalysis] Add some baseline tests in preparation for getModRefInfoMask().
This commit adds some tests in preparation for D136659, which allows alias
analysis to treat locally-invariant memory pointed to by readonly noalias
pointers the same as globally-invariant memory in some cases. The existing
behavior for these tests is marked as expected and will be changed when that
diff lands.

Differential Revision: https://reviews.llvm.org/D136993
2022-10-29 15:08:54 -07:00
Patrick Walton f3d49dbcb1 [test] Remove readonly from some parameters that are written through in tests.
In D136659 I found a few tests that write through readonly parameters:

* Analysis/BasicAA/pr18573.ll: @foo1 writes through %arr.ptr, but declares it
readonly. I removed the readonly annotation.

* CodeGen/ARM/ParallelDSP/aliasing.ll: @restrict writes through the readonly
%arg3, @store_alias_arg3_illegal_1 writes through the readonly %arg3, and
@store_alias_arg3_illegal_2 writes through the readonly %arg3. I removed
readonly from all three. Also, I added some CHECK-LABEL directives to make it
harder for FileCheck output to be mixed up.

* Transforms/LoopVectorize/AArch64/sve-gather-scatter.ll:
@gather_nxv4i32_ind64_stride2 writes through the readonly %a. I removed the
readonly attribute.

* Transforms/LoopVectorize/interleaved-accesses.ll: @load_gap_reverse writes
through the readonly %P1 and %P2. Also, the corresponding C code in the comment
didn't match the test. I removed the readonly attribute from both parameters
and corrected the C code.

Differential Revision: https://reviews.llvm.org/D136880
2022-10-29 15:05:20 -07:00
Craig Topper e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00