Commit Graph

5956 Commits

Author SHA1 Message Date
Jonas Paulsson 122efef8ee Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions.""
This reverts commit 17db0de330.

Some more bots got broken - need to investigate.
2022-12-05 00:52:00 +01:00
Jonas Paulsson 17db0de330 Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).

RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
2022-12-03 14:15:15 -06:00
Matt Arsenault 27387896cf SROA: Simplify addrspacecasted allocas with volatile accesses
If the alloca is accessed through an addrspacecasted pointer, allow
the normal changes on the alloca. Cast back to the original use
address space instead of the new alloca's natural address space.
2022-12-02 15:20:56 -05:00
Valery Pykhtin 5ce3273ebf [AMDGPU] Scheduler: Don't revert the schedule if the register pressure isn't changed for a region
This one-linear fix improves compilation time for about ~40% on ASAN enabled code.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136069
2022-12-02 15:59:38 +01:00
Matt Arsenault d85e849ff4 AMDGPU: Convert some assorted tests to opaque pointers 2022-12-01 21:40:30 -05:00
Matt Arsenault 0fb74d0ff8 AMDGPU: Fix broken attribute usage in test 2022-12-01 21:32:20 -05:00
Matt Arsenault 75a3f22a90 AMDGPU: Convert some stack tests to opaque pointers 2022-12-01 21:32:20 -05:00
Matt Arsenault 4998de4dcc AMDGPU: Update some wait tests to opaque pointers
The script mangled the constantexprs in waitcnt-looptest.ll, so fixed
those manually.
2022-12-01 21:01:58 -05:00
Jonas Paulsson 8ef4632681 Revert "[CodeGen] Add new pass for late cleanup of redundant definitions."
Temporarily revert and fix buildbot failure.

This reverts commit 6d12599fd4.
2022-12-01 13:29:24 -05:00
Jonas Paulsson 6d12599fd4 [CodeGen] Add new pass for late cleanup of redundant definitions.
A new pass MachineLateInstrsCleanup is added to be run after PEI.

This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().

This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.

This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.

Differential Revision: https://reviews.llvm.org/D123394

Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
2022-12-01 13:21:35 -05:00
Roman Lebedev 7850ab2112
[NFC] Port an assortment of tests that invoke SROA to new pass manager 2022-12-01 21:17:18 +03:00
Freddy Ye 89f36dd8f3 [X86] Add ExpandLargeFpConvert Pass and enable for X86
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.

Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M

The end-to-end tests are uploaded at https://reviews.llvm.org/D138261

Reviewed By: LuoYuanke, mgehre-amd

Differential Revision: https://reviews.llvm.org/D137241
2022-12-01 13:47:43 +08:00
Marco Elver b95646fe70 Revert "Use-after-return sanitizer binary metadata"
This reverts commit d3c851d3fc.

Some bots broke:

- https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview
- https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio
2022-11-30 23:35:50 +01:00
Jay Foad 3d9e226081 [AMDGPU] Use s_cmp instead of s_cmpk
Don't bother pre-shrinking "s_cmp_lg_u32 reg, 0" to s_cmpk_lg_u32
because 0 is already an inline constant so the s_cmpk form is no
smaller.

This is just for consistency with the surrounding code and to simplify a
downstream patch.

Differential Revision: https://reviews.llvm.org/D138993
2022-11-30 18:02:39 +00:00
Dmitry Vyukov d3c851d3fc Use-after-return sanitizer binary metadata
Currently per-function metadata consists of:
(start-pc, size, features)

This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D136078
2022-11-30 14:50:22 +01:00
Pierre van Houtryve a88deb4b65 [AMDGPU] Use aperture registers instead of S_GETREG
Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register).

Patch based on D31874 by @kzhuravl
Depends on D137767

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137542
2022-11-30 12:25:10 +00:00
Nicolai Hähnle 4e79764f0a AMDGPU: Fixup tests 2022-11-30 12:37:40 +01:00
Nicolai Hähnle b7f44f7cf9 AMDGPU: Remove ImagePSV and move images to addrspace 7
Following up on the removal of BufferPSV in commit 43b86bf992 ("AMDGPU:
Remove BufferPseudoSourceValue")

It is unclear what exactly the right address space for images should be.
They seem morally closest to buffers, so that's what I went with. In
practical terms, address space 7 is better than address space 0 because
it can't alias with LDS.

Differential Revision: https://reviews.llvm.org/D138949
2022-11-30 11:32:34 +01:00
Matt Arsenault 6a91a5e826 AMDGPU: Convert some cast tests to opaque pointers 2022-11-29 19:04:58 -05:00
Matt Arsenault b5bc205d75 AMDGPU: Convert some bit operation tests to opaque pointers 2022-11-29 18:36:53 -05:00
Matt Arsenault c08d55623d AMDGPU: Fix creating illegal f16 fp_class
We were missing legality checks. The device library build was broken
for targets without f16 support. Technically the first pattern isn't
tested by this patch; it only triggers with the isBeforeLegalize check
in performAndCombine removed. I'm not sure how to trick this into
appearing post-legalization.
2022-11-29 18:24:30 -05:00
Matt Arsenault fac34c01f6 AMDGPU: Bulk update some call tests to use opaque pointers 2022-11-29 18:10:38 -05:00
Matt Arsenault fb1d166e1d AMDGPU: Bulk update some generic intrinsic tests to opaque pointers
Done purely with the script.
2022-11-29 18:09:59 -05:00
Matt Arsenault a1ac8902e3 AMDGPU: Convert amdgpu-alias-analysis.ll to opaque pointers
This one was slightly tricky. The AA debug printing usually, but not
always, uses the old pointer syntax. Also, we need to stop folding out
0 index GEPs in a few of these cases.
2022-11-29 18:08:53 -05:00
Matt Arsenault 177ff42d8e AMDGPU: Convert some fp op tests to opaque issues
fmax_legacy.ll had one test that produced "ptraddrspace(1)", since
somehow "i1addrspace(1)*" used to parse.
2022-11-29 18:08:53 -05:00
Ron Lieberman ca856fff1c Revert "enable code-object-version=5"
very sorry wrong repo.

This reverts commit d882ba7aea.
2022-11-29 15:21:09 -06:00
Nicolai Hähnle 43b86bf992 AMDGPU: Remove BufferPseudoSourceValue
The use of a PSV for buffer intrinsics is misleading because it may be
misinterpreted as all buffer intrinsics accessing the same address in
memory, which is clearly not true.

Instead, build MachineMemOperands without a pointer value but with an
address space, so that address space-based alias analysis can still
work.

There is a lot of test churn because previously address space 4
(constant address space) was used as an address space for buffer
intrinsics. This doesn't make much sense and seems to have been an
accident -- see the change in
AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind.

Differential Revision: https://reviews.llvm.org/D138711
2022-11-29 22:15:11 +01:00
Ron Lieberman d882ba7aea enable code-object-version=5 2022-11-29 15:11:57 -06:00
Brendon Cahoon b32a5666a8 [AMDGPU] Unify uniform return and divergent unreachable blocks
This patch fixes a "failed to annotate CFG" error in
SIAnnotateControlFlow. The problem occurs when there are
divergent and uniform unreachable/return blocks in the same
region. In this case, AMDGPUUnifyDivergentExitNodes does not
create a unified block so the region contains multiple exits.

StructurizeCFG does not work properly when there are multiple
exits, so the neccessary CFG transformations do not occur along
divergent control flow. Subsequently, SIAnnotateControlFlow
processes the path to the divergent exit block, but may only
partially process blocks along a unform control flow path to
another exit block.

This patch fixes the bug by creating a single exit block when
there is a divergent exit block in the function.

Differential revision: https://reviews.llvm.org/D136892
2022-11-29 13:25:56 -06:00
Matt Arsenault 600e9d33a5 AMDGPU: Fix broken test
From ee29a846c6
2022-11-29 12:33:44 -05:00
Matt Arsenault ee29a846c6 DAG: Fix assert when alloca has inconsistent pointer size
Take the type from the alloca, not the type to use for allocas.

Fixes issue 59250.
2022-11-29 11:48:46 -05:00
Matt Arsenault a74ea40cb6 AMDGPU: Remove unnecessary metadata from test
The pass isn't doing anything with it, and the line wrapping is
confusing update_test_checks.
2022-11-29 11:12:08 -05:00
Mateja Marjanovic 595a08847a [AMDGPU] Add support for new LLVM vector types
Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384.

Differential Revision: https://reviews.llvm.org/D138205
2022-11-29 17:02:04 +01:00
Simon Pilgrim 30eff7f29f [DAG] Attempt to replace a mul node with an existing umul_lohi/smul_lohi node (PR59217)
As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization.

This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node

Differential Revision: https://reviews.llvm.org/D138790
2022-11-29 12:51:30 +00:00
Thomas Symalla 5f77ee4011 [NFC][AMDGPU] Pre-commit tests for D136432 2022-11-29 12:57:59 +01:00
Stanislav Mekhanoshin 28eb9ed3bb [AMDGPU] Fine tune LDS misaligned access speed
Differential Revision: https://reviews.llvm.org/D124219
2022-11-28 16:12:02 -08:00
Janek van Oirschot 322966f8f8 [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp
class support and introduce GlobalISel implementation for AMDGPU

Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic
for llvm.is.fpclass
2022-11-28 16:00:36 -05:00
Matt Arsenault ad386a886b AMDGPU: Bulk update some intrinsic tests to opaque pointers
Done entirely with the script.
2022-11-28 14:21:31 -05:00
Matt Arsenault 7a3fb6a6e3 AMDGPU: Convert some memcpy test to opaque pointers
memcpy-scoped-aa.ll required manually updating the IR references in
the MMOs
2022-11-28 14:11:56 -05:00
Matt Arsenault 8e0fadda10 AMDGPU: Bulk update all GlobalISel tests to use opaque pointers 2022-11-28 11:51:36 -05:00
Matt Arsenault da0293e3cc AMDGPU: Bulk update some r600 tests to opaque pointers
r600.amdgpu-alias-analysis.ll has something strange going on where
AliasAnalyisEvaluator's printing is reproducing the typed pointer
syntax.
2022-11-28 11:25:44 -05:00
Matt Arsenault 50caf6936b AMDGPU: Convert promote alloca tests to opaque pointers 2022-11-28 10:36:38 -05:00
Matt Arsenault b3df889b71 AMDGPU: Convert test to generated checks
These checks were too thin to begin with, and required slightly
trickier updates for opaque pointers.
2022-11-28 10:35:29 -05:00
Matt Arsenault 8f071fecfe AMDGPU: Use named values in a test
As always, these were an obstacle to test updates.
2022-11-28 10:35:29 -05:00
Matt Arsenault c1710e7779 AMDGPU: Use modern address spaces in some tests
This was way out of date, still using 4 for generic and 0 for private.
2022-11-28 10:05:06 -05:00
Matt Arsenault 1ab9fa6f0d AMDGPU/GlobalISel: Fix hardcoded virtual register numbers in test 2022-11-28 08:41:31 -05:00
David Stuttard 7940888c59 [AMDGPU] Intrinsic to expose s_wait_event for export ready
Differential Revision: https://reviews.llvm.org/D138216
2022-11-28 11:26:15 +00:00
Roman Lebedev 25f01d593c
Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)"
TableGen is still getting miscompiled on PPC buildbots.
Sent a mail with request for help.

This reverts commit 3c4d2a0396.
2022-11-27 00:00:06 +03:00
Roman Lebedev 3c4d2a0396
[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)
This is a recommit of cf624b23bc,
which was reverted in 5cfc22cafe,
because the cut-off on the number of vector elements was not low enough,
and it triggered both SDAG SDNode operand number assertions,
and caused compile time explosions in some cases.

Let's try with something really *REALLY* conservative first,
just to get somewhere, and try to bump it (to 64/128) later.

FIXME: should this respect TTI reg width * num vec regs?

Original commit message:

Now, there's a big caveat here - these bytes
are abstract bytes, not the i8 we have in LLVM,
so strictly speaking this is not exactly legal,
see e.g. https://github.com/AliveToolkit/alive2/issues/860
^ the "bytes" "could" have been a pointer,
and loading it as an integer inserts an implicit ptrtoint.

But at the same time,
InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()`
would expand a memtransfer of 1/2/4/8 bytes
into integer-typed load+store,
so this isn't exactly a new problem.

Note that in memory, poison is byte-wise,
so we really can't widen elements,
but SROA seems to be inconsistent here.

Fixes #59116.
2022-11-26 23:19:15 +03:00
Ivan Kosarev ec8ede8177 [AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones.
Differential Revision: https://reviews.llvm.org/D138215
2022-11-24 10:50:26 +00:00