Commit Graph

171 Commits

Author SHA1 Message Date
Matt Arsenault dd7e407d81 AMDGPU: Move SpilledReg from MFI to SIRegisterInfo
This isn't the most natural place for it, but it avoids a circular
include dependency in an out of tree patch.
2022-06-02 17:11:24 -04:00
hsmahesha 5bd87350a5 [AMDGPU] On gfx908, reserve VGPR for AGPR copy based on register budget.
Based on available register budget, reserve highest available VGPR for
AGPR copy before RA. After RA, shift it to lowest unused VGPR if the one
exist.

Fixes SWDEV-330006.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D123525
2022-04-21 07:57:26 +05:30
Matt Arsenault e0d585d75a AMDGPU: Defer creation of WWM VGPR spill slots
There's no reason to create these immediately. They can be created in
the prolog/epilog code like CSR spills. There's probably a cleaner way
to do this by utilizing the CSR spill code.

This makes the frame index used transient state for
PrologEpilogInserter, and thus makes serialization easier. Really this
doesn't need to be saved here but there isn't really a better place
for it.
2022-04-19 21:07:13 -04:00
Christudasan Devadasan 34a68037dd [AMDGPU][SIFrameLowering] Refactor custom SGPR spills (NFC).
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123666
2022-04-17 13:42:42 +05:30
Venkata Ramanaiah Nalamothu 04fff547e2 [AMDGPU] Move call clobbered return address registers s[30:31] to callee saved range
Currently the return address ABI registers s[30:31], which fall in the call
clobbered register range, are added as a live-in on the function entry to
preserve its value when we have calls so that it gets saved and restored
around the calls.

But the DWARF unwind information (CFI) needs to track where the return address
resides in a frame and the above approach makes it difficult to track the
return address when the CFI information is emitted during the frame lowering,
due to the involvment of understanding the control flow.

This patch moves the return address ABI registers s[30:31] into callee saved
registers range and stops adding live-in for return address registers, so that
the CFI machinery will know where the return address resides when CSR
save/restore happen during the frame lowering.

And doing the above poses an issue that now the return instruction uses undefined
register `sgpr30_sgpr31`. This is resolved by hiding the return address register
use by the return instruction through the `SI_RETURN` pseudo instruction, which
doesn't take any input operands, until the `SI_RETURN` pseudo gets lowered to the
`S_SETPC_B64_return` during the `expandPostRAPseudo()`.

As an added benefit, this patch simplifies overall return instruction handling.

Note: The AMDGPU CFI changes are there only in the downstream code and another
version of this patch will be posted for review for the downstream code.

Reviewed By: arsenm, ronlieb

Differential Revision: https://reviews.llvm.org/D114652
2022-03-09 12:18:02 +05:30
Sebastian Neubauer 6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Matt Arsenault d6fdbbcace AMDGPU: Add second emergency slot for SGPR to vmem for large frames
In a future change, we will sometimes use a VGPR offset for doing
spills to memory, in which case we need 2 free VGPRs to do the SGPR
spill. In most cases we could spill the VGPR along with the SGPR being
spilled, but we don't have any free lanes for SGPR_1024 in wave32 so
we could still potentially need a second scavenging slot.
2022-02-02 19:05:05 -05:00
Matt Arsenault 18aabae8e2 AMDGPU: Fix assertion on fixed stack objects with VGPR->AGPR spills
These have negative / out of bounds frame index values and would
assert when trying to set the BitVector. Fixed stack objects can't be
colored away so ignore them.
2022-01-24 09:45:41 -05:00
Austin Kerbow 8470bf2b08 [AMDGPU] Do not reserve any VGPR for SGPR spills
After the split register allocation changes in eebe841a47 it is no
longer necessary to reserve a VGPR before RA. This can also create bugs
when IPRA is enabled since we cannot predict that a called function may
not reserve any register if it does not have any SGPR spills. If that
happens those functions may override reserved registers that are
normally callee saved. Added a test to show this.

Fixes: SWDEV-309900

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D115551
2022-01-11 22:14:59 -08:00
Brendon Cahoon d45a247998 [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Removing dead frame indices for VGPR to AGPR spills is incorrect
when the frame index is shared by multiple objects, which may
occur due to stack slot coloring. The problem is that subsequent
code that processes the other object will assert because the stack
frame index is marked dead.

Removing dead frame indices is needed prior to stack slot
coloring, which is what happens with SGPR to VGPR spills. These
spills are lowered prior to stack slot coloring, but the VGPR
to AGPR spills are processed afterwards during the Prolog/Epilog
Inserter pass. This patch marks the VGPR to AGPR spill slot as
dead if the slot is not used by another object.

Differential Revision: https://reviews.llvm.org/D115996
2021-12-23 11:09:19 -06:00
Matt Arsenault 273a0c8bc9 PrologEpilogInserter: Use explicit control for scavenge slot placement
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.

I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
2021-11-23 18:01:12 -05:00
Matt Arsenault 659887b405 AMDGPU: Mark prolog/epilog SCC defs as dead
A future change will add SCC liveness checks. Since we are still
relying on forward register scavenging, add dead flags to avoid
spuriously detecting SCC as live.
2021-11-15 21:35:06 -05:00
Stanislav Mekhanoshin 476ab0f809 [AMDGPU] Fixed stack pointer init with architected flat scratch
Even if wave offset is not present we still need to do the rest
of the initialization. The mov into s32 was missing in the
kernels.

Fixes: SWDEV-310935

Differential Revision: https://reviews.llvm.org/D113628
2021-11-10 17:18:38 -08:00
RamNalamothu 539f500e78 [AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.

Fixes: SWDEV-307590

Reviewed By: scott.linder, sebastian-ne

Differential Revision: https://reviews.llvm.org/D113100
2021-11-04 08:02:41 +05:30
Kazu Hirata 4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Jack Andersen bd4dad87f4 [MachineInstr] Move MIParser's DBG_VALUE RegState::Debug invariant into MachineInstr::addOperand
Based on the reasoning of D53903, register operands of DBG_VALUE are
invariably treated as RegState::Debug operands. This change enforces
this invariant as part of MachineInstr::addOperand so that all passes
emit this flag consistently.

RegState::Debug is inconsistently set on DBG_VALUE registers throughout
LLVM. This runs the risk of a filtering iterator like
MachineRegisterInfo::reg_nodbg_iterator to process these operands
erroneously when not parsed from MIR sources.

This issue was observed in the development of the llvm-mos fork which
adds a backend that relies on physical register operands much more than
existing targets. Physical RegUnit 0 has the same numeric encoding as
$noreg (indicating an undef for DBG_VALUE). Allowing debug operands into
the machine scheduler correlates $noreg with RegUnit 0 (i.e. a collision
of register numbers with different zero semantics). Eventually, this
causes an assert where DBG_VALUE instructions are prohibited from
participating in live register ranges.

Reviewed By: MatzeB, StephenTozer

Differential Revision: https://reviews.llvm.org/D110105
2021-10-07 16:08:52 +01:00
Stanislav Mekhanoshin 11b7ee974a [AMDGPU] Avoid assert for saved FP
With spilling into AGPRs enabled we cannot reliably predict
if we need to save FP or not. We may finally spill everything
into AGPRs and never touch stack. In this case we still may
save FP. This is deficiency but not an error, so avoid the
assert.

Differential Revision: https://reviews.llvm.org/D107404
2021-08-25 09:50:59 -07:00
RamNalamothu 1a8c57179a [AMDGPU] We would need FP if there is call and caller save VGPR spills
Since https://reviews.llvm.org/D98319, determineCalleeSavesSGPR() needs
to consider caller save VGPR spills as well while anticipating if we
require FP.

Fixes: SWDEV-295978

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106758
2021-07-28 11:12:55 +05:30
Sebastian Neubauer 4359b870b1 [AMDGPU] Init scratch only if necessary
If no scratch or flat instructions are used, we do not need to
initialize the flat scratch hardware register.

Differential Revision: https://reviews.llvm.org/D105920
2021-07-14 10:45:22 +02:00
Matt Arsenault eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Sebastian Neubauer 96e1fcb1e0 [AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.

Differential Revision: https://reviews.llvm.org/D103322
2021-06-07 16:09:48 +02:00
Andy Wingo 82f92e35c6 [WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
2021-06-01 11:31:39 +02:00
Andy Wingo bc1ad6e3c4 Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables"
This reverts commit bf35f4af51.  There was
an error in a shared-library build.
2021-05-31 10:55:15 +02:00
Andy Wingo bf35f4af51 [WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
2021-05-31 10:40:38 +02:00
Nemanja Ivanovic e0c8265437 Revert "Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI."
Since ca5f07f8c4 already reverted
the cause for this warning, this commit now causes warnings about
a default label in a switch that covers the enum.

This reverts commit cf2eeb114c.
2021-05-28 10:53:49 -05:00
Simon Pilgrim cf2eeb114c Fix "enumerator 'llvm::TargetStackID::WasmLocal' in switch of enum 'llvm::TargetStackID::Value' is not handled" MSVC warnings. NFCI. 2021-05-28 12:47:22 +01:00
Stanislav Mekhanoshin 6fb02596a2 [AMDGPU] Add support for architected flat scratch
Add support for the readonly flat Scratch register initialized
by the SPI.

Differential Revision: https://reviews.llvm.org/D102432
2021-05-14 10:53:48 -07:00
David Stuttard 606d4e8061 AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Retrying after revert and fix (removed implicit def flag from operand). Now
passes with expensive_checks enabled.

Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Ie3b8b2921237968caca91527dd0c97b1b0cc0360
2021-05-07 13:42:57 +01:00
David Stuttard 793b4b2603 Revert "AMDGPU: Correct const_index_stride for wave 32 for PAL ABI"
This reverts commit 442de0c1ad.
2021-05-07 12:49:17 +01:00
David Stuttard 442de0c1ad AMDGPU: Correct const_index_stride for wave 32 for PAL ABI
Since there is a single scratch resource descriptor for all shaders, if there is
a wave32 and a wave64 shader (for instance for VsFs pairs)
then the const_index_stride will be incorrect for wave32 shaders.

Differential Revision: https://reviews.llvm.org/D101830

Change-Id: Id8de5566b0d1a07a814e2e7db016df9d20bf6d2c
2021-05-07 12:19:49 +01:00
Sebastian Neubauer 9569d5ba02 [AMDGPU] Allow buildSpillLoadStore in empty bb
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.

This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.

Differential Revision: https://reviews.llvm.org/D101356
2021-04-29 12:53:20 +02:00
Sebastian Neubauer 3366d81153 [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429

Reapply with fixed tests on window.
2021-04-23 18:09:24 +02:00
Sebastian Neubauer 22d99cb63f Revert "[AMDGPU] Save WWM registers in functions"
This reverts commit 91464c30bf.

Seems to break tests on windows.
2021-04-23 16:38:50 +02:00
Sebastian Neubauer 91464c30bf [AMDGPU] Save WWM registers in functions
The values of registers in inactive lanes needs to be saved during
function calls.

Save all registers used for whole wave mode, similar to how it is done
for VGPRs that are used for SGPR spilling.

Differential Revision: https://reviews.llvm.org/D99429
2021-04-23 16:09:31 +02:00
Sebastian Neubauer b76c2a6c2b [AMDGPU] Fix saving fp and bp
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.

This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
   epilog is a subset of exec in the prolog.

Differential Revision: https://reviews.llvm.org/D96869
2021-04-12 11:52:55 +02:00
Sebastian Neubauer 32bc9a9bc3 [AMDGPU] Unify spill code
Instead of reimplementing spilling in prolog and epilog, reuse
buildSpillLoadStore.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D99269
2021-04-12 11:19:08 +02:00
Sebastian Neubauer f9a8c6a0e5 [AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.

The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?

If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC

If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)

Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.

Differential Revision: https://reviews.llvm.org/D96336
2021-04-12 11:01:38 +02:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Sebastian Neubauer c10cc4ea27 [AMDGPU] Fix computing live registers in prolog
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.

Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.

Differential Revision: https://reviews.llvm.org/D100098
2021-04-08 14:52:50 +02:00
Sebastian Neubauer 2dc6be5209 [AMDGPU] Update SGPRSpillVGPRCSR name. NFC
The struct is used for both, callee and caller-save registers now.
The frame index is not set for entrypoints, as we do not need to save
the registers then.
Update the struct name to reflect that.

Differential Revision: https://reviews.llvm.org/D99722
2021-04-07 16:30:40 +02:00
Tomas Matheson a9968c0a33 [NFC][CodeGen] Tidy up TargetRegisterInfo stack realignment functions
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.

This patch attempts to clarify the situation by separating them and introducing
new names:

 - shouldRealignStack - true if there is any reason the stack should be
   realigned

 - canRealignStack - true if we are still able to realign the stack (e.g. we
   can still reserve/have reserved a frame pointer)

 - hasStackRealignment = shouldRealignStack && canRealignStack (not target
   customisable)

Targets can now override shouldRealignStack to indicate that stack realignment
is required.

This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).

Differential Revision: https://reviews.llvm.org/D98716

Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
2021-03-30 17:31:39 +01:00
Sebastian Neubauer 1c3b74f0ab [AMDGPU] Remove outdated TODOs. NFC
spillSGPRToVGPR is already respected in these places since D95768.

Differential Revision: https://reviews.llvm.org/D99570
2021-03-30 15:18:49 +02:00
RamNalamothu 43f2d269b3 [AMDGPU, NFC] Refactor FP/BP spill index code in emitPrologue/emitEpilogue
Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D98617
2021-03-16 19:19:45 +05:30
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Sebastian Neubauer 6c59dc474d [AMDGPU] Save all lanes for reserved VGPRs
When SGPRs are spilled to VGPRs, they can overwrite any lane. We need
to preserve the value of inactive lanes in function calls, so we save
the register even if it is marked as caller saved.

Also, teach buildPrologSpill to work when no registers are free like in
CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on
findScratchNonCalleeSaveRegister as it is not used anymore to realign
the stack pointer since D95865.

Differential Revision: https://reviews.llvm.org/D95946
2021-02-04 09:56:36 +01:00
Sebastian Neubauer 8b898b19a8 [AMDGPU] Remove unused tmp register
The temporary register is only used to compute the frame pointer.
The frame pointer is overwritten and not used in between, so we
can reuse the frame pointer for the computation, saving one register.

Differential Revision: https://reviews.llvm.org/D95865
2021-02-02 17:17:54 +01:00
Sebastian Neubauer 6b6ae583cf [AMDGPU] Save fp/bp after csr saves
Saving callee-save registers happens in whole wave mode. Exec is saved
to a free register, which can be reused to save the frame pointer.
Therefore, saving the fp needs to happen after saving csrs.

Differential Revision: https://reviews.llvm.org/D95861
2021-02-02 17:17:54 +01:00
Sebastian Neubauer b91afa474e [AMDGPU] Mark epilog restores as frame-destroy
I guess instructions were marked as frame-setup by accident, they are
restores as part of the epilog.

Differential Revision: https://reviews.llvm.org/D95783
2021-02-02 10:24:37 +01:00
Austin Kerbow e068e236c3 [AMDGPU] Fix release build after 0397dca0. 2021-02-01 08:55:14 -08:00