Commit Graph

176 Commits

Author SHA1 Message Date
Jay Foad 3eb2281bc0 [AMDGPU] Aggressively fold immediates in SIFoldOperands
Previously SIFoldOperands::foldInstOperand would only fold a
non-inlinable immediate into a single user, so as not to increase code
size by adding the same 32-bit literal operand to many instructions.

This patch removes that restriction, so that a non-inlinable immediate
will be folded into any number of users. The rationale is:
- It reduces the number of registers used for holding constant values,
  which might increase occupancy. (On the other hand, many of these
  registers are SGPRs which no longer affect occupancy on GFX10+.)
- It reduces ALU stalls between the instruction that loads a constant
  into a register, and the instruction that uses it.
- The above benefits are expected to outweigh any increase in code size.

Differential Revision: https://reviews.llvm.org/D114643
2022-05-18 10:19:35 +01:00
Christudasan Devadasan 6dd21d1db1 [AMDGPU][SIFoldOperands] Consider the alignment constraints
Enforced an alignment check while folding the operands.
2022-03-17 08:27:53 +05:30
Shengchen Kan 37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
Stanislav Mekhanoshin c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Christudasan Devadasan 0d849b8249 AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users
Use TII::getRegClass to return a valid regclass or a nullptr
if the RC is unknown for a given OpIdx. This fixes a potential
crash occurred while getting the RC from a variadic instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D120813
2022-03-08 10:11:57 +05:30
Stanislav Mekhanoshin e7b362d75d [AMDGPU] Add v_mov_b64 gfx940 opcode
Differential Revision: https://reviews.llvm.org/D121023
2022-03-07 12:07:12 -08:00
Jay Foad 69ab233a15 [AMDGPU] Return better Changed status from SIFoldOperands
Differential Revision: https://reviews.llvm.org/D120023
2022-02-18 10:35:48 +00:00
Stanislav Mekhanoshin dbf278b984 [AMDGPU] Prevent aliasing of SrcC and Dst in MAI
Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs
or Src_C and vDst are completely separated. The case that Src_C and vDst
are overlapping should be avoid as new value could be written to accumulator
input before it gets read.

Note that this inevitably increases register pressure to the point where
some programs will become uncompilable.

This patch separates MAC and FMA versions of MFMA instructions using either
tied dst and src2 or earlyclobber dst.

Fixes: SWDEV-318900

Differential Revision: https://reviews.llvm.org/D117844
2022-01-26 14:48:20 -08:00
Jack Andersen f108c7f59d [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852
2021-12-05 15:55:59 -05:00
Christudasan Devadasan 654c89d85a [AMDGPU] Make vector superclasses allocatable
The combined vector register classes with both
VGPRs and AGPRs are currently unallocatable.
This patch turns them into allocatable as a
prerequisite to enable copy between VGPR and
AGPR registers during regalloc.

Also, added the missing AV register classes from
192b to 1024b.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D109300
2021-11-26 00:42:12 -05:00
Zarko Todorovski 5b8bbbecfa [NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target
Reworded removed code comments that contain `sanity check` and `sanity
test`.
2021-11-17 21:59:00 -05:00
Jay Foad 3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Neubauer, Sebastian d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Jay Foad 6cef28ed2d [TII] Remove the MFI argument to convertToThreeAddress. NFC.
This simplifies the API and addresses a FIXME in
TwoAddressInstructionPass::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D110229
2021-09-23 08:58:46 +01:00
Mikael Holmen e7b169a8ae [AMDGPU] Fix gcc warnings about unused variables [NFC] 2021-09-23 08:08:00 +02:00
Jay Foad 0205806d0f [AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers
Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac
instruction, so we might as well convert it to the more flexible VOP3-
only mad/fma form.

With this change, the only way we should emit VOP3-encoded mac/fmac is
if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs
for both src0 and src1. In all other cases the mac/fmac should either be
converted to mad/fma or shrunk to VOP2 encoding.

Differential Revision: https://reviews.llvm.org/D110156
2021-09-22 09:36:34 +01:00
Sebastian Neubauer f3fe44fa05 [AMDGPU] Fix too many constants with flat scratch
Prevent SIFoldOperands from creating SALU instructions with a constant
and a frame index. Previously, only one operand was checked to be a
frame index, leading to too many constants when flat scratch is enabled
and stack offsets are large.

Differential Revision: https://reviews.llvm.org/D108368
2021-08-20 08:21:36 +02:00
Matt Arsenault 39f8a792f0 AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.
2021-06-22 13:42:49 -04:00
Jay Foad 7c706af03b [AMDGPU] SIFoldOperands: clean up tryConstantFoldOp
First clean up the strange API of tryConstantFoldOp where it took an
immediate operand value, but no indication of which operand it was the
value for.

Second clean up the loop that calls tryConstantFoldOp so that it does
not have to restart from the beginning every time it folds an
instruction.

This is NFCI but there are some minor changes caused by the order in
which things are folded.

Differential Revision: https://reviews.llvm.org/D100031
2021-05-06 09:55:22 +01:00
Matt Arsenault b58332774f AMDGPU: Fix assert on inline asm on gfx90a
This was assuming all mayLoad instructions have one def.
2021-04-23 09:00:25 -04:00
Matt Arsenault 987e52851e AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies 2021-04-21 21:58:18 -04:00
Jay Foad 323ef0eb45 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Reapply with a fix for using InstToErase after it had been erased.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-19 12:05:41 +01:00
Mitch Phillips 3d4730a73f Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs"
This reverts commit d19a42eba9.

Reason: Broke the ASan buildbots. See the original phabricator review
for more details: https://reviews.llvm.org/D100188
2021-04-09 15:47:44 -07:00
Jay Foad d19a42eba9 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-09 20:41:09 +01:00
Jay Foad a4ced03d34 [AMDGPU] SIFoldOperands: eagerly delete dead copies
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.

Differential Revision: https://reviews.llvm.org/D100117
2021-04-09 13:52:54 +01:00
Jay Foad a1a372dfb5 [AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC. 2021-04-08 16:37:43 +01:00
Jay Foad a250e91d10 [AMDGPU] SIFoldOperands: make use of emplace_back. NFC. 2021-04-08 14:34:10 +01:00
Jay Foad 2724b57ecd [AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC. 2021-04-08 14:32:36 +01:00
Jay Foad c28f79a0e3 [AMDGPU] SIFoldOperands: try harder to fold cndmask instructions
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
2021-04-08 14:26:12 +01:00
Jay Foad 3344cd3a14 [AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC. 2021-04-08 14:05:29 +01:00
Jay Foad 94a6fe43de [AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC. 2021-04-08 13:16:07 +01:00
Jay Foad bf6cab6f07 [AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC. 2021-04-07 14:13:00 +01:00
Jay Foad 8f798566a3 [AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC. 2021-04-06 17:53:48 +01:00
Jay Foad efc7bf27f5 [AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad 005dcd196e [AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range
NFC.
2021-04-06 15:23:58 +01:00
Jay Foad ce9cca6c3a [AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask
This follows the pattern of the other tryFold* functions. NFC.
2021-04-06 15:23:58 +01:00
Jay Foad cf4f5292f6 [AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef
We are in SSA so getVRegDef is equivalent but simpler. NFC.
2021-04-06 15:23:58 +01:00
Brendon Cahoon 65c8bfb509 [AMDGPU] Enable output modifiers for double precision instructions
Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64
instructions for folding output modifiers.

Differential Revision: https://reviews.llvm.org/D99505
2021-04-01 10:08:17 -04:00
Stanislav Mekhanoshin 619b88849e [AMDGPU] Fix "Sequence" spelling. NFC. 2021-03-29 12:11:36 -07:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Christudasan Devadasan ff8a1cae18 [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D95071
2021-01-22 14:20:59 +05:30
dfukalov 560d7e0411 [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Jay Foad 3914bebe91 [AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands
Convert it to v_fma_legacy_f32 if it is profitable to do so, just like
other mac instructions that are converted to their mad equivalents.

Differential Revision: https://reviews.llvm.org/D94010
2021-01-05 11:55:33 +00:00
Jay Foad 4e6054a86c [AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC.
Differential Revision: https://reviews.llvm.org/D94009
2021-01-05 11:54:48 +00:00
Kazu Hirata d6ff5cf995 [Target] Use llvm::any_of (NFC) 2020-12-24 19:43:26 -08:00
Stanislav Mekhanoshin ae8f4b2178 [AMDGPU] Folding of FI operand with flat scratch
Differential Revision: https://reviews.llvm.org/D93501
2020-12-22 10:48:04 -08:00
Michael Liao 1fd1f638b6 [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
- Once an instruction is simplified, foldable candidates from it should
  be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174
2020-12-14 13:08:13 -05:00