llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Christudasan Devadasan	6dd21d1db1	[AMDGPU][SIFoldOperands] Consider the alignment constraints Enforced an alignment check while folding the operands.	2022-03-17 08:27:53 +05:30
Shengchen Kan	37b378386e	[NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments	2022-03-16 20:25:42 +08:00
Stanislav Mekhanoshin	c4500de255	[AMDGPU] gfx940: disable OP_SEL on V_DOT instructions Differential Revision: https://reviews.llvm.org/D121634	2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin	36fe3f13a9	[AMDGPU] flat scratch SVS addressing mode for gfx940 Both VADDR and SADDR are used in SVS mode. Differential Revision: https://reviews.llvm.org/D121254	2022-03-14 15:23:36 -07:00
Christudasan Devadasan	0d849b8249	AMDGPU: Skip folding REG_SEQUENCE if found unknown regclasses for its users Use TII::getRegClass to return a valid regclass or a nullptr if the RC is unknown for a given OpIdx. This fixes a potential crash occurred while getting the RC from a variadic instruction. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D120813	2022-03-08 10:11:57 +05:30
Stanislav Mekhanoshin	e7b362d75d	[AMDGPU] Add v_mov_b64 gfx940 opcode Differential Revision: https://reviews.llvm.org/D121023	2022-03-07 12:07:12 -08:00
Jay Foad	69ab233a15	[AMDGPU] Return better Changed status from SIFoldOperands Differential Revision: https://reviews.llvm.org/D120023	2022-02-18 10:35:48 +00:00
Stanislav Mekhanoshin	dbf278b984	[AMDGPU] Prevent aliasing of SrcC and Dst in MAI Form the MAI spec: It’s ok that Src_C and vDst are the exact same VGPRs or Src_C and vDst are completely separated. The case that Src_C and vDst are overlapping should be avoid as new value could be written to accumulator input before it gets read. Note that this inevitably increases register pressure to the point where some programs will become uncompilable. This patch separates MAC and FMA versions of MFMA instructions using either tied dst and src2 or earlyclobber dst. Fixes: SWDEV-318900 Differential Revision: https://reviews.llvm.org/D117844	2022-01-26 14:48:20 -08:00
Jack Andersen	f108c7f59d	[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues. Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852	2021-12-05 15:55:59 -05:00
Christudasan Devadasan	654c89d85a	[AMDGPU] Make vector superclasses allocatable The combined vector register classes with both VGPRs and AGPRs are currently unallocatable. This patch turns them into allocatable as a prerequisite to enable copy between VGPR and AGPR registers during regalloc. Also, added the missing AV register classes from 192b to 1024b. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D109300	2021-11-26 00:42:12 -05:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Jay Foad	3264e95938	[CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress Delegate updating of LiveIntervals to each target's convertToThreeAddress implementation, instead of repairing LiveIntervals after the fact in TwoAddressInstruction::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D113493	2021-11-17 10:16:47 +00:00
Neubauer, Sebastian	d1f45ed58f	[AMDGPU][NFC] Fix typos Differential Revision: https://reviews.llvm.org/D113672	2021-11-12 11:37:21 +01:00
Jay Foad	6cef28ed2d	[TII] Remove the MFI argument to convertToThreeAddress. NFC. This simplifies the API and addresses a FIXME in TwoAddressInstructionPass::convertInstTo3Addr. Differential Revision: https://reviews.llvm.org/D110229	2021-09-23 08:58:46 +01:00
Mikael Holmen	e7b169a8ae	[AMDGPU] Fix gcc warnings about unused variables [NFC]	2021-09-23 08:08:00 +02:00
Jay Foad	0205806d0f	[AMDGPU] Convert mac/fmac to mad/fma when folding output modifiers Use of output modifiers forces VOP3 encoding for a VOP2 mac/fmac instruction, so we might as well convert it to the more flexible VOP3- only mad/fma form. With this change, the only way we should emit VOP3-encoded mac/fmac is if regalloc chooses registers that require the VOP3 encoding, e.g. sgprs for both src0 and src1. In all other cases the mac/fmac should either be converted to mad/fma or shrunk to VOP2 encoding. Differential Revision: https://reviews.llvm.org/D110156	2021-09-22 09:36:34 +01:00
Sebastian Neubauer	f3fe44fa05	[AMDGPU] Fix too many constants with flat scratch Prevent SIFoldOperands from creating SALU instructions with a constant and a frame index. Previously, only one operand was checked to be a frame index, leading to too many constants when flat scratch is enabled and stack offsets are large. Differential Revision: https://reviews.llvm.org/D108368	2021-08-20 08:21:36 +02:00
Matt Arsenault	39f8a792f0	AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions These used to consistently be zeroed pre-gfx9, but gfx9 made the situation complicated since now some still do and some don't. This also manages to pick up a few cases that the pattern fails to optimize away. We handle some cases with instruction patterns, but some get through. In particular this improves the integer cases.	2021-06-22 13:42:49 -04:00
Jay Foad	7c706af03b	[AMDGPU] SIFoldOperands: clean up tryConstantFoldOp First clean up the strange API of tryConstantFoldOp where it took an immediate operand value, but no indication of which operand it was the value for. Second clean up the loop that calls tryConstantFoldOp so that it does not have to restart from the beginning every time it folds an instruction. This is NFCI but there are some minor changes caused by the order in which things are folded. Differential Revision: https://reviews.llvm.org/D100031	2021-05-06 09:55:22 +01:00
Matt Arsenault	b58332774f	AMDGPU: Fix assert on inline asm on gfx90a This was assuming all mayLoad instructions have one def.	2021-04-23 09:00:25 -04:00
Matt Arsenault	987e52851e	AMDGPU: Fix assert when trying to fold reg_sequence of physreg copies	2021-04-21 21:58:18 -04:00
Jay Foad	323ef0eb45	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Reapply with a fix for using InstToErase after it had been erased. Differential Revision: https://reviews.llvm.org/D100188	2021-04-19 12:05:41 +01:00
Mitch Phillips	3d4730a73f	Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs" This reverts commit `d19a42eba9`. Reason: Broke the ASan buildbots. See the original phabricator review for more details: https://reviews.llvm.org/D100188	2021-04-09 15:47:44 -07:00
Jay Foad	d19a42eba9	[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs This is fairly cheap to implement and means less work for future passes like MachineDCE. Differential Revision: https://reviews.llvm.org/D100188	2021-04-09 20:41:09 +01:00
Jay Foad	a4ced03d34	[AMDGPU] SIFoldOperands: eagerly delete dead copies This is cheap to implement, means less work for future passes like MachineDCE, and slightly improves the folding in some cases. Differential Revision: https://reviews.llvm.org/D100117	2021-04-09 13:52:54 +01:00
Jay Foad	a1a372dfb5	[AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC.	2021-04-08 16:37:43 +01:00
Jay Foad	a250e91d10	[AMDGPU] SIFoldOperands: make use of emplace_back. NFC.	2021-04-08 14:34:10 +01:00
Jay Foad	2724b57ecd	[AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC.	2021-04-08 14:32:36 +01:00
Jay Foad	c28f79a0e3	[AMDGPU] SIFoldOperands: try harder to fold cndmask instructions Look through copies to find more cases where the two values being selected are identical. The motivation for this is just to be able to remove the weird special case where tryFoldCndMask was called from foldInstOperand, part way through folding a move-immediate into its users, without regressing any lit tests.	2021-04-08 14:26:12 +01:00
Jay Foad	3344cd3a14	[AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC.	2021-04-08 14:05:29 +01:00
Jay Foad	94a6fe43de	[AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC.	2021-04-08 13:16:07 +01:00
Jay Foad	bf6cab6f07	[AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC.	2021-04-07 14:13:00 +01:00
Jay Foad	8f798566a3	[AMDGPU] SIFoldOperands: use isUseMIInFoldList. NFC.	2021-04-06 17:53:48 +01:00
Jay Foad	efc7bf27f5	[AMDGPU] SIFoldOperands: use MachineRegisterInfo::hasOneNonDBGUser NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	005dcd196e	[AMDGPU] SIFoldOperands: use range-based loops and make_early_inc_range NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	ce9cca6c3a	[AMDGPU] SIFoldOperands: rename tryFoldInst to tryFoldCndMask This follows the pattern of the other tryFold* functions. NFC.	2021-04-06 15:23:58 +01:00
Jay Foad	cf4f5292f6	[AMDGPU] SIFoldOperands: use getVRegDef instead of getUniqueVRegDef We are in SSA so getVRegDef is equivalent but simpler. NFC.	2021-04-06 15:23:58 +01:00
Brendon Cahoon	65c8bfb509	[AMDGPU] Enable output modifiers for double precision instructions Update SIFoldOperands pass to recognize v_add_f64 and v_mul_f64 instructions for folding output modifiers. Differential Revision: https://reviews.llvm.org/D99505	2021-04-01 10:08:17 -04:00
Stanislav Mekhanoshin	619b88849e	[AMDGPU] Fix "Sequence" spelling. NFC.	2021-03-29 12:11:36 -07:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
Christudasan Devadasan	ff8a1cae18	[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. During instruction selection, there is an inconsistency in choosing the initial soffset value. With certain early passes, this value is getting modified and that brought additional fixup during eliminateFrameIndex to work for all cases. This whole transformation looks trivial and can be handled better. This patch clearly defines the initial value for soffset and keeps it unchanged before eliminateFrameIndex. The initial value must be zero for MUBUF with a frame index. The non-frame index MUBUF forms that use a raw offset from SP will have the stack register for soffset. During frame elimination, the soffset remains zero for entry functions with zero dynamic allocas and no callsites, or else is updated to the appropriate frame/stack register. Also, did some code clean up and made all asserts around soffset stricter to match. Reviewed By: scott.linder Differential Revision: https://reviews.llvm.org/D95071	2021-01-22 14:20:59 +05:30
dfukalov	560d7e0411	[NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets ... to reduce headers dependency. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D95036	2021-01-20 22:22:45 +03:00
Joe Nash	314e29ed2b	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
dfukalov	6a87e9b08b	[NFC][AMDGPU] Reduce include files dependency. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D93813	2021-01-07 22:22:05 +03:00
Jay Foad	3914bebe91	[AMDGPU] Handle v_fmac_legacy_f32 in SIFoldOperands Convert it to v_fma_legacy_f32 if it is profitable to do so, just like other mac instructions that are converted to their mad equivalents. Differential Revision: https://reviews.llvm.org/D94010	2021-01-05 11:55:33 +00:00
Jay Foad	4e6054a86c	[AMDGPU] Split out new helper function macToMad in SIFoldOperands. NFC. Differential Revision: https://reviews.llvm.org/D94009	2021-01-05 11:54:48 +00:00
Kazu Hirata	d6ff5cf995	[Target] Use llvm::any_of (NFC)	2020-12-24 19:43:26 -08:00
Stanislav Mekhanoshin	ae8f4b2178	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501	2020-12-22 10:48:04 -08:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00

1 2 3 4

176 Commits