Commit Graph

308 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 791ec1c68e [AMDGPU] Add intrinsics llvm.amdgcn.{raw|struct}.buffer.load.lds
Differential Revision: https://reviews.llvm.org/D124884
2022-05-17 10:32:13 -07:00
Petar Avramovic e06290e53f AMDGPU/GlobalISel: Fix isVCC for uniform s1 with reg class on wave32
Fix isVCC for register that was assigned register class during
inst-selection. This happens when register has multiple uses.
For wave32, uniform i1 to vcc copy was selected like vcc to vcc
copy when uniform i1 had assigned register class.
Uniform i1 register with assigned register class will have s1 LLT,
be defined using G_TRUNC and class will be SReg_32RegClass.
Vcc i1 register with assigned register class will have s1 LLT,
class will be SReg_32RegClass for wave32 and SReg_64RegClass for
wave64 and register will not be defined by G_TRUNC.

Differential Revision: https://reviews.llvm.org/D124163
2022-04-21 16:12:04 +02:00
Jay Foad 1f91512268 [AMDGPU] Simplify calls to getDefSrcRegIgnoringCopies. NFC.
getDefSrcRegIgnoringCopies never returns None on valid MIR.
2022-04-20 12:37:24 +01:00
Matt Arsenault 463bc93e5f AMDGPU/GlobalISel: Remove unused parameter 2022-04-11 19:43:37 -04:00
Stanislav Mekhanoshin 6e3e14f600 [AMDGPU] Support gfx940 smfmac instructions
Differential Revision: https://reviews.llvm.org/D122191
2022-03-24 12:40:42 -07:00
Abinav Puthan Purayil f59cb41ba1 [AMDGPU] Select buffer_atomic_cmpswap* in tblgen
This change replaces the manual selection of buffer_atomic_cmpswap*
instructions in SelectionDAG and GlobalISel with a tblgen based
selection in BUFInstructions.td. This allows us to select the return and
no-return variants in tblgen.

Differential Revision: https://reviews.llvm.org/D121770
2022-03-17 10:12:32 +05:30
Shengchen Kan 37b378386e [NFC][CodeGen] Rename some functions in MachineInstr.h and remove duplicated comments 2022-03-16 20:25:42 +08:00
serge-sans-paille 989f1c72e0 Cleanup codegen includes
This is a (fixed) recommit of https://reviews.llvm.org/D121169

after:  1061034926
before: 1063332844

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121681
2022-03-16 08:43:00 +01:00
Stanislav Mekhanoshin c4500de255 [AMDGPU] gfx940: disable OP_SEL on V_DOT instructions
Differential Revision: https://reviews.llvm.org/D121634
2022-03-14 17:02:00 -07:00
Stanislav Mekhanoshin 36fe3f13a9 [AMDGPU] flat scratch SVS addressing mode for gfx940
Both VADDR and SADDR are used in SVS mode.

Differential Revision: https://reviews.llvm.org/D121254
2022-03-14 15:23:36 -07:00
Nico Weber a278250b0f Revert "Cleanup codegen includes"
This reverts commit 7f230feeea.
Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang,
and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10 07:59:22 -05:00
serge-sans-paille 7f230feeea Cleanup codegen includes
after:  1061034926
before: 1063332844

Differential Revision: https://reviews.llvm.org/D121169
2022-03-10 10:00:30 +01:00
Sebastian Neubauer 6527b2a4d5 [AMDGPU][NFC] Fix typos
Fix some typos in the amdgpu backend.

Differential Revision: https://reviews.llvm.org/D119235
2022-02-18 15:05:21 +01:00
Matt Arsenault aa88b65392 AMDGPU/GlobalISel: Fix assert on invalid cond code for llvm.amdgcn.icmp 2022-01-27 10:34:06 -05:00
Matt Arsenault 045be6ff36 AMDGPU/GlobalISel: Fold wave address into mubuf addressing modes 2022-01-26 15:25:26 -05:00
Sebastian Neubauer ae2f9c8be8 [AMDGPU] Remove lz and nomip combine from codegen
These combines have been moved into the IR combiner in D116042.

Differential Revision: https://reviews.llvm.org/D116116
2022-01-21 12:09:08 +01:00
Matt Arsenault 064cea9c9a AMDGPU/GlobalISel: Try to use s_and_b64 in ptrmask selection
Avoids a test diff with SDAG.
2022-01-20 12:56:53 -05:00
Matt Arsenault 7f26a1027f AMDGPU/GlobalISel: Introduce pseudo to copy sp in call sequences
Arbitrary stack pointers are accessed using MUBUF instructions with
the voffset field, which is interpreted as the swizzled address. We
want to fold fold into the MUBUF form to use the SP in the SGPR
offset, and previously we were special casing the interpretation of
the pointer value if the access memory operand said it was relative to
the stack pointer.

690f5b7a01 removed this check, and moved
the DAG path to special casing copies from SGPRs. This is not an
entirely sound approach, since it's still changing the interpretation
of pointer values based the context.

Introduce a new pseudo which corresponds to the wave-to-vector address
transform. This way the memory instruction has consistent semantics
where the incoming pointer is always interpreted as a vector address,
and we're not obligated to optimize into the MUBUF offset-only
addressing mode. The DAG should probably have an equivalent pseudo.

This should fix some correctness issues, and folding this into
addressing modes will be a future optimization patch.
2022-01-19 10:13:31 -05:00
Matt Arsenault 8e682086a0 AMDGPU/GlobalISel: Explicitly track d16 for image legalization
We were trying to guess at the original IR type for image intrinsics
after legalization to figure out if they were d16, but this didn't
work. Explicitly track if this is a d16 operation or not in the
opcode, as is done for the buffer intrinsics.

The OpenCL library is using f32 image writes with a dmask of 15 for
some reason, and this was incorrectly switching them to use d16. Fixes
image failures in the OpenCL conformance test. The equivalent dmask
for loads doesn't even select in either selector.
2022-01-10 14:25:14 -05:00
Matt Arsenault 0ba4e4b500 GlobalISel: Pass DebugLoc to getFunctionLiveInPhysReg
Fixes crash in assertion about dropping debug info.
2022-01-10 13:50:52 -05:00
Kazu Hirata 2aed08131d [llvm] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-07 00:39:14 -08:00
Kazu Hirata f3a344d212 [Target] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-06 22:01:44 -08:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Abinav Puthan Purayil 078da26b1c [AMDGPU] Check for unneeded shift mask in shift PatFrags.
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.

Differential Revision: https://reviews.llvm.org/D113448
2021-11-24 10:53:12 +05:30
Sebastian Neubauer fd1cfc9094 [AMDGPU][GlobalISel] Fix waterfall loops
- Move the `s_and exec` to its correct position before the content of
  the waterfall loop
- Use the SI_WATERFALL pseudo instruction, like for sdag, to benefit
  from optimizations
- Add support for indirect function calls

To support indirect calls, add a G_SI_CALL instruction without register
class restrictions and insert a waterfall loop when applying register
banks.

Differential Revision: https://reviews.llvm.org/D109052
2021-10-28 10:30:55 +02:00
Dávid Bolvanský fb84aa2a8f Fixed warnings in target/parser codes produced by -Wbitwise-instead-of-logicala 2021-10-03 15:04:01 +02:00
Jacob Lambert dc6e8dfdfe [AMDGPU][NFC] Correct typos in lib/Target/AMDGPU/AMDGPU*.cpp files. Test commit for new contributor. 2021-09-20 14:48:50 -07:00
Petar Avramovic d477a7c2e7 GlobalISel/Utils: Refactor integer/float constant match functions
Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT

Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.

Relevant matchers now return both DefVReg and APInt/APFloat.

Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant

In other places use integer only constant match.

Differential Revision: https://reviews.llvm.org/D104409
2021-09-17 11:22:13 +02:00
Daniil Fukalov 48958d02d2 [NFC][AMDGPU] Reduce includes dependencies.
1. Splitted out some parts of R600 target to separate modules/headers.
2. Reduced some include lists in headers.
3. Found and fixed issue with override `GCNTargetMachine::getSubtargetImpl()`
   and `R600TargetMachine::getSubtargetImpl()` had different return value type
   than base class.
4. Minor forward declarations cleanup.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D108596
2021-08-25 12:01:55 +03:00
Mirko Brkusanin 971f4173f8 [AMDGPU][GlobalISel] Insert an and with exec before s_cbranch_vccnz if necessary
While v_cmp will AND inactive lanes with 0, that is not the case for logical
operations.

This fixes a Vulkan CTS test that would hang otherwise.

Differential Revision: https://reviews.llvm.org/D105709
2021-07-29 11:20:49 +02:00
Brendon Cahoon f9f5d41545 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
Sander de Smalen d5e14ba88c [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
Sebastian Neubauer 96e1fcb1e0 [AMDGPU] Use s_add_i32 for address additions
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.

Differential Revision: https://reviews.llvm.org/D103322
2021-06-07 16:09:48 +02:00
Stanislav Mekhanoshin 9e2e49328f [AMDGPU] All GWS instructions need aligned VGPR on gfx90a
Fixes: SWDEV-288006

Differential Revision: https://reviews.llvm.org/D103197
2021-06-01 17:08:03 -07:00
Sebastian Neubauer 690f5b7a01 [AMDGPU] Fix function calls with flat scratch
When flat scratch is used, the stack pointer needs to be added when
writing arguments to the stack.
For buffer instructions, this is done in SelectMUBUFScratchOffen
and SelectMUBUFScratchOffset.

Move that to call argument lowering, like it is done in GlobalISel.

Differential Revision: https://reviews.llvm.org/D103166
2021-05-28 11:22:13 +02:00
David Stuttard 31b62aa162 [AMDGPU] Fix codegen of image intrinsics for g16 and a16
For gfx10 gradient (g16) and address (a16) can be independent. Previous
implementation assumed that a16 implied g16.

There are some other changes that fix the verification (as well as asm/disasm)
that are required for the included test to pass - the XFAIL will be removed in
those changes.

This also includes required fixes for GlobalISel

Differential Revision: https://reviews.llvm.org/D102066

Change-Id: I7d171cc90994de05f41669b66a6d0ffa2ed05d09
2021-05-14 09:28:15 +01:00
Stanislav Mekhanoshin 909a5ccf3b [AMDGPU] Improve global SADDR selection
An address can be a uniform sum of two i64 bit values.
That regularly happens in a loop where index is an induction
variable promoted to 64 bit by the LSR. We can materialize
zero in a VGPR and still use SADDR form of the load.

Differential Revision: https://reviews.llvm.org/D101591
2021-05-05 14:44:21 -07:00
Petar Avramovic c34900e133 AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448
2021-04-29 20:56:03 +02:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
Jay Foad 3d07a6d891 [AMDGPU][GlobalISel] Add IMG init in selectImageIntrinsic
Doing this during instruction selection avoids the cost of running
SIAddIMGInit which is yet another pass over the MIR.

Differential Revision: https://reviews.llvm.org/D99670
2021-04-01 18:13:17 +01:00
Jay Foad 5d0e9ddfa5 [AMDGPU][GlobalISel] Add support for global atomicrmw fadd
This includes gfx908 which only has a no-return version of the
global_atomic_add_f32 instruction, using the same hack that was
previously implemented for selecting from the
llvm.amdgcn.global.atomic.fadd intrinsic.

Differential Revision: https://reviews.llvm.org/D97767
2021-03-31 11:13:00 +01:00
Stanislav Mekhanoshin 3bffb1cd0e [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Matt Arsenault 70cb57d7da AMDGPU/GlobalISel: Improve private addressing mode matching
This enables the look-through-copy to hack around not correctly
regbankselecting constants to match the use bank.
2021-03-11 10:23:35 -05:00
Piotr Sobczak 4672bac177 [AMDGPU] Introduce Strict WQM mode
* Add amdgcn_strict_wqm intrinsic.
* Add a corresponding STRICT_WQM machine instruction.
* The semantic is similar to amdgcn_strict_wwm with a notable difference that not all threads will be forcibly enabled during the computations of the intrinsic's argument, but only all threads in quads that have at least one thread active.
* The difference between amdgc_wqm and amdgcn_strict_wqm, is that in the strict mode an inactive lane will always be enabled irrespective of control flow decisions.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96258
2021-03-03 14:19:16 +01:00
Piotr Sobczak c3ce7bae80 [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm
* Introduce the new intrinsic amdgcn_strict_wwm
 * Deprecate the old intrinsic amdgcn_wwm

The change is done for consistency as the "strict"
prefix will become an important, distinguishing factor
between amdgcn_wqm and amdgcn_strictwqm in the future.

The "strict" prefix indicates that inactive lanes do not
take part in control flow, specifically an inactive lane
enabled by a strict mode will always be enabled irrespective
of control flow decisions.

The amdgcn_wwm will be removed, but doing so in two steps
gives users time to switch to the new name at their own pace.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D96257
2021-03-03 09:33:57 +01:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Matt Arsenault 78b6d73a93 AMDGPU: Add even aligned VGPR/AGPR register classes
gfx90a operations require even aligned registers, but this was
previously achieved by reserving registers inside the full class.

Ideally this would be captured in the static instruction definitions
for the operands, and we would have different instructions per
subtarget. The hackiest part of this is we need to manually reassign
AGPR register classes after instruction selection (we get away without
this for VGPRs since those types are actually registered for legal
types).
2021-02-24 14:49:37 -05:00
Stanislav Mekhanoshin a8d9d50762 [AMDGPU] gfx90a support
Differential Revision: https://reviews.llvm.org/D96906
2021-02-17 16:01:32 -08:00
Mirko Brkusanin 3c979ae9ec [AMDGPU][GlobalISel] Remove redundant cmp when copying constant to vcc
Differential Revision: https://reviews.llvm.org/D95540
2021-01-28 11:20:09 +01:00