llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	b5bc205d75	AMDGPU: Convert some bit operation tests to opaque pointers	2022-11-29 18:36:53 -05:00
Jay Foad	342642dc75	[AMDGPU][GISel] Smaller code for scalar 32 to 64-bit extensions Differential Revision: https://reviews.llvm.org/D107639	2022-11-16 06:57:21 +00:00
Carl Ritson	4c4db81630	[AMDGPU] Extend SILoadStoreOptimizer to s_load instructions Apply merging to s_load as is done for s_buffer_load. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D130742	2022-07-30 11:38:39 +09:00
Jay Foad	5cae88164e	[AMDGPU] Add GFX11 test coverage Add GFX11 test coverage to a bunch of tests where it was easy to do so, mostly because the checks are autogenerated and/or GFX11 can share the same checks as GFX10. Differential Revision: https://reviews.llvm.org/D129295	2022-07-08 09:13:59 +01:00
Jay Foad	3eb2281bc0	[AMDGPU] Aggressively fold immediates in SIFoldOperands Previously SIFoldOperands::foldInstOperand would only fold a non-inlinable immediate into a single user, so as not to increase code size by adding the same 32-bit literal operand to many instructions. This patch removes that restriction, so that a non-inlinable immediate will be folded into any number of users. The rationale is: - It reduces the number of registers used for holding constant values, which might increase occupancy. (On the other hand, many of these registers are SGPRs which no longer affect occupancy on GFX10+.) - It reduces ALU stalls between the instruction that loads a constant into a register, and the instruction that uses it. - The above benefits are expected to outweigh any increase in code size. Differential Revision: https://reviews.llvm.org/D114643	2022-05-18 10:19:35 +01:00
Austin Kerbow	da067ed569	[AMDGPU] Set most sched model resource's BufferSize to one Using a BufferSize of one for memory ProcResources will result in better ILP since it more accurately models the dependencies between memory ops and their consumers on an in-order processor. After this change, the scheduler will treat the data edges from loads as blocking so that stalls are guaranteed when waiting for data to be retreaved from memory. Since we don't actually track waitcnt here, this should do a better job at modeling their behavior. Practically, this means that the scheduler will trigger the 'STALL' heuristic more often. This type of change needs to be evaluated experimentally. Preliminary results are positive. Fixes: SWDEV-282962 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D114777	2021-12-01 22:31:28 -08:00
Konstantin Schwarz	d2e66d7fa4	[GlobalISel] Add a combine for and(load , mask) -> zextload This only handles simple masks, not shifted masks, for now. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D109357	2021-09-16 10:42:46 +02:00
alex-t	ed0f4415f0	[AMDGPU] Divergence-driven compare operations instruction selection Description: This change enables the compare operations to be selected to SALU/VALU form dependent of the SDNode divergence flag. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D106079	2021-08-25 18:30:49 +03:00
Jay Foad	83610d4eb0	[AMDGPU][GlobalISel] Better legalization of 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107474	2021-08-06 09:40:48 +01:00
Jay Foad	24b67a9024	[AMDGPU][GlobalISel] Improve regbankselect for 64-bit VGPR ctlz_zero_undef/cttz_zero_undef We can improve on the generic splitting by using ffbh/ffbl, which have a defined result when the input is zero. Differential Revision: https://reviews.llvm.org/D107442	2021-08-06 09:40:48 +01:00
Jay Foad	2b63933115	[AMDGPU][SDag] Better lowering for 32-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107566	2021-08-05 15:57:40 +01:00
Jay Foad	e6c364a624	[AMDGPU][SDag] Better lowering for 64-bit ctlz/cttz Differential Revision: https://reviews.llvm.org/D107546	2021-08-05 15:57:40 +01:00
Jay Foad	ba5c4ac600	[AMDGPU] Add cttz tests and globalisel checks for ctlz	2021-08-04 15:57:14 +01:00
Baptiste Saleil	caf1294d95	[AMDGPU] Experiments show that the GCNRegBankReassign pass significantly impacts the compilation time and there is no case for which we see any improvement in performance. This patch removes this pass and its associated test cases from the tree. Differential Revision: https://reviews.llvm.org/D101313 Change-Id: I0599169a7609c19a887f8d847a71e664030cc141	2021-04-26 17:21:49 -04:00
Dmitry Preobrazhensky	cd953434f2	[AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645. Differential Revision: https://reviews.llvm.org/D99413	2021-04-01 14:21:00 +03:00
Petar Avramovic	b082e6f88a	[AMDGPU] Extend gfx10 test coverage. NFC. Differential Revision: https://reviews.llvm.org/D99267	2021-03-29 11:13:55 +02:00
Austin Kerbow	2291bd137d	[AMDGPU] Update subtarget features for new target ID support Support for XNACK and SRAMECC is not static on some GPUs. We must be able to differentiate between different scenarios for these dynamic subtarget features. The possible settings are: - Unsupported: The GPU has no support for XNACK/SRAMECC. - Any: Preference is unspecified. Use conservative settings that can run anywhere. - Off: Request support for XNACK/SRAMECC Off - On: Request support for XNACK/SRAMECC On GCNSubtarget will track the four options based on the following criteria. If the subtarget does not support XNACK/SRAMECC we say the setting is "Unsupported". If no subtarget features for XNACK/SRAMECC are requested we must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the feature string when initializing the subtarget, the settings are "On/Off". The defaults are updated to be conservatively correct, meaning if no setting for XNACK or SRAMECC is explicitly requested, defaults will be used which generate code that can be run anywhere. This corresponds to the "Any" setting. Differential Revision: https://reviews.llvm.org/D85882	2021-01-26 11:25:51 -08:00
Mircea Trofin	1ebe86adf5	[NFC] Removed unused prefixes in test/CodeGen/AMDGPU More patches to follow. Differential Revision: https://reviews.llvm.org/D94121	2021-01-05 14:16:52 -08:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Piotr Sobczak	62d8b8a225	Fix 64-bit copy to SCC Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (`0045786f14`), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207	2020-08-09 20:50:30 +02:00
Jay Foad	62fd7f767c	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392	2020-07-17 11:02:13 +01:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Matt Arsenault	778351df77	Revert "[AMDGPU] Enable compare operations to be selected by divergence" This reverts commit `521ac0b5ce`. Reported to break thousands of piglit tests.	2020-06-24 11:21:30 -04:00
alex-t	521ac0b5ce	[AMDGPU] Enable compare operations to be selected by divergence Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82194	2020-06-24 11:50:40 +03:00
Piotr Sobczak	6d9565d6d5	Revert "[AMDGPU] Select s_cselect" This caused some failures detected by the buildbot with expensive checks enabled. This reverts commit `4067de569f`.	2020-06-19 16:41:04 +02:00
Piotr Sobczak	4067de569f	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81925	2020-06-19 16:17:46 +02:00
Jay Foad	43830790d7	[AMDGPU] Remove dubious logic in bidirectional list scheduler Summary: pickNodeBidirectional tried to compare the best top candidate and the best bottom candidate by examining TopCand.Reason and BotCand.Reason. This is unsound because, after calling pickNodeFromQueue, Cand.Reason does not reflect the most important reason why Cand was chosen. Rather it reflects the most recent reason why it beat some other potential candidate, which could have been for some low priority tie breaker reason. I have seen this cause problems where TopCand is a good candidate, but because TopCand.Reason is ORDER (which is very low priority) it is repeatedly ignored in favour of a mediocre BotCand. This is not how bidirectional scheduling is supposed to work. To fix this I changed the code to always compare TopCand and BotCand directly, like the generic implementation of pickNodeBidirectional does. This removes some uncommented AMDGPU-specific logic; if this logic turns out to be important then perhaps it could be moved into an override of tryCandidate instead. Graphics shader benchmarking on gfx10 shows a lot more positive than negative effects from this change. Reviewers: arsenm, tstellar, rampitec, kzhuravl, vpykhtin, dstuttard, tpr, atrick, MatzeB Subscribers: jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68338	2020-02-28 21:35:34 +00:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Simon Pilgrim	27d6ea9698	[AMDGPU] Regenerate CTLZ tests for an upcoming patch llvm-svn: 362229	2019-05-31 15:06:14 +00:00
Matt Arsenault	8c4a35237a	AMDGPU: Add pass to lower kernel arguments to loads This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650	2018-06-26 19:10:00 +00:00
Dmitry Preobrazhensky	a0342dc9eb	[AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev} See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765 Reviewers: tamazov, SamWot, arsenm, vpykhtin Differential Revision: https://reviews.llvm.org/D40088 llvm-svn: 318675	2017-11-20 18:24:21 +00:00
Matt Arsenault	6c29c5acfe	AMDGPU: Allow SIShrinkInstructions to work in non-SSA Immediates can be folded as long as the immediate is a vreg. Also undo commuting instructions if it didn't fold an immediate. llvm-svn: 307575	2017-07-10 19:53:57 +00:00
Matt Arsenault	9aa45f047f	AMDGPU: Add macro fusion schedule DAG mutation Try to increase opportunities to shrink vcc uses. llvm-svn: 307313	2017-07-06 20:57:05 +00:00
Alexander Timofeev	982aee6a38	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
NAKAMURA Takumi	e4a741376b	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Alexander Timofeev	ea7f08bee5	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Stanislav Mekhanoshin	5fa289f0d8	[AMDGPU] Narrow lshl from 64 to 32 bit if possible Turn expensive 64 bit shift into 32 bit if shift does not overflow int: shl (ext x) => zext (shl x) Differential Revision: https://reviews.llvm.org/D33367 llvm-svn: 303569	2017-05-22 16:58:10 +00:00
Sam Kolton	9fa169601f	[AMDGPU] Resubmit SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654	2017-04-06 15:03:28 +00:00
Ivan Krasin	d4f70c70b9	Revert r299536. [AMDGPU] SDWA peephole: enable by default. Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583	2017-04-05 19:58:12 +00:00
Sam Kolton	34e29784fb	[AMDGPU] SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536	2017-04-05 12:00:45 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Matt Arsenault	7aad8fd8f4	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982	2017-01-24 22:02:15 +00:00
Matt Arsenault	0b26e47345	AMDGPU: Invert cmp + select with constant Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372	2016-12-22 21:40:08 +00:00
Tom Stellard	115a61560e	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464	2016-11-10 16:02:37 +00:00
Tom Stellard	2d2d33f1dc	Revert "AMDGPU: Add VI i16 support" This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995	2016-11-04 13:06:34 +00:00
Tom Stellard	2b3379cdff	AMDGPU: Add VI i16 support Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939	2016-11-03 17:13:50 +00:00
Konstantin Zhuravlyov	d971a1123f	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716	2016-11-01 17:49:33 +00:00
Konstantin Zhuravlyov	f74fc60a7d	[AMDGPU] Promote uniform (i1, i16] operations to i32 Differential Revision: https://reviews.llvm.org/D25302 llvm-svn: 283555	2016-10-07 14:22:58 +00:00
Matt Arsenault	5d8eb25e78	AMDGPU: Use unsigned compare for eq/ne For some reason there are both of these available, except for scalar 64-bit compares which only has u64. I'm not sure why there are both (I'm guessing it's for the one bit inputs we don't use), but for consistency always using the unsigned one. llvm-svn: 282832	2016-09-30 01:50:20 +00:00
Konstantin Zhuravlyov	e14df4b236	[AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624	2016-09-28 20:05:39 +00:00

1 2

59 Commits