llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit `17db0de330`. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Matt Arsenault	27387896cf	SROA: Simplify addrspacecasted allocas with volatile accesses If the alloca is accessed through an addrspacecasted pointer, allow the normal changes on the alloca. Cast back to the original use address space instead of the new alloca's natural address space.	2022-12-02 15:20:56 -05:00
Valery Pykhtin	5ce3273ebf	[AMDGPU] Scheduler: Don't revert the schedule if the register pressure isn't changed for a region This one-linear fix improves compilation time for about ~40% on ASAN enabled code. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D136069	2022-12-02 15:59:38 +01:00
Matt Arsenault	d85e849ff4	AMDGPU: Convert some assorted tests to opaque pointers	2022-12-01 21:40:30 -05:00
Matt Arsenault	0fb74d0ff8	AMDGPU: Fix broken attribute usage in test	2022-12-01 21:32:20 -05:00
Matt Arsenault	75a3f22a90	AMDGPU: Convert some stack tests to opaque pointers	2022-12-01 21:32:20 -05:00
Matt Arsenault	4998de4dcc	AMDGPU: Update some wait tests to opaque pointers The script mangled the constantexprs in waitcnt-looptest.ll, so fixed those manually.	2022-12-01 21:01:58 -05:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit `6d12599fd4`.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Roman Lebedev	7850ab2112	[NFC] Port an assortment of tests that invoke SROA to new pass manager	2022-12-01 21:17:18 +03:00
Freddy Ye	89f36dd8f3	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241	2022-12-01 13:47:43 +08:00
Marco Elver	b95646fe70	Revert "Use-after-return sanitizer binary metadata" This reverts commit `d3c851d3fc`. Some bots broke: - https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview - https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio	2022-11-30 23:35:50 +01:00
Jay Foad	3d9e226081	[AMDGPU] Use s_cmp instead of s_cmpk Don't bother pre-shrinking "s_cmp_lg_u32 reg, 0" to s_cmpk_lg_u32 because 0 is already an inline constant so the s_cmpk form is no smaller. This is just for consistency with the surrounding code and to simplify a downstream patch. Differential Revision: https://reviews.llvm.org/D138993	2022-11-30 18:02:39 +00:00
Dmitry Vyukov	d3c851d3fc	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-11-30 14:50:22 +01:00
Pierre van Houtryve	a88deb4b65	[AMDGPU] Use aperture registers instead of S_GETREG Fixes a longstanding TODO in the codebase where we were using S_GETREG + shift to do something that could simply be done with an inline constant (register). Patch based on D31874 by @kzhuravl Depends on D137767 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137542	2022-11-30 12:25:10 +00:00
Nicolai Hähnle	4e79764f0a	AMDGPU: Fixup tests	2022-11-30 12:37:40 +01:00
Nicolai Hähnle	b7f44f7cf9	AMDGPU: Remove ImagePSV and move images to addrspace 7 Following up on the removal of BufferPSV in commit `43b86bf992` ("AMDGPU: Remove BufferPseudoSourceValue") It is unclear what exactly the right address space for images should be. They seem morally closest to buffers, so that's what I went with. In practical terms, address space 7 is better than address space 0 because it can't alias with LDS. Differential Revision: https://reviews.llvm.org/D138949	2022-11-30 11:32:34 +01:00
Matt Arsenault	6a91a5e826	AMDGPU: Convert some cast tests to opaque pointers	2022-11-29 19:04:58 -05:00
Matt Arsenault	b5bc205d75	AMDGPU: Convert some bit operation tests to opaque pointers	2022-11-29 18:36:53 -05:00
Matt Arsenault	c08d55623d	AMDGPU: Fix creating illegal f16 fp_class We were missing legality checks. The device library build was broken for targets without f16 support. Technically the first pattern isn't tested by this patch; it only triggers with the isBeforeLegalize check in performAndCombine removed. I'm not sure how to trick this into appearing post-legalization.	2022-11-29 18:24:30 -05:00
Matt Arsenault	fac34c01f6	AMDGPU: Bulk update some call tests to use opaque pointers	2022-11-29 18:10:38 -05:00
Matt Arsenault	fb1d166e1d	AMDGPU: Bulk update some generic intrinsic tests to opaque pointers Done purely with the script.	2022-11-29 18:09:59 -05:00
Matt Arsenault	a1ac8902e3	AMDGPU: Convert amdgpu-alias-analysis.ll to opaque pointers This one was slightly tricky. The AA debug printing usually, but not always, uses the old pointer syntax. Also, we need to stop folding out 0 index GEPs in a few of these cases.	2022-11-29 18:08:53 -05:00
Matt Arsenault	177ff42d8e	AMDGPU: Convert some fp op tests to opaque issues fmax_legacy.ll had one test that produced "ptraddrspace(1)", since somehow "i1addrspace(1)*" used to parse.	2022-11-29 18:08:53 -05:00
Ron Lieberman	ca856fff1c	Revert "enable code-object-version=5" very sorry wrong repo. This reverts commit `d882ba7aea`.	2022-11-29 15:21:09 -06:00
Nicolai Hähnle	43b86bf992	AMDGPU: Remove BufferPseudoSourceValue The use of a PSV for buffer intrinsics is misleading because it may be misinterpreted as all buffer intrinsics accessing the same address in memory, which is clearly not true. Instead, build MachineMemOperands without a pointer value but with an address space, so that address space-based alias analysis can still work. There is a lot of test churn because previously address space 4 (constant address space) was used as an address space for buffer intrinsics. This doesn't make much sense and seems to have been an accident -- see the change in AMDGPUTargetMachine::getAddressSpaceForPseudoSourceKind. Differential Revision: https://reviews.llvm.org/D138711	2022-11-29 22:15:11 +01:00
Ron Lieberman	d882ba7aea	enable code-object-version=5	2022-11-29 15:11:57 -06:00
Brendon Cahoon	b32a5666a8	[AMDGPU] Unify uniform return and divergent unreachable blocks This patch fixes a "failed to annotate CFG" error in SIAnnotateControlFlow. The problem occurs when there are divergent and uniform unreachable/return blocks in the same region. In this case, AMDGPUUnifyDivergentExitNodes does not create a unified block so the region contains multiple exits. StructurizeCFG does not work properly when there are multiple exits, so the neccessary CFG transformations do not occur along divergent control flow. Subsequently, SIAnnotateControlFlow processes the path to the divergent exit block, but may only partially process blocks along a unform control flow path to another exit block. This patch fixes the bug by creating a single exit block when there is a divergent exit block in the function. Differential revision: https://reviews.llvm.org/D136892	2022-11-29 13:25:56 -06:00
Matt Arsenault	600e9d33a5	AMDGPU: Fix broken test From `ee29a846c6`	2022-11-29 12:33:44 -05:00
Matt Arsenault	ee29a846c6	DAG: Fix assert when alloca has inconsistent pointer size Take the type from the alloca, not the type to use for allocas. Fixes issue 59250.	2022-11-29 11:48:46 -05:00
Matt Arsenault	a74ea40cb6	AMDGPU: Remove unnecessary metadata from test The pass isn't doing anything with it, and the line wrapping is confusing update_test_checks.	2022-11-29 11:12:08 -05:00
Mateja Marjanovic	595a08847a	[AMDGPU] Add support for new LLVM vector types Add VReg, AReg and SReg on AMDGPU for bit widths: 288, 320, 352 and 384. Differential Revision: https://reviews.llvm.org/D138205	2022-11-29 17:02:04 +01:00
Simon Pilgrim	30eff7f29f	[DAG] Attempt to replace a mul node with an existing umul_lohi/smul_lohi node (PR59217) As discussed on Issue #59217, under certain circumstances the DAG can generate duplicate MUL and MUL_LOHI nodes, often during MULO legalization. This patch attempts to replace MUL nodes with additional uses of the LO result from the MUL_LOHI node Differential Revision: https://reviews.llvm.org/D138790	2022-11-29 12:51:30 +00:00
Thomas Symalla	5f77ee4011	[NFC][AMDGPU] Pre-commit tests for D136432	2022-11-29 12:57:59 +01:00
Stanislav Mekhanoshin	28eb9ed3bb	[AMDGPU] Fine tune LDS misaligned access speed Differential Revision: https://reviews.llvm.org/D124219	2022-11-28 16:12:02 -08:00
Janek van Oirschot	322966f8f8	[AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU Uses existing SelectionDAG lowering of the llvm.amdgcn.class intrinsic for llvm.is.fpclass	2022-11-28 16:00:36 -05:00
Matt Arsenault	ad386a886b	AMDGPU: Bulk update some intrinsic tests to opaque pointers Done entirely with the script.	2022-11-28 14:21:31 -05:00
Matt Arsenault	7a3fb6a6e3	AMDGPU: Convert some memcpy test to opaque pointers memcpy-scoped-aa.ll required manually updating the IR references in the MMOs	2022-11-28 14:11:56 -05:00
Matt Arsenault	8e0fadda10	AMDGPU: Bulk update all GlobalISel tests to use opaque pointers	2022-11-28 11:51:36 -05:00
Matt Arsenault	da0293e3cc	AMDGPU: Bulk update some r600 tests to opaque pointers r600.amdgpu-alias-analysis.ll has something strange going on where AliasAnalyisEvaluator's printing is reproducing the typed pointer syntax.	2022-11-28 11:25:44 -05:00
Matt Arsenault	50caf6936b	AMDGPU: Convert promote alloca tests to opaque pointers	2022-11-28 10:36:38 -05:00
Matt Arsenault	b3df889b71	AMDGPU: Convert test to generated checks These checks were too thin to begin with, and required slightly trickier updates for opaque pointers.	2022-11-28 10:35:29 -05:00
Matt Arsenault	8f071fecfe	AMDGPU: Use named values in a test As always, these were an obstacle to test updates.	2022-11-28 10:35:29 -05:00
Matt Arsenault	c1710e7779	AMDGPU: Use modern address spaces in some tests This was way out of date, still using 4 for generic and 0 for private.	2022-11-28 10:05:06 -05:00
Matt Arsenault	1ab9fa6f0d	AMDGPU/GlobalISel: Fix hardcoded virtual register numbers in test	2022-11-28 08:41:31 -05:00
David Stuttard	7940888c59	[AMDGPU] Intrinsic to expose s_wait_event for export ready Differential Revision: https://reviews.llvm.org/D138216	2022-11-28 11:26:15 +00:00
Roman Lebedev	25f01d593c	Revert "[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2)" TableGen is still getting miscompiled on PPC buildbots. Sent a mail with request for help. This reverts commit `3c4d2a0396`.	2022-11-27 00:00:06 +03:00
Roman Lebedev	3c4d2a0396	[SROA] `isVectorPromotionViable()`: memory intrinsics operate on vectors of bytes (take 2) This is a recommit of `cf624b23bc`, which was reverted in `5cfc22cafe`, because the cut-off on the number of vector elements was not low enough, and it triggered both SDAG SDNode operand number assertions, and caused compile time explosions in some cases. Let's try with something really REALLY conservative first, just to get somewhere, and try to bump it (to 64/128) later. FIXME: should this respect TTI reg width * num vec regs? Original commit message: Now, there's a big caveat here - these bytes are abstract bytes, not the i8 we have in LLVM, so strictly speaking this is not exactly legal, see e.g. https://github.com/AliveToolkit/alive2/issues/860 ^ the "bytes" "could" have been a pointer, and loading it as an integer inserts an implicit ptrtoint. But at the same time, InstCombine's `InstCombinerImpl::SimplifyAnyMemTransfer()` would expand a memtransfer of 1/2/4/8 bytes into integer-typed load+store, so this isn't exactly a new problem. Note that in memory, poison is byte-wise, so we really can't widen elements, but SROA seems to be inconsistent here. Fixes #59116.	2022-11-26 23:19:15 +03:00
Ivan Kosarev	ec8ede8177	[AMDGPU][CodeGen] Support raw format TFE buffer loads other than byte, short and d16 ones. Differential Revision: https://reviews.llvm.org/D138215	2022-11-24 10:50:26 +00:00

1 2 3 4 5 ...

5956 Commits