llvm-project

Commit Graph

Author	SHA1	Message	Date
Zi Xuan Wu (Zeson)	f4d61cdf9c	[CSKY] Lower ISD::ConstantPool node to support getting the address of ConstantPool entry When there is not GRS or MOVIH/ORI instruction, we can not get the address of ConstantPool entry directly. So we need put the address into ConstantPool to leverage CSKY::LRW instruction.	2022-11-21 10:37:20 +08:00
gonglingqin	c2ec455f18	[LoongArch] Add intrinsics for ibar, break and syscall Diagnostics for intrinsic input parameters have also been added. Differential Revision: https://reviews.llvm.org/D138094	2022-11-21 09:31:26 +08:00
Simon Pilgrim	89365b159e	[X86] IceLakeServer - PACKS instructions take latency 3cy This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64 Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect	2022-11-20 19:28:35 +00:00
Kazu Hirata	7524db4d44	[llvm] Remove unused forward declarations (NFC)	2022-11-20 09:59:36 -08:00
Benjamin Kramer	e2bff1e489	[X86] Fix atomic rmw intrinsic expansion for non-opaque pointers This is a bit annoying, but there are still users out there that got broken by this (this time it was numba). We need to keep some barebones support around until non-opaque pointers are completely gone.	2022-11-20 15:39:30 +01:00
Simon Pilgrim	4695f5982a	[X86] Remove unnecessary SHLD32rri8/SHRD16rri8 instruction override from bdver2 model Reported by D138359 - the override matches the WriteSHDrri base sched def	2022-11-20 14:17:44 +00:00
Simon Pilgrim	dd9a900a50	[X86] Remove unnecessary XGETBV instruction overrides from znver1/znver2 models Reported by D138359 - znver models already treats all WriteSystem sched instructions as microcoded	2022-11-20 14:05:05 +00:00
Simon Pilgrim	a6686aae38	[X86] Remove unnecessary RDPMC/RDTSC instruction overrides from znver1/znver2 models Reported by D138359	2022-11-20 13:19:49 +00:00
Simon Pilgrim	9148aeac00	[X86] Remove unnecessary string instruction overrides from znver1/znver2 models Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake	2022-11-20 12:57:44 +00:00
Simon Pilgrim	611db1c78f	[X86] Remove unnecessary bit test instruction overrides from znver2 model Reported by D138359 and confirmed with AMD SoG - matches znver1 model	2022-11-20 12:22:11 +00:00
Simon Pilgrim	357f1c4ef1	[X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).	2022-11-20 12:13:02 +00:00
Simon Pilgrim	94d240a44a	[X86] Remove unnecessary zmm shuffle instruction overrides from IceLake model Reported by D138359 and confirmed with Intel AoM, Agner + uops.info	2022-11-20 10:48:27 +00:00
Simon Pilgrim	13fd7373b6	[X86] znver2 - (V)EXTRACTPSrr takes 2 uops D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)	2022-11-20 09:24:55 +00:00
Phoebe Wang	510e5fba16	[X86] Use lock or/and/xor for cases that we only care about the EFLAGS This is a follow up of D137711 to fix the reset of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D138294	2022-11-20 10:42:48 +08:00
David Green	201b7858f6	[AArch64] Disable aarch64-enable-gep-opt This option was enabled in D128582, and whilst it seems to be a net improvement in many cases, at least a couple of issues have been reported from D135957 and from the CSE added to the backend causing more instructions in executed blocks. Revert for the time being, until we can improve the precision.	2022-11-19 21:25:18 +00:00
Simon Pilgrim	8033141140	[X86] Remove unnecessary STC instruction overrides Reported by D138359	2022-11-19 18:15:38 +00:00
Simon Pilgrim	90702f47cf	[X86] Remove unnecessary STD + CLD instruction overrides Reported by D138359	2022-11-19 18:15:38 +00:00
Simon Pilgrim	e8ea92d24e	[X86] Remove some unnecessary cvt overrides All of these match the default WriteCvtI2PS class defs	2022-11-19 15:37:52 +00:00
wanglei	6971c1b370	[LoongArch] Add support for tail call optimization This patch adds tail call support to the LoongArch backend. When appropriate, use the `b` or `jr` instruction for tail calls (the `pcalau12i+jirl` instruction pair when use medium codemodel). This patch also modifies the inappropriate operand name: simm26_bl -> simm26_symbol This has been modeled after RISCV's tail call opt. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D137889	2022-11-19 17:36:06 +08:00
wanglei	0e4378c55e	[LoongArch] Add emergency spill slot for CFR spill/reload When all registers have been allocated and CFR needs to be saved on the stack, an emergency spill slot is required. Because CFR's spill and reload require a general purpose register to transfer. The attached test case was bugpoint-reduced down from `MultiSource/Benchmarks/mafft/Lalignmm.c` in the test-suite. Without this patch, llc will crash and report the following errors: ``` LLVM ERROR: Error while trying to spill R4 from class GPR: Cannot scavenge register without an emergency spill slot! ``` Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D138007	2022-11-19 14:35:31 +08:00
Matt Arsenault	b0847b0095	AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG This transform is broken for undef or poison inputs without a freeze. This is also broken in lots of other places where shifts are split into 32-bit pieces. Amt < 32 case: ; Broken: https://alive2.llvm.org/ce/z/7bb4vc ; Freezing the low half of the bits makes it correct ; Fixed: https://alive2.llvm.org/ce/z/zJAZFr define i64 @src(i64 %val) { %shl = shl i64 %val, 55 %shr = ashr i64 %shl, 55 ret i64 %shr } define i64 @tgt(i64 %val) { %lo32 = trunc i64 %val to i32 %shr.half = lshr i64 %val, 32 %hi32 = trunc i64 %shr.half to i32 %inreg.0 = shl i32 %lo32, 23 %new.lo = ashr i32 %inreg.0, 23 %new.hi = ashr i32 %new.lo, 31 %zext.lo = zext i32 %new.lo to i64 %zext.hi = zext i32 %new.hi to i64 %hi.ins = shl i64 %zext.hi, 32 %or = or i64 %hi.ins, %zext.lo ret i64 %or } Amt == 32 case: Broken: https://alive2.llvm.org/ce/z/5f4qwQ Fixed: https://alive2.llvm.org/ce/z/A2hWWF This one times out alive; works if argument is made noundef or scaled down to a smaller bitwidth. define i64 @src(i64 %val) { %shl = shl i64 %val, 32 %shr = ashr i64 %shl, 32 ret i64 %shr } define i64 @tgt(i64 %val) { %lo32 = trunc i64 %val to i32 %shr.half = lshr i64 %val, 32 %hi32 = trunc i64 %shr.half to i32 %new.hi = ashr i32 %lo32, 31 %zext.lo = zext i32 %lo32 to i64 %zext.hi = zext i32 %new.hi to i64 %hi.ins = shl i64 %zext.hi, 32 %or = or i64 %hi.ins, %zext.lo ret i64 %or } Amt > 32 case: ; Correct: https://alive2.llvm.org/ce/z/tvrhPf define i64 @src(i64 %val) { %shl = shl i64 %val, 9 %shr = ashr i64 %shl, 9 ret i64 %shr } define i64 @tgt(i64 %val) { %lo32 = trunc i64 %val to i32 %lshr = lshr i64 %val, 32 %hi32 = trunc i64 %lshr to i32 %inreg.0 = shl i32 %hi32, 9 %new.hi = ashr i32 %inreg.0, 9 %zext.lo = zext i32 %lo32 to i64 %zext.hi = zext i32 %new.hi to i64 %hi.ins = shl i64 %zext.hi, 32 %or = or i64 %hi.ins, %zext.lo ret i64 %or }	2022-11-18 16:05:19 -08:00
Philip Reames	06e2b44c46	[RISCV] Optimize scalable frame setup when VLEN is precisely known If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub. We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch. Differential Revision: https://reviews.llvm.org/D137593	2022-11-18 15:30:39 -08:00
Matt Arsenault	1fe1299a93	GlobalISel: Legalize strict_fsub In the future should probably have a more convenient way to switch between building strict and non-strict ops.	2022-11-18 15:21:41 -08:00
Sami Tolvanen	7c96f61aaa	[X86][KCFI] Don't fold loads into indirect calls that need a KCFI check Avoid unnecessary folding as X86KCFIPass would have to unfold these anyway when emitting the KCFI_CHECK.	2022-11-18 21:55:41 +00:00
Krzysztof Parzyszek	c346cc3041	[Hexagon] Remove non-existent instructions Some instructions that don't actually exist in hardware were emitted by the generator script in error. Delete them from the .td files.	2022-11-18 13:53:34 -08:00
Michael Maitland	184fbfd712	[RISCV][CodeGen] Chapter of vector instruction type corresponds with chapters in RISCV vector specification. NFC The [vector spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) is organized in chapters based on instruction type. The comments in the tablegen marked the incorrect chapters. This change updates the comments with the correct chapter numbers. Differential Revision: https://reviews.llvm.org/D138311	2022-11-18 10:30:08 -08:00
Matt Arsenault	fe56afc4d7	AMDGPU: Fix fcanonicalize constant folding not correctly handling -0.0	2022-11-18 10:03:29 -08:00
Philip Reames	18fda867f4	[RISCV] Optimize scalable frame offset calculation when VLEN is precisely known When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well. Differential Revision: https://reviews.llvm.org/D137591	2022-11-18 09:56:55 -08:00
Michael Maitland	98e342dca2	[RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind of information impacts how the instruction takes to execute and what dependencies this may cause. On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or vl, in addition with the instruction itself. But MCA does not track or use the data in these registers. This patch fixes this problem by introducing Instruments into MCA. * Replace `CodeRegions` with `AnalysisRegions` * Add `Instrument` and `InstrumentManager` * Add `InstrumentRegions` * Add RISCV Instrument and `InstrumentManager` * Parse `Instruments` in driver * Use instruments to override schedule class * RISCV use lmul instrument to override schedule class * Fix unit tests to pass empty instruments * Add -ignore-im clopt to disable this change A prior version of this patch was commited in `5e82ee5373`. `2323a4ee61` reverted that change because the unit test files caused build errors. The change with fixes were committed in `b88b8307bf` but reverted once again `e8e92c8313` due to more build errors. This commit adds the prior changes and fixes the build error. Differential Revision: https://reviews.llvm.org/D137440	2022-11-18 09:55:15 -08:00
Mirko Brkusanin	e58b116843	[AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11 Differential Revision: https://reviews.llvm.org/D133012	2022-11-18 18:19:27 +01:00
Petar Avramovic	0f3e72e86c	AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection When selectVOP3PMadMixModsImpl fails, it can still create new copy instr via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr will remain dead but will not be automatically removed. InstructionSelect does not check if instructions created during selection are dead. Such dead copy doesn't have register class on dst operand and causes crash. Fix is to build copy when operands are being added to selected instruction. Differential Revision: https://reviews.llvm.org/D138044	2022-11-18 18:02:26 +01:00
Jay Foad	38302c60ef	[AMDGPU] Stop looking for implicit M0 uses on MOV instructions Before D114230, indirect moves used regular MOV opcodes and were identified by having an implicit use of M0. Since D114230 they use dedicated opcodes instead, so remove some old code that checks for implicit uses of M0. NFCI. Differential Revision: https://reviews.llvm.org/D138308	2022-11-18 16:57:55 +00:00
Matt Arsenault	08ec15e44b	AMDGPU/GlobalISel: Fix strictfp fmul	2022-11-18 08:53:49 -08:00
Dinar Temirbulatov	44e2c6a428	[AArch64][SVE] Use PTRUE instruction instead of WHILELO if the range is appropriate for predicator constant. While get_active_lane_mask lowering it uses WHILELO instruction, but forconstant range suitable for PTRUE then we could issue PTRUE instruction instead. Differential Revision: https://reviews.llvm.org/D137547	2022-11-18 16:21:10 +00:00
Krzysztof Parzyszek	ea6693d4c8	[Hexagon] Add missing patterns for mulhs/mulhu	2022-11-18 08:13:57 -08:00
Alexander Timofeev	3ae96e9eb8	ARCRegisterInfo::eliminateFrameIndex updated to fix build error caused by `32bd75716c`	2022-11-18 16:16:10 +01:00
Alexander Timofeev	32bd75716c	PEI should be able to use backward walk in replaceFrameIndicesBackward. The backward register scavenger has correct register liveness information. PEI should leverage the backward register scavenger. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D137574	2022-11-18 15:57:34 +01:00
David Sherwood	2e02f007a2	[AArch64][SME2] Remove vector constraints from zip/uzp (2-vector) instruction classes The zip/uzp (2-vector) instruction classes have the incorrect register constraints and mark the destination as also being an input. However, the instructions are fully destructive so I've restructured the classes. Differential Revision: https://reviews.llvm.org/D138288	2022-11-18 14:30:48 +00:00
Phoebe Wang	d558255650	[X86] Use lock add/sub for cases that we only care about the EFLAGS This fixes #36373, #36905 and partial of #58685. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D137711	2022-11-18 21:43:47 +08:00
Hassnaa Hamdi	79e8bd1add	[AArch64][SME]: Generate streaming-compatible code for ISD::INSERT_VECTOR_ELT. 1- Enable custom lowering INSERT_VECTOR_ELT to generate code compatible to streaming mode. 2- Add testing file: insert-vector-elt.ll Differential Revision: https://reviews.llvm.org/D138222	2022-11-18 12:20:16 +00:00
Hassnaa Hamdi	d8306b8885	[AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode. 1- in streaming mode, use SVE OR/mov instruction instead of NEON OR, during copying phyReg -AArch64InstrInfo::copyPhysReg-. 2- add testing file: register-mov.ll Differential Revision: https://reviews.llvm.org/D138211	2022-11-18 11:18:30 +00:00
Dmitry Preobrazhensky	96155bf44b	[AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2) Rename interface functions and operands to make code clearer. Differential Revision: https://reviews.llvm.org/D138133	2022-11-18 14:15:05 +03:00
Valery Pykhtin	a35ba2a256	[AMDGPU] Fix PreRARematStage::sinkTriviallyRematInsts region boundary update after sinking. First boundary of a region wasn't updated when a sinked instruction was added first into the region. Reviewed By: vangthao Differential Revision: https://reviews.llvm.org/D138256	2022-11-18 12:13:14 +01:00
wanglei	bfa3551dd3	[LoongArch] Implement assembler branches pseudo instructions These instructions always output the canonical mnemonic. The GNU tools emit the canonical mnemonic for the branch pseudo instructions as well (e.g. "bgt" will be recognised by the assembler but never printed by objdump). Reviewed By: xen0n Differential Revision: https://reviews.llvm.org/D138100	2022-11-18 16:54:20 +08:00
Chen Zheng	f034c98af0	[PowerPC] mark dead def for ctr be clobber. TLS pseudo ADDIStlsgdHA will have such def. This dead def should also prevent PPC from generating CTR loops.	2022-11-18 06:55:42 +00:00
Han-Kuan Chen	7e6dbfcd9d	[RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR. Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are EXTRACT_SUBVECTOR and their source are same. The commit will make the function seek input and their source until they are not EXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D138025	2022-11-17 22:32:53 -08:00
Matt Arsenault	fe5b9a6a11	AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal	2022-11-17 20:50:04 -08:00
Matt Arsenault	ae43420f39	AMDGPU/GlobalISel: Fix not selecting modifiers for f16 fma on gfx9 VOP3OpSel wasn't trying to match any modifiers. Just try to match the basic case, like the DAG does.	2022-11-17 18:51:45 -08:00
Alexander Shaposhnikov	7059a6c32c	[IR] Split out IR printing passes into IRPrinter This diff splits out (from LLVMCore) IR printing passes into IRPrinter. This structure is similar to what we already have for IRReader and enables us to avoid circular dependencies between LLVMCore and Analysis (this is a preparation for https://reviews.llvm.org/D137768). The legacy interface is left unchanged, once the legacy pass manager is removed (in the future) we will be able to clean it up further. The bazel build configuration has been updated as well. Test plan: 1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc 2/ bazel build --config=generic_clang @llvm-project//... Differential revision: https://reviews.llvm.org/D138081	2022-11-18 01:47:56 +00:00
Krzysztof Parzyszek	a98fc08396	[Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73 This includes instruction formats, definitions, encodings, scheduling classes, and builtins/intrinsics. New and improved version of 76536989ba, so much so that even clang builds with it.	2022-11-17 15:51:38 -08:00

1 2 3 4 5 ...

69737 Commits