Commit Graph

69737 Commits

Author SHA1 Message Date
Zi Xuan Wu (Zeson) f4d61cdf9c [CSKY] Lower ISD::ConstantPool node to support getting the address of ConstantPool entry
When there is not GRS or MOVIH/ORI instruction, we can not get the address of
ConstantPool entry directly. So we need put the address into ConstantPool to leverage CSKY::LRW instruction.
2022-11-21 10:37:20 +08:00
gonglingqin c2ec455f18 [LoongArch] Add intrinsics for ibar, break and syscall
Diagnostics for intrinsic input parameters have also been added.

Differential Revision: https://reviews.llvm.org/D138094
2022-11-21 09:31:26 +08:00
Simon Pilgrim 89365b159e [X86] IceLakeServer - PACKS instructions take latency 3cy
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64

Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect
2022-11-20 19:28:35 +00:00
Kazu Hirata 7524db4d44 [llvm] Remove unused forward declarations (NFC) 2022-11-20 09:59:36 -08:00
Benjamin Kramer e2bff1e489 [X86] Fix atomic rmw intrinsic expansion for non-opaque pointers
This is a bit annoying, but there are still users out there that got
broken by this (this time it was numba). We need to keep some barebones
support around until non-opaque pointers are completely gone.
2022-11-20 15:39:30 +01:00
Simon Pilgrim 4695f5982a [X86] Remove unnecessary SHLD32rri8/SHRD16rri8 instruction override from bdver2 model
Reported by D138359 - the override matches the WriteSHDrri base sched def
2022-11-20 14:17:44 +00:00
Simon Pilgrim dd9a900a50 [X86] Remove unnecessary XGETBV instruction overrides from znver1/znver2 models
Reported by D138359 - znver models already treats all WriteSystem sched instructions as microcoded
2022-11-20 14:05:05 +00:00
Simon Pilgrim a6686aae38 [X86] Remove unnecessary RDPMC/RDTSC instruction overrides from znver1/znver2 models
Reported by D138359
2022-11-20 13:19:49 +00:00
Simon Pilgrim 9148aeac00 [X86] Remove unnecessary string instruction overrides from znver1/znver2 models
Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded

It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake
2022-11-20 12:57:44 +00:00
Simon Pilgrim 611db1c78f [X86] Remove unnecessary bit test instruction overrides from znver2 model
Reported by D138359 and confirmed with AMD SoG - matches znver1 model
2022-11-20 12:22:11 +00:00
Simon Pilgrim 357f1c4ef1 [X86] Improve LOOP/LOOPE/LOOPNE schedule on SandyBridge model
D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).
2022-11-20 12:13:02 +00:00
Simon Pilgrim 94d240a44a [X86] Remove unnecessary zmm shuffle instruction overrides from IceLake model
Reported by D138359 and confirmed with Intel AoM, Agner + uops.info
2022-11-20 10:48:27 +00:00
Simon Pilgrim 13fd7373b6 [X86] znver2 - (V)EXTRACTPSrr takes 2 uops
D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)
2022-11-20 09:24:55 +00:00
Phoebe Wang 510e5fba16 [X86] Use lock or/and/xor for cases that we only care about the EFLAGS
This is a follow up of D137711 to fix the reset of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D138294
2022-11-20 10:42:48 +08:00
David Green 201b7858f6 [AArch64] Disable aarch64-enable-gep-opt
This option was enabled in D128582, and whilst it seems to be a net
improvement in many cases, at least a couple of issues have been
reported from D135957 and from the CSE added to the backend causing more
instructions in executed blocks. Revert for the time being, until we can
improve the precision.
2022-11-19 21:25:18 +00:00
Simon Pilgrim 8033141140 [X86] Remove unnecessary STC instruction overrides
Reported by D138359
2022-11-19 18:15:38 +00:00
Simon Pilgrim 90702f47cf [X86] Remove unnecessary STD + CLD instruction overrides
Reported by D138359
2022-11-19 18:15:38 +00:00
Simon Pilgrim e8ea92d24e [X86] Remove some unnecessary cvt overrides
All of these match the default WriteCvtI2PS class defs
2022-11-19 15:37:52 +00:00
wanglei 6971c1b370 [LoongArch] Add support for tail call optimization
This patch adds tail call support to the LoongArch backend.  When
appropriate, use the `b` or `jr` instruction for tail calls (the
`pcalau12i+jirl` instruction pair when use medium codemodel).

This patch also modifies the inappropriate operand name:
simm26_bl -> simm26_symbol

This has been modeled after RISCV's tail call opt.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D137889
2022-11-19 17:36:06 +08:00
wanglei 0e4378c55e [LoongArch] Add emergency spill slot for CFR spill/reload
When all registers have been allocated and CFR needs to be saved on the
stack, an emergency spill slot is required. Because CFR's spill and
reload require a general purpose register to transfer.

The attached test case was bugpoint-reduced down from
`MultiSource/Benchmarks/mafft/Lalignmm.c` in the test-suite.
Without this patch, llc will crash and report the following errors:

```
LLVM ERROR: Error while trying to spill R4 from class GPR: Cannot scavenge register without an emergency spill slot!
```

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D138007
2022-11-19 14:35:31 +08:00
Matt Arsenault b0847b0095 AMDGPU/GlobalISel: Insert freeze when splitting vector G_SEXT_INREG
This transform is broken for undef or poison inputs without a freeze.
This is also broken in lots of other places where shifts are split
into 32-bit pieces.

Amt < 32 case:
; Broken: https://alive2.llvm.org/ce/z/7bb4vc
; Freezing the low half of the bits makes it correct
; Fixed: https://alive2.llvm.org/ce/z/zJAZFr
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 55
  %shr = ashr i64 %shl, 55
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %inreg.0 = shl i32 %lo32, 23
  %new.lo = ashr i32 %inreg.0, 23
  %new.hi = ashr i32 %new.lo, 31
  %zext.lo = zext i32 %new.lo to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt == 32 case:
Broken: https://alive2.llvm.org/ce/z/5f4qwQ
Fixed: https://alive2.llvm.org/ce/z/A2hWWF
This one times out alive; works if argument is made noundef or
scaled down to a smaller bitwidth.

define i64 @src(i64 %val) {
  %shl = shl i64 %val, 32
  %shr = ashr i64 %shl, 32
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %shr.half = lshr i64 %val, 32
  %hi32 = trunc i64 %shr.half to i32
  %new.hi = ashr i32 %lo32, 31
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}

Amt > 32 case:
; Correct: https://alive2.llvm.org/ce/z/tvrhPf
define i64 @src(i64 %val) {
  %shl = shl i64 %val, 9
  %shr = ashr i64 %shl, 9
  ret i64 %shr
}

define i64 @tgt(i64 %val) {
  %lo32 = trunc i64 %val to i32
  %lshr = lshr i64 %val, 32
  %hi32 = trunc i64 %lshr to i32
  %inreg.0 = shl i32 %hi32, 9
  %new.hi = ashr i32 %inreg.0, 9
  %zext.lo = zext i32 %lo32 to i64
  %zext.hi = zext i32 %new.hi to i64
  %hi.ins = shl i64 %zext.hi, 32
  %or = or i64 %hi.ins, %zext.lo
  ret i64 %or
}
2022-11-18 16:05:19 -08:00
Philip Reames 06e2b44c46 [RISCV] Optimize scalable frame setup when VLEN is precisely known
If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub.

We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch.

Differential Revision: https://reviews.llvm.org/D137593
2022-11-18 15:30:39 -08:00
Matt Arsenault 1fe1299a93 GlobalISel: Legalize strict_fsub
In the future should probably have a more convenient
way to switch between building strict and non-strict ops.
2022-11-18 15:21:41 -08:00
Sami Tolvanen 7c96f61aaa [X86][KCFI] Don't fold loads into indirect calls that need a KCFI check
Avoid unnecessary folding as X86KCFIPass would have to unfold these
anyway when emitting the KCFI_CHECK.
2022-11-18 21:55:41 +00:00
Krzysztof Parzyszek c346cc3041 [Hexagon] Remove non-existent instructions
Some instructions that don't actually exist in hardware were emitted
by the generator script in error. Delete them from the .td files.
2022-11-18 13:53:34 -08:00
Michael Maitland 184fbfd712 [RISCV][CodeGen] Chapter of vector instruction type corresponds with chapters in RISCV vector specification. NFC
The [vector spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) is organized in chapters
based on instruction type. The comments in the tablegen marked the incorrect chapters. This change
updates the comments with the correct chapter numbers.

Differential Revision: https://reviews.llvm.org/D138311
2022-11-18 10:30:08 -08:00
Matt Arsenault fe56afc4d7 AMDGPU: Fix fcanonicalize constant folding not correctly handling -0.0 2022-11-18 10:03:29 -08:00
Philip Reames 18fda867f4 [RISCV] Optimize scalable frame offset calculation when VLEN is precisely known
When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.

Differential Revision: https://reviews.llvm.org/D137591
2022-11-18 09:56:55 -08:00
Michael Maitland 98e342dca2 [RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.

On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.

* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change

A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.

This commit adds the prior changes and fixes the build error.

Differential Revision: https://reviews.llvm.org/D137440
2022-11-18 09:55:15 -08:00
Mirko Brkusanin e58b116843 [AMDGPU] Add subtarget feature for MAD_U64/I64 bug on GFX11
Differential Revision: https://reviews.llvm.org/D133012
2022-11-18 18:19:27 +01:00
Petar Avramovic 0f3e72e86c AMDGPU/GlobalISel: Fix crash after mad/fma_mix fails selection
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.

Differential Revision: https://reviews.llvm.org/D138044
2022-11-18 18:02:26 +01:00
Jay Foad 38302c60ef [AMDGPU] Stop looking for implicit M0 uses on MOV instructions
Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.

Differential Revision: https://reviews.llvm.org/D138308
2022-11-18 16:57:55 +00:00
Matt Arsenault 08ec15e44b AMDGPU/GlobalISel: Fix strictfp fmul 2022-11-18 08:53:49 -08:00
Dinar Temirbulatov 44e2c6a428 [AArch64][SVE] Use PTRUE instruction instead of WHILELO if the range is appropriate for predicator constant.
While get_active_lane_mask lowering it uses WHILELO instruction,
but forconstant range suitable for PTRUE then we could issue PTRUE instruction
instead.

Differential Revision: https://reviews.llvm.org/D137547
2022-11-18 16:21:10 +00:00
Krzysztof Parzyszek ea6693d4c8 [Hexagon] Add missing patterns for mulhs/mulhu 2022-11-18 08:13:57 -08:00
Alexander Timofeev 3ae96e9eb8 ARCRegisterInfo::eliminateFrameIndex updated to fix build error caused by 32bd75716c 2022-11-18 16:16:10 +01:00
Alexander Timofeev 32bd75716c PEI should be able to use backward walk in replaceFrameIndicesBackward.
The backward register scavenger has correct register
liveness information. PEI should leverage the backward register scavenger.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D137574
2022-11-18 15:57:34 +01:00
David Sherwood 2e02f007a2 [AArch64][SME2] Remove vector constraints from zip/uzp (2-vector) instruction classes
The zip/uzp (2-vector) instruction classes have the incorrect
register constraints and mark the destination as also being an
input. However, the instructions are fully destructive so I've
restructured the classes.

Differential Revision: https://reviews.llvm.org/D138288
2022-11-18 14:30:48 +00:00
Phoebe Wang d558255650 [X86] Use lock add/sub for cases that we only care about the EFLAGS
This fixes #36373, #36905 and partial of #58685.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D137711
2022-11-18 21:43:47 +08:00
Hassnaa Hamdi 79e8bd1add [AArch64][SME]: Generate streaming-compatible code for ISD::INSERT_VECTOR_ELT.
1- Enable custom lowering INSERT_VECTOR_ELT to generate code compatible
   to streaming mode.
2- Add testing file:
   insert-vector-elt.ll

Differential Revision: https://reviews.llvm.org/D138222
2022-11-18 12:20:16 +00:00
Hassnaa Hamdi d8306b8885 [AArch64][SME]: Use SVE mov instruction for FPR128 registers in streaming-compatible mode.
1- in streaming mode, use SVE OR/mov instruction instead of NEON OR,
   during copying phyReg -AArch64InstrInfo::copyPhysReg-.
2- add testing file:
   register-mov.ll

Differential Revision: https://reviews.llvm.org/D138211
2022-11-18 11:18:30 +00:00
Dmitry Preobrazhensky 96155bf44b [AMDGPU][GFX11][NFC] Refactor VOPD operands handling (part 2)
Rename interface functions and operands to make code clearer.

Differential Revision: https://reviews.llvm.org/D138133
2022-11-18 14:15:05 +03:00
Valery Pykhtin a35ba2a256 [AMDGPU] Fix PreRARematStage::sinkTriviallyRematInsts region boundary update after sinking.
First boundary of a region wasn't updated when a sinked instruction was added first into the region.

Reviewed By: vangthao

Differential Revision: https://reviews.llvm.org/D138256
2022-11-18 12:13:14 +01:00
wanglei bfa3551dd3 [LoongArch] Implement assembler branches pseudo instructions
These instructions always output the canonical mnemonic. The GNU tools
emit the canonical mnemonic for the branch pseudo instructions as well
(e.g. "bgt" will be recognised by the assembler but never printed by
objdump).

Reviewed By: xen0n

Differential Revision: https://reviews.llvm.org/D138100
2022-11-18 16:54:20 +08:00
Chen Zheng f034c98af0 [PowerPC] mark dead def for ctr be clobber.
TLS pseudo ADDIStlsgdHA will have such def. This dead def should
also prevent PPC from generating CTR loops.
2022-11-18 06:55:42 +00:00
Han-Kuan Chen 7e6dbfcd9d [RISCV] Make lowerVECTOR_SHUFFLEAsVSlidedown follow source until not EXTRACT_SUBVECTOR.
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D138025
2022-11-17 22:32:53 -08:00
Matt Arsenault fe5b9a6a11 AMDGPU/GlobalISel: Make strict fadd, fmul and fma legal 2022-11-17 20:50:04 -08:00
Matt Arsenault ae43420f39 AMDGPU/GlobalISel: Fix not selecting modifiers for f16 fma on gfx9
VOP3OpSel wasn't trying to match any modifiers. Just try to match the
basic case, like the DAG does.
2022-11-17 18:51:45 -08:00
Alexander Shaposhnikov 7059a6c32c [IR] Split out IR printing passes into IRPrinter
This diff splits out (from LLVMCore) IR printing passes into IRPrinter.
This structure is similar to what we already have for IRReader and
enables us to avoid circular dependencies between LLVMCore and Analysis
(this is a preparation for https://reviews.llvm.org/D137768).
The legacy interface is left unchanged, once the legacy pass manager
is removed (in the future) we will be able to clean it up further.
The bazel build configuration has been updated as well.

Test plan:
1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc
2/ bazel build --config=generic_clang @llvm-project//...

Differential revision: https://reviews.llvm.org/D138081
2022-11-18 01:47:56 +00:00
Krzysztof Parzyszek a98fc08396 [Hexagon] Add instruction definitions for Hexagon v71, v71t, and v73
This includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.

New and improved version of 76536989ba, so much so that even clang
builds with it.
2022-11-17 15:51:38 -08:00