When there is not GRS or MOVIH/ORI instruction, we can not get the address of
ConstantPool entry directly. So we need put the address into ConstantPool to leverage CSKY::LRW instruction.
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64
Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect
This is a bit annoying, but there are still users out there that got
broken by this (this time it was numba). We need to keep some barebones
support around until non-opaque pointers are completely gone.
Reported by D138359 - they were being overridden as WriteMicrocoded despite already being declared WriteMicrocoded
It also fixes a rather funny instregex mismatch that was matching the movsldup shuffle by mistake
D138359 was reporting that this override was superfluous, but it had never been setup - I took the numbers from uops.info (I couldn't find an estimate in Intel docs).
D138359 was reporting that the EXTRACTPSrr override was unnecessary, however the AMD SoG and Agner both confirm that both the rr and rm versions take 2uops (matching znver1)
This option was enabled in D128582, and whilst it seems to be a net
improvement in many cases, at least a couple of issues have been
reported from D135957 and from the CSE added to the backend causing more
instructions in executed blocks. Revert for the time being, until we can
improve the precision.
This patch adds tail call support to the LoongArch backend. When
appropriate, use the `b` or `jr` instruction for tail calls (the
`pcalau12i+jirl` instruction pair when use medium codemodel).
This patch also modifies the inappropriate operand name:
simm26_bl -> simm26_symbol
This has been modeled after RISCV's tail call opt.
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D137889
When all registers have been allocated and CFR needs to be saved on the
stack, an emergency spill slot is required. Because CFR's spill and
reload require a general purpose register to transfer.
The attached test case was bugpoint-reduced down from
`MultiSource/Benchmarks/mafft/Lalignmm.c` in the test-suite.
Without this patch, llc will crash and report the following errors:
```
LLVM ERROR: Error while trying to spill R4 from class GPR: Cannot scavenge register without an emergency spill slot!
```
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D138007
If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub.
We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch.
Differential Revision: https://reviews.llvm.org/D137593
When we have a precisely known VLEN, we can replace runtime usage of VLENB with compile time constants. This converts offsets involving both fixed and scalable components into fixed offsets. The result is that we avoid the csr read of vlenb, and can often fold the multiply as well.
Differential Revision: https://reviews.llvm.org/D137591
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
When selectVOP3PMadMixModsImpl fails, it can still create new copy instr
via selectVOP3ModsImpl. When selectG_FMA_FMAD gives up, new copy instr
will remain dead but will not be automatically removed.
InstructionSelect does not check if instructions created during selection
are dead.
Such dead copy doesn't have register class on dst operand and causes crash.
Fix is to build copy when operands are being added to selected instruction.
Differential Revision: https://reviews.llvm.org/D138044
Before D114230, indirect moves used regular MOV opcodes and were
identified by having an implicit use of M0. Since D114230 they use
dedicated opcodes instead, so remove some old code that checks for
implicit uses of M0. NFCI.
Differential Revision: https://reviews.llvm.org/D138308
While get_active_lane_mask lowering it uses WHILELO instruction,
but forconstant range suitable for PTRUE then we could issue PTRUE instruction
instead.
Differential Revision: https://reviews.llvm.org/D137547
The zip/uzp (2-vector) instruction classes have the incorrect
register constraints and mark the destination as also being an
input. However, the instructions are fully destructive so I've
restructured the classes.
Differential Revision: https://reviews.llvm.org/D138288
1- in streaming mode, use SVE OR/mov instruction instead of NEON OR,
during copying phyReg -AArch64InstrInfo::copyPhysReg-.
2- add testing file:
register-mov.ll
Differential Revision: https://reviews.llvm.org/D138211
First boundary of a region wasn't updated when a sinked instruction was added first into the region.
Reviewed By: vangthao
Differential Revision: https://reviews.llvm.org/D138256
These instructions always output the canonical mnemonic. The GNU tools
emit the canonical mnemonic for the branch pseudo instructions as well
(e.g. "bgt" will be recognised by the assembler but never printed by
objdump).
Reviewed By: xen0n
Differential Revision: https://reviews.llvm.org/D138100
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D138025
This diff splits out (from LLVMCore) IR printing passes into IRPrinter.
This structure is similar to what we already have for IRReader and
enables us to avoid circular dependencies between LLVMCore and Analysis
(this is a preparation for https://reviews.llvm.org/D137768).
The legacy interface is left unchanged, once the legacy pass manager
is removed (in the future) we will be able to clean it up further.
The bazel build configuration has been updated as well.
Test plan:
1/ Tested the following cmake configurations: static/dynamic linking * lld/gold * clang/gcc
2/ bazel build --config=generic_clang @llvm-project//...
Differential revision: https://reviews.llvm.org/D138081
This includes instruction formats, definitions, encodings, scheduling
classes, and builtins/intrinsics.
New and improved version of 76536989ba, so much so that even clang
builds with it.