Fix replaceVariableLocationOp unconditionally replacing the first operand of a
dbg.assign.
Reviewed By: jryans
Differential Revision: https://reviews.llvm.org/D138561
I noticed these were missing, so this adds Host identifiers for
cortex-a55, cortex-a510, cortex-a710 and cortex-x2, taken from their
respective TRMs.
Differential Revision: https://reviews.llvm.org/D138497
This patch introudces the OpenMPIRBuilderConfig class which contains various
flags that are needed to lower OMP constructs to LLVM-IR. The purpose is to
keep the flags in one place so they do not have to be passed in every time.
The flags can be set optionally since some uses cases don't rely on functions
that depend on these flags.
Reviewed By: jdoerfert, tschuett
Differential Revision: https://reviews.llvm.org/D138220
This patch implements the 2022 Architecture General Data-Processing Instructions
They include:
Common Short Sequence Compression (CSSC) instructions
- scalar comparison instructions
SMAX, SMIN, UMAX, UMIN (32/64 bits) with or without immediate
- ABS (absolute), CNT (count non-zero bits), CTZ (count trailing zeroes)
- command-line options for CSSC
Associated with these instructions in the documentation is the Range Prefetch
Memory (RPRFM) instruction, which signals to the memory system that data memory
accesses from a specified range of addresses are likely to occur in the near
future. The instruction lies in hint space, and is made unconditional.
Specs for the individual instructions can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09/Base-Instructions/
contributors to this patch:
- Cullen Rhodes
- Son Tuan Vu
- Mark Murray
- Tomas Matheson
- Sam Elliott
- Ties Stuij
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D138488
This patch replaces None with a custom base class in
FormatVariadicTest.cpp.
As part of the migration from llvm::Optional to std::optional, I'd
like to define None as std::nullopt, but FormatVariadicTest.cpp blocks
that.
When you specialize indexed_accessor_range with the base class being
None, the template instantiation eventually generates code to compare
two instances of None. That's not guaranteed with std::nullopt.
Replacing None with a custom base class allows me to define None as
std::nullopt.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Differential Revision: https://reviews.llvm.org/D138381
This patch replaces NoneType() and NoneType::None with None in
preparation for migration from llvm::Optional to std::optional.
In the std::optional world, we are not guranteed to be able to
default-construct std::nullopt_t or peek what's inside it, so neither
NoneType() nor NoneType::None has a corresponding expression in the
std::optional world.
Once we consistently use None, we should even be able to replace the
contents of llvm/include/llvm/ADT/None.h with something like:
using NoneType = std::nullopt_t;
inline constexpr std::nullopt_t None = std::nullopt;
to ease the migration from llvm::Optional to std::optional.
Differential Revision: https://reviews.llvm.org/D138376
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
This is useful where tests previously encoded information in the name
names of ranges and points. Currently, this is pretty limited because
names consist of only alphanumeric characters and '_'.
With this patch, we can keep the names simple and attach optional
payloads to ranges and points instead.
The new syntax should be fully backwards compatible (if I haven't missed
anything). I tested this against clangd unit tests and everything still passes.
Differential Revision: https://reviews.llvm.org/D137909
The invalid case is now represented by an empty StringRef rather than
a nullptr.
Previously ARCH_FEATURE was build from SUB_ARCH by prepending "+".
This is now reverse, so that the "+arch-feature" is now visible in
the .def, which is a bit clearer. This meant converting one StringSwitch
into a loop.
Removed getters which are now mostly unnecessary.
Removed some old FIXMEs.
Differential Revision: https://reviews.llvm.org/D138026
After 32f1c5531b, getDefiningRecipe returns a VPRecipeBase* so there's
no need to cast to VPRecipeBase.
Suggested by @Ayal during review of D136068, thanks!
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.
Also rename it to getDefiningRecipe to make the name a bit clearer.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D136068
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in. It was reverted in
5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. This commit adds the original changes
and the fixed test files.
Differential Revision: https://reviews.llvm.org/D137440
NVIDIA, ARM, and Intel recently introduced two new FP8 formats, as described in the paper: https://arxiv.org/abs/2209.05433. The first of the two FP8 dtypes, E5M2, was added in https://reviews.llvm.org/D133823. This change adds the second of the two: E4M3.
There is an RFC for adding the FP8 dtypes here: https://discourse.llvm.org/t/rfc-add-apfloat-and-mlir-type-support-for-fp8-e5m2/65279. I spoke with the RFC's author, Stella, and she gave me the go ahead to implement the E4M3 type. The name of the E4M3 type in APFloat is Float8E4M3FN, as discussed in the RFC. The "FN" means only Finite and NaN values are supported.
Unlike E5M2, E4M3 has different behavior from IEEE types in regards to Inf and NaN values. There are no Inf values, and NaN is represented when the exponent and mantissa bits are all 1s. To represent these differences in APFloat, I added an enum field, fltNonfiniteBehavior, to the fltSemantics struct. The possible enum values are IEEE754 and NanOnly. Only Float8E4M3FN has the NanOnly behavior.
After this change is submitted, I plan on adding the Float8E4M3FN type to MLIR, in the same way as E5M2 was added in https://reviews.llvm.org/D133823.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D137760
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
Differential Revision: https://reviews.llvm.org/D137440
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.
To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
index into this vector in the callsite and allocation memprof
summaries.
2. When building the combined index during the LTO link, the callsite
and allocation memprof summaries are only kept on the FunctionSummary
of the prevailing copy.
Differential Revision: https://reviews.llvm.org/D135714
AArch64TargetParser reuses data structures and some data from ARMTargetParser,
which causes more problems than it solves. This change separates them.
Code which is common to ARM and AArch64 is moved to ARMTargetParserCommon
which both ARMTargetParser and AArch64TargetParser use.
Some of the information in AArch64TargetParser.def was unused or nonsensical
(CPU_ATTR, ARCH_ATTR, ARCH_FPU) because it reused data strutures from
ARMTargetParser where some of these make sense. These are removed.
Differential Revision: https://reviews.llvm.org/D137924
This patch adds a new feature flag:
sme-f16f16 to represent FEAT_SME-F16F16
This patch add the following instructions:
SME2.1 stand alone instructions:
MOVAZ (array to vector, four registers): Move and zero four ZA single-vector groups to vector registers.
(array to vector, two registers): Move and zero two ZA single-vector groups to vector registers.
(tile to vector, four registers): Move and zero four ZA tile slices to vector registers.
(tile to vector, single): Move and zero ZA tile slice to vector register.
(tile to vector, two registers): Move and zero two ZA tile slices to vector registers.
LUTI2 (Strided four registers): Lookup table read with 2-bit indexes.
(Strided two registers): Lookup table read with 2-bit indexes.
LUTI4 (Strided four registers): Lookup table read with 4-bit indexes.
(Strided two registers): Lookup table read with 4-bit indexes.
ZERO (double-vector): Zero ZA double-vector groups.
(quad-vector): Zero ZA quad-vector groups.
(single-vector): Zero ZA single-vector groups.
SME2p1 and SME-F16F16:
All instructions are half precision elements:
FADD: Floating-point add multi-vector to ZA array vector accumulators.
FSUB: Floating-point subtract multi-vector from ZA array vector accumulators.
FMLA (multiple and indexed vector): Multi-vector floating-point fused multiply-add by indexed element.
(multiple and single vector): Multi-vector floating-point fused multiply-add by vector.
(multiple vectors): Multi-vector floating-point fused multiply-add.
FMLS (multiple and indexed vector): Multi-vector floating-point fused multiply-subtract by indexed element.
(multiple and single vector): Multi-vector floating-point fused multiply-subtract by vector.
(multiple vectors): Multi-vector floating-point fused multiply-subtract.
FCVT (widening): Multi-vector floating-point convert from half-precision to single-precision (in-order).
FCVTL: Multi-vector floating-point convert from half-precision to deinterleaved single-precision.
FMOPA (non-widening): Floating-point outer product and accumulate.
FMOPS (non-widening): Floating-point outer product and subtract.
SME2p1 and B16B16:
BFADD: BFloat16 floating-point add multi-vector to ZA array vector accumulators.
BFSUB: BFloat16 floating-point subtract multi-vector from ZA array vector accumulators.
BFCLAMP: Multi-vector BFloat16 floating-point clamp to minimum/maximum number.
BFMLA (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-add by indexed element.
(multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-add by vector.
(multiple vectors): Multi-vector BFloat16 floating-point fused multiply-add.
BFMLS (multiple and indexed vector): Multi-vector BFloat16 floating-point fused multiply-subtract by indexed element.
(multiple and single vector): Multi-vector BFloat16 floating-point fused multiply-subtract by vector.
(multiple vectors): Multi-vector BFloat16 floating-point fused multiply-subtract.
BFMAX (multiple and single vector): Multi-vector BFloat16 floating-point maximum by vector.
(multiple vectors): Multi-vector BFloat16 floating-point maximum.
BFMAXNM (multiple and single vector): Multi-vector BFloat16 floating-point maximum number by vector.
(multiple vectors): Multi-vector BFloat16 floating-point maximum number.
BFMIN (multiple and single vector): Multi-vector BFloat16 floating-point minimum by vector.
(multiple vectors): Multi-vector BFloat16 floating-point minimum.
BFMINNM (multiple and single vector): Multi-vector BFloat16 floating-point minimum number by vector.
(multiple vectors): Multi-vector BFloat16 floating-point minimum number.
BFMOPA (non-widening): BFloat16 floating-point outer product and accumulate.
BFMOPS (non-widening): BFloat16 floating-point outer product and subtract.
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2022-09
Differential Revision: https://reviews.llvm.org/D137571
Underlying data may have requirements/expectations/etc. about
the run-time alignment. WritableMemoryBuffer currently uses
a 16 byte alignment, which works for many situations but not all.
Allowing a desired alignment makes it easier to reuse WritableMemoryBuffer
in situations of special alignment, and also removes a problem when
opening files with special alignment constraints. Large files generally
get mmaped, which has ~page alignment, but small files go through
WritableMemoryBuffer which has the much smaller alignment guarantee.
Differential Revision: https://reviews.llvm.org/D137820
The canonical multiarch tuples for LoongArch are defined in [the
LoongArch toolchain conventions][1] document. As the musl port is still
WIP, only the GNU triples are added for now.
The spec mentions `loongarch64-linux-gnuf64`, which is functionally the
same as the existing `loongarch64-linux-gnu` triple, only with the
floating-point ABI part explicitly spelled out. Both forms are
supported, but normalization of one into another is not implemented in
this patch, to give the ecosystem some time to experiment and discuss.
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-toolchain-conventions-EN.html
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D135751
Weak symbols can be overridden while they're in the NeverSearched state, but
should not be able to be overridden once they've been bound by some lookup.
Historically we guaranteed this by stripping the weak flag once a symbol as
bound, causing it to appear as if it were strong. In ffe2dda29f we changed
that behavior to retain weak flags on symbols (to facilitate tracking for
dynamic re-binding during dlopen). This test checks that we still fail as
required after ffe2dda29f.
Remove `std::forward` call for `iterator_range` iterator de-reference.
It fixes formatting usage for some tricky cases, like special ranges,
which de-reference to value type.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D94769
Remove `std::forward` call for `iterator_range` iterator de-reference.
It fixes formatting usage for some tricky cases, like special ranges,
which de-reference to value type.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D94769
This is very backend specific so either belongs in Toolchains/ARM or in
ARMTargetParser. Since it is used in lldb, ARMTargetParser made more sense.
This is part of an effort to move information about ARM/AArch64 architecture
versions, extensions and CPUs into their respective TargetParsers.
Differential Revision: https://reviews.llvm.org/D137564
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
A DIAssignID attachment is debug metadata, so don't drop it.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133292
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Add method:
Instruction::mergeDIAssignID(
ArrayRef<const Instruction* > SourceInstructions)
which merges the DIAssignID metadata attachments on `SourceInstructions` and
`this` and replaces uses of the original IDs with the new shared one.
This is used when stores are merged, for example sinking stores out of a
if-diamond CFG or vectorizing contiguous stores.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D133291
Previously reverted in 41f5a0004e. Fold in
D133576 previously reverted in d29d5ffb63.
---
The Assignment Tracking debug-info feature is outlined in this RFC:
https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir
Overview
It's possible to find intrinsics linked to an instruction by looking at the
MetadataAsValue uses of the attached DIAssignID. That covers instruction ->
intrinsic(s) lookup. Add a global DIAssignID -> instruction(s) map which gives
us the ability to perform intrinsic -> instruction(s) lookup. Add plumbing to
keep the map up to date through optimisations and add utility functions
including two that perform those lookups. Finally, add a unittest.
Details
In llvm/lib/IR/LLVMContextImpl.h add AssignmentIDToInstrs which maps DIAssignID
* attachments to Instruction *s. Because the DIAssignID * is the key we can't
use a TrackingMDNodeRef for it, and therefore cannot easily update the mapping
when a temporary DIAssignID is replaced.
Temporary DIAssignID's are only used in IR parsing to deal with metadata
forward references. Update llvm/lib/AsmParser/LLParser.cpp to avoid using
temporary DIAssignID's for attachments.
In llvm/lib/IR/Metadata.cpp add Instruction::updateDIAssignIDMapping which is
called to remove or add an entry (or both) to AssignmentIDToInstrs. Call this
from Instruction::setMetadata and add a call to setMetadata in Intruction's
dtor that explicitly unsets the DIAssignID so that the mappging gets updated.
In llvm/lib/IR/DebugInfo.cpp and DebugInfo.h add utility functions:
getAssignmentInsts(const DbgAssignIntrinsic *DAI)
getAssignmentMarkers(const Instruction *Inst)
RAUW(DIAssignID *Old, DIAssignID *New)
deleteAll(Function *F)
deleteAssignmentMarkers(const Instruction *Inst)
These core utils are tested in llvm/unittests/IR/DebugInfoTest.cpp.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D132224
This was done as a test for D137302 and it makes sense to push these changes
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D137493