This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Reported by D138359 - the overrides matched the base class schedule definition (its been flagged as WriteMicrocoded instead of WriteSystem but the models define both the same)
Reported by D138359 - the overrides matched the base class schedule definition (its been flagged as WriteMicrocoded instead of WriteSystem but the models define both the same)
Reported by D138359 - the overrides matched the base class schedule definition (in the case of VPERMDYrr it was entirely replacing uses of WriteVarShuffle256 so could that could be adjusted directly)
As stated in
https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528,
this implementation is very similar to ExpandLargeDivRem, which expands
‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions
with a bitwidth above a threshold into auto-generated functions. This is
useful for targets like x86_64 that cannot lower fp convertions with more
than 128 bits. The expanded nodes are referring from the IR generated by
`compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`,
and etc.
Corner cases:
1. For fp16: as there is no related builtins added in compliler-rt. So I
mainly utilized the fp32 <-> fp16 lib calls to implement.
2. For fp80: as this pass is soft fp emulation and no fp80 instructions can
help in this problem. I recommend users to deprecate this usage. For now, the
implementation uses fp128 as the temporary conversion type and inserts
fptrunc/ext at top/end of the function.
3. For bf16: as clang FE currently doesn't support bf16 algorithm operations
(convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for
now.
4. For unsigned FPToI: since both default hardware behaviors and libgcc are
ignoring "returns 0 for negative input" spec. This pass follows this old way
to ignore unsigned FPToI. See this example:
https://gcc.godbolt.org/z/bnv3jqW1M
The end-to-end tests are uploaded at https://reviews.llvm.org/D138261
Reviewed By: LuoYuanke, mgehre-amd
Differential Revision: https://reviews.llvm.org/D137241
We support AMX-FP16 isa in https://reviews.llvm.org/D135941 now.
The old intrinsic interface need to manually write tile registers.
So we support its new intrinsic interface to let it be able to do register allocation.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D138987
This was an old patch from when I was trying to improve pre-AVX scheduler support as part of D103695, we were missing port mappings entirely for these targets - although tbh they don't map well to the SandyBridge model that they currently use.
This patch fixes crashes related with how PatchableFunction selects the instruction to make patchable:
- Ensure PatchableFunction skips all instructions that don't generate actual machine instructions.
- Handle the case where the first MachineBasicBlock is empty
- Removed support for 16 bit x86 architectures.
Note: another issue remains related with PatchableFunction, in the lowering part.
See https://github.com/llvm/llvm-project/issues/59039
Differential Revision: https://reviews.llvm.org/D137642
The probes are all inserted at the iterator passed into the functions, so
that's where any EFLAGS clobbering will happen and where we need it to be dead.
Fixes: https://github.com/llvm/llvm-project/issues/59121
The znver2 override already matched the WriteBlendY class exactly, and the znver1 override wasn't accounting for ymm double-pumping.
Found with the help of D138359
We're using lowerShuffleAsPermuteAndUnpack, which can probably be improved to handle 256/512-bit types pretty easily.
First step towards trying to address the poor vector-shuffle-sse4a.ll pre-SSSE3 codegen mentioned on D127115
If one of the AND operands is a setcc then we're implicitly zeroing the upper mask bits
Similar pattern to regressions identified in D127115 (masked comparisons)
This patch makes code less readable but it will clean itself after all functions are converted.
Differential Revision: https://reviews.llvm.org/D138665
This reduces diffs between znver1/znver2 and should marginally speed up tlbgen build time (Issue #35303)
Found by adding a temp check inside InstRegexOp::apply inside single matches
I noticed a an assertion error when building MIPS code that loaded from
NULL. Loading from NULL ends up being a load with maximum alignment, and
due to integer truncation the value maximum was interpreted as 0 and the
assertion in MipsDAGToDAGISel::Select() failed. This previously happened
to work, but the maximum alignment was increased in
df84c1fe78, so it no longer fits into a 32
bit integer.
Instead of just fixing the one MIPS case, this patch removes all uses of
the deprecated getAlignment() call and replaces them with getAlign().
Differential Revision: https://reviews.llvm.org/D138420
We don't provide __extendhfxf2, and only have the soft-float
__extendhfsf2 in compiler-rt. This only changed recently with
655ba9c8a1, so this patch reverts back to the previous behavior.
However, the f80->f16 fptrunc is not easily implementable without
the compiler-rt __truncxfhf2, but that has always been true, and
isn't an immediate regression.
Patch by Ahmed Bougacha.
rdar://102194995
This patch is an alternative of D100091. It solved the problems in `f80` type lowering.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D137946
znver1/znver2 has barely any difference in behaviour between the AVX1/2 variants of these instructions - it looks like it was a copy+paste mistake to miss the AVX2 integer domain instructions in the overrides.
Having said that the override numbers don't appear to match the numbers in the AMD 17h SoGs very well - for instance vperm2f128/vperm2i128 might be microcoded from the AMD sense of >3 uops, but it doesn't have a 100cy latency..... These will need to be further addressed.
Before this patch, the code enumerated `getCondFromBranch`, `getCondFromSETCC` and `getCondFromFromCMov` to get the condition code of a `MachineInstr`, and assigned the result to variable `OldCC` when `MI || IsSwapped || ImmDelta != 0` was satisfiled.
After this patch, the `if-else` structure is eliminated by using `getCondFromMI`. Since `OldCC` is only used when `MI || IsSwapped || ImmDelta != 0` is true, it is initialized with `getCondFromMI` directly outside the scope of `if` now.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D138349
This appears to be a slow down vs Skylake (which the model was copied off) - confirmed with uops.info / instlatx64
Noticed as D138359 was reporting that many of the PACKS overrides were redundant, but were in fact incorrect