This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.
This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
All ISD::BSWAP nodes are not customized lowered in RISC-V now, so the patch
removed dead code for ISD::BSWAP in LowerOperation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137907
The static chain parameter is a special parameter that is not passed in the usual argument registers or stack space. For example, in x64 System V ABI it is always passed in R10. Although the ABI of RISCV does not assign a register for this purpose, GCC had support for it on RISC-V a long time ago, and it is exposed via `__builtin_call_with_static_chain` intrinsic, and assign t2 for static chain parameters. This patch also chose t2 for compatibility.
In LLVM, static chain parameters are handled by the `nest` attribute of an argument to a function ([D6332](https://reviews.llvm.org/D6332)), so tests are added to ensure `nest` arguments are handled correctly.
Reviewed By: kito-cheng, MaskRay
Differential Revision: https://reviews.llvm.org/D129106
We can narrow one of the extends and keep the other original by
using a vwaddu.wv or vwadd.wv.
We were previously forgetting to keep the original operand and
instead took the source of its extend. This resulted in a type
mismatch that later failed with an impossible physical register copy.
To fix this I've refactored some code to maintain information about
whether the source needs to be extended at all for longer so we could
use it in materialize.
Differential Revision: https://reviews.llvm.org/D137106
We can use tail undisturbed vslide1down to insert into the vector.
This should make D136640 unneeded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136738
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136735
If Mask[0] is 0, then we're never going to match a slidedown. If
we get through the for loop, then it's an identity mask which should
have already been optimized out. Otherwise it's some non-contiguous
mask that will fail out of the lop. Might as well not bother entering
the loop.
sifive-7-series has macrofusion support to convert a branch over
a single instruction into a conditional instruction. This can be
an improvement if the branch is hard to predict.
This patch adds support for the most basic case, a branch over a
move instruction. This is implemented as a pseudo instruction so
we can hide the control flow until all code motion passes complete.
I've disabled a recent select optimization if this feature is enabled
in the subtarget.
Related gcc patch for the same optimization https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg211045.html
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135814
Same with (select C, X, -1), (select C, 0, X), and (select C, X, 0).
There's a DAGCombine after we turn the select into select_cc, but
that may introduce a setcc that didn't previously exist. We could
add more DAGCombines to remove the extra setcc, but this seemed lower
effort.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D135833
Continuing the theme of adding branchless lowerings for simple selects, this time handle the 0 arm case. This is very common for various umin idioms, etc..
Differential Revision: https://reviews.llvm.org/D135600
The scalar instruction of this is `llvm.trunc`. However the naming of
ISD::VP_TRUNC is already taken by `trunc` of the LLVM IR. Naming this as
`vp.ftrunc` would likely cause confusion with `vp.fptrunc`. So adding
`vp.roundtozero` that will look similar to `vp.roundeven`.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D135233
We can lower these as an or with the negative of the condition value. This appears to result in significantly less branch-y code on multiple common idioms (as seen in tests).
Differential Revision: https://reviews.llvm.org/D135316
This patch allows the combines that fold extensions in binary operations
to have more than one use.
The approach here is pretty conservative: if all the users of an
extension can fold the extension, then the folding is done, otherwise we
don't fold.
This is the first step towards avoiding the one-use limitation.
As a result, we make a decision to fold/don't fold for a web of
instructions. An instruction is part of the web of instructions as soon
as it consumes an extension that needs to be folded for all its users.
Because of how SDISel works a web of instructions can be visited over
and over. More precisely, if the folding happens, it happens for the
whole web and that's the end of it, but if the folding fails, the whole
web may be revisited when another member of the web is visited.
To avoid a compile time explosion in pathological cases, we bail out
earlier for webs that are bigger than a given threshold (arbitrarily set
at 18 for now.) This size can be changed using
`--riscv-lower-ext-max-web-size=<maxWebSize>`.
At the current time, I didn't see a better scheme for that. Assuming we
want to stick with doing that in SDISel.
Differential Revision: https://reviews.llvm.org/D133739
This patch centralizes all the combines of add|sub|mul with extended
operands in one "framework".
The rationale for this change is to offer a one-stop-shop for all these
transformations so that, in the future, it is easier to make combine
decisions for a web of instructions (i.e., instructions connected
through s|zext operands).
Technically this patch is not NFC because the new version is more
powerful than the previous version.
In particular, it diverges in two cases:
- VWMULSU can now also be produced from `mul(splat, zext)`, whereas
previously only `mul(sext, splat)` were supported when `splat`s were
involved. (As demonstrated in rvv/fixed-vectors-vwmulsu.ll)
- VWSUB(U) can now also be produced from `sub(splat, ext)`, whereas
previously only `sub(ext, splat)` were supported when `splat`s were
involved. (As demonstrated in rvv/fixed-vectors-vwsub.ll)
If we wanted, we could block these transformations to make this
patch really NFC. For instance, we could do something similar to
`AllowSplatInVW_W`, which prevents the combines to form vw(add|sub)(u)_w
when the RHS is a splat.
Regarding the "framework" itself, the bulk of the patch is some
boilderplate code that abstracts away the actual extensions that are
present in the DAG. This allows us to handle `vwadd_w(ext a, b)` as if
it was a regular `add(ext a, ext b)`. Since the node `ext b` doesn't
actually exist in the DAG, we have a bunch of methods (all in the
NodeExtensionHelper class) that fake all that for us.
The other half of the change is around `CombineToTry` and
`CombineResult`. These helper structures respectively:
- Represent the kind of combines that can be applied to a node, and
- Store what needs to happen to do that combine.
This can be viewed as a two step approach:
- First, check if a pattern applies, and
- Second apply it.
The checks and the materialization of the combines are decoupled so that
in the future we can perform several checks and do all the related
applies in one go.
Differential Revision: https://reviews.llvm.org/D134703
This information is not preserved in MIR today. So this patch adds
information to RISCVMachineFunctionInfo when the vreg is created for
the argument.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D134621
We want to emit a masked setcc that preserves zeros in all of the bits
where the original mask is zero. To do this we need to pass the original
mask as the passthru operand as well. Otherwise, we'll use the mask agnostic
policy and replace the zeros with 1s on some CPUs.
Differential Revision: https://reviews.llvm.org/D135122
These transforms were recently added (by me) in D134881. Looking at the code again, I realized we don't need the (and x, 0x1) portion of the pattern, we just need to know that the result of that sub-tree is either 0 or 1. Checking for this directly allows us to match slightly more broadly. The test changes are zext i1 arguments, but this could also kick in for e.g. shifts of high bits, or any other source of known bits.
Differential Revision: https://reviews.llvm.org/D135081
We have a very common pattern of dispatching between BUILD_VECTOR and SPLAT_VECTOR creation repeated in many cases in code. Common the pattern into a utility function.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.
This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.
@qcolombet spotted some of these in D134703.
VMV_V_X_VL nodes should always have a passthru, a splat, and a VL.
We were sometimes missing the VL.
This went unnoticed because these cases were all selected into the
following node to form a .vx or .vi instruction. The ComplexPattern
that does this, doesn't check the VL operand. I've added an assert
to the ComplexPattern to catch if the operand is missing.
@qcolombet spotted some of these in D134703.
This code is directly ported from the X86 backend which applies the same rewrite (along with several others). Planning on looking more closely at the other branchless variants from x86 to see if any are worth porting in future changes.
Motivation here is the coremark crc8 routine from https://github.com/eembc/coremark/blob/main/core_util.c#L165. This patch significantly reduces the number of unpredictable branches in the workload.
Differential Revision: https://reviews.llvm.org/D134881
Using this helper makes work about neutral elements more easier. Although I only
find one case now, I think it will have more chance to be used since so many
combine works are related to neutral elements.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D133866
Previous commit 8b00b24f85 missed to add `int_ceil` anchor for the
llvm.ceil.* section under LangRef.rst
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134586
Add vp.maxnum and vp.minnum which are vector predicted intrinsics of llvm.maxnum
and llvm.minnum.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D134639