Commit Graph

843 Commits

Author SHA1 Message Date
jacquesguan 0fe5f03eeb [RISCV][NFC] Use nested namespace definations.
Since we use C++17 now, we could use nested namespace definations to simplify code.

Differential Revision: https://reviews.llvm.org/D131751
2022-08-13 09:56:59 +08:00
Alex Bradbury 47b1f8362a [RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls
Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).

This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.

Differential Revision: https://reviews.llvm.org/D131087
2022-08-10 10:50:29 +01:00
Nikita Popov f5ed0cb217 [RISCV] Add target feature to force-enable atomics
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.

This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.

For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.

This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.

Differential Revision: https://reviews.llvm.org/D130621
2022-08-09 16:04:46 +02:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
Kazu Hirata a2d4501718 [llvm] Fix comment typos (NFC) 2022-08-07 00:16:14 -07:00
Craig Topper 12a1ca9c42 [RISCV] Relax another one use restriction in performSRACombine.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.

After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).

This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
2022-08-04 14:32:31 -07:00
Craig Topper a2de12c987 [RISCV] Relax a one use restriction performSRACombine
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).

The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
2022-08-04 11:25:08 -07:00
Craig Topper 53d560b22f [RISCV] Prevent infinite loop after D129980.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.

To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.

Fixes PR56905.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D131113
2022-08-03 15:19:07 -07:00
David Truby 9a976f3661 [llvm] Always use TargetConstant for FP_ROUND ISD Nodes
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.

This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D130370
2022-08-03 14:02:11 +01:00
Alex Bradbury 28f12a09ae [RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.

Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.

Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.

Differential Revision: https://reviews.llvm.org/D130191
2022-08-03 13:41:58 +01:00
Fraser Cormack 646e2f4803 [VP] Rename VP int<->float conversion ISD opcodes
These should be named like the non-VP versions for consistency.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130967
2022-08-03 10:04:38 +01:00
wanglian e208bab55f [RISCV][NFC] Use defined variable instead some code.
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D130687
2022-08-02 16:26:33 +08:00
Lorenzo Albano 71b7c03fd6 [RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D121113
2022-08-01 09:23:45 -07:00
Craig Topper d21b315360 [RISCV] Remove vmerges from vector ceil, floor, trunc lowering.
Use masked operations to suppress spurious exception bits being
set in fflags. Unfortunately, doing this adds extra copies.
2022-07-30 10:58:41 -07:00
Craig Topper a23f07fb1d [RISCV] Add merge operands to more RISCVISD::*_VL opcodes.
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.

I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.

This does reduce the isel table size by about 25k so that's nice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130816
2022-07-30 10:26:38 -07:00
Craig Topper 9bf305fe2b [RISCV] Swap the merge and mask operand order for VRGATHER*_VL and FCOPYSIGN_VL nodes.
Based on review feedback from D130816.
2022-07-30 09:57:05 -07:00
Craig Topper 2750873dfe [RISCV] Update lowerFROUND to use masked instructions.
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Reviewed By: reames, frasercrmck

Differential Revision: https://reviews.llvm.org/D130659
2022-07-28 10:05:19 -07:00
Craig Topper 89173dee71 [RISCV] Remove duplicate code. NFC
The same operations are part of `FloatingPointVecReduceOps` a little
bit earlier.
2022-07-28 10:05:19 -07:00
Craig Topper 1d1d8d6025 [RISCV] Reorder code in lowerFROUND to make the diff in D130659 cleaner. NFC 2022-07-27 17:13:04 -07:00
Craig Topper 98647330bf [RISCV] Add merge operand to RISCVISD::FCOPYSIGN_VL.
Similar to what was done for VRGATHER*_VL recently.

This will be used in D130659.
2022-07-27 15:25:34 -07:00
LiaoChunyu bf4f9a468a [RISCV]Enable isIntDivCheap when attribute is minsize
Don't expand divisions by constants when attribute is minsize.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D130543
2022-07-27 18:22:51 +08:00
Craig Topper 45944e7cf4 [RISCV] Refactor translateSetCCForBranch to prepare for D130508. NFC.
D130508 handles more constants than just 1 or -1. We need to extract
the constant instead of relying isOneConstant or isAllOnesConstant.
2022-07-25 15:54:54 -07:00
jacquesguan d8800ead62 [RISCV] Scalarize binop followed by extractelement.
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.

Differential Revision: https://reviews.llvm.org/D129545
2022-07-25 17:23:31 +08:00
Craig Topper 9adc00a9d0 [RISCV] Add a continue to reduce nesting. NFC 2022-07-23 17:36:12 -07:00
Kazu Hirata 1cc7f5bede Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2022-07-23 09:22:27 -07:00
Craig Topper add17fc8e4 [RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale)
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.

We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
2022-07-20 22:32:11 -07:00
Craig Topper 7dda6c71b1 [RISCV] Refactor the common combines for SELECT_CC and BR_CC into a helper function.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.

Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
2022-07-20 21:18:07 -07:00
Craig Topper 8983db15a3 [RISCV] Optimize (brcond (seteq (and X, 1 << C), 0))
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.

I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.

I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130203
2022-07-20 18:40:49 -07:00
ksyx 3198364e6e [RISCV][Clang] Add support for Zmmul extension
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.

Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
2022-07-18 20:26:08 -04:00
Craig Topper 0b02752899 [RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1)
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D129980
2022-07-18 10:54:45 -07:00
Simon Pilgrim 259c36e7c1 [DAG] Add asserts to isDesirableToCommuteWithShift overrides to ensure its being called from a shift. NFC. 2022-07-18 13:11:24 +01:00
jacquesguan 2b11174079 [RISCV][NFC] Use more Arrayref in TargetLowering functions.
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125656
2022-07-18 03:33:45 +00:00
Fangrui Song d955497112 [RISCV] Simplify lowerGlobalAddress. NFC 2022-07-17 15:42:45 -07:00
Craig Topper decf385c27 [RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR.
We were only handling AND before, but SimplifyDemandedBits can
also call it for OR and XOR.
2022-07-17 12:36:33 -07:00
Craig Topper 257755530a [RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32).
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.

Differential Revision: https://reviews.llvm.org/D129688
2022-07-13 14:34:17 -07:00
Philip Reames dde2a7fb6d [RISCV] Exploit fact that vscale is always power of two to replace urem sequence
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Differential Revision: https://reviews.llvm.org/D129609
2022-07-13 10:54:47 -07:00
Craig Topper c5be6a8308 [RISCV] Use X0 in place of VLMaxSentinel in lowering.
I thought I had already fixed all of these, but I guess I missed one.
2022-07-11 23:29:04 -07:00
Craig Topper c3c17b1695 [RISCV] Use MVT for the argument to getMaskTypeFor. NFC
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
2022-07-11 15:14:44 -07:00
Craig Topper 1a2bd44b77 [RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.

To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
2022-07-11 09:40:08 -07:00
LiaoChunyu 3f68f0f816 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-07-11 14:10:27 +08:00
Craig Topper 35ec8a423d [RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools.
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.

I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.

Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D129402
2022-07-10 14:10:17 -07:00
Lian Wang 9cfb28d672 [RISCV] Change VECTOR_SPLICE mask operation from expand to promote
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128717
2022-07-08 06:20:22 +00:00
Diego Caballero bf1758c3dc Revert "[RISCV] Optimize 2x SELECT for floating-point types"
This reverts commit 1178992c72.
2022-07-07 22:54:00 +00:00
Craig Topper 51d672946e [RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
Similar for a subtract with a constant left hand side.

(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).

For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).

There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.

If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.

I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128869
2022-06-30 09:01:24 -07:00
Craig Topper 9ace5af049 [RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C).
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.

This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128843
2022-06-30 09:01:24 -07:00
Philip Reames 860c62f53c [RISCV] Refine known bits for READ_VLENB
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.

The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.

Differential Revision: https://reviews.llvm.org/D128758
2022-06-28 15:42:14 -07:00
Lian Wang 96ab083622 [RISCV] Support VECTOR_REVERSE mask operation.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128627
2022-06-28 07:48:51 +00:00
LiaoChunyu 1178992c72 [RISCV] Optimize 2x SELECT for floating-point types
Including the following opcode:
 Select_FPR16_Using_CC_GPR
 Select_FPR32_Using_CC_GPR
 Select_FPR64_Using_CC_GPR

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D127871
2022-06-28 12:02:05 +08:00
Craig Topper ea1b861278 [RISCV] Fix misleading formatting and remove a dead getNode call. NFC 2022-06-27 18:49:57 -07:00
Philip Reames 0533b6e2f6 [RISCV] Remove a use of getMinVLen in favor of getRealMinVLen
The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.
2022-06-27 12:52:24 -07:00
Philip Reames a0443dd47c [RISCV] Simplify 16 bit index handling in lowerVECTOR_REVERSE [nfc]
getRealMaxVLen returns an upper bound on the value of VLEN.  We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case.

Note as well that this code already implicitly depends on a capped value for VLEN.  If infinite VLEN were possible, than 16 bit indices wouldn't be enough.
2022-06-24 13:08:39 -07:00
Philip Reames f1e1c3ce77 [RISCV] Replace two calls to getMinRVVVectorSizeInBits in fixed length lowering [nfc]
Both of these are only reached if useRVVForFixedLengthVectors is true.  Given that, we know that getRealMinVLen() == getMinRVVVectorSizeInBits().
2022-06-24 13:00:57 -07:00
Craig Topper c579ab53bd [RISCV] Move vfma_vl+fneg_vl matching to DAG combine.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.

This is modeled after similar code from X86.

This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.

The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128310
2022-06-24 00:00:37 -07:00
Craig Topper 8b10ffabae [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D128286
2022-06-23 08:49:18 -07:00
Craig Topper f912d21e67 [RISCV] Add RISCVISD opcodes for the rest of get*Addr.
This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to
remove creation of MachineSDNodes form get*Addr. This makes the
code consistent with the previous patches that added RISCVISD::HI,
ADD_LO, LLA, and TPREL_ADD.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D128325
2022-06-22 09:21:07 -07:00
Craig Topper 0efbf5bfbb [RISCV] Move the passthru operand for RISCVISD::VRGATHER*_VL nodes. NFC
Put it before the VL instead of as the first operand. I want to add
passthru to more operands, but the commutable ones like VADD_VL
require the commutable operands to be operand 0 and 1. So we can't
have the passthru as operand 0 for those.
2022-06-21 14:01:02 -07:00
Craig Topper e01353f816 [RISCV] Add RISCVISD opcode for PseudoAddTPRel.
Use it along with RISCVISD::HI and ADD_LO to avoid emitting
MachineSDNodes during lowering.
2022-06-20 20:56:52 -07:00
Kazu Hirata 0916d96d12 Don't use Optional::hasValue (NFC) 2022-06-20 20:17:57 -07:00
Craig Topper 16d3a82de5 [RISCV] Add merge operand to RISCVISD::VRGATHER*_VL nodes.
Use it in place of VSELECT_VL+VRGATHER*_VL.

This simplifies the isel patterns.

Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D128023
2022-06-20 18:58:24 -07:00
Kazu Hirata e0e687a615 [llvm] Don't use Optional::hasValue (NFC) 2022-06-20 10:38:12 -07:00
Craig Topper 545a71c0d6 [RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization
Type legalization will convert the bitcast into a vector store and
scalar load.

Instead this patch widens the vector to v8i1 with undef, and bitcasts
it to i8. v8i1->i8 has custom handling for type legalization already to
bitcast to a v1i8 vector and use an extract_element.

The code here was lifted from X86's avx512 support.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128099
2022-06-18 11:06:45 -07:00
Craig Topper cbf6737cc4 [RISCV] Use RVVBitsPerBlock instead of hardcoding multiples of 64. NFC 2022-06-17 14:10:39 -07:00
Craig Topper 9d7b01dc95 [RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad.
This allows computeKnownBits to see the constant being loaded.

This recovers the rv64zbp test case changes from D127520.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127679
2022-06-16 15:11:18 -07:00
Craig Topper 5afdceb82b [RISCV] Add RISCVISD opcode for PseudoLLA.
Rather than emitting a MachineSDNode from lowering. Let isel match it.

This is consistent with the RISCVISD::HI and ADD_LO nodes that were
also added. Having them both the same will make D127679 consistent.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127714
2022-06-16 15:11:03 -07:00
Craig Topper 4191de262f [RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr
Instead add RISCVISD opcodes that will be selected to LUI/ADDI
during isel.

I'm looking into maybe moving doPeepholeLoadStoreADDI into isel.
Having the ADDI as a RISCVISD node will make it visible to isel.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127713
2022-06-16 14:56:07 -07:00
Craig Topper e4062522d3 [RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes.
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.

Fixes PR56007.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127681
2022-06-13 13:41:39 -07:00
Craig Topper cef03e3dcd [RISCV] Move creation of constant pools from isel to lowering.
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.

There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.

The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D127520
2022-06-13 09:07:57 -07:00
Craig Topper e91051184c [RISCV] Mark FSIN and other math functions as Expand for scalable vectors.
This prevents them from being assumed legal by the cost model.

This matches what is done for AArch64 SVE.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123799
2022-06-10 08:40:07 -07:00
Shao-Ce SUN 862f30a428 [RISCV] Add ISD::EH_DWARF_CFA
Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D126181
2022-06-08 22:03:30 +08:00
Craig Topper aeb27f133a [RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled.
We enable a custom handler to optimize conversions between scalars
and fixed vectors. Unfortunately, the custom handler picks up scalar
to scalar conversions as well. If the scalar types are both legal,
we wouldn't match any of the fixed vector cases and would return SDValue()
causing the LegalizeDAG to expand the bitcast through memory.

This patch fixes this by checking if it's a scalar to scalar conversion
and returns `Op` if both types are legal.

Differential Revision: https://reviews.llvm.org/D126739
2022-06-01 08:13:49 -07:00
Craig Topper b09e54541a [RISCV] Use template version of SignExtend64 for constant extends. NFC
We were inconsistent about which one we used.
2022-05-27 13:11:15 -07:00
Craig Topper d0f65eaa85 [RISCV] Remove unused variables. NFC 2022-05-27 12:13:45 -07:00
Craig Topper aaad507546 [RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering.
When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.

It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.

This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.

Differential Revision: https://reviews.llvm.org/D126558
2022-05-27 11:05:18 -07:00
Philip Reames 8a3b6ba756 [RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.

There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.

Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.

Differential Revision: https://reviews.llvm.org/D126085
2022-05-26 15:25:47 -07:00
Craig Topper e9ac99b609 [RISCV] Simplfy creation of IndexVT in lowerMaskedGather/lowerMaskedScatter. NFC
The scalar element width is not a factor in how ContainerVT is
determined. We don't need to check the relative size of VT and
IndexVT.
2022-05-26 13:13:32 -07:00
jacquesguan b271488e8b [RISCV] Replace ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op.
This patch tries to solve the incoordination between the direct and intermediate  cast caused by D123975.
This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast.
And it also changes the FP widenning pattern with the VL op.

Differential Revision: https://reviews.llvm.org/D125364
2022-05-26 02:17:31 +00:00
Craig Topper 172149e98c [RISCV] Preserve fast math flags in lowerVPOp.
Update test to check MIR after finalize-isel instead of debug output.

This is of course not the only place we should preserve FMF, but
it's the most obvious one.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D126306
2022-05-25 09:16:07 -07:00
Paul Walker 258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Jay Foad 6bec3e9303 [APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.

The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.

Differential Revision: https://reviews.llvm.org/D125557
2022-05-19 11:23:13 +01:00
Paul Walker 7dd05ba9ed [SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes.
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:

* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)

Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.

I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.

Depends On D123347

Differential Revision: https://reviews.llvm.org/D123381
2022-05-16 20:47:52 +01:00
jacquesguan a8426ada49 [RISCV][NFC] Replace for-each with array argument call.
This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity.

Differential Revision: https://reviews.llvm.org/D125455
2022-05-16 02:12:48 +00:00
Craig Topper 5a19fbad83 [RISCV] Remove unneeded check for ISD::VSCALE operand being a constant. NFC
ISD::VSCALE only allows constant operands.
2022-05-14 13:45:03 -07:00
Roger Ferrer Ibanez 189ca6958e [RISCV] Use the new chain when converting a fixed RVV load
When building the final merged node, we were using the original chain
rather than the output chain of the new operation. After some collapsing
of the chain this could cause the loads be incorrectly scheduled respect
to later stores.

This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c
of the llvm testsuite.

https://reviews.llvm.org/D125560
2022-05-13 22:21:08 +00:00
Zakk Chen 7dfc56c107 [RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125323
2022-05-13 02:16:40 -07:00
Craig Topper 0ebb02b90a [RISCV] Override TargetLowering::shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd.
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.

The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.

RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.

This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.

I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124639
2022-05-11 11:13:17 -07:00
Craig Topper 0781742785 [RISCV] Add a DAG combine to pre-promote (i32 (and (srl X, Y), 1)) with Zbs on RV64.
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.

I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D124109
2022-05-11 10:49:16 -07:00
Fraser Cormack c1d48b35d8 [SelectionDAG][VP] Rename VP sext/zext/trunc ISD opcodes
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D125298
2022-05-11 10:25:51 +01:00
jacquesguan 2509dcd58a [RISCV] Add rvv codegen support for vp.fpext.
This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four.
And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123975
2022-05-11 03:28:25 +00:00
Fraser Cormack 0b2e7a7c72 [RISCV][NFC] Remove else after continue 2022-05-10 11:15:50 +01:00
Craig Topper 1d6430b9e2 [RISCV] Update isLegalAddressingMode for RVV.
RVV instructions only support base register addressing.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124820
2022-05-03 19:49:11 -07:00
Hsiangkai Wang eaaa31ff2c [RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C).
Follow-up to D122933.

Differential Revision: https://reviews.llvm.org/D124374
2022-05-03 03:51:36 +00:00
Yeting Kuo c069e37019 [RISCV] Add DAGCombine to fold base operation and reduction.
Transform (<bop> x, (reduce.<bop> vec, splat(neutral_element))) to
(reduce.<bop> vec, splat (x)).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122563
2022-04-30 14:07:05 +08:00
Philip Reames 3ea191ed03 [RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc] 2022-04-29 10:00:57 -07:00
Philip Reames f927be0df8 [RISCV] Extract getAllOnesMask helper [nfc] 2022-04-29 09:30:18 -07:00
Craig Topper ec11fbb1d6 [RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled.
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124096
2022-04-28 09:58:30 -07:00
Lian Wang dc0ae8ce18 [RISCV] Support VP_SETCC mask operations
Support VP_SETCC mask operations, turn it to logical operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124438
2022-04-28 08:52:29 +00:00
Shao-Ce SUN c59473aacc [NFC][RISCV][CodeGen] Use ArrayRef in TargetLowering functions
Based on D123467.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D123653
2022-04-26 23:53:00 +08:00
Fraser Cormack 3e678cb772 [RISCV] Don't emit fractional VIDs with negative steps
We can't shift-right negative numbers to divide them, so avoid emitting
such sequences. Use negative numerators as a proxy for this situation, since
the indices are always non-negative.

An alternative strategy could be to add a compiler flag to emit division
instructions, which would at least allow us to test the VID sequence
matching itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123796
2022-04-21 07:00:34 +01:00
Craig Topper 6db0afb44e [RISCV] Fold (xor (sllw 1, x), -1) -> (rolw ~1, x).
There's an existing generic combine that does this for legal types.
This patch adds a RISCV specific combine for W instructions.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123983
2022-04-19 15:03:43 -07:00
Fraser Cormack c5cac48549 [RISCV] Fix lowering of BUILD_VECTORs as VID sequences
This patch fixes a bug when lowering BUILD_VECTOR via VID sequences.
After adding support for fractional steps in D106533, elements with zero
steps may be skipped if no step has yet been computed. This allowed
certain sequences to slip through the cracks, being identified as VID
sequences when in fact they are not.

The fix for this is to perform a second loop over the BUILD_VECTOR to
validate the entire sequence once the step has been computed. This isn't
the most efficient, but on balance the code is more readable and
maintainable than doing back-validation during the first loop.

Fixes the tests introduced in D123785.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123786
2022-04-19 07:43:38 +01:00
jacquesguan 25445b94db [RISCV] Add rvv codegen support for vp.fptrunc.
This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D123841
2022-04-19 01:56:18 +00:00
jacquesguan 1aa4f0bb6c [RISCV][VP] Add RVV codegen for vp.trunc.
Differential Revision: https://reviews.llvm.org/D123579
2022-04-15 02:29:53 +00:00
Liqin Weng 8265679018 [RISCV][NFC] Refactor the type promotion of fsl/fsr/becompress/bdecompress/bfp
Reviewed By: asb, jrtc27, craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D123181
2022-04-13 08:52:04 +00:00
Craig Topper 2ce2562876 [RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64.
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.

This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122951
2022-04-11 14:38:39 -07:00
Craig Topper 76192182d0 [RISCV] Remove riscv-v-fixed-length-vector-elen-max command line option.
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D123418
2022-04-11 10:14:48 -07:00
Craig Topper e13a44b460 [RISCV] Add lowering for vp.sext and vp.zext.
Including mask vector inputs.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D123150
2022-04-06 09:59:49 -07:00
Fraser Cormack 6be5e875be [RISCV][VP] Add basic RVV codegen for vp.icmp
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.

Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.

In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.

Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.

Portions of this code were taken from the VP reference patches.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122743
2022-04-06 16:51:22 +01:00
Craig Topper 3c831c9b28 [RISCV] Add support for vp.fptosi where the result is a mask type.
We can do this conversion by converting the same sized integer type, then compare the result with 0. The conversion is undefined if the converted FP value doesn't fit in an i1.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122678
2022-04-05 09:48:04 -07:00
Craig Topper d970e96c53 [RISCV] Add lowering for vp.fptoui and vp.uitofp.
This is a straightforward extension of D122512 to unsigned integers.
2022-04-01 18:28:46 -07:00
Craig Topper fa630e7594 [RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1).
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.

Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.

This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122933
2022-04-01 13:14:10 -07:00
Craig Topper 7417eb29ce [RISCV] Use getSplatBuildVector instead of getSplatVector for fixed vectors.
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.

Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D122185
2022-03-30 11:36:34 -07:00
Craig Topper 45e85feba6 [RISCV] Pull APInt/computeKnonwbits specifics out of computeGREVOrGORC. NFC
This function now takes a uint64_t instead of an APInt. The caller
is responsible for masking the shift amount, extracting and inserting
into the KnownBits APInts, and inverting to compute zeros.

This is less code and cleaner division of responsibilities.
2022-03-28 20:53:54 -07:00
Shao-Ce SUN 662b9fa02c [NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557
2022-03-29 09:53:24 +08:00
Craig Topper 01203918d1 [RISCV] Add computeKnownBits support for RISCVISD::GORC.
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D121575
2022-03-28 16:56:33 -07:00
Craig Topper e68257fcee [RISCV][SelectionDAG] Enable TargetLowering::hasBitTest for masks that fit in ANDI.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h

This is an alternative to D122454.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D122458
2022-03-28 12:46:36 -07:00
Craig Topper cfe533da05 [RISCV] Add lowering for vp.fptosi and vp.sitofp.
This as an alternative version of D120641. Starting from the code here
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/raw/EPI/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
but with some modifications to how the interim types are calculated,
and adding support for f16.

Still need to add fptosi for mask vectors.

Lots of masked isel patterns added so we can pass the mask through
the type changes.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D122512
2022-03-28 11:06:41 -07:00
Craig Topper 6c90a654bb [RISCV] Simplify some code in lowering vector int<->fp conversions. NFC
Don't call EltVT.getSizeInBits() or SrcEltVT.getSizeInBits() a second
time. They are already in EltSize or SrcEltSize variables.

Refactor some comparisons to use multiply instead of division.
2022-03-23 12:09:05 -07:00
Craig Topper 51940d69cb [RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.

We already special cased sign extended constants, this extends it
to any sign extended value.

I've only added tests for one case of vadd. Most intrinsics go
through the same check.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D122186
2022-03-22 10:29:06 -07:00
Craig Topper cc5b0868ff Revert "[RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32."
This reverts commit 8c4937b33f.

Committed by mistake.
2022-03-21 14:58:11 -07:00
Craig Topper d4aeb5000f [RISCV] Simplify some code. NFC 2022-03-21 14:50:56 -07:00
Craig Topper 19de2e8db6 [RISCV] Remove stray slash from comment. NFC 2022-03-21 14:50:56 -07:00
Craig Topper 8c4937b33f [RISCV] Special case sign extended scalars when type legalizing nxvXi64 .vx instrinsics on RV32.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.

We already special cased sign extended constants, this extends it
to any sign extended value.

I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.

Differential Revision: https://reviews.llvm.org/D122186
2022-03-21 14:50:55 -07:00
Jessica Clarke 63ea7797dd [RISCV] Fix buildbot breakage by explicitly instantiating templates
RISCVISelDAGToDAG's selectImm uses RISCVTargetLowering::getAddr
(specifically the ConstantPoolSDNode) as of 41454ab256 ("[RISCV] Use
constant pool for large integers"), but nothing explicitly instantiates
any of the templates, the only reason they exist is because of the
various lowering methods in RISCVISelLowering.cpp that themselves use
the methods. However, with inlining, those can end up not existing as
real functions and thus not be exported, leading to link errors. Up
until now this hasn't happened, but for whatever reason D121654 has
triggered this on the sanitizer-ppc64be-linux buildbot, giving:

  ../../../../lib/libLLVMRISCVCodeGen.a(RISCVISelDAGToDAG.cpp.o): In function `selectImm(llvm::SelectionDAG*, llvm::SDLoc const&, llvm::MVT, long, llvm::RISCVSubtarget const&)':
  RISCVISelDAGToDAG.cpp:(.text._ZL9selectImmPN4llvm12SelectionDAGERKNS_5SDLocENS_3MVTElRKNS_14RISCVSubtargetE+0x3d8): undefined reference to `llvm::SDValue llvm::RISCVTargetLowering::getAddr<llvm::ConstantPoolSDNode>(llvm::ConstantPoolSDNode*, llvm::SelectionDAG&, bool) const'
  collect2: error: ld returned 1 exit status

Fix this by explicitly instantiating getAddr in its four different forms
so separate translation units can reliably use it.

Fixes: 41454ab256 ("[RISCV] Use constant pool for large integers")
2022-03-18 02:22:17 +00:00
Craig Topper 7e15303062 [RISCV] Simplify scalable vector case in lowerVectorMaskExt.
Since we have SPLAT_VECTOR_PARTS these days, I don't think we need
to go through extra lengths to avoid introducing an illegal scalar type.
We can just call getConstant using the scalable vector type and let
it create either a SPLAT_VECTOR or a SPLAT_VECTOR_PARTS.

Reviewed By: frasercrmck, rogfer01

Differential Revision: https://reviews.llvm.org/D121645
2022-03-17 09:43:13 -07:00
Jessica Clarke 659363c0cc [RISCV] Ensure PseudoLA* can be hoisted
Since we mark the pseudos as mayLoad but do not provide any MMOs,
isSafeToMove conservatively returns false, stopping MachineLICM from
hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a
load, so stop marking that as mayLoad to allow it to be hoisted, and for
the others make sure to add MMOs during lowering to indicate they're GOT
loads and thus can be freely moved.

Fixes https://github.com/llvm/llvm-project/issues/54372

Reviewed By: MaskRay, arichardson

Differential Revision: https://reviews.llvm.org/D121654
2022-03-16 18:45:36 +00:00
Craig Topper 06c5d74090 [RISCV] Remove lowerSPLAT_VECTOR
This code handles fixed vector SPLAT_VECTOR, but is never called in
any tests.

We only form fixed vector splat vectors for vXi64 on RV32 as part
of DAGCombine. This will be type legalized to SPLAT_VECTOR_PARTS.
So the Custom handling for SPLAT_VECTOR is never needed.

This patch makes SPLAT_VECTOR for vXi64 'Legal' on RV32 so that
DAGCombine will create it, but there's no need for Custom handler.
It will still be type legalized to SPLAT_VECTOR_PARTS.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D121673
2022-03-15 08:22:13 -07:00
Craig Topper eeb3bfd74a [RISCV] Merge ReplaceNodeResults code for SHFL and GREV/GORC. NFC 2022-03-13 18:42:26 -07:00
Lehua Ding 1648852c98 [RISCV][RVV] Fix vslide1up/down intrinsics overflow bug for SEW=64 on RV32
Reviewed By: craig.topper, kito-cheng

Differential Revision: https://reviews.llvm.org/D120899
2022-03-13 18:06:09 +08:00
Craig Topper fd4d584d6b [RISCV] Add DAGCombine to fold (bitreverse (bswap X)) to brev8 with Zbkb.
If the type is less than XLenVT, type legalization will turn this
into (srl (bitreverse (bswap (srl (bswap X), C))), C). We can't
completely recover from these shifts. They introduce zeros into
the upper bits of the result and we can't easily tell if they are
needed. By doing a DAG combine early, we avoid introducing these
shifts.
2022-03-12 16:39:39 -08:00
Craig Topper 43f668b98e [RISCV] Move GORCIW/GREVIW formation to isel patterns.
Type legalize narrow RISCVISD::GREV/GORC with constant to a larger
type without switching to W. Detect sext_inreg+gorci/grevi with a
uimm5 immediate during isel to emit GREVIW/GORCIW.

This allows us to better propagate known bits information through
extended bits after type legalization. It will also simplify a
change I'm considering for BREV8 with Zbkb.

A future patch will add computeKnownBits support for GORC.

A further improvement here would be to use hasAllWUsers and
doPeepholeSExtW like we do for SLLIW, but I don't think we have
the test coverage for that yet.
2022-03-11 18:02:47 -08:00
Craig Topper 337d49da84 [RISCV] Fix typo in comment. NFC 2022-03-10 22:00:18 -08:00
Craig Topper 1f3a8d58a6 [RISCV] Use ZERO_EXTEND instead of ANY_EXTEND when promoting i32 RISCVISD::SHFL. NFC
We know the shift amount is a constant with bit 31 clear. anyext
of constant will be either zext or sext which will produce the
same result here. But we really shouldn't rely on that. It would
be valid to put a random number in the upper bits. Our isel patterns
expect the upper bits to be 0 so we should ask for it explicitly.
2022-03-10 20:57:04 -08:00
Craig Topper 9ce6b1ca86 [RISCV] Remove performANY_EXTENDCombine.
This doesn't appear to be needed any more. I did some inspecting
of the gcc torture suite and SPEC2006 with this removed and didn't
find any meaningful changes.

I think we're more aggressive about forming ADDIW now using
sign_extend_inreg during type legalization and hasAllWUsers in isel.
This probably helps catch the cases this helped with before.
2022-03-10 11:29:31 -08:00
Luke 0803dba7dd [RISCV] Add fixed-length vector instrinsics for segment load
Inspired by reviews.llvm.org/D107790.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D119834
2022-03-10 16:23:40 +08:00
Craig Topper d53707508a [RISCV] Remove RISCVISD::VLE_VL/VSE_VL. Use intrinsics instead.
Similar to what we do for other loads/stores, use the intrinsic
version that we already have custom isel for.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D121166
2022-03-09 22:44:28 -08:00
Craig Topper 845bfcede1 [RISCV] Rename 'SplatOperand' to 'ScalarOperand'. NFC
vslide1up/down have this flag set, but the value isn't a splat.
Rename for clarity.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D121037
2022-03-07 11:28:32 -08:00
Craig Topper bd5f124716 [RISCV] Add SimplifyDemandedBits support for FSR/FSL/FSRW/FSLW. 2022-03-05 21:26:51 -08:00
Craig Topper 232f57319d [RISCV] Move vslide1up/down intrinsics into lowerVectorIntrinsicSplats. NFC
Rename to lowerVectorIntrinsicScalars.

This allows us to share the code that checks if the scalar needs
to be type legalized.
2022-03-04 18:21:53 -08:00
Craig Topper 3d4e83f17d [RISCV] With Zbb, fold (sext_inreg (abs X)) -> (max X, (negw X))
With Zbb, abs is expanded to (max X, neg) by default. If X has 33 or
more sign bits, we can expand it a little early using negw instead of
neg to save a sext_inreg. If X started as a 32 bit value, type
legalization would have inserted a sext before the abs so X having
33 sign bits should always be true.

Note: I've used ISD::FREEZE here since we increase the number of uses.
Our default expansion for ABS doesn't do that, but I think that's a bug.

We can't do this with custom type legalization because ISD::FREEZE
doesn't propagate sign bits so later DAG combine won't expand be
able to see optmize it.

Alives2 https://alive2.llvm.org/ce/z/Gx3RNe

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D120597
2022-03-03 15:42:29 -08:00
Craig Topper 6cb42cd666 [RISCV] More correctly ignore Zfinx register classes in getRegForInlineAsmConstraint.
Until Zfinx is supported in CodeGen we need to convert all Zfinx
register classes to GPR.

Remove the zfinx-types.ll test which didn't test anything meaningful
since -mattr=zfinx isn't implemented completely in llc.

Follow up to D93298.
2022-03-02 11:22:46 -08:00
Craig Topper a1f8349d77 [RISCV] Don't combine ROTR ((GREV x, 24), 16)->(GREV x, 8) on RV64.
This miscompile was introduced in D119527.

This was a special pattern for rotate+bswap on RV32. It doesn't
work for RV64 since the rotate needs to be half the bitwidth. The
equivalent pattern for RV64 is ROTR ((GREV x, 56), 32) so match
that instead.

This could be generalized further as noted in the new FIXME.

Reviewed By: Chenbing.Zheng

Differential Revision: https://reviews.llvm.org/D120686
2022-03-02 09:47:06 -08:00
Shao-Ce SUN 0e38b29543 [RISCV] add the MC layer support of Zfinx extension
This patch added the MC layer support of Zfinx extension.

Authored-by: StephenFan
Co-Authored-by: Shao-Ce Sun

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D93298
2022-03-02 14:25:19 +08:00
Craig Topper b9d6e8c441 [RISCV] Lower VECTOR_SPLICE to RVV instructions.
This lowers VECTOR_SPLICE of scalable vectors to a slidedown follow by a slideup.
Fixed vectors are encouraged to use shufflevector instruction. The equivalent patch
for fixed vectors is D119039.

I've used a tail agnostic slidedown and limited the VL to only the
elements that will not be overwritten by the slideup. The slideup
uses VLMax for its VL. It unfortunately uses tail undisturbed policy
but it isn't required as there is no tail. We just need the merge
operand to carry the bits for the lower portion of the result.

Care was taken to ensure that either the slideup or slidedown will
be able to use a .vi instruction when the immediate is small. Which
one uses the immediate depends on the sign of the immediate.

Reviewed By: frasercrmck, ABataev

Differential Revision: https://reviews.llvm.org/D119303
2022-03-01 10:10:13 -08:00
Craig Topper e83db8c001 [RISCV] Only enable combineROTR_ROTL_RORW_ROLW with Zbp.
I think the immediate values we check for on the GREV nodes already
protect this, but better to be explicit.
2022-02-28 12:47:36 -08:00
Craig Topper b083157b7b [RISCV] Don't call combineROTR_ROTL_RORW_ROLW for SLLW/SRLW/SRAW nodes. NFC
I think the function does the correct thing internally, but it's
confusing to read.
2022-02-28 11:05:10 -08:00
Craig Topper f46890711f [RISCV] Custom type legalize i32 ISD::ABS on RV64 without Zbb.
Default type legalization will create sext_inreg+abs, but we may
not be able to remove the sext_inreg.

Instead this patch expands abs during type legalization to
Y = sraiw X, 31; subw(xor X, Y), Y) which doesn't require the input
to be sign extended.

This gives a big improvement for some neg-abs tests where the
abs is used more than the the neg. Previously the abs was expanded
a different way before and after type legalization. Now they are
expanded in a similar way enabling more CSE.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D120636
2022-02-28 09:30:27 -08:00
Chenbing Zheng b20e80aa59 [RISCV] DAG Combine vcpop and vfirst with VL=0 to li imm
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D120302
2022-02-25 14:44:25 +08:00
Craig Topper a975ca97c3 [RISCV] Fold (sext_inreg (fmv_x_anyexth X), i16) -> (fmv_x_signexth X).
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.

For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D118974
2022-02-24 09:19:01 -08:00
Shao-Ce SUN 78b5f0fb05 [NFC][RISCV] Reuse ISD::NodeType in float extension
Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D120412
2022-02-24 19:57:55 +08:00
Craig Topper 5b7ac107b1 [RISCV] Use SelectionDAG::getFreeze to simplify some code. NFC 2022-02-23 21:13:01 -08:00