Prior to this patch, libcalls inserted by the SelectionDAG legalizer
could never be tailcalled. The eligibility of libcalls for tail calling
is is partly determined by checking TargetLowering::isInTailCallPosition
and comparing the return type of the libcall and the calleer.
isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly
(which always returns false if not implemented by the target).
This patch provides a minimal implementation of
TargetLowering::isUsedByReturnOnly - enough to support tail calling
libcalls on hard float ABIs. Soft-float ABIs are left for a follow on
patch. libcall-tail-calls.ll also shows missed opportunities to tail
call integer libcalls, but this is due to issues outside of
the isUsedByReturnOnly hook.
Differential Revision: https://reviews.llvm.org/D131087
This adds a +forced-atomics target feature with the same semantics
as +atomics-32 on ARM (D130480). For RISCV targets without the +a
extension, this forces LLVM to assume that lock-free atomics
(up to 32/64 bits for riscv32/64 respectively) are available.
This means that atomic load/store are lowered to a simple load/store
(and fence as necessary), as these are guaranteed to be atomic
(as long as they're aligned). Atomic RMW/CAS are lowered to __sync
(rather than __atomic) libcalls. Responsibility for providing the
__sync libcalls lies with the user (for privileged single-core code
they can be implemented by disabling interrupts). Code using
+forced-atomics and -forced-atomics are not ABI compatible if atomic
variables cross the ABI boundary.
For context, the difference between __sync and __atomic is that the
former are required to be lock-free, while the latter requires a
shared global lock provided by a shared object library. See
https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed
discussion on the topic.
This target feature will be used by Rust's riscv32i target family
to support the use of atomic load/store without atomic RMW/CAS.
Differential Revision: https://reviews.llvm.org/D130621
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C)
it's possible that the add is used by multiple sras. We should
allow the combine if all the SRAs will eventually be updated.
After transforming all of the sras, the shls will share a single
(sext_inreg (add X, C1), i32).
This pattern occurs if an sra with 32 is used as index in multiple
GEPs with different scales. The shl from the GEPs will be combined
with the sra before we get a chance to match the sra pattern.
When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C)
ignore the use count on the (shl X, 32).
The sext_inreg after the transform is free. So we're only making
2 new instructions, the add and the shl. So we only need to be
concerned with replacing the original sra+add. The original shl
can have other uses. This helps if there are multiple different
constants being added to the same shl.
D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into
(seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it
will be turned back into an 'and' by SimplifyDemandedBits which
can cause an infinite loop.
To prevent this, check if bit 31 is 0 with computeKnownBits before
doing the transformation.
Fixes PR56905.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D131113
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.
This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130370
An unnecessary sext.w is generated when masking the result of the
riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the
intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed.
Although this isn't a particularly important optimisation, removing the
sext.w simplifies implementation of an additional cmpxchg-related
optimisation in D130192.
Although I can't produce a test with different codegen for the other
atomics intrinsics, these are added as well for completeness.
Differential Revision: https://reviews.llvm.org/D130191
This adds a merge operand to all of the binary _VL nodes. Including
integer and widening. They all share multiclasses in tablegen
so doing them all at once was easiest.
I plan to use FADD_VL in an upcoming patch. The rest are just for
consistency to keep tablegen working.
This does reduce the isel table size by about 25k so that's nice.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130816
This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.
To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.
I plan to do a similar update for trunc, floor, and ceil.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D130659
This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation.
Differential Revision: https://reviews.llvm.org/D129545
(srl (and X, 1<<C), C) is the form we receive for testing bit C.
An earlier combine removed the setcc so it wasn't there to match
when we created the SELECT_CC. This doesn't happen for BR_CC because
generic DAG combine rebuilds the setcc if it is used by BRCOND.
We can shift X left by XLen-1-C to put the bit to be tested in the
MSB, and use a signed compare with 0 to test the MSB.
The only difference between the combines were the calls to getNode
that include the true/false values for SELECT_CC or the chain
and branch target for BR_CC.
Wrap the rest of the code into a helper that reads LHS, RHS, and
CC and outputs new values and a bool if a new node needs to be
created.
If C > 10, this will require a constant to be materialized for the
And. To avoid this, we can shift X left by XLen-1-C bits to put the
tested bit in the MSB, then we can do a signed compare with 0 to
determine if the MSB is 0 or 1. Thanks to @reames for the suggestion.
I've implemented this inside of translateSetCCForBranch which is
called when setcc+brcond or setcc+select is converted to br_cc or
select_cc during lowering. It doesn't make sense to do this for
general setcc since we lack a sgez instruction.
I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31
and between 32 and 63 for both i32 and i64 where applicable. Select
has some deficiencies where we receive (and (srl X, C), 1) instead.
This doesn't happen for br_cc due to the call to rebuildSetCC in the
generic DAGCombiner for brcond. I'll explore improving select in a
future patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130203
This patch implements recently ratified extension Zmmul, a subextension
of M (Integer Multiplication and Division) consisting only
multiplication part of it.
Differential Revision: https://reviews.llvm.org/D103313
Reviewed By: craig.topper, jrtc27, asb
(and X, 0xffffffff) requires 2 shifts in the base ISA. Since we
know the result is being used by a compare, we can use a sext_inreg
instead of an AND if we also modify C1 to have 33 sign bits instead
of 32 leading zeros. This can also improve the generated code for
materializing C1.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D129980
This patch replaces some foreach with Arrayref, and abstract some same literal array with a variable.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125656
The former pattern will select as slliw+sraiw while the latter
will select as slli+srai. This can enable the slli+srai to be
compressed.
Differential Revision: https://reviews.llvm.org/D129688
When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.
vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)
We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)
It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.
Differential Revision: https://reviews.llvm.org/D129609
Only one caller didn't already have an MVT and that was easy to
fix. Since the return type is MVT and it uses MVT::getVectorVT,
taking an MVT as input makes the most sense.
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.
To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127871
I think it only makes sense to return true here if we aren't going
to turn around and create a constant pool for the immmediate.
I left out the check for useConstantPoolForLargeInts() thinking
that even if you don't want the commpiler to create a constant pool
you might still want to avoid materializing an integer that is
already available in a global variable.
Test file was copied from AArch64/ARM and has not been commited yet.
Will post separate review for that.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D129402
Similar for a subtract with a constant left hand side.
(sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine
for (sext (add (trunc X to i32), 32) to i32).
For RISCV, we should lower this as addiw which means turning it into
(sext_inreg (add X, C1)).
There is an existing DAG combine to convert back to (sext (add (trunc X
to i32), 32) to i32), but it requires isTruncateFree to return true
and for i32 to be a legal type as it used sign_extend and truncate
nodes. So that doesn't work for RISCV.
If the outer sra happens be used by a shl by constant, it will be
folded and the shift amount of the sra will be changed before we
can do our own DAG combine. This requires us to match the more
general pattern and restore the shl.
I had wanted to do this as a separate (add (shl X, 32), C1<<32) ->
(shl (add X, C1), 32) combine, but that hit an infinite loop for some
values of C1.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128869
The sext_inreg can often be folded into an earlier instruction by
using a W instruction. The sext_inreg also works better with our ABI.
This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128843
This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two.
The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too.
Differential Revision: https://reviews.llvm.org/D128758
Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127871
getRealMaxVLen returns an upper bound on the value of VLEN. We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case.
Note as well that this code already implicitly depends on a capped value for VLEN. If infinite VLEN were possible, than 16 bit indices wouldn't be enough.
This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with
different portions negated. It also adds a DAG combine to peek
through FNEG_VL to create these new opcodes.
This is modeled after similar code from X86.
This makes the isel patterns more regular and reduces the size of
the isel table by ~37K.
The test changes look like regressions, but they point to a bug that
was already there. We aren't able to commute a masked FMA instruction
to improve register allocation because we always use a mask undisturbed
policy. Prior to this patch we matched two multiply operands in a
different order and hid this issue for these test cases, but a different
test still could have encountered it.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D128310
According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.
Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.
For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D128286
This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to
remove creation of MachineSDNodes form get*Addr. This makes the
code consistent with the previous patches that added RISCVISD::HI,
ADD_LO, LLA, and TPREL_ADD.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D128325
Put it before the VL instead of as the first operand. I want to add
passthru to more operands, but the commutable ones like VADD_VL
require the commutable operands to be operand 0 and 1. So we can't
have the passthru as operand 0 for those.
Use it in place of VSELECT_VL+VRGATHER*_VL.
This simplifies the isel patterns.
Overall, I think trying to match select+op to create masked instructions
in isel doesn't scale. We either need to do it in DAG combine, pre-isel
peepole, or post-isel peephole. I don't yet know which is the right
answer, but for this case it seemed best to be able to request the
masked form directly from lowering.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D128023
Type legalization will convert the bitcast into a vector store and
scalar load.
Instead this patch widens the vector to v8i1 with undef, and bitcasts
it to i8. v8i1->i8 has custom handling for type legalization already to
bitcast to a v1i8 vector and use an extract_element.
The code here was lifted from X86's avx512 support.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D128099
This allows computeKnownBits to see the constant being loaded.
This recovers the rv64zbp test case changes from D127520.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127679
Rather than emitting a MachineSDNode from lowering. Let isel match it.
This is consistent with the RISCVISD::HI and ADD_LO nodes that were
also added. Having them both the same will make D127679 consistent.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127714
Instead add RISCVISD opcodes that will be selected to LUI/ADDI
during isel.
I'm looking into maybe moving doPeepholeLoadStoreADDI into isel.
Having the ADDI as a RISCVISD node will make it visible to isel.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127713
We were incorrectly creating a VRGATHER node with i1 vector type. We
could support this by promoting the mask to i8 and truncating it, but
for now I want to prevent the crash.
Fixes PR56007.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127681
This simplifies the isel code by removing the manual load creation.
It also improves our ability to use 0 strided loads for vector splats.
There is an assumption here that Mask and ShiftedMask constants are
cheap enough that they don't become constant pool loads so that our
isel optimizations involving And still work. I believe those constants
are 3 instructions in the worst case.
The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics
being expanded to RISCVISD also occuring during lowering. So the optimizations
were only happening during the last DAGCombine, which can't see through the
load. I believe we can fix this test by implementing
TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic
to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not
a ratified extension, I don't view these as blocking this patch.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D127520
This prevents them from being assumed legal by the cost model.
This matches what is done for AArch64 SVE.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D123799
Based on D24038.
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D126181
We enable a custom handler to optimize conversions between scalars
and fixed vectors. Unfortunately, the custom handler picks up scalar
to scalar conversions as well. If the scalar types are both legal,
we wouldn't match any of the fixed vector cases and would return SDValue()
causing the LegalizeDAG to expand the bitcast through memory.
This patch fixes this by checking if it's a scalar to scalar conversion
and returns `Op` if both types are legal.
Differential Revision: https://reviews.llvm.org/D126739
When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.
It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.
This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.
Differential Revision: https://reviews.llvm.org/D126558
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.
There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.
Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.
Differential Revision: https://reviews.llvm.org/D126085
This patch tries to solve the incoordination between the direct and intermediate cast caused by D123975.
This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast.
And it also changes the FP widenning pattern with the VL op.
Differential Revision: https://reviews.llvm.org/D125364
Update test to check MIR after finalize-isel instead of debug output.
This is of course not the only place we should preserve FMF, but
it's the most obvious one.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D126306
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:
* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)
Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.
I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.
Depends On D123347
Differential Revision: https://reviews.llvm.org/D123381
This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity.
Differential Revision: https://reviews.llvm.org/D125455
When building the final merged node, we were using the original chain
rather than the output chain of the new operation. After some collapsing
of the chain this could cause the loads be incorrectly scheduled respect
to later stores.
This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c
of the llvm testsuite.
https://reviews.llvm.org/D125560
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125323
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.
The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.
RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.
This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.
I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124639
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.
I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124109
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125298
This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four.
And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123975
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124096
We can't shift-right negative numbers to divide them, so avoid emitting
such sequences. Use negative numerators as a proxy for this situation, since
the indices are always non-negative.
An alternative strategy could be to add a compiler flag to emit division
instructions, which would at least allow us to test the VID sequence
matching itself.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123796
There's an existing generic combine that does this for legal types.
This patch adds a RISCV specific combine for W instructions.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123983
This patch fixes a bug when lowering BUILD_VECTOR via VID sequences.
After adding support for fractional steps in D106533, elements with zero
steps may be skipped if no step has yet been computed. This allowed
certain sequences to slip through the cracks, being identified as VID
sequences when in fact they are not.
The fix for this is to perform a second loop over the BUILD_VECTOR to
validate the entire sequence once the step has been computed. This isn't
the most efficient, but on balance the code is more readable and
maintainable than doing back-validation during the first loop.
Fixes the tests introduced in D123785.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123786
This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123841
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.
This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122951
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123418
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.
Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.
In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.
Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.
Portions of this code were taken from the VP reference patches.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122743
We can do this conversion by converting the same sized integer type, then compare the result with 0. The conversion is undefined if the converted FP value doesn't fit in an i1.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D122678
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.
Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.
This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122933
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.
Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D122185
This function now takes a uint64_t instead of an APInt. The caller
is responsible for masking the shift amount, extracting and inserting
into the KnownBits APInts, and inverting to compute zeros.
This is less code and cleaner division of responsibilities.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h
This is an alternative to D122454.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122458
Don't call EltVT.getSizeInBits() or SrcEltVT.getSizeInBits() a second
time. They are already in EltSize or SrcEltSize variables.
Refactor some comparisons to use multiply instead of division.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D122186
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.
Differential Revision: https://reviews.llvm.org/D122186
RISCVISelDAGToDAG's selectImm uses RISCVTargetLowering::getAddr
(specifically the ConstantPoolSDNode) as of 41454ab256 ("[RISCV] Use
constant pool for large integers"), but nothing explicitly instantiates
any of the templates, the only reason they exist is because of the
various lowering methods in RISCVISelLowering.cpp that themselves use
the methods. However, with inlining, those can end up not existing as
real functions and thus not be exported, leading to link errors. Up
until now this hasn't happened, but for whatever reason D121654 has
triggered this on the sanitizer-ppc64be-linux buildbot, giving:
../../../../lib/libLLVMRISCVCodeGen.a(RISCVISelDAGToDAG.cpp.o): In function `selectImm(llvm::SelectionDAG*, llvm::SDLoc const&, llvm::MVT, long, llvm::RISCVSubtarget const&)':
RISCVISelDAGToDAG.cpp:(.text._ZL9selectImmPN4llvm12SelectionDAGERKNS_5SDLocENS_3MVTElRKNS_14RISCVSubtargetE+0x3d8): undefined reference to `llvm::SDValue llvm::RISCVTargetLowering::getAddr<llvm::ConstantPoolSDNode>(llvm::ConstantPoolSDNode*, llvm::SelectionDAG&, bool) const'
collect2: error: ld returned 1 exit status
Fix this by explicitly instantiating getAddr in its four different forms
so separate translation units can reliably use it.
Fixes: 41454ab256 ("[RISCV] Use constant pool for large integers")
Since we have SPLAT_VECTOR_PARTS these days, I don't think we need
to go through extra lengths to avoid introducing an illegal scalar type.
We can just call getConstant using the scalable vector type and let
it create either a SPLAT_VECTOR or a SPLAT_VECTOR_PARTS.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D121645
Since we mark the pseudos as mayLoad but do not provide any MMOs,
isSafeToMove conservatively returns false, stopping MachineLICM from
hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a
load, so stop marking that as mayLoad to allow it to be hoisted, and for
the others make sure to add MMOs during lowering to indicate they're GOT
loads and thus can be freely moved.
Fixes https://github.com/llvm/llvm-project/issues/54372
Reviewed By: MaskRay, arichardson
Differential Revision: https://reviews.llvm.org/D121654
This code handles fixed vector SPLAT_VECTOR, but is never called in
any tests.
We only form fixed vector splat vectors for vXi64 on RV32 as part
of DAGCombine. This will be type legalized to SPLAT_VECTOR_PARTS.
So the Custom handling for SPLAT_VECTOR is never needed.
This patch makes SPLAT_VECTOR for vXi64 'Legal' on RV32 so that
DAGCombine will create it, but there's no need for Custom handler.
It will still be type legalized to SPLAT_VECTOR_PARTS.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D121673
If the type is less than XLenVT, type legalization will turn this
into (srl (bitreverse (bswap (srl (bswap X), C))), C). We can't
completely recover from these shifts. They introduce zeros into
the upper bits of the result and we can't easily tell if they are
needed. By doing a DAG combine early, we avoid introducing these
shifts.
Type legalize narrow RISCVISD::GREV/GORC with constant to a larger
type without switching to W. Detect sext_inreg+gorci/grevi with a
uimm5 immediate during isel to emit GREVIW/GORCIW.
This allows us to better propagate known bits information through
extended bits after type legalization. It will also simplify a
change I'm considering for BREV8 with Zbkb.
A future patch will add computeKnownBits support for GORC.
A further improvement here would be to use hasAllWUsers and
doPeepholeSExtW like we do for SLLIW, but I don't think we have
the test coverage for that yet.
We know the shift amount is a constant with bit 31 clear. anyext
of constant will be either zext or sext which will produce the
same result here. But we really shouldn't rely on that. It would
be valid to put a random number in the upper bits. Our isel patterns
expect the upper bits to be 0 so we should ask for it explicitly.
This doesn't appear to be needed any more. I did some inspecting
of the gcc torture suite and SPEC2006 with this removed and didn't
find any meaningful changes.
I think we're more aggressive about forming ADDIW now using
sign_extend_inreg during type legalization and hasAllWUsers in isel.
This probably helps catch the cases this helped with before.
Similar to what we do for other loads/stores, use the intrinsic
version that we already have custom isel for.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D121166
vslide1up/down have this flag set, but the value isn't a splat.
Rename for clarity.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D121037
With Zbb, abs is expanded to (max X, neg) by default. If X has 33 or
more sign bits, we can expand it a little early using negw instead of
neg to save a sext_inreg. If X started as a 32 bit value, type
legalization would have inserted a sext before the abs so X having
33 sign bits should always be true.
Note: I've used ISD::FREEZE here since we increase the number of uses.
Our default expansion for ABS doesn't do that, but I think that's a bug.
We can't do this with custom type legalization because ISD::FREEZE
doesn't propagate sign bits so later DAG combine won't expand be
able to see optmize it.
Alives2 https://alive2.llvm.org/ce/z/Gx3RNe
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D120597
Until Zfinx is supported in CodeGen we need to convert all Zfinx
register classes to GPR.
Remove the zfinx-types.ll test which didn't test anything meaningful
since -mattr=zfinx isn't implemented completely in llc.
Follow up to D93298.
This miscompile was introduced in D119527.
This was a special pattern for rotate+bswap on RV32. It doesn't
work for RV64 since the rotate needs to be half the bitwidth. The
equivalent pattern for RV64 is ROTR ((GREV x, 56), 32) so match
that instead.
This could be generalized further as noted in the new FIXME.
Reviewed By: Chenbing.Zheng
Differential Revision: https://reviews.llvm.org/D120686
This patch added the MC layer support of Zfinx extension.
Authored-by: StephenFan
Co-Authored-by: Shao-Ce Sun
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D93298
This lowers VECTOR_SPLICE of scalable vectors to a slidedown follow by a slideup.
Fixed vectors are encouraged to use shufflevector instruction. The equivalent patch
for fixed vectors is D119039.
I've used a tail agnostic slidedown and limited the VL to only the
elements that will not be overwritten by the slideup. The slideup
uses VLMax for its VL. It unfortunately uses tail undisturbed policy
but it isn't required as there is no tail. We just need the merge
operand to carry the bits for the lower portion of the result.
Care was taken to ensure that either the slideup or slidedown will
be able to use a .vi instruction when the immediate is small. Which
one uses the immediate depends on the sign of the immediate.
Reviewed By: frasercrmck, ABataev
Differential Revision: https://reviews.llvm.org/D119303
Default type legalization will create sext_inreg+abs, but we may
not be able to remove the sext_inreg.
Instead this patch expands abs during type legalization to
Y = sraiw X, 31; subw(xor X, Y), Y) which doesn't require the input
to be sign extended.
This gives a big improvement for some neg-abs tests where the
abs is used more than the the neg. Previously the abs was expanded
a different way before and after type legalization. Now they are
expanded in a similar way enabling more CSE.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D120636
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120302
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.
For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118974