Commit Graph

96 Commits

Author SHA1 Message Date
Yeting Kuo ed9638c44b [VP][RISCV] Add vp.nearbyint and RISC-V support.
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137685
2022-11-16 14:05:35 +08:00
Yeting Kuo 5c3ca10b09 [VP][RISCV] Add vp.bswap and RISC-V support.
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D137928
2022-11-16 11:36:38 +08:00
Craig Topper b6ad7ab89e [RISCV] Prevent autovectorization using vscale with Zvl32b.
RVVBitsPerBlock is 64. If VLen==32, VLen/RVVBitsPerBlock is 0.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D137280
2022-11-02 13:55:21 -07:00
Philip Reames 73482b457e [RISCV] Fix cost of legal fixed length masked load and stores
We can cost them the same way as a scalable masked load/store. By hitting the default path, we were costing them as if they were being scalarized. This is a significant over estimate.

Differential Revision: https://reviews.llvm.org/D137218
2022-11-02 07:24:38 -07:00
Yeting Kuo 71e4e35581 [VP][RISCV] Add vp.rint and RISC-V support.
FRINT uses dynamic rounding mode instead of static rounding mode. The patch
rename VFCVT_X_F_VL to VFCVT_RM_X_F_VL for static rounding mode uses and added
new ISDNode VFCVT_X_F_VL directly selected to PseudoVFCVT_X_F_V.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D136662
2022-11-01 14:52:47 +08:00
Craig Topper e94dc58dff [RISCV] Inline scalar ceil/floor/trunc/rint/round/roundeven.
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.

The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.

The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.

I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.

We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.

Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136508
2022-10-26 14:36:49 -07:00
Craig Topper d4dc036e70 [RISCV] Move vector cost table lookup out of the switch in getIntrinsicInstrCost. NFC
This allows vectors to be looked up if the switch is used for the
scalar version of an intrinsic.

Extracted from D136508.
2022-10-24 20:32:22 -07:00
Craig Topper 020450211b [RISCV] Add missing vscale x 1 cost model entries and tests.
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D136411
2022-10-21 09:05:59 -07:00
Craig Topper 44f0b13494 [RISCV] Correct RISCVTTIImpl::getRegUsageForType for vectors of pointers.
getPrimitiveSizeInBits returns 0 for pointers, we need to query
the size via DataLayout instead.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135976
2022-10-14 11:34:12 -07:00
Craig Topper de0de294eb [RISCV] Update cost of vector roundeven to match round which uses the same sequence but a different FRM value.
Reviewed By: reames, eopXD

Differential Revision: https://reviews.llvm.org/D134978
2022-09-30 20:01:35 -07:00
Philip Reames 02bfe2de7c [RISCV] Adjust vector immediate store materialization cost
This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement.

Differential Revision: https://reviews.llvm.org/D134746
2022-09-29 07:37:13 -07:00
Philip Reames 77b202f974 [RISCV] Rename getVectorImmCost to getStoreImmCost [nfc]
My original intent had been to reuse this for arithmetic instructions as well, but due to the availability of a immediate splat encoding there, we will need different heuristics.  So specialize the existing code for the store case.
2022-09-27 08:22:13 -07:00
jacquesguan ecf327f154 [RISCV] Add cost model for vector insert/extract element.
This patch adds cost model for vector insert/extract element instructions. In RVV, we could use vector scalar move instruction to insert or extract the first element, and use vslide to move it. But for mask vector or i64 vector in i32 target, we need special instructions to make it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133007
2022-09-14 11:10:18 +08:00
Haojian Wu 7ed68182d7 Fix a -Wswitch warning. 2022-09-13 08:57:43 +02:00
jacquesguan b98b4fae75 [RISCV] Add cost model for compare and select instructions.
This patch adds cost model for vector compare and select instructions. For vector FP compare instruction, it only add the comparisions supported natively.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132296
2022-09-13 14:44:46 +08:00
liqinweng 9b4e75ee76 [RISCV][COST] Add cost model for mask vector select instruction when its condition is a scalar type
Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D132992
2022-09-08 18:55:49 +08:00
Craig Topper 5d30565d80 [RISCV] Improve vector fround lowering by changing FRM.
This is a follow up to D133238 which did this for ceil/floor.

Reviewed By: arcbbb, frasercrmck

Differential Revision: https://reviews.llvm.org/D133335
2022-09-06 09:33:13 -07:00
Craig Topper f0332d12ae [RISCV] Improve vector fceil/ffloor lowering by changing FRM.
This adds new VFCVT pseudoinstructions that take a rounding mode operand. A custom inserter is used to insert additional instructions to change FRM around the
VFCVT.

Some of this is borrowed from D122860, but takes a somewhat different direction. We may migrate to that patch, but for now I was trying to keep this as independent from
RVV intrinsics as I could.

A followup patch will use this approach for FROUND too.

Still need to fix the cost model.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D133238
2022-09-05 19:03:44 -07:00
jacquesguan 45c1ce321d [RISCV] Add cost model for select and integer compare instructions.
This patch adds cost model for vector select and integer compare instructions.
2022-08-31 11:32:58 +08:00
liqinweng 72c9f811d8 [RISCV][COST] Refactor for costs of integer saturing add/sub
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132822
2022-08-30 11:39:55 +08:00
liqinweng a42e21deb8 [RISCV] Refactor for costs of integer min/max
Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D132724
2022-08-29 10:13:50 +08:00
Philip Reames a310637132 [RISCV] Disable SLP vectorization by default due to unresolved profitability issues
This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP.

Differential Revision: https://reviews.llvm.org/D132680
2022-08-26 14:11:22 -07:00
Philip Reames 53f738ce7e [RISCV] Add empirical costs for integer min/max and saturing add/sub
All of these are lowered to a single instruction for all legal vector types.
2022-08-25 09:27:17 -07:00
Philip Reames 03798f268b {RISCV] Backout cttz/ctlz instruction costs
Craig points out correctly in post-commit review that these depend on the availability of floating point extensions.
2022-08-24 15:40:48 -07:00
Philip Reames d4d6e71ea2 [RISCV] Add empirical costs for bswap/bitreverse/ctpop/ctlz/cttz
If anyone is looking for a source of ideas on vector codegen improvements, the lowerings for several of these seem to include pretty obvious fixits.
2022-08-24 15:09:21 -07:00
Philip Reames 42af1a776a [RISCV] Add empirically measured vector sqrt intrinsic costs 2022-08-24 14:27:57 -07:00
Philip Reames 4d3134866f [RISCV] Add vector fabs intrinsic costs
We have a fabs vector instruction, and are using it for current lowering.
2022-08-24 14:09:51 -07:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 478cf94378 [X86][AArch64][WebAsm][RISCV] Query operand properties instead of using enums directly [nfc]
This is part of an ongoing transition to use OperandValueInfo which combines OperandValueKind and OperandValueProperties.  This change adds some accessor methods and uses them to simplify backend code.  The primary motivation of doing so is removing uses of the parameters so that an upcoming api change is less error prone.
2022-08-22 13:37:59 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Philip Reames 59960e8db9 [RISCV] Factor out getVectorImmCost cost after 0e7ed3 [nfc] 2022-08-19 12:53:54 -07:00
Philip Reames e7fda46300 [RISCV] Correct costs for vector ceil/floor/trunc/round
Add vector costs for ceil/floor/trunc/round. As can be seen in the tests, the prior default costs were a significant under estimate of the actual code generated.

These costs are computed by simply generating code with the current backend, and then counting the number of instructions. I discount one vsetvli, and ignore the return.

Differential Revision: https://reviews.llvm.org/D131967
2022-08-19 10:37:39 -07:00
Alexey Bataev 0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Philip Reames 4d87591028 [RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL
On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Differential Revision: https://reviews.llvm.org/D131519
2022-08-18 13:10:03 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
jacquesguan 21bf59c92a [RISCV] Add cost model for mask vector extend and truncate instruction.
As extending from or truncating to mask vector do not use the same instructions as the normal cast, this path changed it to 2 which is the number of instructions we used.

Differential Revision: https://reviews.llvm.org/D131552
2022-08-11 10:55:43 +08:00
jacquesguan b6b1c0d1c4 [RISCV] Add cost model for fp-mask cast op.
The cost of convert from or to mask vector is different from other cases. We could not use PowDiff to calculate it. This patch set it to 3 as we use 3 instruction to make it.

Differential Revision: https://reviews.llvm.org/D131149
2022-08-10 17:14:37 +08:00
Fangrui Song de9d80c1c5 [llvm] LLVM_FALLTHROUGH => [[fallthrough]]. NFC
With C++17 there is no Clang pedantic warning or MSVC C5051.
2022-08-08 11:24:15 -07:00
jacquesguan b61cfc91ea [RISCV] Add cost modelling for vector widenning reduction.
In RVV, we use vwredsum.vs and vwredsumu.vs for vecreduce.add(ext(Ty A)) if the result type's width is twice of the input vector's SEW-width. In this situation, the cost of extended add reduction should be same as single-width add reduction. So as the vector float widenning reduction.

Differential Revision: https://reviews.llvm.org/D129994
2022-08-04 15:31:31 +08:00
Craig Topper 9b27d13204 [RISCV] Disable constant hoisting for multiply by negated power of 2.
A mul by a negated power of 2 is a slli followed by neg. This doesn't
require any constant materialization and may be lower latency than mul.
The neg may also be foldable into other arithmetic.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D130047
2022-07-27 09:37:59 -07:00
Lian Wang dca821d80a [RISCV] Add cost model for vector.reverse mask operation
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D128784
2022-07-15 06:58:57 +00:00
Craig Topper bc0d656558 [RISCV] Fix mistake in RISCVTTIImpl::getIntImmCostInst.
zext.w requires Zba not Zbb. The test was also wrong, but had the
correct comment.
2022-07-14 16:42:35 -07:00
Philip Reames aadc9d26a3 [RISCV] Cost model for scalable reductions
This extends the existing cost model for reductions for scalable vectors.

The existing cost model assumes that reductions are roughly logarithmic in cost for unordered variants and linear for ordered ones. This change keeps that same basic model, and extends it out to the maximum number of elements a scalable vector could possibly have.

This results in costs which aren't terribly high for unordered reductions, but are for ordered ones. This seems about right; we want to strongly bias away from using scalable ordered reductions if the cost might be linear in VL.

Differential Revision: https://reviews.llvm.org/D127447
2022-06-27 12:44:38 -07:00
Philip Reames 9803b0d1e7 [RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled
LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN).

The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable.

As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended.

Differential Revision: https://reviews.llvm.org/D128547
2022-06-25 11:25:23 -07:00
Philip Reames 4710e78974 [RISCV] Implement RISCVTTIImpl::getMaxVScale correctly
The comments in the existing code appear to pre-exist the standardization of the +v extension. In particular, the specification *does* provide a bound on the maximum value VLEN can take. From what I can tell, the LMUL comment was simply a misunderstanding of what this API returns.

This API returns the maximum value that vscale can take at runtime. This is used in the vectorizer to bound the largest scalable VF (e.g. LMUL in RISCV terms) which can be used without violating memory dependence.

Differential Revision: https://reviews.llvm.org/D128538
2022-06-24 16:51:53 -07:00
Philip Reames f1b1bcdbd4 [RISCV] Replace two calls to getMinRVVVectorSizeInBits with getRealMinVLen [nfc]
This doesn't change behavior, it just makes it slightly more obvious what's
going on.  Note that getRealMinVLen is always >= getMinRVVVectorSizeInBits.

The first case is a bit tricky, as you have to know that
getMinRVVVectorSizeInBits returns 0 when not set, and thus is equivalent
to the else value clause.  The new code structure makes it more obvious we
return 0 unless using RVV for fixed length vectors.
2022-06-24 12:07:33 -07:00
Philip Reames 0aebd1d875 [RISCV] Fix crash when costing scalable gather/scatter of pointer
This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax.

Differential Revision: https://reviews.llvm.org/D128219
2022-06-20 12:50:42 -07:00
Philip Reames ea690e7019 [RISCV] Rename VTy param of RISCVTTIImpl::getArithmeticReductionCost [NFC]
Having it be consistent with getMinMaxReductionCost for ease of copy paste outweights the minor clarity of calling it VTy instead of Ty.
2022-06-16 15:26:09 -07:00