we follow the following rules to insert join block and join instruction
1: Legalize all the return block
when there are one more return blocks in machine function, there must be
branches, we need to reduce return blocks number down to 1
1.1: If two return blocks have common nearest parent branch, this two blocks
need to be joined, and we add a hasBeenJoined marker for this parent
branch
1.2: after we complete 1.1 process, there maybe one more return blocks, we
need to further add join block, we recursively build dominator tree for
these return blocks, first we find the nearest common dominator branch for
two return blocks, and then get dominator tree path between dominator
and each return block, we need to check this path in which whether any
other branch blocks exists, ideally, the branch block in path should have
been joined and marked, if not, this path is illegal, these two block can
not be joined
2: Insert join instructions
2.1: we scan through the MachineBasic blocks and check what blocks to insert
join instruction, below MBB represents MachineBasic Block
2.2: The MBB must have one more predecessors and its nearest dominator must
be a VBranch
2.3: Then we analyze the the predecessor of MBB, if the predecessor
has single successor, we add a join instruction to the predecessor end,
other wise, we need to insert a join block between predecessor and MBB
Because ventus riscv is designed specially for OpenCL language, we originally add or remove some language features mainly for serving OpenCL, but we now need to add customized `printf` function which is expected to be written in C, so we need also to add support for C language features in current ventus
Signed-off-by: zhoujingya <jing.zhou@terapines.com>
As the VentusRegExtInsertion pass will break the def-use chain, so it should
only run without -verify-machineinstrs and should only be run at the very end
of codegen pass.
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).
RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
Although i32 type is illegal in the backend, RV64I has pretty good support for i32 types by using W instructions.
By adding n32 to the DataLayout string, middle end optimizations will consider i32 to be a native type. One known effect of this is enabling LoopStrengthReduce on loops with i32 induction variables. This can be beneficial because C/C++ code often has loops with i32 induction variables due to the use of `int` or `unsigned int`.
If this patch exposes performance issues, those are better addressed by tuning LSR or other passes.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D116735
Enable Machine Combiner for O1/O2/O3 optimization levels. It makes RISCV
consistent with other targets running Machine Combiner.
Originally it was enabled only for -O3, however I looked through time reports
and usually it takes 0.1%-0.4% of total time, and never takes more than 1.0%.
Differential Revision: https://reviews.llvm.org/D136339
Split out from D129178, this just adds the GlobalMerge tests (other than global-merge-minsize.ll which is testing a specific configuration of the pass when it's enabled) and exposes `-riscv-enable-global-merge` and //doesn't enable it by default//.
Note that the comment "// FIXME: Unify control over GlobalMerge." is copied from the Arm and AArch64 backends, which expose the same flag. Presumably the author is imagining some later refactoring that provides a target-independent flag.
Reviewed By: craig.topper, reames, hiraditya
Differential Revision: https://reviews.llvm.org/D130481
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.
Differential Revision: https://reviews.llvm.org/D123264
If X is known positive by a dominating condition, we can fill in
ones into the upper bits of C1 if that would allow it to become an
simm12 allowing the use of ANDI.
This pattern often occurs in unrolled loops where the induction
variable has been widened.
To get the best benefit from this, I had to move the pass above
ConstantHoisting which is in addIRPasses. Otherwise the AND constant
is often hoisted away from the AND.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D129888
Initial optimization is to convert (i64 (zext (i32 X))) to
(i64 (sext (i32 X))) if the dominating condition for the basic block
guaranteed the sign bit of X is zero.
This frequently occurs in loop preheaders where a signed induction
variable that can never be negative has been widened. There will be
a dominating check that the 32-bit trip count isn't negative or zero.
The check here is not restricted to that specific case though.
A i32->i64 sext is cheaper than zext on RV64 without the Zba
extension. Later optimizations can often remove the sext from the
preheader basic block because the dominating block also needs a sext to
evaluate the greater than 0 check.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D129732
This adds the macrofusion plumbing and support fusing LUI+ADDI(W).
This is similar to D73643, but handles a different case. Other cases
can be added in the future.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D128393
Originally, `OptLevel` isn't passed into the `MachineFunctionPass`.
This lets the default parameter of `SelectionDAGISel`, which is
`CodeGenOpt::Default`, be passed in. OptLevelChanger captures the
optimization level with the parameter, and rather not the value
within `TargetMachine`. This lets the optimization be
unintentionally overwriten if other value than `CodeGenOpt::Default`
passed.
This patch fixes this by passing the optimization level rather
than using the default value.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D126641
When optimizing for size, this pass searches for instructions that are
prevented from being compressed by one of the following:
1. The use of a single uncompressed register.
2. A base register + offset where the offset is too large to be
compressed and the base register may or may not already be compressed.
In the first case, if there is a compressed register available, then the
uncompressed register is copied to the compressed register and its uses
replaced. This is only done if there are enough uses that code size
would be improved.
In the second case, if a compressed register is available, then the
original base register is copied and adjusted such that:
new_base_register = base_register + adjustment
base_register + large_offset = new_base_register + small_offset
and the uses of the base register are replaced with the new base
register. Again this is only done if there are enough uses for code size
to be improved.
This pass was authored by Lewis Revill, with large offset optimization
added by Craig Blackmore.
Differential Revision: https://reviews.llvm.org/D92105
Enable default outlining when the function has the minsize attribute.
`addr-label.ll` crashed after enabling this, so a barrier is added before
instruction selection as a workaround.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122213
RISCVMachineFunctionInfo has some fields like VarArgsFrameIndex and
VarArgsSaveSize are calculated at ISel lowering stage, those info are
not contained in MIR files, that cause test cases rely on those field
can't not reproduce correctly by MIR dump files.
This patch adding the MIR read/write for those fields.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123178
Using AArch64's original implementation for reference, this patch
implements a pass to remove unneeded copies of X0. This pass runs
after register allocation and looks to see if a register is implied
to be 0 by a branch in the predecessor basic block.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118160
Function calls and compare instructions tend to cause sext.w
instructions to be inserted. If we make good use of W instructions,
these operations can often end up being redundant. We don't always
detect these during SelectionDAG due to things like phis. There also
some cases caused by failure to turn extload into sextload in
SelectionDAG. extload selects to LW allowing later sext.ws to become
redundant.
This patch adds a pass that examines the input of sext.w instructions trying
to determine if it is already sign extended. Either by finding a
W instruction, other instructions that produce a sign extended result,
or looking through instructions that propagate sign bits. It uses
a worklist and visited set to search as far back as necessary.
Reviewed By: asb, kito-cheng
Differential Revision: https://reviews.llvm.org/D116397
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
For strided accesses the loop vectorizer seems to prefer creating a
vector induction variable with a start value of the form
<i32 0, i32 1, i32 2, ...>. This value will be incremented each
loop iteration by a splat constant equal to the length of the vector.
Within the loop, arithmetic using splat values will be done on this
vector induction variable to produce indices for a vector GEP.
This pass attempts to dig through the arithmetic back to the phi
to create a new scalar induction variable and a stride. We push
all of the arithmetic out of the loop by folding it into the start,
step, and stride values. Then we create a scalar GEP to use as the
base pointer for a strided load or store using the computed stride.
Loop strength reduce will run after this pass and can do some
cleanups to the scalar GEP and induction variable.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D107790
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.
For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D102737
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.
Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.
Differential Revision: https://reviews.llvm.org/D97732
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.
Differential Revision: https://reviews.llvm.org/D93168
To support OpenCL, which typically uses SPIR as an IR, non-zero address
spaces must be accounted for. This patch makes the RISC-V target assume
no-op address space casts across the board, which effectively removes
the need to support addrspacecast instructions in the backend.
For a RISC-V implementation with different configurations or specialized
address spaces where casts aren't no-ops, the function can be adjusted
as required.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D93536
Add simple pass for removing redundant vsetvli instructions within a basic block. This handles the case where the AVL register and VTYPE immediate are the same and no other instructions that change VTYPE or VL are between them.
There are going to be more opportunities for improvement in this space as we development more complex tests.
Differential Revision: https://reviews.llvm.org/D92679
Internally the pass skips any function with the optnone attribute. But that still requires checking each function. If the opt level is set to None we might as well just skip putting in the pipeline at all. This what is already done for many of the passes added by TargetPassConfig.
Differential Revision: https://reviews.llvm.org/D92511
- The goal of this patch is improve option compatible with RISCV-V GCC,
-mcpu support on GCC side will sent patch in next few days.
- -mtune only affect the pipeline model and non-arch/extension related
target feature, e.g. instruction fusion; in td file it called
TuneFeatures, which is introduced by X86 back-end[1].
- -mtune accept all valid option for -mcpu and extra alias processor
option, e.g. `generic`, `rocket` and `sifive-7-series`, the purpose is
option compatible with RISCV-V GCC.
- Processor alias for -mtune will resolve according the current target arch,
rv32 or rv64, e.g. `rocket` will resolve to `rocket-rv32` or `rocket-rv64`.
- Interaction between -mcpu and -mtune:
* -mtune has higher priority than -mcpu for pipeline model and
TuneFeatures.
[1] https://reviews.llvm.org/D85165
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D89025