Commit Graph

117 Commits

Author SHA1 Message Date
Aries e6b7935c89 [Ventus] ABI and stack adjustment.
Remove all SGPRs(except ra) from callee saved register set, as they are mainly used in kernel function.
Unify the stack to use TP only, we will emit customized instructions for SP use which should not be
considered as stack according to LLVM codegen infrastructure(only 1 stack is allowed).
By unifying the stack to TP based, it is much easiler for the backend codegen.
2023-06-21 13:08:02 +08:00
zhoujing 513412bb33 [VENTUS][RISCV][fix] Fix building libclc errors 2023-06-16 17:42:22 +08:00
zhoujing 6636793f64 Merge libclc-vector-support 2023-06-16 09:41:08 +08:00
zhoujing c30c837caa [VENTUS][RISCV][fix] Fix SP stack size calculation error 2023-06-15 18:12:34 +08:00
zhoujing c60810b243 [VENTUS][RISCV][feat] Modify SP stack size calculation
Add initial SP stack size calculation support, still remains many issues
2023-06-12 13:27:55 +08:00
zhoujing faf6a0bcd9 [VENTUS][RISCV][fix] Add initial Tp stack size calculation
Cause there are two stacks in Ventus, we need to seperate TP stack and SP stack,
this commit just add very initial support for TP stack size calculation
2023-06-11 12:18:39 +08:00
zhoujing 033505de1d [VENTUS][RISCV][fix] Modify calling convention 2023-06-05 17:11:25 +08:00
zhoujing 967cb725c8 [VENTUS][RISCV][feat] Set ventus kernel for OpenCL kernel functions 2023-06-05 13:10:35 +08:00
zhoujingya 9d9283fa7b [VENTUS][RISCV][fix] Fix ventus abi and calling convention
Kernel functions use sp as GPRs spill stack slots
Non-kernel functions use tp as VGPRs spill stack slots
2023-04-20 15:27:52 +08:00
zhoujingya f28e6c5e38 [VENTUS][RISCV][feat] Add vararg backend support in ventus
We adjust the stack growing direction early months for OpenCL, in order to be
compatible with current architecture, we need to do some modification to
support vararg
2023-04-18 10:03:53 +08:00
Aries 438f1c92c4 Fix some build warnings 2023-01-19 09:45:27 +08:00
Aries a173844ae5 Grow Ventus GPGPU stack upwards instead of downwards 2023-01-04 10:29:53 +08:00
Aries 9925e4e511 Define callee saved registers for Ventus GPGPU.
Initially implemented 2 stacks support for sGPR spill/restore stack and per-thread stack,
but stack size calculation is computed as a sum of 2 stacks(this works but wastes lot of
spaces).
Now TP register is used as per-thread stack pointer, SP register is used for sGPR spill/restore.
Clean up RVV related stack frame code etc.
2022-12-28 16:37:38 +08:00
Aries 424ea45e4f Update Ventus GPGPU ABI: X4 as stack pointer, V0-V31 as arguments registers etc 2022-12-28 13:11:22 +08:00
Aries 228be521e5 Add initial different stack frame support for sALU and vALU.
FIXME: The stack pointer RISCV::X4 for vALU is not yet correctly used, but related infrastructure
should work(MFI.isEntryFunction() is used to check RISCV::X2 or RISCV::X4 to be used as stack pointer).
2022-12-27 18:28:51 +08:00
Aries 8c531048c2 Initially add vector load/store instruction and related codegen 2022-12-21 16:27:39 +08:00
Philip Reames 14d993435b [RISCV] Inline RISCVFrameLowering::adjustReg out of existance [nfc]
This was requested by a reviewer in D138926.
2022-11-30 11:07:45 -08:00
Philip Reames c0692c08ee [RISCV] Adjust code to fallthrough to a single adjustReg callsite [nfc]
Note that we have to now pass alignment to that callsite because the wrapper previously did that for us for fixed offsets.
2022-11-30 10:45:55 -08:00
Philip Reames 1f04ac54f9 [RISCV] Merge two versions of adjustReg on TRI [nfc]
After ac1ec9e, the version with the StackOffset param has a strict superset of behavior.  As a result, we can switch callers to use it, and then inline the other version into the now-single caller.
2022-11-30 10:12:40 -08:00
Philip Reames 80fcf992b7 [RISCV] Reuse and generalize adjustReg from another spot in frame lowering [nfc]
Differential Revision: https://reviews.llvm.org/D138926
2022-11-30 09:43:14 -08:00
Philip Reames ac1ec9e290 [RISCV] Share code for fixed offsets adjustRegs (thus materializing fewer constants)
This reuses the existing optimized implementation of adjustReg, and commons up code. This has the effect of enabling two code changes for the new caller. First, we enable the "split andi" lowering (with no alignment requirement), and second we use a sub with smaller constant in register instead of a add with negative constant in register.

Differential Revision: https://reviews.llvm.org/D132839
2022-11-30 09:28:29 -08:00
Philip Reames 1a5be5265c [RISCV] Move implementation of adjustReg from frame lowering to register info [nfc]
Putting both variants of this function in the same place, in advance of code resuse.  Note that I tweaked the API slightly in advance of additional callers without the alignment requirement.  Some of the existing callers may also be okay with weaker alignment requirements, but that should be it's own set of changes.
2022-11-28 12:41:00 -08:00
Philip Reames 06e2b44c46 [RISCV] Optimize scalable frame setup when VLEN is precisely known
If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub.

We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch.

Differential Revision: https://reviews.llvm.org/D137593
2022-11-18 15:30:39 -08:00
Craig Topper 2c82080f09 [MachineFrameInfo][RISCV] Call ensureStackAlignment for objects created with scalable vector stack id.
This is an alternative to fix PR57939 for RISC-V. It definitely
can be argued that the stack temporaries for RISC-V are being created
with an unnecessarily large alignment. But ignoring the alignment
in MachineFrameInfo also seems bad.

Looking at the test update that go with the current ID==0 check,
it was intending to exclude things like the NoAlloc stackid. So I'm
not sure if scalable vectors are intentionally being excluded.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135913
2022-10-20 14:05:46 -07:00
Craig Topper 31bca38ad1 [RISCV] Pass the destination register to getVLENFactoredAmount instead of returning it. NFC
This is a refactor for another patch. For now we move the vreg
creation to the caller.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135008
2022-10-03 10:59:35 -07:00
ZHU Zijia 9c85382ade [RISCV] Handle register spill in branch relaxation
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D130560
2022-08-24 13:27:56 +08:00
Kazu Hirata f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
Alex Bradbury 5ad59c9e59 [RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions
* TargetFrameLowering has a TransientStackAlignment field that "returns
  the number of bytes to which the stack pointer must be aligned at all
  times, even between calls.
  * As explained in the [RISC-V calling
    convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc),
    the stack pointer must remain fully aligned throughout execution for
    compliant code. This is important for embedded targets that might avoid
    realigning the stack pointer for interrupt service routines. Systems
    running full OSes may always realign the stack anyway.
* TransientStackAlignment is used in estimateStackSize in
  MachineFrameInfo and in PEI::calculateFrameObjectOffsets.
  * estimateStackSize is only used in the RISC-V backend for scavenging
    slots. It may be possible to craft a function where the difference
    is observable, but it wouldn't be a meaningful test.
  * calculateFrameObjectOffsets makes use of TransientStackAlignment,
    but then sets the stack alignment to the max of that alignment and
    MaxAlign, which is unconditionally set to 16 in
    RISCVFrameLowering::processFunctionBeforeFrameFinalized
  * I've changed this logic to only set MaxAlign if there are RVV frame
    objects. There should be no functional change here for either RVV
    targets (MaxAlign is set as before) or non-RVV targets
    (TransientStackAlign is now 16 anyway).

Differential Revision: https://reviews.llvm.org/D130068
2022-08-02 09:46:06 +01:00
Fraser Cormack b336cf856e [RISCV] Add early-exit to RVV stack computation. NFCI.
This patch was split off from D126465, where an early-exit is necessary
as it checks the VLEN and that asserts that V instructions are present.

Since this makes logical sense on its own, I think it's worth landing
regardless of D126465.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D129617
2022-07-13 08:50:08 +01:00
luxufan 0f45eaf0da [RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.

The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.

Differential Revision: https://reviews.llvm.org/D128188
2022-07-03 20:18:13 +08:00
Yeting Kuo 5744b9cb79 [RISCV] Restore "Enable shrink wrap by default"
This reverts commit 7af3d4ab3d.

RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed
by D123679, the commit re-enable it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D128965
2022-07-02 11:13:13 +08:00
Craig Topper d63b66840f [RISCV] Move some methods out of RISCVInstrInfo and into RISCV namespace.
These methods don't access any state from RISCVInstrInfo. Make them
free functions in the RISCV namespace.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D127583
2022-06-12 10:47:21 -07:00
Kito Cheng 4b11f90903 [RISCV] Fix missing stack pointer recover
In order to make sure the stack point is right through the EH region,
we also need to restore stack pointer from the frame pointer if we
don't preserve stack space within prologue/epilogue for outgoing variables,
normally it's just checking the variable sized object is present or not
is enough, but we also don't preserve that at prologue/epilogue when
have vector objects in stack.

Example to show what happened:
```
try {
  sp adjust for outgoing args. // 1. Sp changed.
  func_call  // 2. Exception raised
  sp restore // Oh, not restored
} catch {
  // 3. And now we are here.
}

// 4. Prepare to return!, restore return address from stack, but...sp is wrong.
// 5. Screw up!
```

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D126861
2022-06-09 23:38:50 +08:00
Craig Topper 1b2de79ff4 [RISCV] Use two ADDIs to do some stack pointer adjustments.
If the adjustment doesn't fit in 12 bits, try to break it into
two 12 bit values before falling back to movImm+add/sub.

This is based on a similar idea from isel.

Reviewed By: luismarques, reames

Differential Revision: https://reviews.llvm.org/D126392
2022-05-31 10:25:28 -07:00
Philip Reames d58cc0839e [RISCV] reorganize getFrameIndexReference to reduce code duplication [nfc]
This change reorganizes the majority of frame index resolution into a two strep process.

    Step 1 - Select which base register we're going to use.
    Step 2 - Compute the offset from that base register.

The key point is that this allows us to share the step 2 logic for the SP case. This reduces the code duplication, and (I think) makes the code much easier to follow.

I also went ahead and added assertions into phase 2 to catch errors where we select an illegal base pointer. In general, we can't index from a base register to a stack location if that requires crossing a variable and unknown region. In practice, we have two such cases: dynamic stack realign and var sized objects. Note that crossing the scalable region is fine since while variable, it's a known variability which can be expressed in the offset.

Differential Revision: https://reviews.llvm.org/D126403
2022-05-26 09:44:58 -07:00
Philip Reames dd336b6891 [RISCV] Restructure comment and add clarifying assert to getFrameIndexReference [NFC]
Differential Revision: https://reviews.llvm.org/D126088
2022-05-25 07:59:27 -07:00
Fraser Cormack fd93736657 [RISCV] Replace untested code with assert
We found untested code where negative frame indices were ostensibly
handled despite it being in a block guarded by !MFI.isFixedObjectIndex.

While the implementation of MachineFrameInfo::isFixedObjectIndex
suggests this is possible (i.e., if a frame index was more negative - less than the
number of fixed objects), I couldn't find any test in tree -- for any
target -- where a negative frame index wasn't also a fixed object
offset. I couldn't find a way of creating such a object with the
public MachineFrameInfo creation APIs. Even
MachineFrameInfo::getObjectIndexBegin starts counting at the negative
number of fixed objects, so such frame indices wouldn't be covered by
loops using the provided begin/end methods.

Given all this, an assert that any object encountered in the block is
non-negative seems reasonable.

Reviewed By: StephenFan, kito-cheng

Differential Revision: https://reviews.llvm.org/D126278
2022-05-25 05:03:53 +01:00
Fraser Cormack 08c9fb8447 [RISCV] Ensure the entire stack is aligned to the RVV stack alignment
This patch fixes another bug in the RVV frame lowering. While some frame
objects with non-default stack IDs (such scalable-vector alloca
instructions) are considered in the target-independent max alignment
calculations, others (for example, during calling-convention lowering)
are not. This means we'd occasionally align the base of the stack to
only 16 bytes, with no way to ensure that the RVV section contained
within that is aligned to anything higher.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D125973
2022-05-24 06:58:51 +01:00
Fraser Cormack cb8681a2b3 [RISCV] Fix RVV stack frame alignment bugs
This patch addresses several alignment issues in the stack frame when
RVV objects are taken into account.

One bug is that the RVV stack was never guaranteed to keep the alignment
of the stack *as a whole*. We must maintain a 16-byte aligned stack at
all times, especially when calling other functions. With the standard V
extension, this is conveniently happening since VLEN is at least 128 and
always 16-byte aligned. However, we support Zvl64b which does not
guarantee this. To fix this, the RVV stack size is rounded up to be
aligned to 16 bytes. This in practice generally makes us allocate a
stack sized at least 2*VLEN in size, and a multiple of 2.

    |------------------------------| -- <-- FP
    | 8-byte callee-save           | |      |
    |------------------------------| |      |
    | one VLENB-sized RVV object   | |      |
    |------------------------------| |      |
    | 8-byte local variable        | |      |
    |------------------------------| -- <-- SP (must be aligned to 16)

In the example above, with Zvl64b we are decrementing SP by 12 bytes
which does not leave SP correctly aligned. We therefore introduce an
extra VLENB-sized amount used for alignment. This would therefore ensure
the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc):

    |------------------------------| -- <-- FP
    | 8-byte callee-save           | |      |
    |------------------------------| |      |
    | one VLENB-sized padding obj  | |      |
    | one VLENB-sized RVV object   | |      |
    |------------------------------| |      |
    | 8-byte local variable        | |      |
    |------------------------------| -- <-- SP

A new RVV invariant has been introduced in this patch, which is that the
base of the RVV stack itself is now always aligned to 16 bytes, not 8 as
before. This keeps us more in line with the scalar stack and should be
easier to reason about. The calculation of the RVV padding has thus
changed to be the amount required to align the scalar local variable
section to the RVV section's alignment. This amount is further rounded
up when setting up the initial stack to keep everything aligned:

    |------------------------------| -- <-- FP
    | 8-byte callee-save           |
    |------------------------------|
    |                              |
    | RVV objects                  |
    | (aligned to at least 16)     |
    |                              |
    |------------------------------|
    | RVV padding of 8 bytes       |
    |------------------------------|
    | 8-byte local variable        |
    |------------------------------| -- <-- SP

In the example above, it's clear that we need 8 bytes of padding to keep
the RVV section aligned to 16 when using SP. But to keep SP *itself*
aligned to 16 we can't decrement the initial stack pointer by 24 - we
have to round up to 32.

With the RVV section correctly aligned, the second bug fixed by
this patch is that RVV objects themselves are now correctly aligned. We
were previously only guaranteeing an alignment of 8 bytes, even if they
required a higher alignment. This is relatively simple and in practice
we see more rounding up of VLEN amounts to account for alignment in
between objects:

    |------------------------------|
    | RVV object (aligned to 16)   |
    |------------------------------|
    | no padding necessary         |
    |------------------------------|
    | 2*VLENB RVV object (align 16)|
    |------------------------------|
    | VLENB alignment padding      |
    |------------------------------|
    | RVV object (align 32)        |
    |------------------------------|
    | 3*VLENB alignment padding    |
    |------------------------------|
    | VLENB RVV object (align 32)  |
    |------------------------------| -- <-- base of RVV section

Note that a lot of the regressions in codegen owing to the new alignment
rules are correct but actually only strictly necessary for Zvl64b (and
Zvl32b but that's not really supported). I plan a follow-up patch to
take the known VLEN into account when padding for alignment.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D125787
2022-05-24 06:53:51 +01:00
Fraser Cormack d60ae47f9d [RISCV] Fix logic for determining RVV stack padding
We must add padding when using SP or BP to access stack objects.
Checking whether we're missing FP is not sufficient as stack realignment
uses SP too. The test in D125962 explains the specific issue in more
detail.

Split from D125787.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D125964
2022-05-20 13:18:52 +01:00
Fraser Cormack f00f894d5d [RISCV][NFC] Reword split SP adjustment comments 2022-05-17 10:03:21 +01:00
Fraser Cormack 05ad4d4f38 [RISCV][NFC] Fix comment typos in split SP adjustment 2022-05-17 09:56:54 +01:00
Fraser Cormack 27c7e922fe [RISCV][NFC] Rename variable to appease code style 2022-05-11 12:41:25 +01:00
Fraser Cormack 874b802a6d [RISCV][NFC] Move variable down closer to its first use 2022-05-11 12:33:01 +01:00
luxufan e098281c27 [RISCV] Don't getDebugLoc for the end node of MBB iterator
Because of shrink wrapping, the block to insert epilog may don't have
instructions (Only debug instructions). And the position to insert may
point to MBB.end() that don't have a DebugLoc. This patch fix this
problem.

The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123679
2022-04-30 16:00:20 +08:00
Kito Cheng 9c5aedfbf5 [RISCV] Fixing stack offset for RVV object with vararg in stack.
We found LLVM generate wrong stack offset for RVV object when stack
having variable argument, that cause by we didn't count vaarg part during
calculate RVV stack objects.

Also update the stack layout diagram for including vaarg in the diagram.

Stack layout ref:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L3941

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D123180
2022-04-08 12:01:16 +08:00
Dimitry Andric 7af3d4ab3d Revert "[RISCV] Enable shrink wrap by default"
This reverts commit 5ebdb07e7e.

Enabling shrink wrap by default can cause assertions or crashes, and
these should first be investigated and fixed. For now, reverting the
change so it can be cherry-picked into 14.0.0 is the safest choice.
2022-02-12 19:04:12 +01:00
Hsiangkai Wang ad06e65dc4 [RISCV] Fix the bug in the register allocator caused by reserved BP.
Originally, hasRVVFrameObject() will scan all the stack objects to check
whether if there is any scalable vector object on the stack or not.
However, it causes errors in the register allocator. In issue 53016, it
returns false before RA because there is no RVV stack objects. After RA,
it returns true because there are spilling slots for RVV values during RA.
The compiler will not reserve BP during register allocation and generate BP
access in the PEI pass due to the inconsistent behavior of the function.

The function is changed to use hasStdExtV() as the return value. It is
not precise, but it can make the register allocation correct.

Refer to https://github.com/llvm/llvm-project/issues/53016.

Differential Revision: https://reviews.llvm.org/D117663
2022-01-21 01:23:01 +00:00
Hsiangkai Wang 9a88566537 [RISCV] Fix a bug in RISCVFrameLowering.
When we have out-going arguments passing through stack and we do not
reserve the stack space in the prologue. Use BP to access stack objects
after adjusting the stack pointer before function calls.

callseq_start  ->  sp = sp - reserved_space
//
// Use FP to access fixed stack objects.
// Use BP to access non-fixed stack objects.
//
call @foo
callseq_end    ->  sp = sp + reserved_space

Differential Revision: https://reviews.llvm.org/D114246
2021-11-30 10:39:35 +08:00
Hsiangkai Wang 137d3474ca [RISCV] Reverse the order of loading/storing callee-saved registers.
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.

Differential Revision: https://reviews.llvm.org/D113967
2021-11-22 23:02:11 +08:00