Commit Graph

137 Commits

Author SHA1 Message Date
zhoujing 87fe5f3ce8 [VENTUS][fix] Put local variables declared in kernel function into shared memory 2024-03-05 16:32:59 +08:00
qinfan c42c00f67e [VENTUS][fix] Modified the resource statistics interface
1. The origin interface will not be called under the -O0 optimization.
2. New interfaces added to epilogue pass.
2024-03-04 15:44:30 +08:00
zhoujingya 49c039a902
Merge pull request #89 from THU-DSP-LAB/eliminate_call_frame
[VENTUS][fix] Fix framelowering and calculation method of stack offset
2024-02-01 14:54:42 +08:00
qinfan 93c99240db [VENTUS][fix] Fix the calculation of stack size
Fix the calculation of stack size.
2023-12-25 13:26:25 +08:00
qinfan d809d3a2bd [VENTUS][fix] Fix the Offset of private variable offset on stack
Fix the Offset of private variable offset on stack.
2023-12-22 16:47:19 +08:00
qinfan 755797e27c [VENTUS][fix] Fix framelowering and calculation method of stack offset
1. Add VMV_V_X in emitEpilogue.
2. Change all the positive numbers added by TP to negative numbers(in LowerCall).
3. Fix the LowerCall function to generate correct store instruction transferring the function parameters.
4. Fix hasReservedCallFrame function to return false.
5. Align the convention between caller and callee in the case of passing parameters by stack.
6. Change the stack offset calculation method of TP.
7. Unify the calculation of TP stack and SP stack offset.
8. Node that needing to manually modify the calculation of sp offset in the workitem.S. Since the growth direction of the stack is different from that of the traditional RISCV, it is now stipulated that for both the SP stack and the TP stack, the data is stored where the stack pointer is not offset.
9. There is a SPAdj check in eliminateFrameIndex function. but we don't need this value at all so that adding a getSPAdjust function to return zero.
10. V33 is a wrong value when parameters pushed to TP stack so there must be a MV instruction to refresh V33 after ADJCALLSTACKDOWN.
2023-12-20 17:03:01 +08:00
qinfan e35b2e4fed [VENTUS][fix] Distinguish the resource usage of each kernel function
Distinguish the resource usage of each kernel function in the same source file.
2023-12-14 17:18:20 +08:00
qinfan 304a2c1284 [VENTUS][fix] Fix the mechanism of statistical register resources
1. Fix the bug of repeated calculation of register resources.
2. Add resource calculation with stack register.
2023-11-28 11:29:13 +08:00
zhoujingya f9a20984b5 [VENTUS][fix] Comment out illegal fmv.w.x instruction and change vmv instructions' format
https://github.com/THU-DSP-LAB/llvm-project/issues/30
2023-10-09 14:04:55 +08:00
zhoujingya dc3ffe70cf [VENTUS][fix] Fix getStackSize calculation bugs 2023-09-14 15:52:08 +08:00
zhoujing 6491bdfb02 Revert "[VENTUS][fix] No need to spill/restore callee saved registers for kernel function"
This reverts commit 85df9000bb.
2023-08-27 15:47:26 +08:00
zhoujing 85df9000bb [VENTUS][fix] No need to spill/restore callee saved registers for kernel function 2023-08-21 14:44:45 +08:00
zhoujing 826c4cb599 Revert "[VENTUS][fix] Insert barrier instruction for function calling"
This reverts commit 7e4b7a6ae1.
2023-08-16 14:50:42 +08:00
zhoujing 50b23dc21a [VENTUS][fix] Deprecating vmv.s.x and use vmv.v.x instead
As required, vmv.s.x instruction may will later be deprecated
2023-08-01 13:25:24 +08:00
zhoujing 7e4b7a6ae1 [VENTUS][fix] Insert barrier instruction for function calling
Stack space is shared between different warps, if two warps are executing
different functions, then the access to the return address will conflict,
which will lead the warp executing faster can not find the return address,
so we would like to add a barrier instruction after the lw and before the ret,
to ensure that the warps have the same scope of the sp pointer
2023-07-31 11:01:14 +08:00
zhoujing 98474922a4 [VENTUS][fix] Add LDS/PDS calculation
Later need to fix the local data declaration calculation
2023-07-27 11:58:31 +08:00
zhoujing 623ca8b4ba [VENTUS][RISCV][fix] Fix stack size calculation bug 2023-07-21 18:02:33 +08:00
zhoujing 24dbcd9b0e [VENTUS][RISCV][NFC] Define interfaces for VENTUS
Our previous design has two stacks, TP&SP, but we only need to store ra to sp,
and restore it from sp, this make it inconvenient to calculate stack offset for
two stack frame offset,  Here we just define interfaces, but we do not really
implement it, if needed, we need to remove callee saved registers, and modify
the related overrided functions
2023-06-28 11:19:29 +08:00
zhoujing 7b8402802a [VENTUS][RISCV][fix] Fix calling convention 2023-06-25 22:03:04 +08:00
zhoujing f494e20d44 [VENTUS][RISCV][fix] Fix private memory access instructions' codegen errors
We changed the private memory access' encoding in this commit `6da666856b`,
this commit is to fix the codegen bugs by that commit
2023-06-25 10:59:21 +08:00
Aries e6b7935c89 [Ventus] ABI and stack adjustment.
Remove all SGPRs(except ra) from callee saved register set, as they are mainly used in kernel function.
Unify the stack to use TP only, we will emit customized instructions for SP use which should not be
considered as stack according to LLVM codegen infrastructure(only 1 stack is allowed).
By unifying the stack to TP based, it is much easiler for the backend codegen.
2023-06-21 13:08:02 +08:00
zhoujing 513412bb33 [VENTUS][RISCV][fix] Fix building libclc errors 2023-06-16 17:42:22 +08:00
zhoujing 6636793f64 Merge libclc-vector-support 2023-06-16 09:41:08 +08:00
zhoujing c30c837caa [VENTUS][RISCV][fix] Fix SP stack size calculation error 2023-06-15 18:12:34 +08:00
zhoujing c60810b243 [VENTUS][RISCV][feat] Modify SP stack size calculation
Add initial SP stack size calculation support, still remains many issues
2023-06-12 13:27:55 +08:00
zhoujing faf6a0bcd9 [VENTUS][RISCV][fix] Add initial Tp stack size calculation
Cause there are two stacks in Ventus, we need to seperate TP stack and SP stack,
this commit just add very initial support for TP stack size calculation
2023-06-11 12:18:39 +08:00
zhoujing 033505de1d [VENTUS][RISCV][fix] Modify calling convention 2023-06-05 17:11:25 +08:00
zhoujing 967cb725c8 [VENTUS][RISCV][feat] Set ventus kernel for OpenCL kernel functions 2023-06-05 13:10:35 +08:00
zhoujingya 9d9283fa7b [VENTUS][RISCV][fix] Fix ventus abi and calling convention
Kernel functions use sp as GPRs spill stack slots
Non-kernel functions use tp as VGPRs spill stack slots
2023-04-20 15:27:52 +08:00
zhoujingya f28e6c5e38 [VENTUS][RISCV][feat] Add vararg backend support in ventus
We adjust the stack growing direction early months for OpenCL, in order to be
compatible with current architecture, we need to do some modification to
support vararg
2023-04-18 10:03:53 +08:00
Aries 438f1c92c4 Fix some build warnings 2023-01-19 09:45:27 +08:00
Aries a173844ae5 Grow Ventus GPGPU stack upwards instead of downwards 2023-01-04 10:29:53 +08:00
Aries 9925e4e511 Define callee saved registers for Ventus GPGPU.
Initially implemented 2 stacks support for sGPR spill/restore stack and per-thread stack,
but stack size calculation is computed as a sum of 2 stacks(this works but wastes lot of
spaces).
Now TP register is used as per-thread stack pointer, SP register is used for sGPR spill/restore.
Clean up RVV related stack frame code etc.
2022-12-28 16:37:38 +08:00
Aries 424ea45e4f Update Ventus GPGPU ABI: X4 as stack pointer, V0-V31 as arguments registers etc 2022-12-28 13:11:22 +08:00
Aries 228be521e5 Add initial different stack frame support for sALU and vALU.
FIXME: The stack pointer RISCV::X4 for vALU is not yet correctly used, but related infrastructure
should work(MFI.isEntryFunction() is used to check RISCV::X2 or RISCV::X4 to be used as stack pointer).
2022-12-27 18:28:51 +08:00
Aries 8c531048c2 Initially add vector load/store instruction and related codegen 2022-12-21 16:27:39 +08:00
Philip Reames 14d993435b [RISCV] Inline RISCVFrameLowering::adjustReg out of existance [nfc]
This was requested by a reviewer in D138926.
2022-11-30 11:07:45 -08:00
Philip Reames c0692c08ee [RISCV] Adjust code to fallthrough to a single adjustReg callsite [nfc]
Note that we have to now pass alignment to that callsite because the wrapper previously did that for us for fixed offsets.
2022-11-30 10:45:55 -08:00
Philip Reames 1f04ac54f9 [RISCV] Merge two versions of adjustReg on TRI [nfc]
After ac1ec9e, the version with the StackOffset param has a strict superset of behavior.  As a result, we can switch callers to use it, and then inline the other version into the now-single caller.
2022-11-30 10:12:40 -08:00
Philip Reames 80fcf992b7 [RISCV] Reuse and generalize adjustReg from another spot in frame lowering [nfc]
Differential Revision: https://reviews.llvm.org/D138926
2022-11-30 09:43:14 -08:00
Philip Reames ac1ec9e290 [RISCV] Share code for fixed offsets adjustRegs (thus materializing fewer constants)
This reuses the existing optimized implementation of adjustReg, and commons up code. This has the effect of enabling two code changes for the new caller. First, we enable the "split andi" lowering (with no alignment requirement), and second we use a sub with smaller constant in register instead of a add with negative constant in register.

Differential Revision: https://reviews.llvm.org/D132839
2022-11-30 09:28:29 -08:00
Philip Reames 1a5be5265c [RISCV] Move implementation of adjustReg from frame lowering to register info [nfc]
Putting both variants of this function in the same place, in advance of code resuse.  Note that I tweaked the API slightly in advance of additional callers without the alignment requirement.  Some of the existing callers may also be okay with weaker alignment requirements, but that should be it's own set of changes.
2022-11-28 12:41:00 -08:00
Philip Reames 06e2b44c46 [RISCV] Optimize scalable frame setup when VLEN is precisely known
If we know the exact value of VLEN, the frame offset adjustment for scalable stack slots becomes a fixed constant. This avoids the need to read vlenb, and may allow the offset to be folded into the immediate field of an add/sub.

We could go further here, and fold the offset into a single larger frame adjustment - instead of having a separate scalable adjustment step - but that requires a bit more code reorganization. I may (or may not) return to that in a future patch.

Differential Revision: https://reviews.llvm.org/D137593
2022-11-18 15:30:39 -08:00
Craig Topper 2c82080f09 [MachineFrameInfo][RISCV] Call ensureStackAlignment for objects created with scalable vector stack id.
This is an alternative to fix PR57939 for RISC-V. It definitely
can be argued that the stack temporaries for RISC-V are being created
with an unnecessarily large alignment. But ignoring the alignment
in MachineFrameInfo also seems bad.

Looking at the test update that go with the current ID==0 check,
it was intending to exclude things like the NoAlloc stackid. So I'm
not sure if scalable vectors are intentionally being excluded.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135913
2022-10-20 14:05:46 -07:00
Craig Topper 31bca38ad1 [RISCV] Pass the destination register to getVLENFactoredAmount instead of returning it. NFC
This is a refactor for another patch. For now we move the vreg
creation to the caller.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D135008
2022-10-03 10:59:35 -07:00
ZHU Zijia 9c85382ade [RISCV] Handle register spill in branch relaxation
In branch relaxation pass, `j`'s with offset over 1MiB will be relaxed
to `jump` pseudo-instructions.

This patch allocates a stack slot for functions with a size greater than
1MiB. If the register scavenger cannot find a scratch register for
`jump`, spill a register to the slot before the jump and restore it
after the jump.

.mbb:
        foo
        j       .dest_bb
        bar
        bar
        bar
.dest_bb:
        baz

The above code will be relaxed to the following code.

.mbb:
        foo
        sd      s11, 0(sp)
        jump    .restore_bb, s11
        bar
        bar
        bar
        j       .dest_bb
.restore_bb:
        ld      s11, 0(sp)
.dest_bb:
        baz

Depends on D129999.

Reviewed By: StephenFan

Differential Revision: https://reviews.llvm.org/D130560
2022-08-24 13:27:56 +08:00
Kazu Hirata f5a68feab3 Use llvm::none_of (NFC) 2022-08-14 16:25:39 -07:00
Alex Bradbury 5ad59c9e59 [RISCV][NFCI] Set TransientStackAlignment and rely on it rather than RVV-specific logic on RVV-less functions
* TargetFrameLowering has a TransientStackAlignment field that "returns
  the number of bytes to which the stack pointer must be aligned at all
  times, even between calls.
  * As explained in the [RISC-V calling
    convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc),
    the stack pointer must remain fully aligned throughout execution for
    compliant code. This is important for embedded targets that might avoid
    realigning the stack pointer for interrupt service routines. Systems
    running full OSes may always realign the stack anyway.
* TransientStackAlignment is used in estimateStackSize in
  MachineFrameInfo and in PEI::calculateFrameObjectOffsets.
  * estimateStackSize is only used in the RISC-V backend for scavenging
    slots. It may be possible to craft a function where the difference
    is observable, but it wouldn't be a meaningful test.
  * calculateFrameObjectOffsets makes use of TransientStackAlignment,
    but then sets the stack alignment to the max of that alignment and
    MaxAlign, which is unconditionally set to 16 in
    RISCVFrameLowering::processFunctionBeforeFrameFinalized
  * I've changed this logic to only set MaxAlign if there are RVV frame
    objects. There should be no functional change here for either RVV
    targets (MaxAlign is set as before) or non-RVV targets
    (TransientStackAlign is now 16 anyway).

Differential Revision: https://reviews.llvm.org/D130068
2022-08-02 09:46:06 +01:00
Fraser Cormack b336cf856e [RISCV] Add early-exit to RVV stack computation. NFCI.
This patch was split off from D126465, where an early-exit is necessary
as it checks the VLEN and that asserts that V instructions are present.

Since this makes logical sense on its own, I think it's worth landing
regardless of D126465.

Reviewed By: kito-cheng

Differential Revision: https://reviews.llvm.org/D129617
2022-07-13 08:50:08 +01:00
luxufan 0f45eaf0da [RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.

The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.

Differential Revision: https://reviews.llvm.org/D128188
2022-07-03 20:18:13 +08:00