* TargetFrameLowering has a TransientStackAlignment field that "returns
the number of bytes to which the stack pointer must be aligned at all
times, even between calls.
* As explained in the [RISC-V calling
convention](https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc),
the stack pointer must remain fully aligned throughout execution for
compliant code. This is important for embedded targets that might avoid
realigning the stack pointer for interrupt service routines. Systems
running full OSes may always realign the stack anyway.
* TransientStackAlignment is used in estimateStackSize in
MachineFrameInfo and in PEI::calculateFrameObjectOffsets.
* estimateStackSize is only used in the RISC-V backend for scavenging
slots. It may be possible to craft a function where the difference
is observable, but it wouldn't be a meaningful test.
* calculateFrameObjectOffsets makes use of TransientStackAlignment,
but then sets the stack alignment to the max of that alignment and
MaxAlign, which is unconditionally set to 16 in
RISCVFrameLowering::processFunctionBeforeFrameFinalized
* I've changed this logic to only set MaxAlign if there are RVV frame
objects. There should be no functional change here for either RVV
targets (MaxAlign is set as before) or non-RVV targets
(TransientStackAlign is now 16 anyway).
Differential Revision: https://reviews.llvm.org/D130068
This patch was split off from D126465, where an early-exit is necessary
as it checks the VLEN and that asserts that V instructions are present.
Since this makes logical sense on its own, I think it's worth landing
regardless of D126465.
Reviewed By: kito-cheng
Differential Revision: https://reviews.llvm.org/D129617
Computing scalable offset needs up to two scrach registers. We add
scavenge spill slots according to the result of `RISCV::isRVVSpill`
and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`,
PEI doesn't add scavenge spill slots for scrach registers when using
ADDI to get scalable stack offsets.
The ADDI instruction has a destination register which can be used as
a scrach register. So one scavenge spil slot is sufficient for
computing scalable stack offsets.
Differential Revision: https://reviews.llvm.org/D128188
This reverts commit 7af3d4ab3d.
RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed
by D123679, the commit re-enable it.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D128965
These methods don't access any state from RISCVInstrInfo. Make them
free functions in the RISCV namespace.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D127583
In order to make sure the stack point is right through the EH region,
we also need to restore stack pointer from the frame pointer if we
don't preserve stack space within prologue/epilogue for outgoing variables,
normally it's just checking the variable sized object is present or not
is enough, but we also don't preserve that at prologue/epilogue when
have vector objects in stack.
Example to show what happened:
```
try {
sp adjust for outgoing args. // 1. Sp changed.
func_call // 2. Exception raised
sp restore // Oh, not restored
} catch {
// 3. And now we are here.
}
// 4. Prepare to return!, restore return address from stack, but...sp is wrong.
// 5. Screw up!
```
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D126861
If the adjustment doesn't fit in 12 bits, try to break it into
two 12 bit values before falling back to movImm+add/sub.
This is based on a similar idea from isel.
Reviewed By: luismarques, reames
Differential Revision: https://reviews.llvm.org/D126392
This change reorganizes the majority of frame index resolution into a two strep process.
Step 1 - Select which base register we're going to use.
Step 2 - Compute the offset from that base register.
The key point is that this allows us to share the step 2 logic for the SP case. This reduces the code duplication, and (I think) makes the code much easier to follow.
I also went ahead and added assertions into phase 2 to catch errors where we select an illegal base pointer. In general, we can't index from a base register to a stack location if that requires crossing a variable and unknown region. In practice, we have two such cases: dynamic stack realign and var sized objects. Note that crossing the scalable region is fine since while variable, it's a known variability which can be expressed in the offset.
Differential Revision: https://reviews.llvm.org/D126403
We found untested code where negative frame indices were ostensibly
handled despite it being in a block guarded by !MFI.isFixedObjectIndex.
While the implementation of MachineFrameInfo::isFixedObjectIndex
suggests this is possible (i.e., if a frame index was more negative - less than the
number of fixed objects), I couldn't find any test in tree -- for any
target -- where a negative frame index wasn't also a fixed object
offset. I couldn't find a way of creating such a object with the
public MachineFrameInfo creation APIs. Even
MachineFrameInfo::getObjectIndexBegin starts counting at the negative
number of fixed objects, so such frame indices wouldn't be covered by
loops using the provided begin/end methods.
Given all this, an assert that any object encountered in the block is
non-negative seems reasonable.
Reviewed By: StephenFan, kito-cheng
Differential Revision: https://reviews.llvm.org/D126278
This patch fixes another bug in the RVV frame lowering. While some frame
objects with non-default stack IDs (such scalable-vector alloca
instructions) are considered in the target-independent max alignment
calculations, others (for example, during calling-convention lowering)
are not. This means we'd occasionally align the base of the stack to
only 16 bytes, with no way to ensure that the RVV section contained
within that is aligned to anything higher.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D125973
This patch addresses several alignment issues in the stack frame when
RVV objects are taken into account.
One bug is that the RVV stack was never guaranteed to keep the alignment
of the stack *as a whole*. We must maintain a 16-byte aligned stack at
all times, especially when calling other functions. With the standard V
extension, this is conveniently happening since VLEN is at least 128 and
always 16-byte aligned. However, we support Zvl64b which does not
guarantee this. To fix this, the RVV stack size is rounded up to be
aligned to 16 bytes. This in practice generally makes us allocate a
stack sized at least 2*VLEN in size, and a multiple of 2.
|------------------------------| -- <-- FP
| 8-byte callee-save | | |
|------------------------------| | |
| one VLENB-sized RVV object | | |
|------------------------------| | |
| 8-byte local variable | | |
|------------------------------| -- <-- SP (must be aligned to 16)
In the example above, with Zvl64b we are decrementing SP by 12 bytes
which does not leave SP correctly aligned. We therefore introduce an
extra VLENB-sized amount used for alignment. This would therefore ensure
the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc):
|------------------------------| -- <-- FP
| 8-byte callee-save | | |
|------------------------------| | |
| one VLENB-sized padding obj | | |
| one VLENB-sized RVV object | | |
|------------------------------| | |
| 8-byte local variable | | |
|------------------------------| -- <-- SP
A new RVV invariant has been introduced in this patch, which is that the
base of the RVV stack itself is now always aligned to 16 bytes, not 8 as
before. This keeps us more in line with the scalar stack and should be
easier to reason about. The calculation of the RVV padding has thus
changed to be the amount required to align the scalar local variable
section to the RVV section's alignment. This amount is further rounded
up when setting up the initial stack to keep everything aligned:
|------------------------------| -- <-- FP
| 8-byte callee-save |
|------------------------------|
| |
| RVV objects |
| (aligned to at least 16) |
| |
|------------------------------|
| RVV padding of 8 bytes |
|------------------------------|
| 8-byte local variable |
|------------------------------| -- <-- SP
In the example above, it's clear that we need 8 bytes of padding to keep
the RVV section aligned to 16 when using SP. But to keep SP *itself*
aligned to 16 we can't decrement the initial stack pointer by 24 - we
have to round up to 32.
With the RVV section correctly aligned, the second bug fixed by
this patch is that RVV objects themselves are now correctly aligned. We
were previously only guaranteeing an alignment of 8 bytes, even if they
required a higher alignment. This is relatively simple and in practice
we see more rounding up of VLEN amounts to account for alignment in
between objects:
|------------------------------|
| RVV object (aligned to 16) |
|------------------------------|
| no padding necessary |
|------------------------------|
| 2*VLENB RVV object (align 16)|
|------------------------------|
| VLENB alignment padding |
|------------------------------|
| RVV object (align 32) |
|------------------------------|
| 3*VLENB alignment padding |
|------------------------------|
| VLENB RVV object (align 32) |
|------------------------------| -- <-- base of RVV section
Note that a lot of the regressions in codegen owing to the new alignment
rules are correct but actually only strictly necessary for Zvl64b (and
Zvl32b but that's not really supported). I plan a follow-up patch to
take the known VLEN into account when padding for alignment.
Reviewed By: StephenFan
Differential Revision: https://reviews.llvm.org/D125787
We must add padding when using SP or BP to access stack objects.
Checking whether we're missing FP is not sufficient as stack realignment
uses SP too. The test in D125962 explains the specific issue in more
detail.
Split from D125787.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D125964
Because of shrink wrapping, the block to insert epilog may don't have
instructions (Only debug instructions). And the position to insert may
point to MBB.end() that don't have a DebugLoc. This patch fix this
problem.
The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123679
This reverts commit 5ebdb07e7e.
Enabling shrink wrap by default can cause assertions or crashes, and
these should first be investigated and fixed. For now, reverting the
change so it can be cherry-picked into 14.0.0 is the safest choice.
Originally, hasRVVFrameObject() will scan all the stack objects to check
whether if there is any scalable vector object on the stack or not.
However, it causes errors in the register allocator. In issue 53016, it
returns false before RA because there is no RVV stack objects. After RA,
it returns true because there are spilling slots for RVV values during RA.
The compiler will not reserve BP during register allocation and generate BP
access in the PEI pass due to the inconsistent behavior of the function.
The function is changed to use hasStdExtV() as the return value. It is
not precise, but it can make the register allocation correct.
Refer to https://github.com/llvm/llvm-project/issues/53016.
Differential Revision: https://reviews.llvm.org/D117663
When we have out-going arguments passing through stack and we do not
reserve the stack space in the prologue. Use BP to access stack objects
after adjusting the stack pointer before function calls.
callseq_start -> sp = sp - reserved_space
//
// Use FP to access fixed stack objects.
// Use BP to access non-fixed stack objects.
//
call @foo
callseq_end -> sp = sp + reserved_space
Differential Revision: https://reviews.llvm.org/D114246
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.
Differential Revision: https://reviews.llvm.org/D113967
Add new hasVInstructions() which is currently equivalent.
Replace vector uses of hasStdExtZfh/F/D with new vector specific
versions. The vector spec no longer requires that the vectors implement the
same types as scalar. It only requires that the scalar type is
the maximum size the vectors can support. This is currently
implemented using the scalar rule we were using before.
Add new hasVInstructionsI64() begin using to qualify code that
requires i64 vector elements.
This is all NFC for now, but we can start using this to better
implement D112408 which introduces the Zve extensions.
Reviewed By: frasercrmck, eopXD
Differential Revision: https://reviews.llvm.org/D112496
When using FP to access stack objects, the scalable stack objects will
be put at the lower end of the frame. It looks like
```
|-------------------| <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars |
|-------------------| <-- SP
```
If there are scalar arguments that need to pass through memory and there
are vector objects on the stack using FP to access. The outgoing scalar
arguments will overwrite the vector objects. It looks like
```
|-------------------| <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------| |-------------------|
| RVV local vars | | outgoing args | <- outgoing arguments
|-------------------| <-- SP |-------------------| overwrite from here.
```
In this patch, we reserve the stack for the outgoing arguments before
function calls if using FP to access and there are scalable vector frame
objects. It looks like
```
|-------------------| <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars |
|-------------------|
| outgoing args |
|-------------------| <-- SP
```
Differential Revision: https://reviews.llvm.org/D103622
This patch addresses an issue in which fixed-length (VLS) vector RVV
code could fail to reserve an emergency spill slot for their frame index
elimination. This is because we were previously only reserving a spill
slot when there were `scalable-vector` frame indices being used.
However, fixed-length codegen uses regular-type frame indices if it
needs to spill.
This patch does the fairly brute-force method of checking ahead of time
whether the function contains any RVV spill instructions, in which case
it reserves one slot. Note that the second RVV slot is still only
reserved for `scalable-vector` frame indices.
This unfortunately causes quite a bit of churn in existing tests, where
we chop and change stack offsets for spill slots.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103269
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
Since ca5f07f8c4 already reverted
the cause for this warning, this commit now causes warnings about
a default label in a switch that covers the enum.
This reverts commit cf2eeb114c.
The MachineBasicBlock::iterator is continuously changing during
generating the frame handling instructions. We should use the DebugLoc
from the caller, instead of getting it from the changing iterator.
If the prologue instructions located in a basic block without any other
instructions after these prologue instructions, the iterator will be
updated to the boundary of the basic block and it is invalid to use the
iterator to access DebugLoc. This patch also fixes the crash when
accessing DebugLoc using the iterator.
Differential Revision: https://reviews.llvm.org/D102386
When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field
Differential Revision: https://reviews.llvm.org/D100286
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.
This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D100574
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.
This patch attempts to clarify the situation by separating them and introducing
new names:
- shouldRealignStack - true if there is any reason the stack should be
realigned
- canRealignStack - true if we are still able to realign the stack (e.g. we
can still reserve/have reserved a frame pointer)
- hasStackRealignment = shouldRealignStack && canRealignStack (not target
customisable)
Targets can now override shouldRealignStack to indicate that stack realignment
is required.
This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).
Differential Revision: https://reviews.llvm.org/D98716
Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
In D97111 we changed the RVV frame layout when using sp or bp to address
the stack slots so we could address the emergency stack slot. The idea
is to put the RVV objects as far as possible (in offset terms) from the
frame reference register (sp / fp / bp).
When using fp this happens naturally because the RVV objects are already
the top of the stack and due to the constraints of RVV (VLENB being a
power of two >= 128) the stack remains aligned. The rest of this summary
does not apply to this case.
When using sp / bp we need to skip the non-RVV stack slots. The size of
the the non-RVV objects is computed subtracting the callee saved
register size (whose computation is added in D97111 itself) to the total
size of the stack (which does not account for RVV stack slots). However,
when doing so we round to 16 bytes when computing that size and we end
emitting a smaller offset that may belong to a scalar stack slot (see
D98801). So this change removes that rounding.
Also, because we want the RVV objects be between the non-RVV stack slots
and the callee-saved register slots, we need to make sure the RVV
objects are properly aligned to 8 bytes. Adding a padding of 8 would
render the stack unaligned. So when allocating space for RVV (only when
we don't use fp) we need to have extra padding that preserves the stack
alignment. This way we can round to 8 bytes the offset that skips the
non-RVV objects and we do not misalign the whole stack in the way. In
some circumstances this means that the RVV objects may have padding
before (=lower offsets from sp/bp) and after (before the CSR stack
slots).
Differential Revision: https://reviews.llvm.org/D98802
This patch change the rvv frame layout that proposed in D94465. In patch D94465, In the eliminateFrameIndex function,
to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class
RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers,
we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency
spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error.
So I change the rvv frame layout as follows:
```
|--------------------------------------|
| arguments passed on the stack |
|--------------------------------------|<--- fp
| callee saved registers |
|--------------------------------------|
| rvv vector objects(local variables |
| and outgoing arguments |
|--------------------------------------|
| realignment field |
|--------------------------------------|
| scalar local variable(also contains|
| emergency spill slot) |
|--------------------------------------|<--- bp
| variable-sized local variables |
|--------------------------------------|<--- sp
```
Differential Revision: https://reviews.llvm.org/D97111
This patch proposes how to deal with RISC-V vector frame objects. The
layout of RISC-V vector frame will look like
|---------------------------------|
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there is realignment or variable length array in the stack, we will use
frame pointer to access fixed objects and stack pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------|
| scalar outgoing arguments |
|---------------------------------|
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------| <- end of frame (sp)
If there are both realignment and variable length array in the stack, we
will use frame pointer to access fixed objects and base pointer to access
non-fixed objects.
|---------------------------------| <- frame pointer (fp)
| scalar callee-saved registers |
|---------------------------------|
| scalar local variables |
|---------------------------------|
| ///// realignment ///// |
|---------------------------------| <- base pointer (bp)
| RVV local variables && |
| RVV outgoing arguments |
|---------------------------------|
| /////////////////////////////// |
| variable length array |
| /////////////////////////////// |
|---------------------------------| <- end of frame (sp)
| scalar outgoing arguments |
|---------------------------------|
In this version, we do not save the addresses of RVV objects in the
stack. We access them directly through the polynomial expression
(a x VLENB + b). We do not reserve frame pointer when there is any RVV
object in the stack. So, we also access the scalar frame objects through the
polynomial expression (a x VLENB + b) if the access across RVV stack
area.
Differential Revision: https://reviews.llvm.org/D94465
This is a first change needed to fix a crash in which the emergency
spill splot ends being out of reach. This happens when we run the
register scavenger after we have eliminated the frame indexes. The fix
for the actual crash will come in a later change.
This change removes an extra stack size increase we do in
RISCVFrameLowering::determineFrameLayout.
We don't have to change the size of the stack here as
PEI::calculateFrameObjectOffsets is already doing this with the right
size accounting the extra alignment.
Differential Revision: https://reviews.llvm.org/D89237
This is a special calling convention to be used by the GHC compiler.
Patch by Andreas Schwab (schwab)
Differential Revision: https://reviews.llvm.org/D89788