[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining
New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining
adding x86 tests analagous to AARCH64 cfi tests
Revision: https://reviews.llvm.org/D77852
CFI instructions can only safely be outlined when the outlined call is a tail
call, or when the outlined frame is fixed up.
For the sake of correctness, disable outlining from CFI instructions.
Add machine-outliner-cfi.mir to test this.
Add support for DestructiveBinaryComm DestructiveInstType, as well as the lowering code to expand the new Pseudos into the final movprfx+instruction pairs.
Differential Revision: https://reviews.llvm.org/D73711
Summary:
Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo,
has the effect that all places where this information is used
(notably, TargetInstrInfo::getMemOperandWithOffset) will need
to consider Scale - and derived, Offset - possibly being scalable.
This patch adds a new operand `bool &OffsetIsScalable` to
TargetInstrInfo::getMemOperandWithOffset and fixes up all
the places where this function is used, to consider the
offset possibly being scalable.
In most cases, this means bailing out because the algorithm does not
(or cannot) support scalable offsets in places where it does some
form of alias checking for example.
Reviewers: rovka, efriedma, kristof.beyls
Reviewed By: efriedma
Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72758
Scheduler sends NumLoads argument into shouldClusterMemOps()
one less the actual cluster length. So for 2 instructions
it will pass just 1. Correct this number.
This is NFC for in tree targets.
Differential Revision: https://reviews.llvm.org/D73292
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).
Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71215
The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.
The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.
The only requirements for the base operands returned by
getMemOperandsWithOffset are:
- they can be sorted by MemOpInfo::Compare, such that clusterable ops
get sorted next to each other, and
- shouldClusterMemOps knows what they mean.
One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.
Differential Revision: https://reviews.llvm.org/D71655
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.
Differential Revision: https://reviews.llvm.org/D72961
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and base address as a FI operand when
possible. This is needed to simplify recognizing an STGloop instruction
as operating on a stack slot post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Conservatively always save + restore LR in noreturn functions.
These functions do not end in a RET, and so they aren't guaranteed to have an
instruction which uses LR in any way. So, as a result, you can end up in
unfortunate situations where you can't backtrace out of these functions in a
debugger.
Remove the old noreturn test, and add a new one which is more descriptive.
Remove the restriction that we can't outline from noreturn functions as well
since we now do the right thing.
Summary:
This never really occurs in the current codegen, so only a MIR test is
possible.
Reviewers: ostannard, pcc
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72123
Summary:
r347747 added support for clustering mem ops with FI base operands
including support for fixed stack objects in shouldClusterFI, but
apparently this was never tested.
This patch fixes shouldClusterFI to work with scaled as well as
unscaled load/store instructions, and fixes the ordering of memory ops
in MemOpInfo::operator< to ensure that memory addresses always
increase, regardless of which direction the stack grows.
Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71334
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: ormris, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70431
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
Summary:
When MUL is the first operand to SUB, we can't use MLS because the accumulator
should be negated. Emit a NEG of the accumulator and an MLA instead, similar to
what we do for FMUL / FSUB fusing.
Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf, mstorsjo, asbirlea
Reviewed By: asbirlea
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71067
Summary:
Reland after fixing an ASan failure by stopping outlining early if the
constraints for return address signing removed too many outlining candidates.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
This patch allows the register allocator to spill SVE registers to the stack.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70082
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
This is causing faults when an instruction which modifies SP is
outlined, causing the PAC and AUT instructions to not match.
This reverts commits 70caa1fc30 and
55314d3237.
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
Summary:
During AArch64 frame lowering instructions to enable return address
signing are inserted into function if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining
agree on their return address signing scope and their return address
signing key, outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining
agree on their support for v8.3A features, outlining is disabled for
these functions.
2. If all candidate functions agree on the signing scope, signing key and
and their support for v8.3 features, the outlined function behaves as
if it had the same scope and key attributes and as if it would provide
the same v8.3A support as the original functions.
Reviewers: olista01, paquette, t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69097
Extend the describeLoadedValue() with support for target specific ARM and
AArch64 instructions interpretation. The patch provides specialization for
ADD and SUB operations that include a register and an immediate/offset
operand. Some of the instructions can operate with global string addresses
or constant pool indexes but such cases are omitted since we currently lack
flexible support for processing such operands at DWARF production stage.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D67556
r374772 changed Offset to be an int64_t but left NewOffset as an int.
Scale is unsigned, so in the calculation `Offset - NewOffset * Scale`,
`NewOffset * Scale` was promoted to unsigned and was then zero-extended
to 64 bits, leading to an incorrect computation which manifested as an
out-of-memory when building the Swift standard library for Android
aarch64. Promote NewOffset to int64_t to fix this, and promote
EmittableOffset as well, since its one user passes it to a function
which takes an int64_t anyway.
Test case based on a suggestion by Sander de Smalen!
Differential Revision: https://reviews.llvm.org/D69018
llvm-svn: 375043
Materialize accesses to SVE frame objects from SP or FP, whichever is
available and beneficial.
This patch still assumes the objects are pre-allocated. The automatic
layout of SVE objects within the stackframe will be added in a separate
patch.
Reviewers: greened, cameron.mcinally, efriedma, rengolin, thegameg, rovka
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D67749
llvm-svn: 374772
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:
> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
> %mul = fmul fast <8 x half> %c, %b
> %sub = fsub fast <8 x half> %mul, %a
> ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.
This patch generates an fmla and negates the value of the operand2 of the fsub.
Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.
Tested on aarch64-linux with make check-all.
Differential Revision: https://reviews.llvm.org/D67990
llvm-svn: 374044
Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects.
The focus of this patch is purely to allow the stack frame to
allocate/deallocate space for scalable SVE objects. More dynamic
allocation (at compile-time, i.e. determining placement of SVE objects
on the stack), or resolving frame-index references that include
scalable-sized offsets, are left for subsequent patches.
SVE objects are allocated in the stack frame as a separate region below
the callee-save area, and above the alignment gap. This is done so that
the SVE objects can be accessed directly from the FP at (runtime)
VL-based offsets to benefit from using the VL-scaled addressing modes.
The layout looks as follows:
+-------------+
| stack arg |
+-------------+
| Callee Saves|
| X29, X30 | (if available)
|-------------| <- FP (if available)
| : |
| SVE area |
| : |
+-------------+
|/////////////| alignment gap.
| : |
| Stack objs |
| : |
+-------------+ <- SP after call and frame-setup
SVE and non-SVE stack objects are distinguished using different
StackIDs. The offsets for objects with TargetStackID::SVEVector should be
interpreted as purely scalable offsets within their respective SVE region.
Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61437
llvm-svn: 373585
Summary:
Adds the following inline asm constraints for SVE:
- Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
- Upa: SVE predicate register with full range, P0 to P15
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin
Reviewed By: rovka
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66524
llvm-svn: 371967
Follow-up of rL371321 that added some more FP16 FMA patterns, and an attempt to
reduce the copy-pasting and make this more readable.
Differential Revision: https://reviews.llvm.org/D67403
llvm-svn: 371818
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
This patch enables generation of fused multiply add/sub for instructions operating on fp16.
Tested on aarch64-linux.
Differential Revision: https://reviews.llvm.org/D67297
llvm-svn: 371321