Summary:
Adds the following inline asm constraints for SVE:
- w: SVE vector register with full range, Z0 to Z31
- x: Restricted to registers Z0 to Z15 inclusive.
- y: Restricted to registers Z0 to Z7 inclusive.
This change also adds the "z" modifier to interpret a register as an SVE register.
Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness.
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened
Reviewed By: sdesmalen
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66302
llvm-svn: 370673
Recommit with fixes for mac builders.
Summary:
AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta
instructions (e.g. CFI_INSTRUCTION) as normal instructions and
giving them a size of 4.
This results in branch relaxation calculating block sizes wrong.
Branch relaxation also considers alignment and thus a single
mistake can result in later blocks being incorrectly sized even
when they themselves do not contain meta instructions.
The net result is we might not relax a branch whose destination is
not within range.
Reviewers: nickdesaulniers, peter.smith
Reviewed By: peter.smith
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66337
> llvm-svn: 369111
llvm-svn: 369133
Summary:
AArch64InstrInfo::getInstSizeInBytes is incorrectly treating meta
instructions (e.g. CFI_INSTRUCTION) as normal instructions and
giving them a size of 4.
This results in branch relaxation calculating block sizes wrong.
Branch relaxation also considers alignment and thus a single
mistake can result in later blocks being incorrectly sized even
when they themselves do not contain meta instructions.
The net result is we might not relax a branch whose destination is
not within range.
Reviewers: nickdesaulniers, peter.smith
Reviewed By: peter.smith
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66337
llvm-svn: 369111
Prevent the LoadStoreOptimizer from pairing any load/store instructions with
instructions from the prologue/epilogue if the CFI information has encoded the
operations as separate instructions. This would otherwise lead to a mismatch
of the actual prologue size from the size as recorded in the Windows CFI.
Reviewers: efriedma, mstorsjo, ssijaric
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D65817
llvm-svn: 368164
Refactor emitFrameOffset to accept a StackOffset struct as its offset argument.
This method currently only supports byte offsets (MVT::i8) but will be extended
in a later patch to support scalable offsets (MVT::nxv1i8) as well.
Reviewers: thegameg, rovka, t.p.northover, efriedma, greened
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61436
llvm-svn: 368049
To support spilling/filling of scalable vectors we need a more generic
representation of a stack offset than simply 'int'.
For this we introduce the StackOffset struct, which comprises multiple
offsets sized by their respective MVTs. Byte-offsets will thus be a simple
tuple such as { offset, MVT::i8 }. Adding two byte-offsets will result in a
byte offset { offsetA + offsetB, MVT::i8 }. When two offsets have different
types, we can canonicalise them to use the same MVT, as long as their
runtime sizes are guaranteed to have the same size-ratio as they would have
at compile-time.
When we have both scalable- and fixed-size objects on the stack, we can
create an offset that is:
({ offset_fixed, MVT::i8 } + { offset_scalable, MVT::nxv1i8 })
The struct also contains a getForFrameOffset() method that is specific to
AArch64 and decomposes the frame-offset to be used directly in instructions
that operate on the stack or index into the stack.
Note: This patch adds StackOffset as an AArch64-only concept, but we would
like to make this a generic concept/struct that is supported by all
interfaces that take or return stack offsets (currently as 'int'). Since
that would be a bigger change that is currently pending on D32530 landing,
we thought it makes sense to first show/prove the concept in the AArch64
target before proposing to roll this out further.
Reviewers: thegameg, rovka, t.p.northover, efriedma, greened
Reviewed By: rovka, greened
Differential Revision: https://reviews.llvm.org/D61435
llvm-svn: 368024
This feature instructs the backend to allow locally defined global variable
addresses to contain a pointer tag in bits 56-63 that will be ignored by
the hardware (i.e. TBI), but may be used by an instrumentation pass such
as HWASAN. It works by adding a MOVK instruction to the regular ADRP/ADD
sequence that sets bits 48-63 to the corresponding bits of the global, with
the linker bounds check disabled on the ADRP instruction to prevent the tag
from causing a link failure.
This implementation of the feature omits the MOVK when loading from or storing
to a global, which is sufficient for TBI. If the same approach is extended
to MTE, assuming that 0 is not configured as a catch-all tag, we will most
likely also need the MOVK in this case in order to avoid a tag mismatch.
Differential Revision: https://reviews.llvm.org/D65364
llvm-svn: 367475
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
Summary:
It looks like since INLINEASM_BR was created off of INLINEASM, a few
checks for INLINEASM needed to be updated to check for either case.
pr/41999
Reviewers: t.p.northover, peter.smith
Reviewed By: peter.smith
Subscribers: craig.topper, javed.absar, kristof.beyls, hiraditya, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62402
llvm-svn: 361661
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :
undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;
It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.
llvm-svn: 361417
Summary:
Otherwise, we emit directives for CFI without any actual CFI opcodes to
go with them, which causes tools to malfunction. The technique is
similar to what the x86 backend already does.
Fixes https://bugs.llvm.org/show_bug.cgi?id=40876
Patch by: froydnj (Nathan Froyd)
Reviewers: mstorsjo, eli.friedman, rnk, mgrang, ssijaric
Reviewed By: rnk
Subscribers: javed.absar, kristof.beyls, llvm-commits, dmajor
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61960
llvm-svn: 360816
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486
llvm-svn: 358963
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).
The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60856
llvm-svn: 358744
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.
Reviewers: paquette, evandro, eli.friedman, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357064
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.
Differential revision: https://reviews.llvm.org/D57753
llvm-svn: 353190
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.
Reviewers: rnk, mstorsjo, efriedma
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D56037
llvm-svn: 350798
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.
Differential Revision: https://reviews.llvm.org/D55929
llvm-svn: 350729
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Differential Revision: https://reviews.llvm.org/D54896
llvm-svn: 349456
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, for the Exynos processors.
Differential revision: https://reviews.llvm.org/D55345
llvm-svn: 348774
This moves the stack check logic into a lambda within getOutliningCandidateInfo.
This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.
So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.
This gives some code size savings as a result.
llvm-svn: 348220
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.
llvm-svn: 348219
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.
This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.
Also add tests for each possible new behaviour.
- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.
- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case
- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.
llvm-svn: 348168
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.
This makes it so that we check for that condition before mapping instructions.
This allows us to outline more, since we don't pessimise as many instructions.
Also update some tests, since we outline more.
llvm-svn: 348081
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.
This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.
This mostly offers some serious simplification.
Also update the test to show the added functionality.
llvm-svn: 348036
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.
llvm-svn: 347906
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:
```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
unsigned long long args[8];
args[0] = key;
args[1] = index;
use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
dst[0] = a;
dst[1] = b;
}
```
The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.
This adds support for this.
Differential Revision: https://reviews.llvm.org/D54847
llvm-svn: 347747
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
llvm-svn: 347746
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasExtendedReg()`.
Differential revision: https://reviews.llvm.org/D54822
llvm-svn: 347599
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.
If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.
llvm-svn: 346901
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.
This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)
llvm-svn: 346816
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.
The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.
llvm-svn: 346809
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.
llvm-svn: 346803
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.
llvm-svn: 346721
Instead of returning Flags, return true if the MBB is safe to outline from.
This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.
llvm-svn: 346718
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.
Patch by Sanjin Sijaric, with some fixes by me.
Differential Revision: https://reviews.llvm.org/D51524
llvm-svn: 346568
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue. These are used by the MCLayer to
populate the .xdata section.
Differential Revision: https://reviews.llvm.org/D50288
llvm-svn: 345701
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.
isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.
Differential Revision: https://reviews.llvm.org/D53851
llvm-svn: 345634
When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.
Differential revision: https://reviews.llvm.org/D52869
llvm-svn: 343969
The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.
The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.
Differential revision: https://reviews.llvm.org/D52829
llvm-svn: 343959
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Differential Revision: https://reviews.llvm.org/D52125
llvm-svn: 343895
- Fix spill/reloads of XSeqPairs failing with vregs (only physregs
worked correctly)
- Add missing spill/reload code for WSeqPairs class
Differential Revision: https://reviews.llvm.org/D52761
llvm-svn: 343799
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Differential Revision: https://reviews.llvm.org/D52125
llvm-svn: 343520
Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp`
and `zcz-fp`, respectively, while retaining the original feature option to
mean both.
Differential revision: https://reviews.llvm.org/D52621
llvm-svn: 343354
The runtime pseudo relocations can't handle the AArch64 format PC
relative addressing in adrp+add/ldr pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.
Differential Revision: https://reviews.llvm.org/D51452
llvm-svn: 341401
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.
This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.
Differential Revision: https://reviews.llvm.org/D49673
llvm-svn: 340397
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:
f1:
...
bl OUTLINED_FUNCTION
...
f2:
...
move LR's contents to a register
bl OUTLINED_FUNCTION
move the register's contents back
instead of falling back to saving LR in both cases.
llvm-svn: 338278
Fixed the ASAN failure from before in r338148, so recommiting.
This patch enables the MachineOutliner by default in AArch64 under -Oz.
The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.
We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!
llvm-svn: 338160
There was a missing check for if a candidate list was entirely deleted. This
adds that check.
This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.
llvm-svn: 338148
This patch enables the MachineOutliner by default in AArch64 under -Oz.
The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.
We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!
llvm-svn: 338133
Just some gardening here.
Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.
Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.
llvm-svn: 337848
Before this, TCI contained all the call information for each Candidate.
This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.
A follow-up patch for the AArch64 outliner will demonstrate this.
llvm-svn: 337840
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).
Differential Revision: https://reviews.llvm.org/D49023
llvm-svn: 336481
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.
This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.
https://bugs.llvm.org/show_bug.cgi?id=37573https://reviews.llvm.org/D47655
llvm-svn: 335758
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.
This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.
llvm-svn: 335076
This is setting up to fix bug 37573 cleanly.
This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.
Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.
Also, as a side-effect, it makes the outliner a bit cleaner.
https://bugs.llvm.org/show_bug.cgi?id=37573
llvm-svn: 333952
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.
In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.
To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.
Differential Revision: https://reviews.llvm.org/D47173
llvm-svn: 333015
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.
Differential Revision: https://reviews.llvm.org/D46921
llvm-svn: 332685
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.
Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.
Differential Revision: https://reviews.llvm.org/D46920
llvm-svn: 332529
The cost computation assumes we do this correctly, but the actual
lowering was wrong.
Differential Revision: https://reviews.llvm.org/D46923
llvm-svn: 332514
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.
This patch has no new test case. I have run regression test and there is
no difference in regression test.
Differential Revision: https://reviews.llvm.org/D45342
Patch by Hsiangkai Wang.
llvm-svn: 331844
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.
Outlined calls shouldn't break liveness assumptions that someone might make.
This also un-XFAILs the noredzone test, and updates the calls test.
llvm-svn: 331095
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.
I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.
Differential Revision: https://reviews.llvm.org/D46091
llvm-svn: 331007
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.
This also adds a test that ensures that those sorts of ADRPs won't be outlined.
llvm-svn: 330783
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.
Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.
llvm-svn: 330459
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.
llvm-svn: 328719
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.
This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.
llvm-svn: 328674
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.
Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.
llvm-svn: 327917
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.
llvm-svn: 327760