Despite the fact that the localizer's original motivation was to fix horrendous
constant spilling at -O0, shortening live ranges still has net benefits even
with optimizations enabled.
On an -Os build of CTMark, doing this improves code size by 0.5% geomean.
There are a few regressions, bullet increasing in size by 0.5%. One example from
bullet where code size increased slightly was due to GlobalISel actually now
generating the same code as SelectionDAG. So we actually have an opportunity
in future to implement better heuristics for localization and therefore be
*better* than SDAG in some cases. In relation to other optimizations though that
one is relatively minor.
Differential Revision: https://reviews.llvm.org/D67303
llvm-svn: 371266
Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.
MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.
The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.
Reviewers: pcc, vitalybuka, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66457
llvm-svn: 370490
Summary:
MTE provides instructions to update memory tags and data at the same
time. This change makes use of those to generate more compact code for
stack variable tagging + initialization.
We collect memory store and memset instructions following an alloca or a
lifetime.start call, and replace them with the corresponding MTE
intrinsics. Since the intrinsics work on 16-byte aligned chunks, the
stored values are combined as necessary.
Reviewers: pcc, vitalybuka, ostannard
Subscribers: srhines, javed.absar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66167
llvm-svn: 369297
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Extern global merging is good for code-size. There's definitely potential for
performance too, but there's one regression in a benchmark that needs
investigating, so that's why we enable it only when we optimise for size for
now.
Patch by Ramakota Reddy and Sjoerd Meijer.
Differential Revision: https://reviews.llvm.org/D61947
llvm-svn: 363130
This reverts r362990 (git commit 374571301d)
This was causing linker warnings on Darwin:
ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.
llvm-svn: 363028
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
llvm-svn: 362990
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.
llvm-svn: 360709
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
llvm-svn: 358368
Windows ARM64 has PIC relocation model and uses jump table kind
EK_LabelDifference32. This produces jump table entry as
".word LBB123 - LJTI1_2" which represents the distance between the block
and jump table.
A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup
correctly if they are in different COFF section.
This change saves the jump table to the same COFF section as the
associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32
relocation type.
Patch by Tom Tan!
Differential Revision: https://reviews.llvm.org/D57277
llvm-svn: 352465
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Differential Revision: https://reviews.llvm.org/D54896
llvm-svn: 349456
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.
llvm-svn: 349338
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.
The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.
Differential Revision: https://reviews.llvm.org/D54518
llvm-svn: 347861
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.
llvm-svn: 346366
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.
Differential Revision: https://reviews.llvm.org/D54129
llvm-svn: 346358
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.
There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
used at function entry points. These are the BLR instruction, as well
as BR with x16 or x17. These BR instructions are allowed for use in
PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
targeted by any BLR or BR instruction.
Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).
This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
called (either because it is address-taken, or externally visible so
could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
indirectly branched to. This could be either done by a jump table, or
by taking the address of the block (e.g. the using GCC label values
extension).
We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.
Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.
Differential revision: https://reviews.llvm.org/D52867
llvm-svn: 343967
Summary: Depends on D45541
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45543
The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.
llvm-svn: 343654
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.
llvm-svn: 343546
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.
This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.
Differential Revision: https://reviews.llvm.org/D49673
llvm-svn: 340397
Fixed the ASAN failure from before in r338148, so recommiting.
This patch enables the MachineOutliner by default in AArch64 under -Oz.
The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.
We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!
llvm-svn: 338160
This patch enables the MachineOutliner by default in AArch64 under -Oz.
The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.
We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!
llvm-svn: 338133
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.
After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.
https://reviews.llvm.org/D48683
llvm-svn: 335887
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.
Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.
rdar://41530228
DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.
llvm-svn: 331846
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.
No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.
Reviewers: qcolombet, aemerson, ab, dsanders
Reviewed By: qcolombet
Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42137
llvm-svn: 322773
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.
This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.
Differential Revision: https://reviews.llvm.org/D41362
llvm-svn: 321655
Re-land r321234. It had to be reverted because it broke the shared
library build. The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).
Original commit message:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321375
Summary:
This makes the TargetMachine interface a bit simpler. We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.
See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html
I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.
Reviewers: echristo, MatzeB, hfinkel
Reviewed By: hfinkel
Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D41464
llvm-svn: 321234
This should solve:
https://bugs.llvm.org/show_bug.cgi?id=34603
...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run.
It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the
sinking transform later in the optimization pipeline.
Differential Revision: https://reviews.llvm.org/D38566
llvm-svn: 320749
Replace interleaved store instructions by equivalent and more efficient instructions based on latency cost model.
Https://reviews.llvm.org/D38196
llvm-svn: 320123
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
Reverting to investigate layering effects of MCJIT not linking
libCodeGen but using TargetMachine::getNameWithPrefix() breaking the
lldb bots.
This reverts commit r315633.
llvm-svn: 315637
Merge LLVMTargetMachine into TargetMachine.
- There is no in-tree target anymore that just implements TargetMachine
but not LLVMTargetMachine.
- It should still be possible to stub out all the various functions in
case a target does not want to use lib/CodeGen
- This simplifies the code and avoids methods ending up in the wrong
interface.
Differential Revision: https://reviews.llvm.org/D38489
llvm-svn: 315633
Summary:
After r308422 we defer optimizations that can destroy loop canonical forms to
LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations
can exploit more control-flow opportunities.
Reviewers: mcrosier, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38262
llvm-svn: 314857
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.
A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34963
llvm-svn: 308059
Summary:
isFusion returns true if the subtarget supports any kind of instruction
fusion, similar to ARMSubtarget::isFusion. This was suggested in D34142.
This changes the current behavior slightly, because the macro fusion mutation
is now added to the PostRA MachineScheduler in case the subtarget supports
any kind of fusion. I think that makes sense because if the PostRA
MachineScheduler is run, there is potential that instructions scheduled back to
back are re-scheduled.
Reviewers: evandro, t.p.northover, joelkevinjones, joel_k_jones, steleman
Reviewed By: joelkevinjones
Subscribers: joel_k_jones, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34958
llvm-svn: 307842
Summary:
This is the llvm part of the initial implementation to support Windows ARM64 COFF format.
I will gradually add more functionality in subsequent patches.
Reviewers: ruiu, rnk, t.p.northover, compnerd
Reviewed By: ruiu, compnerd
Subscribers: aemerson, mgorny, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D34705
llvm-svn: 306490
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2
sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.
Differential Revision: https://reviews.llvm.org/D34220.
llvm-svn: 306144
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
similar to GenericScheduler::tryCandidate.
This change increases the number of AES instruction pairs generated on
Cortex-A57 and Cortex-A72. This doesn't change code at all in
most benchmarks or general code, but we've seen improvement on kernels
using AESE/AESMC and AESD/AESIMC.
Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB
Reviewed By: evandro
Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33230
llvm-svn: 303618
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32861
llvm-svn: 303418
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
Summary:
Remove the AArch64AddressTypePromotion pass as we migrated all transformations
done in this pass into CGP in r299379.
Reviewers: qcolombet, jmolloy, javed.absar, mcrosier
Reviewed By: qcolombet
Subscribers: aemerson, rengolin, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D31623
llvm-svn: 302245
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().
Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.
Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32491
llvm-svn: 301750
Summary: This resolves the issue of tablegen-erated includes in the headers for non-GlobalISel builds in a simpler way than before.
Reviewers: qcolombet, ab
Reviewed By: ab
Subscribers: igorb, ab, mgorny, dberris, rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30998
llvm-svn: 299637
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.
Reviewers: jmolloy, mcrosier, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, aemerson, rengolin, mcrosier
Differential Revision: https://reviews.llvm.org/D28680
llvm-svn: 299379
Until now, we've had to use -global-isel to enable GISel. But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.
This first step adds an override for the target to explicitly define its
level of support. For AArch64, do that using
a new command-line option (I know..):
-aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.
Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!
While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is
such a cold path that in practice that shouldn't matter at all.
llvm-svn: 296710
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, sections 4.14 and 4.15.
Differential revision: https://reviews.llvm.org/D28698
llvm-svn: 293739
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.
Differential revision: https://reviews.llvm.org/D28489
llvm-svn: 293737
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
llvm-svn: 292478
MachineLegalizer used to be the name of both the class and the member,
causing GCC errors. r276522 fixed that by renaming the member to just
'Legalizer'. The 'class' workaround isn't necessary anymore; drop it.
llvm-svn: 289848
This makes the createGenericSchedLive() function that constructs the
default scheduler available for the public API. This should help when
you want to get a scheduler and the default list of DAG mutations.
This also shrinks the list of default DAG mutations:
{Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer
added by default. Targets can easily add them if they need them. It also
makes it easier for targets to add alternative/custom macrofusion or
clustering mutations while staying with the default
createGenericSchedLive(). It also saves the callback back and forth in
TargetInstrInfo::enableClusterLoads()/enableClusterStores().
Differential Revision: https://reviews.llvm.org/D26986
llvm-svn: 288057
Doing this before register allocation reduces register pressure as we do
not even have to allocate a register for those dead definitions.
Differential Revision: https://reviews.llvm.org/D26111
llvm-svn: 287076
Summary:
Add relocations for AArch64 ILP32. Includes:
- Addition of definitions for R_AARCH32_*
- Definition of new -target-abi: ilp32
- Definition of data layout string
- Tests for added relocations. Not comprehensive, but matches
existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64".
- Tests for llvm-readobj
Reviewers: zatrazz, peter.smith, echristo, t.p.northover
Subscribers: aemerson, rengolin, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25159
llvm-svn: 284973
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
Avoid generating indexed vector instructions for Exynos. This is needed for
fmla/fmls/fmul/fmulx. For example, the instruction
fmla v0.4s, v1.4s, v2.s[1]
is less efficient than the instructions
dup v2.4s, v2.s[1]
fmla v0.4s, v1.4s, v2.4s
Patch written by Abderrazek Zaafrani.
Differential Revision: https://reviews.llvm.org/D21571
llvm-svn: 283663
This reverts commit b7d42b0048f65346e9fa37fb65defeea7ce8c337 per request by
Eric Christopher <echristo@gmail.com> (v. http://bit.ly/2cmz6kW).
llvm-svn: 282000
Initialize all AArch64-specific passes in the TargetMachine so they can be run
by llc. This can lead to conflicts in opt with some command line options that
share the same name as the pass, so I took this opportunity to do some cleanups:
* rename all relevant command line options from "aarch64-blah" to
"aarch64-enable-blah" and update the tests accordingly
* run clang-format on their declarations
* move all these declarations to a common place (the TargetMachine) as opposed
to having them scattered around (AArch64BranchRelaxation and
AArch64AddressTypePromotion were the only offenders)
llvm-svn: 277322
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.
llvm-svn: 276461
The commit reinstates r273279, which was informally approved.
Original Review: http://reviews.llvm.org/D21414
This reverts commit ca632c91aaa7cafc50942f890c49f727a046ace1.
llvm-svn: 274790
Testing for specific CPUs has a number of problems, better use subtarget
features:
- When some tweak is added for a specific CPU it is often desirable for
the next version of that CPU as well, yet we often forget to add it.
- It is hard to keep track of checks scattered around the target code;
Declaring all target specifics together with the CPU in the tablegen
file is a clear representation.
- Subtarget features can be tweaked from the command line.
To discourage people from using CPU checks in the future I removed the
isCortexXX(), isCyclone(), ... functions. I added an getProcFamily()
function for exceptional circumstances but made it clear in the comment
that usage is discouraged.
Reformat feature list in AArch64.td to have 1 feature per line in
alphabetical order to simplify merging and sorting for out of tree
tweaks.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D20762
llvm-svn: 271555
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.
llvm-svn: 269011
This patch adds support for estimating the square root, its reciprocal and
division or reciprocal using the combiner generic reciprocal machinery.
llvm-svn: 268539