Commit Graph

47 Commits

Author SHA1 Message Date
Dávid Bolvanský 49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi 435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Djordje Todorovic df686842bc [RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.

After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.

This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.

This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:

For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (1).

By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.

Differential Revision: https://reviews.llvm.org/D105279
2021-07-14 04:29:42 -07:00
Arthur Eubanks 8815ce03e8 Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.

Reviewed By: rnk, MaskRay, compnerd

Differential Revision: https://reviews.llvm.org/D99707
2021-05-31 08:32:36 -07:00
Amara Emerson 5b158093e2 [AArch64][GlobalISel] Create a new minimal combiner pass just for -O0.
We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).

This also removes the CSE option since we don't need it for -O0.

Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.

Differential Revision: https://reviews.llvm.org/D102038
2021-05-07 17:01:27 -07:00
Simon Moll 1db4dbba24 Recommit "[VP,Integer,#2] ExpandVectorPredication pass"
This reverts the revert 02c5ba8679

Fix:

Pass was registered as DUMMY_FUNCTION_PASS causing the newpm-pass
functions to be doubly defined. Triggered in -DLLVM_ENABLE_MODULE=1
builds.

Original commit:

This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203
2021-05-04 11:47:52 +02:00
Adrian Prantl 02c5ba8679 Revert "[VP,Integer,#2] ExpandVectorPredication pass"
This reverts commit 43bc584dc0.

The commit broke the -DLLVM_ENABLE_MODULES=1 builds.

http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/31603/consoleFull#2136199809a1ca8a51-895e-46c6-af87-ce24fa4cd561
2021-04-30 17:02:28 -07:00
Simon Moll 43bc584dc0 [VP,Integer,#2] ExpandVectorPredication pass
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).

VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D78203
2021-04-30 15:47:28 +02:00
Arthur Eubanks c88b87f9ce Revert "Remove "Rewrite Symbols" from codegen pipeline"
This reverts commit 6210261ecb.

addr-label.ll crashes on armv7.
2021-04-10 23:28:16 -07:00
Arthur Eubanks 6210261ecb Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99707
2021-04-10 22:38:44 -07:00
Amara Emerson 8a316045ed [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Arthur Eubanks 040c1b49d7 Move EntryExitInstrumentation pass location
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.

Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.

Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D97608
2021-03-01 10:08:10 -08:00
Florian Hahn ca23b2c8ed
[AArch64] Move machine bundle unpacking to PreEmit2 phase.
This patch adjusts the placement of the bundle unpacking to just before
code emission. In particular, this means bundle unpacking happens AFTER
the machine outliner. With the previous position, the machine outliner
may outline parts of a bundle, which breaks them up.

This is an issue for BLR_RVMARKER handling, as illustrated by the
rvmarker-pseudo-expansion-and-outlining.mir test case. The machine
outliner should not break up the bundles created during pseudo
expansion.

This should fix PR49082.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D96294
2021-02-15 16:10:43 +00:00
Amara Emerson 12b9b778d9 [AArch64][GlobalISel] Enable CSE for the prelegalizer combiner.
Differential Revision: https://reviews.llvm.org/D95647
2021-01-28 16:38:49 -08:00
Jessica Paquette 147b9497e7 [AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)

It still makes sense to select these instructions at -O0.

Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.

This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.

Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.

And also add a 'r' which somehow got dropped from a bunch of function names.

https://reviews.llvm.org/D89820
2020-10-22 14:43:25 -07:00
Kristof Beyls c35ed40f4f [AArch64] Extend AArch64SLSHardeningPass to harden BLR instructions.
To make sure that no barrier gets placed on the architectural execution
path, each
  BLR x<N>
instruction gets transformed to a
  BL __llvm_slsblr_thunk_x<N>
instruction, with __llvm_slsblr_thunk_x<N> a thunk that contains
__llvm_slsblr_thunk_x<N>:
  BR x<N>
  <speculation barrier>

Therefore, the BLR instruction gets split into 2; one BL and one BR.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber X16 and X17 on function calls, the
above code transformation would not be correct in case a linker does so
when N=16 or N=17. Therefore, when the mitigation is enabled, generation
of BLR x16 or BLR x17 is avoided.

As BLRA* indirect calls are not produced by LLVM currently, this does
not aim to implement support for those.

Differential Revision:  https://reviews.llvm.org/D81402
2020-06-12 07:34:33 +01:00
Kristof Beyls 0ee176edc8 [AArch64] Introduce AArch64SLSHardeningPass, implementing hardening of RET and BR instructions.
Some processors may speculatively execute the instructions immediately
following RET (returns) and BR (indirect jumps), even though
control flow should change unconditionally at these instructions.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every RET and BR instruction.

Since these barriers are never on the correct, architectural execution
path, performance overhead of this is expected to be low.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

Even though currently LLVM does not produce BRAA/BRAB/BRAAZ/BRABZ
instructions, these are also mitigated by the pass and tested through a
MIR test.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision:  https://reviews.llvm.org/D81400
2020-06-11 07:51:17 +01:00
Nikita Popov e0e5c64460 [SDAG] Don't require LazyBlockFrequencyInfo at optnone
While LazyBlockFrequencyInfo itself is lazy, the dominator tree
and loop info analyses it requires are not. Drop the dependency
on this pass in SelectionDAGIsel at O0.
This makes for a ~0.6% O0 compile-time improvement.

Differential Revision: https://reviews.llvm.org/D80387
2020-05-28 18:48:33 +02:00
Nikita Popov 2833c46f75 [DwarfEHPrepare] Don't prune unreachable resumes at optnone
Disable pruning of unreachable resumes in the DwarfEHPrepare pass
at optnone. While I expect the pruning itself to be essentially free,
this does require a dominator tree calculation, that is not used for
anything else. Saving this DT construction makes for a 0.4% O0
compile-time improvement.

Differential Revision: https://reviews.llvm.org/D80400
2020-05-23 20:58:01 +02:00
Nikita Popov 0c6bba71e3 [TargetPassConfig] Don't add alias analysis at optnone
When performing codegen at optnone, don't add alias analysis to
the pipeline. We don't need it, but it causes an unnecessary
dominator tree calculation.

I've also moved the module verifier call to the top so that a bunch
of disabled-at-optnone passes group more nicely.

Differential Revision: https://reviews.llvm.org/D80378
2020-05-23 10:35:03 +02:00
Daniel Sanders f71350f05a Add -debugify-and-strip-all to add debug info before a pass and remove it after
Summary:
This allows us to test each backend pass under the presence
of debug info using pre-existing tests. The tests should not
fail as a result of this so long as it's true that debug info
does not affect CodeGen.

In practice, a few tests are sensitive to this:
* Tests that check the pass structure (e.g. O0-pipeline.ll)
* Tests that check --debug output. Specifically instruction
  dumps containing MMO's (e.g. prelegalizercombiner-extends.ll)
* Tests that contain debugify metadata as mir-strip-debug will
  remove it (e.g. fastisel-debugvalue-undef.ll)
* Tests with partial debug info (e.g.
  patchable-function-entry-empty.mir had debug info but no
  !llvm.dbg.cu)
* Tests that check optimization remarks overly strictly (e.g.
  prologue-epilogue-remarks.mir)
* Tests that would inject the pass in an unsafe region (e.g.
  seqpairspill.mir would inject between register alloc and
  virt reg rewriter)
In all cases, the checks can either be updated or
--debugify-and-strip-all-safe=0 can be used to avoid being
affected by something like llvm-lit -Dllc='llc --debugify-and-strip-all-safe'

I tested this without the lost debug locations verifier to
confirm that AArch64 behaviour is unaffected (with the fixes
in this patch) and with it to confirm it finds the problems
without the additional RUN lines we had before.

Depends on D77886, D77887, D77747

Reviewers: aprantl, vsk, bogner

Subscribers: qcolombet, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77888
2020-04-10 16:36:07 -07:00
Serguei Katkov 4275eb1331 Re-land [Codegen/Statepoint] Allow usage of registers for non gc deopt values.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.

The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.

Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
2020-04-10 10:13:39 +07:00
Cameron McInally a5b22b768f [AArch64][SVE] Add support for DestructiveBinary and DestructiveBinaryComm DestructiveInstTypes
Add support for DestructiveBinaryComm DestructiveInstType, as well as the lowering code to expand the new Pseudos into the final movprfx+instruction pairs.

Differential Revision: https://reviews.llvm.org/D73711
2020-02-21 15:19:54 -06:00
Fangrui Song 9a24488cb6 [CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).

This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.

Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
2020-01-19 00:09:46 -08:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi 9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Momchil Velikov d91ea7fc6f [AArch64] Move the branch relaxation pass after BTI insertion
Summary:
Inserting BTI instructions can push branch destinations out of range.

The branch relaxation pass itself cannot insert indirect branches since `TargetInstrInfo::insertIndirecrtBranch` is not implemented for AArch64 (guess +/-128 MB direct branch range is more than enough in practice).

Testing this is a bit tricky.

The original test case we have is 155kloc/6.1M. I've generated a test case using this program:
```

int main() {
  std::cout << R"src(int test();
void g0(), g1(), g2(), g3(), g4(), e();

void f(int v) {
  if ((test() & 2) == 0) {
  switch (v) {
  case 0:
    g0();
  case 1:
    g1();
  case 2:
    g2();
  case 3:
    g3();
  }
)src";

  const int N = 8176;

  for (int i = 0; i < N; ++i)
    std::cout << "    void h" << i << "();\n";
  for (int i = 0; i < N; ++i)
    std::cout << "    h" << i << "();\n";

  std::cout << R"src(
  } else {
    e();
  }
}
)src";
}
```
which is still a bit too much to commit as a regression test, IMHO.

Reviewers: t.p.northover, ostannard

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69118

Change-Id: Ide5c922bcde08ff4cf635da5e52365525a997a0a
2019-11-06 12:46:50 +00:00
Joerg Sonnenberger 9681ea9560 Reapply r374743 with a fix for the ocaml binding
Add a pass to lower is.constant and objectsize intrinsics

This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374784
2019-10-14 16:15:14 +00:00
Dmitri Gribenko 1a21f98ac3 Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218

llvm-svn: 374768
2019-10-14 12:22:48 +00:00
Joerg Sonnenberger e4300c392d Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374743
2019-10-13 23:00:15 +00:00
Matt Arsenault caff0a88dd GlobalISel: Add known bits to InstructionSelector
AMDGPU uses this for some addressing mode selection patterns. The
analysis run itself doesn't do anything so it seems easier to just
always require this than adding a way to opt in.

llvm-svn: 370388
2019-08-29 17:24:32 +00:00
Aditya Nandakumar 6bbfde5c48 [GISel]: Fix trivial build breakage
llvm-svn: 368067
2019-08-06 17:53:04 +00:00
Evgeniy Stepanov 851339fb29 Basic MTE stack tagging instrumentation.
Summary:
Use MTE intrinsics to tag stack variables in functions with
sanitize_memtag attribute.

Reviewers: pcc, vitalybuka, hctim, ostannard

Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64173

llvm-svn: 366361
2019-07-17 19:24:12 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Kristof Beyls e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Oliver Stannard 250e5a5b65 [AArch64][v8.5A] Branch Target Identification code-generation pass
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967
2018-10-08 14:04:24 +00:00
Daniel Sanders c973ad1878 Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.

llvm-svn: 343654
2018-10-03 02:12:17 +00:00
Daniel Sanders 33f42f97af Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.

llvm-svn: 343546
2018-10-01 22:32:08 +00:00
Daniel Sanders 9659bfda5a [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 343521
2018-10-01 18:56:47 +00:00
Daniel Sanders 618437459c Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Daniel Sanders d24dcdd1f7 [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Michael Zolotukhin 8d052a0dd2 Remove MachineLoopInfo dependency from AsmPrinter.
Summary:
Currently MachineLoopInfo is used in only two places:
1) for computing IsBasicBlockInsideInnermostLoop field of MCCodePaddingContext, and it is never used.
2) in emitBasicBlockLoopComments, which is called only if `isVerbose()` is true.
Despite that, we currently have a dependency on MachineLoopInfo, which makes
pass manager to compute it and MachineDominator Tree. This patch removes the
use (1) and makes the use (2) lazy, thus avoiding some redundant
recomputations.

Reviewers: opaparo, gadi.haber, rafael, craig.topper, zvi

Subscribers: rengolin, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44812

llvm-svn: 329542
2018-04-09 00:54:47 +00:00
Michael Zolotukhin 3520331f93 Reapply "[test] Add tests for llc passes pipelines." with a fix for bots with expensive checks on.
llvm-svn: 328267
2018-03-22 23:02:48 +00:00
Jonas Devlieghere 7e69dd02bb Revert "[test] Add tests for llc passes pipelines."
This reverts r328159 because the two AArch64 tests fail on GreenDragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/

llvm-svn: 328188
2018-03-22 10:34:06 +00:00
Michael Zolotukhin 7e6fa1d6ae [test] Add tests for llc passes pipelines.
This is basically an extension of existing test
test/CodeGen/X86/O0-pipeline.ll introduced in r302608.

llvm-svn: 328159
2018-03-21 22:17:13 +00:00