Commit Graph

11202 Commits

Author SHA1 Message Date
David Penry a9f14cdc62 [ARM] Add bank conflict hazarding
Adds ARMBankConflictHazardRecognizer. This hazard recognizer
looks for a few situations where the same base pointer is used and
then checks whether the offsets lead to a bank conflict. Two
parameters are also added to permit overriding of the target
assumptions:

arm-data-bank-mask=<int> - Mask of bits which are to be checked for
conflicts.  If all these bits are equal in the offsets, there is a
conflict.
arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank
conflicts on any loads to a constant pool.

This hazard recognizer is enabled for Cortex-M7, where the Technical
Reference Manual states that there are two DTCM banks banked using bit
2 and one ITCM bank.

Differential Revision: https://reviews.llvm.org/D93054
2020-12-23 14:00:59 +00:00
Fangrui Song d9a0c40bce [MC] Split MCContext::createTempSymbol, default AlwaysAddSuffix to true, and add comments
CanBeUnnamed is rarely false. Splitting to a createNamedTempSymbol makes the
intention clearer and matches the direction of reverted r240130 (to drop the
unneeded parameters).

No behavior change.
2020-12-21 14:04:13 -08:00
Fangrui Song d33abc337c Migrate MCContext::createTempSymbol call sites to AlwaysAddSuffix=true
Most call sites set AlwaysAddSuffix to true. The two use cases do not really
need false and can be more consistent with other temporary symbol usage.
2020-12-21 14:04:13 -08:00
Fangrui Song 7b9890e17e [MC][ELF] Remove unneeded MCSymbol::setExternal calls
ELF code uses symbol bindings and does not call isExternal().
2020-12-20 21:26:36 -08:00
Kazu Hirata 966f1431de [Target] Use llvm::erase_if (NFC) 2020-12-20 17:43:22 -08:00
Kristof Beyls df8ed39283 [ARM] harden-sls-blr: avoid r12 and lr in indirect calls.
As a linker is allowed to clobber r12 on function calls, the code
transformation that hardens indirect calls is not correct in case a
linker does so.  Similarly, the transformation is not correct when
register lr is used.

This patch makes sure that r12 or lr are not used for indirect calls
when harden-sls-blr is enabled.

Differential Revision: https://reviews.llvm.org/D92469
2020-12-19 12:39:59 +00:00
Kristof Beyls a4c1f5160e [ARM] Harden indirect calls against SLS
To make sure that no barrier gets placed on the architectural execution
path, each indirect call calling the function in register rN, it gets
transformed to a direct call to __llvm_slsblr_thunk_mode_rN.  mode is
either arm or thumb, depending on the mode of where the indirect call
happens.

The llvm_slsblr_thunk_mode_rN thunk contains:

bx rN
<speculation barrier>

Therefore, the indirect call gets split into 2; one direct call and one
indirect jump.
This transformation results in not inserting a speculation barrier on
the architectural execution path.

The mitigation is off by default and can be enabled by the
harden-sls-blr subtarget feature.

As a linker is allowed to clobber r12 on function calls, the
above code transformation is not correct in case a linker does so.
Similarly, the transformation is not correct when register lr is used.
Avoiding r12/lr being used is done in a follow-on patch to make
reviewing this code easier.

Differential Revision: https://reviews.llvm.org/D92468
2020-12-19 12:33:42 +00:00
Kristof Beyls 320fd3314e [ARM] Implement harden-sls-retbr for Thumb mode
The only non-trivial consideration in this patch is that the formation
of TBB/TBH instructions, which is done in the constant island pass, does
not understand the speculation barriers inserted by the SLSHardening
pass. As such, when harden-sls-retbr is enabled for a function, the
formation of TBB/TBH instructions in the constant island pass is
disabled.

Differential Revision: https://reviews.llvm.org/D92396
2020-12-19 12:32:47 +00:00
Kristof Beyls 195f44278c [ARM] Implement harden-sls-retbr for ARM mode
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.

To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.

Hardening of indirect function calls will be done in a later,
independent patch.

This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change.  I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.

This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.

The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.

On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.

These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.

The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.

Differential Revision: https://reviews.llvm.org/D92395
2020-12-19 11:42:39 +00:00
David Green e1c1adf9dc [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

This is a recommit of 6cc3d80a84 after
fixing the backward instruction definitions.
2020-12-18 16:13:08 +00:00
David Green 6e913e4451 Revert "[ARM] Match dual lane vmovs from insert_vector_elt"
This one needed more testing.
2020-12-18 13:33:40 +00:00
Yvan Roux 923ca0b411 [ARM][MachineOutliner] Fix costs model.
Fix candidates calls costs models allocation and prepare stack fixups
handling.

Differential Revision: https://reviews.llvm.org/D92933
2020-12-17 16:08:23 +01:00
Lucas Prates c5046ebdf6 [ARM] Adding v8.7-A command-line support for the ARM target
This extends the command-line support for the 'armv8.7-a' architecture
name to the ARM target.

Based on a patch written by Momchil Velikov.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D93231
2020-12-17 13:48:54 +00:00
Lucas Prates 42b92b31b8 [ARM][AArch64] Adding basic support for the v8.7-A architecture
This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.

Based on patches written by Simon Tatham and Victor Campos.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91772
2020-12-17 13:45:08 +00:00
David Green 6cc3d80a84 [ARM] Match dual lane vmovs from insert_vector_elt
MVE has a dual lane vector move instruction, capable of moving two
general purpose registers into lanes of a vector register. They look
like one of:
  vmov q0[2], q0[0], r2, r0
  vmov q0[3], q0[1], r3, r1
They only accept these lane indices though (and only insert into an
i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector
inserts elements. Because the insert_elements are know to be
canonicalized to ascending order there are several patterns that we need
to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of
lane indices will be matched by a combination of these and the
individual vmov pattern already present. This does mean that we are
selecting several machine instructions at once due to the need to
re-arrange the inserts, but in this case there is nothing else that will
attempt to match an insert_vector_elt node.

Differential Revision: https://reviews.llvm.org/D92553
2020-12-15 15:58:52 +00:00
David Green 1de3e7fd62 [ARM] Improve handling of empty VPT blocks in tail predicated loops
A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the
VCTP is removed will become invalid. This fixed the first by removing
the now empty block and bails out for the second, as we have no simple
way of converting a VPT to a VCMP.

Differential Revision: https://reviews.llvm.org/D92369
2020-12-14 11:17:01 +00:00
David Green a4823377fd [ARM] Add basic masked load/store costs
This adds some basic MVE masked load/store costs, notably changing the
cost of legal loads/stores to the MVECostFactor and the cost of
scalarized instructions to 8*NumElts.

Differential Revision: https://reviews.llvm.org/D86538
2020-12-12 15:26:32 +00:00
David Green 3f571be1c0 [ARM] Make t2DoLoopStartTP a terminator
Although this was something that I was hoping we would not have to do,
this patch makes t2DoLoopStartTP a terminator in order to keep it at the
end of it's block, so not allowing extra MVE instruction between it and
the end. With t2DoLoopStartTP's also starting tail predication regions,
it also marks them as having side effects. The t2DoLoopStart is still
not a terminator, giving it the extra scheduling freedom that can be
helpful, but now that we have a TP version they can be treated
differently.

Differential Revision: https://reviews.llvm.org/D91887
2020-12-11 09:23:57 +00:00
Haojian Wu 2fc4afda0f Fix a -Wunused-variable warning in release build. 2020-12-10 14:52:45 +01:00
David Green 0447f3508f [ARM][RegAlloc] Add t2LoopEndDec
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358
2020-12-10 12:14:23 +00:00
David Green b0ce615b2d [ARM] Remove copies from low overhead phi inductions.
The phi created in a low overhead loop gets created with a default
register class it seems. There are then copied inserted between the low
overhead loop pseudo instructions (which produce/consume GPRlr
instructions) and the phi holding the induction. This patch removes
those as a step towards attempting to make t2LoopDec and t2LoopEnd a
single instruction, and appears useful in it's own right as shown in the
tests.

Differential Revision: https://reviews.llvm.org/D91267
2020-12-10 10:30:31 +00:00
David Green 384383e15c [ARM] Common inverse constant predicates to VPNOT
This scans through blocks looking for constants used as predicates in
MVE instructions. When two constants are found which are the inverse of
one another, the second can be replaced by a VPNOT of the first,
potentially allowing that not to be folded away into an else predicate
of a vpt block.

Differential Revision: https://reviews.llvm.org/D92470
2020-12-09 07:56:44 +00:00
David Green 03e675fd12 [ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1)
This folds a not (an xor -1) though a predicate_cast, so that it can be
turned into a VPNOT and potentially be folded away as an else predicate
inside a VPT block.

Differential Revision: https://reviews.llvm.org/D92235
2020-12-08 15:22:46 +00:00
David Green 91fb9eac0b [ARM] Remove dead instructions before creating VPT block bundles
We remove VPNOT instructions in VPT blocks as we create them, turning
them into else predicates. We don't remove the dead instructions until
after the block has been created though. Because the VPNOT will have
killed the vpr register it used, this makes finalizeBundle add internal
flags to the vpr uses of any instructions after the VPNOT. These
incorrect flags can then confuse what is alive and what is not, leading
to machine verifier problems.

This patch removes them earlier instead, before the bundle is finalized
so that kill flags remain valid.

Differential Revision: https://reviews.llvm.org/D92227
2020-12-08 14:05:07 +00:00
David Green d9bf6245bf [ARM] Revert low overhead loops with calls before registry allocation.
This adds code to revert low overhead loops with calls in them before
register allocation. Ideally we would not create low overhead loops with
calls in them to begin with, but that can be difficult to always get
correct. If we want to try and glue together t2LoopDec and t2LoopEnd
into a single instruction, we need to ensure that no instructions use LR
in the loop. (Technically the final code can be better too, as it
doesn't need to use the same registers but that has not been optimized
for here, as reverting loops with calls is expected to be very rare).

It also adds a MVETailPredUtils.h header to share the revert code
between different passes, and provides a place to expand upon, with
RevertLoopWithCall becoming a place to perform other low overhead loop
alterations like removing copies or combining LoopDec and End into a
single instruction.

Differential Revision: https://reviews.llvm.org/D91273
2020-12-07 15:44:40 +00:00
Evgeny Leviant 993eaf2d69 Recommit [TableGen][SchedModels] Fix read/write variant substitution
Original commit rG112b3cb6ba49 introduced non-determinism in subtarget
generator due to iteration over DenseMap. New patch fixes this changing
ProcModelMapTy from DenseMap to std::map.
2020-12-04 21:50:34 +03:00
Fangrui Song 86fa896363 Revert D90844 "[TableGen][SchedModels] Fix read/write variant substitution"
This reverts commit 112b3cb6ba.

D90844 made lib/Target/AArch64/AArch64GenSubtargetInfo.inc non-deterministic.
2020-12-03 14:24:29 -08:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Mircea Trofin bab72dd5d5 [NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister.
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
2020-12-02 15:46:38 -08:00
David Green eedf0ed63e [ARM] Mark select and selectcc of MVE vector operations as expand.
We already expand select and select_cc in codegenprepare, but they can
still be generated under some situations. Explicitly mark them as expand
to ensure they are not produced, leading to a failure to select the
nodes.

Differential Revision: https://reviews.llvm.org/D92373
2020-12-01 15:05:55 +00:00
David Green 7923d71b4a [ARM] PREDICATE_CAST demanded bits
The PREDICATE_CAST node is used to model moves between MVE predicate
registers and gpr's, and eventually become a VMSR p0, rn. When moving to
a predicate only the bottom 16 bits of the sources register are
demanded. This adds a simple fold for that, allowing it to potentially
remove instructions like uxth.

Differential Revision: https://reviews.llvm.org/D92213
2020-12-01 10:32:24 +00:00
Evgeny Leviant 112b3cb6ba [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes multiple issues related to expansion of variant sched reads and
writes.

Differential revision: https://reviews.llvm.org/D90844
2020-11-30 11:55:55 +03:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
David Green 0e49a40d75 [ARM] Cleanup for the MVETailPrediction pass
This strips out a lot of the code that should no longer be needed from
the MVETailPredictionPass, leaving the important part - find active lane
mask instructions and convert them to VCTP operations.

Differential Revision: https://reviews.llvm.org/D91866
2020-11-26 15:10:44 +00:00
Mark Murray 2b6691894a [ARM][AArch64] Adding Neoverse N2 CPU support
Add support for the Neoverse N2 CPU to the ARM and AArch64 backends.

Differential Revision: https://reviews.llvm.org/D91695
2020-11-25 11:42:54 +00:00
Evgeny Leviant a6a6d11c7b [MC][ARM] Fix number of operands of tMOVSr
Differential revision: https://reviews.llvm.org/D92029
2020-11-24 18:13:10 +03:00
Evgeny Leviant 50bd686695 Add support for branch forms of ALU instructions to Cortex-A57 model
Patch fixes scheduling of ALU instructions which modify pc register. Patch
also fixes computation of mutually exclusive predicates for sequences of
variants to be properly expanded

Differential revision: https://reviews.llvm.org/D91266
2020-11-24 11:43:51 +03:00
Craig Topper 4252f7773a [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
David Green c8c3a411c5 [ARM] Ensure MVE_TwoOpPattern is used inside Predicate's 2020-11-22 21:38:00 +00:00
David Green f08c37da7b [ARM] Disable WLSTP loops
This checks to see if the loop will likely become a tail predicated loop
and disables wls loop generation if so, as the likelihood for reverting
is currently too high. These should be fairly rare situations anyway due
to the way iterations and element counts are used during lowering. Just
not trying can alter how SCEV's are materialized however, leading to
different codegen.

It also adds a option to disable all while low overhead loops, for
debugging.

Differential Revision: https://reviews.llvm.org/D91663
2020-11-20 13:30:44 +00:00
Sam Tebbs 8ecb015ed5 [ARM][LowOverheadLoops] Convert intermediate vpr use assertion to condition
This converts the intermediate VPR use assertion to a condition in the if-statement to protect against assertion failures in case behaviuour is changed.

This is a follow-up to https://reviews.llvm.org/D90935 and implements the post-approval comments.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D91790
2020-11-19 17:15:45 +00:00
David Green 006b3bdedd [ARM] Deliberately prevent inline asm in low overhead loops. NFC
This was already something that was handled by one of the "else"
branches in maybeLoweredToCall, so this patch is an NFC but makes it
explicit and adds a test. We may in the future want to support this
under certain situations but for the moment just don't try and create
low overhead loops with inline asm in them.

Differential Revision: https://reviews.llvm.org/D91257
2020-11-19 13:28:21 +00:00
Duncan P. N. Exon Smith 5abf76fbe3 ADT: Add assertions to SmallVector::insert, etc., for reference invalidation
2c196bbc6b asserted that
`SmallVector::push_back` doesn't invalidate the parameter when it needs
to grow. Do the same for `resize`, `append`, `assign`, `insert`, and
`emplace_back`.

Differential Revision: https://reviews.llvm.org/D91744
2020-11-18 17:36:28 -08:00
Mikhail Goncharov f45c052c9e Fix unused variables in release build
Differential Revision: https://reviews.llvm.org/D91705
2020-11-18 15:18:31 +01:00
Sam Tebbs da2e4728c7 [ARM][LowOverheadLoops] Merge VCMP and VPST across VPT blocks
This patch adds support for combining a VPST with a dangling VCMP from a
previous VPT block.

Differential Revision: https://reviews.llvm.org/D90935
2020-11-18 12:54:16 +00:00
Florian Hahn b2f4c5fddc
[AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
David Penry 48b43c9d4f [ARM] Cortex-M7 schedule
This patch adds the SchedMachineModel for Cortex-M7. It
also adds test cases for the scheduling information.

Details of the pipeline and descriptions are in comments
in file ARMScheduleM7.td included in this patch.

Differential Revision: https://reviews.llvm.org/D91355
2020-11-16 10:16:07 +00:00
serge-sans-paille 9218ff50f9 llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
David Green 11dee2eae2 [ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP
Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287
2020-11-12 13:47:46 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00