Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM
SDNodeXForms in AArch64InstrFormats.td.
- Add select-logical-imm.mir, which contains tests for each imported pattern.
- Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now
select instructions of this form.
Differential Revision: https://reviews.llvm.org/D66162
llvm-svn: 369465
This adds GlobalISel equivalents for the following from AArch64InstrFormats:
- arith_shifted_reg32
- arith_shifted_reg64
And partial support for
- logical_shifted_reg32
- logical_shifted_reg32
The only thing missing for the logical cases is support for rotates. Other than
the missing support, the transformation is identical for the arithmetic shifted
register and the logical shifted register.
Lots of tests here:
- Add select-arith-shifted-reg.mir to show that we correctly select add and
sub instructions which use this pattern.
- Add select-logical-shifted-reg.mir to cover patterns which are not shared
between the arithmetic and logical cases.
- Update addsub-shifted.ll to show that we correctly fold shifts into
adds/subs.
- Update eon.ll to show that we can select the eon instruction by folding xors.
Differential Revision: https://reviews.llvm.org/D66163
llvm-svn: 369460
Implement this single atomic load instruction so that we can compile stack
protector code.
Differential Revision: https://reviews.llvm.org/D66245
llvm-svn: 368923
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.
Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.
llvm-svn: 368704
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.
AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.
Differential Revision: https://reviews.llvm.org/D65984
llvm-svn: 368652
All TLS access on Darwin is in the "general dynamic" form where we call
a function to resolve the address, so implementation is pretty simple.
llvm-svn: 368418
Summary:
This is tested by D61289 but has been pulled into a separate patch at
a reviewers request.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm, rovka
Reviewed By: arsenm
Subscribers: javed.absar, hiraditya, wdng, kristof.beyls, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61321
llvm-svn: 368063
These cases can come up when the extending loads combiner doesn't combine a
zext(load) to a zextload op, due to some other operation being in between, which
then gets simplified at a later stage.
Differential Revision: https://reviews.llvm.org/D65360
llvm-svn: 367723
Add an equivalent ComplexRendererFns function for SelectNegArithImmed. This
allows us to select immediate adds of -1 by turning them into subtracts.
Update select-binop.mir to show that the pattern works.
Differential Revision: https://reviews.llvm.org/D65460
llvm-svn: 367700
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().
Differential Revision: https://reviews.llvm.org/D65465
llvm-svn: 367474
Add partial instruction selection for intrinsics like this:
```
declare i32 @llvm.aarch64.stlxr(i64, i32*)
```
(This only handles the case where a G_ZEXT is feeding the intrinsic.)
Also make sure that the added store instruction actually has the memory op from
the original G_STORE.
Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D65355
llvm-svn: 367163
Before, we weren't able to select things like this for G_GEP:
add x0, x8, #8
And instead we'd materialize the 8.
This teaches GISel to do that. It gives some considerable code size savings
on 252.eon-- about 4%!
Differential Revision: https://reviews.llvm.org/D65248
llvm-svn: 366959
If we have a G_MUL, and either the LHS or the RHS of that mul is the legal
shift value for a load addressing mode, we can fold it into the load.
This gives some code size savings on some SPEC tests. The best are around 2%
on 300.twolf and 3% on 254.gap.
Differential Revision: https://reviews.llvm.org/D65173
llvm-svn: 366954
Fix an off-by-one error which made us not look at the last element of the
zero vector. This caused a miscompile in 188.ammp.
Differential Revision: https://reviews.llvm.org/D65168
llvm-svn: 366930
We need to be able to load and store s128 for memcpy inlining, where we want to
generate Q register mem ops. Making these legal also requires that we add some
support in other instructions. Regbankselect should also know about these since
they have no GPR register class that can hold them, so need special handling to
live on the FPR bank.
Differential Revision: https://reviews.llvm.org/D65166
llvm-svn: 366857
When we select the XRO variants of loads, we can pull in very specific shifts
(of the size of an element). E.g.
```
ldr x1, [x2, x3, lsl #3]
```
This teaches GISel to handle these when they're coming from shifts
specifically.
This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg`
which recognizes this pattern.
This also packs this up with `selectAddrModeRegisterOffset` into
`selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO`
in AArch64ISelDAGtoDAG.
Also update load-addressing-modes to show that all of the cases here work.
Differential Revision: https://reviews.llvm.org/D65119
llvm-svn: 366819
Sometimes, you can end up with cross-bank copies between same-sized GPRs and
FPRs, which feed into G_STOREs. When these copies feed only into stores, they
aren't necessary; we can just store using the original register bank.
This provides some minor code size savings for some floating point SPEC
benchmarks. (Around 0.2% for 453.povray and 450.soplex)
This issue doesn't seem to show up due to regbankselect or anything similar. So,
this patch introduces an early select function, `contractCrossBankCopyIntoStore`
which performs the contraction when possible. The selector then continues
normally and selects the correct store opcode, eliminating needless copies
along the way.
Differential Revision: https://reviews.llvm.org/D65024
llvm-svn: 366625
Add support for folding G_GEPs into loads of the form
```
ldr reg, [base, off]
```
when possible. This can save an add before the load. Currently, this is only
supported for loads of 64 bits into 64 bit registers.
Add a new addressing mode function, `selectAddrModeRegisterOffset` which
performs this folding when it is profitable.
Also add a test for addressing modes for G_LOAD.
Differential Revision: https://reviews.llvm.org/D64944
llvm-svn: 366503
Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes
obstruct attempts to find constant source values. These usually come about when
try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough
about this operation allows the CBZ/CBNZ optimization to catch more cases.
This change also improves the case where we can't find a constant source at all.
Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit
a cmp and conditional branch, saving an instruction.
The cumulative code size improvement of this change plus D64354 is 5.5% geomean
on arm64 CTMark -O0.
Differential Revision: https://reviews.llvm.org/D64377
llvm-svn: 365690
Some minor cleanup.
This function in Utils does the same thing as `findMIFromReg`. It also looks
through copies, which `findMIFromReg` didn't.
Delete `findMIFromReg` and use `getOpcodeDef` instead. This only happens in
`tryOptVectorDup` right now.
Update opt-shuffle-splat to show that we can look through the copies now, too.
Differential Revision: https://reviews.llvm.org/D64520
llvm-svn: 365684
There are a few places where we walk over copies throughout
AArch64InstructionSelector.cpp. In Utils, there's a function that does exactly
this which we can use instead.
Note that the utility function works with the case where we run into a COPY
from a physical register. We've run into bugs with this a couple times, so using
it should defend us from similar future bugs.
Also update opt-fold-compare.mir to show that we still handle physical registers
properly.
Differential Revision: https://reviews.llvm.org/D64513
llvm-svn: 365683
Porting over the part of `emitComparison` in AArch64ISelLowering where we use
TST to represent a compare.
- Rename `tryOptCMN` to `tryFoldIntegerCompare`, since it now also emits TSTs
when possible.
- Add a utility function for emitting a TST with register operands.
- Rename opt-fold-cmn.mir to opt-fold-compare.mir, since it now also tests the
TST fold as well.
Differential Revision: https://reviews.llvm.org/D64371
llvm-svn: 365404
Precedence was wrong in an assert added in r364961. Add braces around the
assertion condition to make it right.
See: https://reviews.llvm.org/D64084
llvm-svn: 365069
Instead of just stopping to see if we have a G_CONSTANT, instead, look through
G_TRUNCs, G_SEXTs, and G_ZEXTs.
This gives an average ~1.3% code size improvement on CINT2000 at -O3.
Differential Revision: https://reviews.llvm.org/D64108
llvm-svn: 365063
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.
To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.
There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.
2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.
Differential Revision: https://reviews.llvm.org/D63910
llvm-svn: 364994
This teaches `tryOptSelect` to handle folding G_ICMP, and removes the
requirement that the G_SELECT we're dealing with is floating point.
Some refactoring to make this work nicely as well:
- Factor out the scalar case from the selection code for G_ICMP into
`emitIntegerCompare`.
- Make `tryOptCMN` return a MachineInstr* instead of a bool.
- Make `tryOptCMN` not modify the instruction being selected.
- Factor out the CMN emission into `emitCMN` for readability.
By doing this this way, we can get all of the compare selection optimizations
in select emission.
Differential Revision: https://reviews.llvm.org/D64084
llvm-svn: 364961
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.
Differential Revision: https://reviews.llvm.org/D63635
llvm-svn: 364115
With this we can now fully code generate jump tables, which is important for code size.
Differential Revision: https://reviews.llvm.org/D63223
llvm-svn: 364086
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).
On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.
There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.
Differential Revision: https://reviews.llvm.org/D63587
llvm-svn: 364075
Basically porting over the behaviour in AArch64ISelLowering to GISel. See
emitComparison for reference.
When we have something like this:
```
lhs = G_SUB 0, y
...
G_ICMP lhs, rhs
```
We can fold away the G_SUB and produce a cmn instead, given that we produce
the same value in NZCV.
Add a test showing that the transformation works, and also showing that we
don't perform the transformation when it's unsafe.
Also factor out the CSet emission into emitCSetForICMP.
Differential Revision: https://reviews.llvm.org/D63163
llvm-svn: 363596
We already get support for G_ZEXTLOAD to s32 from the importer, but it can't
deal with the SUBREG_TO_REG in the pattern. Tweaking the existing manual
selection code for G_LOAD to handle an additional SUBREG_TO_REG when dealing
with G_ZEXTLOAD isn't much work.
Also add tests to check the imported pattern selections to s32 work.
llvm-svn: 362681
When looking through copies, make sure to not try to find the vreg def of a physreg.
Normally getVRegDef will return nullptr in this case, but if there happens to be
multiple defs then it will assert.
This fixes PR42129.
llvm-svn: 362666
Instead of emitting all of the test stuff for a compare when it's only used by
a select, instead, just emit the compare + select. The select will use the
value of NZCV correctly, so we don't need to emit all of the test instructions
etc.
For now, only support fp selects which use G_FCMP. Also only support condition
codes which will only require one select to represent.
Also add a test.
Differential Revision: https://reviews.llvm.org/D62695
llvm-svn: 362446
Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and
factor out opcode selection for G_FCMP into its own function.
Add a test to show that we don't do this with other immediates.
Differential Revision: https://reviews.llvm.org/D62539
llvm-svn: 361888
This saves us some unnecessary copies.
If the inputs to a G_SELECT are floating point, we should use fcsel rather than
csel.
Changes here are...
- Teach selectCopy about s1-to-s1 copies across register banks.
- AArch64RegisterBankInfo about G_SELECT in general.
- Teach the instruction selector about the FCSEL instructions.
Also add two tests:
- select-select.mir to show that we get the expected FCSEL
- regbank-select.mir (unfortunately named) to show the register banks on
G_SELECT are properly preserved
And update fast-isel-select.ll to show that we do the same thing as other
instruction selectors in these cases.
llvm-svn: 359940
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.
Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.
llvm-svn: 359734
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.
Also factor out the code for finding an intrinsic ID.
llvm-svn: 359501
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.
Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.
llvm-svn: 359351
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.
Update select-bswap.mir and arm64-rev.ll to reflect the changes.
llvm-svn: 359331
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.
Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)
Differential revision: https://reviews.llvm.org/D61124
llvm-svn: 359287
Add selection support for G_INTRINSIC_ROUND, add a selection test, and add
check lines to arm64-vfloatintrinsics.ll and f16-instructions.ll.
llvm-svn: 359046
Apparently FileCheck wasn't actually matching the fallback check lines in
arm64-vfloatintrinsics.ll properly. So, there were selection fallbacks for
G_INTRINSIC_TRUNC there.
Actually hook it up into AArch64InstructionSelector.cpp and write a proper
selection test.
I guess I'll figure out the FileCheck magic to make the fallback checks work
properly in arm64-vfloatintrinsics.ll.
llvm-svn: 359030
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.
llvm-svn: 358312
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.
This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.
Differential Revision: https://reviews.llvm.org/D60534
llvm-svn: 358221
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.
For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.
Differential Revision: https://reviews.llvm.org/D60436
llvm-svn: 358035
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.
Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D60100
llvm-svn: 357518
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.
Differential Revision: https://reviews.llvm.org/D59910
llvm-svn: 357318
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.
Differential Revision: https://reviews.llvm.org/D59558
llvm-svn: 356526
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().
For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.
This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.
llvm-svn: 356396
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().
Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.
Differential Revision: https://reviews.llvm.org/D59434
llvm-svn: 356304
This adds instruction selection support for G_UADDO on s32s and s64s.
Also
- Add an instruction selection test
- Update the arm64-xaluo.ll test to show that we generate the correct assembly
Differential Revision: https://reviews.llvm.org/D58734
llvm-svn: 356214
This re-uses the previous support for extract vector elt to extract the
subvectors.
Differential Revision: https://reviews.llvm.org/D59390
llvm-svn: 356213
This adds support for inserting elements into packed vectors. It also adds
two tests: one for selection, and one for regbank select.
Unpacked vectors will come in a follow-up.
Differential Revision: https://reviews.llvm.org/D59325
llvm-svn: 356182
NFC. Some more preliminary factoring for G_INSERT_VECTOR_ELT.
Also better code-reuse, etc., etc.
Differential Revision: https://reviews.llvm.org/D59323
llvm-svn: 356107
Factor out the vector insert code in `selectBuildVector`. Replace part of it
with `emitScalarToVector`, since it was pretty much equivalent.
This will make implementing G_INSERT_VECTOR_ELT easier.
Differential Revision: https://reviews.llvm.org/D59322
llvm-svn: 356106
Some more refactoring for G_INSERT_VECTOR_ELT.
Factor out the code used to find a lane index from `selectExtractElt`. Put it
into a more general-purpose `getConstantValueForReg` function.
This will be shared with the code for G_INSERT_VECTOR_ELT.
Differential Revision: https://reviews.llvm.org/D59324
llvm-svn: 356101
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.
Also add a fix for lane copies, which don't support index 0.
llvm-svn: 355871
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.
It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.
Differential Revision: https://reviews.llvm.org/D58469
llvm-svn: 355344
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.
llvm-svn: 355326
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
and non-enumeral type in conditional expression". Solve this by casting
to unsigned which is the final type anyway.
Differential Revision: https://reviews.llvm.org/D58834
llvm-svn: 355304
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.
Differential Revision: https://reviews.llvm.org/D58684
llvm-svn: 355104
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.
Differential Revision: https://reviews.llvm.org/D58528
llvm-svn: 354807
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.
For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.
Currently supports <2 x s64> and <4 x s32> result types.
Differential Revision: https://reviews.llvm.org/D58466
llvm-svn: 354521
https://reviews.llvm.org/D57608
It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).
llvm-svn: 353223
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.
To do this, this patch...
- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors
It also adds
- A legalizer test for the 16-bit vector ceil which verifies that we create a
G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
full fp16 is supported
- A test for selecting G_UNMERGE_VALUES
And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.
https://reviews.llvm.org/D56682
llvm-svn: 352113
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.
Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.
rdar://46491420
llvm-svn: 349712
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.
At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
in registers.
- using intrinsics to mask of data in registers as indicated by the
programmer (see https://lwn.net/Articles/759423/).
For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
relative abundance of registers, one register is reserved (X16) to be
the taint register. X16 is expected to not clash with other register
reservation mechanisms with very high probability because:
. The AArch64 ABI doesn't guarantee X16 to be retained across any call.
. The only way to request X16 to be used as a programmer is through
inline assembly. In the rare case a function explicitly demands to
use X16/W16, this pass falls back to hardening against speculation
by inserting a DSB SYS/ISB barrier pair which will prevent control
flow speculation.
- It is easy to insert mask operations at this late stage as we have
mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
and contains all-zeros when miss-speculation is detected. Therefore, when
masking, an AND instruction (which only changes the register to be masked,
no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
select instruction (CSEL) to evaluate the flags that were also used to
make conditional branch direction decisions. Speculation of the CSEL
instruction can be limited with a CSDB instruction - so the combination of
CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
aren't speculated. When conditional branch direction gets miss-speculated,
the semantics of the inserted CSEL instruction is such that the taint
register will contain all zero bits.
One key requirement for this to work is that the conditional branch is
followed by an execution of the CSEL instruction, where the CSEL
instruction needs to use the same flags status as the conditional branch.
This means that the conditional branches must not be implemented as one
of the AArch64 conditional branches that do not use the flags as input
(CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
selectors to not produce these instructions when speculation hardening
is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
the taint register X16 to be encoded in the SP register as value 0.
Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
x86-slh-lfence option. This may be more useful for the intrinsics-based
approach than for the SLH approach to masking.
Note that this pass already inserts the full speculation barriers if the
function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
could be done for some indirect branches, such as switch jump tables.
Differential Revision: https://reviews.llvm.org/D54896
llvm-svn: 349456
https://reviews.llvm.org/D55294
Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as
B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to
buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like
B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.
Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.
llvm-svn: 348815
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.
This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.
Differential Revisions: https://reviews.llvm.org/D53629
llvm-svn: 348788
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.
Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.
The old test case should still work as expected.
llvm-svn: 348320