Commit Graph

2148 Commits

Author SHA1 Message Date
David Green 11b34e78c1 [ARM] Define CPSR on MEMCPY pseudos
These pseudos are converted post-isel into t2WhileLoopStart and
t2LoopEnd/LoopDec instructions, which themselves are defined to clobber
CPSR. Doing the same with the MEMCPY nodes will make sure they are
scheduled correctly to not end up with incorrect uses.
2021-05-14 15:06:59 +01:00
Tomas Matheson 34c098b780 [ARM] Prevent spilling between ldrex/strex pairs
Based on the same for AArch64: 4751cadcca

At -O0, the fast register allocator may insert spills between the ldrex and
strex instructions inserted by AtomicExpandPass when expanding atomicrmw
instructions in LL/SC loops. To avoid this, expand to cmpxchg loops and
therefore expand the cmpxchg pseudos after register allocation.

Required a tweak to ARMExpandPseudo::ExpandCMP_SWAP to use the 4-byte encoding
of UXT, since the pseudo instruction can be allocated a high register (R8-R15)
which the 2-byte encoding doesn't support. However, the 4-byte encodings
are not present for ARM v8-M Baseline. To enable this, two new pseudos are
added for Thumb which are only valid for v8mbase, tCMP_SWAP_8 and
tCMP_SWAP_16.

The previously committed attempt in D101164 had to be reverted due to runtime
failures in the test suites. Rather than spending time fixing that
implementation (adding another implementation of atomic operations and more
divergence between backends) I have chosen to follow the approach taken in
D101163.

Differential Revision: https://reviews.llvm.org/D101898

Depends on D101912
2021-05-12 09:43:21 +01:00
David Green 76786037c6 [ARM] Fix postinc of vst1xN
These nodes are not handled correctly by CombineBaseUpdate. For the
moment, similar to 5f1cad4d29 mark them as unsupported.
2021-05-09 21:57:55 +01:00
Malhar Jajoo dfe3ffaa4a [ARM] Transforming memset to Tail predicated Loop
This patch converts llvm.memset intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

The llvm.memset is converted to a TP loop for both
constant and non-constant input sizes (of llvm.memset).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100435
2021-05-07 13:35:53 +01:00
Malhar Jajoo 9ff38e2d9d [ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

From an implementation point of view, the patch

- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
  on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
   to be (by later passes) into a WLSTP loop.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99723
2021-05-06 23:21:28 +01:00
Malhar Jajoo fc690777fc Revert "[ARM] Transforming memcpy to Tail predicated Loop"
Reverting commit since it causes failure (10462).
This reverts commit b856f4a232.
2021-05-06 12:39:08 +01:00
Malhar Jajoo b856f4a232 [ARM] Transforming memcpy to Tail predicated Loop
This patch converts llvm.memcpy intrinsic into Tail Predicated
Hardware loops for a target that supports the Arm M-profile
Vector Extension (MVE).

From an implementation point of view, the patch

- adds an ARM specific SDAG Node (to which the llvm.memcpy intrinsic is lowered to, during first phase of ISel)
- adds a corresponding TableGen entry to generate a pseudo instruction, with a custom inserter,
  on matching the above node.
- Adds a custom inserter function that expands the pseudo instruction into MIR suitable
   to be (by later passes) into a WLSTP loop.

Note: A cli option is used to control the conversion of memcpy to TP
loop and this option is currently disabled by default. It may be enabled
in the future after further downstream testing.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99723
2021-05-06 09:34:09 +01:00
Tomas Matheson 9d86095ff8 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 753185031d.
2021-05-03 21:48:20 +01:00
Tomas Matheson 753185031d [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164

Originally submitted as 3338290c18.
Reverted in c7df6b1223.
2021-05-03 20:25:15 +01:00
David Green d1bbe61d1c [ARM] Memory operands for MVE gathers/scatters
Similarly to D101096, this makes sure that MMO operands get propagated
through from MVE gathers/scatters to the Machine Instructions. This
allows extra scheduling freedom, not forcing the instructions to act as
scheduling barriers. We create MMO's with an unknown size, specifying
that they can load from anywhere in memory, similar to the masked_gather
or X86 intrinsics.

Differential Revision: https://reviews.llvm.org/D101219
2021-05-03 11:24:59 +01:00
Tomas Matheson c7df6b1223 Revert "[CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0"
This reverts commit 3338290c18.

Broke expensive checks on debian.
2021-04-30 16:53:14 +01:00
Tomas Matheson 3338290c18 [CodeGen][ARM] Implement atomicrmw as pseudo operations at -O0
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.

To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.

Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)

Differential Revision: https://reviews.llvm.org/D101164
2021-04-30 16:40:33 +01:00
David Green e11420ca23 [ARM] Ensure CSINC has one use in CSINV combine
Otherwise the CMP glue may be used in multiple nodes, needing to be
emitted multiple times. Currently this either increases instruction
count or fails as it attempt to insert the same node multiple times.
2021-04-29 10:59:14 +01:00
David Green 8de7d8b2c2 [ARM] Recognize VIDUP from BUILDVECTORs of additions
This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing
adds. This can come up from either geps or adds, and came up recently in
D100550. We are just looking for a BUILD_VECTOR where each lane is an
add of the first lane with N*i, where i is the lane and N is one of 1,
2, 4, or 8, supported by the VIDUP instruction.

Differential Revision: https://reviews.llvm.org/D101263
2021-04-27 19:33:24 +01:00
David Green 94c7bd7eb2 [ARM] Expand VMOVRRD simplification pattern
This expands the VMOVRRD(extract(..(build_vector(a, b, c, d)))) pattern,
to also handle insert_vectors. Providing we can find the correct insert,
this helps further simplify patterns by removing the redundant VMOVRRD.

Differential Revision: https://reviews.llvm.org/D100245
2021-04-26 12:27:38 +01:00
David Green 7255d1f54f [ARM] Format ARMISD node definitions. NFC
This clang-formats the list of ARMISD nodes. Usually this is something I
would avoid, but these cause problems with formatting every time new
nodes are added.

The list in getTargetNodeName also makes use of MAKE_CASE macros, as
other backends do.
2021-04-24 14:50:32 +01:00
Sander de Smalen 43ace8b5ce [TTI] NFC: Change getScalingFactorCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D100564
2021-04-23 16:06:36 +01:00
David Green 21a8b9d9e9 [ARM] Limit PerformExtractEltToVMOVRRD to when f64 is legal.
The generic SoftFloatVectorExtract.ll test was failing when run on arm
machines, as it tries to create a f64 under soft float. Limit the
transform to when f64 is legal.

Also add a missing override, as reported in D100244.
2021-04-20 16:24:36 +01:00
David Green 48cef1fa8e [ARM] Create VMOVRRD from adjacent vector extracts
This adds a combine for extract(x, n); extract(x, n+1)  ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.

This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly.  Otherwise the machine verifier is unhappy.

Differential Revision: https://reviews.llvm.org/D100244
2021-04-20 15:15:43 +01:00
David Green 00a6045473 [ARM] Combine sub 0, csinc X, Y, CC -> csinv -X, Y, CC
Combine sub 0, csinc X, Y, CC to csinv -X, Y, CC providing that the
negation of X is cheap, currently just handling constants. This comes up
during the splat of an i1 to a predicate, where we now generate csetm,
as opposed to cset; rsb.

Differential Revision: https://reviews.llvm.org/D99940
2021-04-16 11:52:31 +01:00
Simon Pilgrim ddbb58736a [KnownBits] Rename KnownBits::computeForMul to KnownBits::mul. NFCI.
As promised in D98866
2021-04-06 10:11:41 +01:00
David Green 35e0567d58 [ARM] Add VREV MVE shuffle costs
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.

Differential Revision: https://reviews.llvm.org/D98210
2021-03-17 21:21:43 +00:00
David Green bd516d24c1 [ARM] Move t2DoLoopStart reg alloc hint
This adjusts the place that the t2DoLoopStart reg allocation hint is
inserted, adding it in the ARMTPAndVPTOptimizaionPass in a similar place
as other tail predicated loop optimizations. This removes the need for
doing so in a custom inserter, and should make the hint more accurate,
only adding it where we expect to create a DLS (not DLSTP or WLS).
2021-03-11 17:56:19 +00:00
David Green fad70c3068 [ARM] Improve WLS lowering
Recently we improved the lowering of low overhead loops and tail
predicated loops, but concentrated first on the DLS do style loops. This
extends those improvements over to the WLS while loops, improving the
chance of lowering them successfully. To do this the lowering has to
change a little as the instructions are terminators that produce a value
- something that needs to be treated carefully.

Lowering starts at the Hardware Loop pass, inserting a new
llvm.test.start.loop.iterations that produces both an i1 to control the
loop entry and an i32 similar to the llvm.start.loop.iterations
intrinsic added for do loops. This feeds into the loop phi, properly
gluing the values together:

  %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div)
  %wls0 = extractvalue { i32, i1 } %wls, 0
  %wls1 = extractvalue { i32, i1 } %wls, 1
  br i1 %wls1, label %loop.ph, label %loop.exit
...
loop:
  %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ]
  ..
  %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1)
  %cmp = icmp ne i32 %iv.next, 0
  br i1 %cmp, label %loop, label %loop.exit

The llvm.test.start.loop.iterations need to be lowered through ISel
lowering as a pair of WLS and WLSSETUP nodes, which each get converted
to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent
t2WhileLoopStart from being a terminator that produces a value,
something difficult to control at that stage in the pipeline. Instead
the t2WhileLoopSetup produces the value of LR (essentially acting as a
lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc).

These are then converted into a single t2WhileLoopStartLR at the same
point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop
to prevent them from progressing further in the pipeline. The
t2WhileLoopStartLR is a single instruction that takes a GPR and produces
LR, similar to the WLS instruction.

  %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3
  t2B %bb.1
...
bb.2.loop:
  %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2
  ...
  %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2
  t2B %bb.3

The t2WhileLoopStartLR can then be treated similar to the other low
overhead loop pseudos, eventually being lowered to a WLS providing the
branches are within range.

Differential Revision: https://reviews.llvm.org/D97729
2021-03-11 17:56:19 +00:00
David Green a968e7b82e [ARM] KnownBits for CSINC/CSNEG/CSINV
This adds some simple known bits handling for the three CSINC/NEG/INV
instructions. From the operands known bits we can compute the common
bits of the first operand and incremented/negated/inverted second
operand. The first, especially CSINC ZR, ZR, comes up fair amount in the
tests. The others are more rare so a unit test for them is added.

Differential Revision: https://reviews.llvm.org/D97788
2021-03-04 08:40:20 +00:00
David Green 438c98515c [ARM] Use 0, not ZR during ISel for CSINC/INV/NEG
Instead of converting the 0 into a ZR reg during lowering, do that with
tablegen by matching the zero immediate. This when combined with other
optimizations is more likely to use ZR and helps keep the DAG more
easily optimizable. It should not otherwise effect code generation.
2021-03-02 19:01:14 +00:00
David Green 91ebc4e864 [ARM] VMOVN undef folding
If we insert undef using a VMOVN, we can just use the original value in
three out of the four possible combinations. Using VMOVT into a undef
vector will still require the lanes to be moved, but otherwise the
non-undef value can be used.
2021-02-28 14:44:45 +00:00
David Green 0fe64812d8 [ARM] VECTOR_REG_CAST undef -> undef
Propagate undef through VECTOR_REG_CAST nodes, allowing extra
simplification in some patterns.
2021-02-28 11:13:49 +00:00
Leonard Chan c77659e549 [llvm][IR] Do not place constants with static relocations in a mergeable section
This patch provides two major changes:

1. Add getRelocationInfo to check if a constant will have static, dynamic, or
   no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
   in a mergeable section.

This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.

See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.

Differential Revision: https://reviews.llvm.org/D95960
2021-02-18 15:39:00 -08:00
Serge Pavlov 816053bc71 [FPEnv][ARM] Implement lowering of llvm.set.rounding
Differential Revision: https://reviews.llvm.org/D96501
2021-02-13 11:16:29 +07:00
David Green 875f0cbcc6 [ARM] Optimize fp store of extract to integer store if already available.
Given a floating point store from an extracted vector, with an integer
VGETLANE that already exists, storing the existing VGETLANEu directly
can be better for performance. As the value is known to already be in an
integer registers, this can help reduce fp register pressure, removed
the need for the fp extract and allows use of more integer post-inc
stores not available with vstr.

This can be a bit narrow in scope, but helps with certain biquad kernels
that store shuffled vector elements.

Differential Revision: https://reviews.llvm.org/D96159
2021-02-12 18:34:58 +00:00
David Green 541828e35d [ARM] Single source VMOVNT
Our current lowering of VMOVNT goes via a shuffle vector of the form
<0, N, 2, N+2, 4, N+4, ..>. That can of course also be a single input
shuffle of the form <0, 0, 2, 2, 4, 4, ..>, where we use a VMOVNT to
insert a vector into the top lanes of itself. This adds lowering of that
case, re-using the existing isVMOVNMask.

Differential Revision: https://reviews.llvm.org/D96065
2021-02-12 14:28:57 +00:00
David Green 1db7b9ceaa [ARM] Make a BE predicate bitcast consistent with the rest of llvm
We were storing predicate registers, such as a <8 x i1>, in the opposite
order to how the rest of llvm expects. This actually turns out to be
correct for the one place that usually uses it - the
ScalarizeMaskedMemIntrin pass, but only because the pass was incorrect
itself. This fixes the order so that bits are stored in the opposite
order and bitcasts work as expected. This allows the Scalarization pass
to be fixed, as in https://reviews.llvm.org/D94765.

Differential Revision: https://reviews.llvm.org/D94867
2021-02-11 08:59:52 +00:00
David Green 0c7e044a7f [ARM] One-off identity shuffle
A One-Off Identity mask is a shuffle that is mostly an identity mask
from as single source but contains a single element out-of-place, either
from a different vector or from another position in the same vector. As
opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an
extract/insert pair directly. Under ARM with individually accessible
lane elements this often becomes a simple lane move.

This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not
v4i32), a more natural type for lane moves.

Differential Revision: https://reviews.llvm.org/D95551
2021-02-08 21:24:32 +00:00
David Green 11e415dc90 [ARM] Make v2f64 scalar_to_vector legal
Because we mark all operations as expand for v2f64, scalar_to_vector
would end up lowering through a stack store/reload. But it is pretty
simple to implement, only inserting a D reg into an undef vector. This
helps clear up some inefficient codegen from soft calling conventions.

Differential Revision: https://reviews.llvm.org/D96153
2021-02-08 11:34:55 +00:00
Craig Topper 11ef356d9e [TargetLowering] Use Align in allowsMisalignedMemoryAccesses.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96097
2021-02-04 19:22:06 -08:00
Ayke van Laethem aecdf15cc7
[ARM] Do not emit ldrexd/strexd on Cortex-M chips
The ldrexd/strexd instructions are not supported on M-class chips, see
for example
https://developer.arm.com/documentation/dui0489/e/arm-and-thumb-instructions/memory-access-instructions/ldrex-and-strex
which says:

> All these 32-bit Thumb instructions are available in ARMv6T2 and
> above, except that LDREXD and STREXD are not available in the ARMv7-M
> architecture.

Looking at the ARMv8-M architecture, it appears that these instructions
aren't supported either. The Architecture Reference Manual lists
ldrex/strex but not ldrexd/strexd:
https://developer.arm.com/documentation/ddi0553/bn/

Godbolt example on LLVM 11.0.0, which incorrectly emits ldrexd/strexd
instructions: https://llvm.godbolt.org/z/5qqPnE

Differential Revision: https://reviews.llvm.org/D95891
2021-02-04 21:55:34 +01:00
David Green 649a3d00df [ARM] Handle f16 in GeneratePerfectShuffle
This new f16 shuffle under Neon would hit an assert in
GeneratePerfectShuffle as it would try to treat a f16 vector as an i8.
Add f16 handling, treating them like an i16.

Differential Revision: https://reviews.llvm.org/D95446
2021-02-04 11:14:52 +00:00
David Green 5b2626ea87 [ARM] Flatten identity shuffles through vqdmulh nodes
Given a shuffle(vqdmulh(shuffle, shuffle), we can flatter the shuffles
out if they become an identity mask. This can come up during lane
interleaving, when we do that better.

Differential Revision: https://reviews.llvm.org/D94034
2021-02-01 19:14:20 +00:00
David Green 5805521207 [ARM] Simplify VMOVRRD from extracts of buildvectors
Under the softfp calling convention, we are often left with
VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value
of the function. These can be simplified to a,b or c,d directly,
depending on the value of the extract.

Big endian is a little different because the bitcast switches the lanes
around, meaning we end up with b,a or d,c.

Differential Revision: https://reviews.llvm.org/D94989
2021-02-01 16:09:25 +00:00
David Green ad12e6ee95 [ARM] Turn sext_inreg(VGetLaneu) into VGetLaneu
This adds a DAG combine for converting sext_inreg of VGetLaneu into
VGetLanes, providing the types match correctly.

Differential Revision: https://reviews.llvm.org/D95073
2021-02-01 11:10:35 +00:00
David Green 6ab792b68d [ARM] Simplify extract of VMOVDRR
Under SoftFP calling conventions, we can be left with
extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can
simplify to a or b, depending on the extract lane.

Differential Revision: https://reviews.llvm.org/D94990
2021-02-01 10:24:57 +00:00
Kazu Hirata 7925aa091d [llvm] Populate SmallVector at construction time (NFC) 2021-01-28 22:21:14 -08:00
David Green 40f46cb0e4 [ARM] Add alignment checks for MVE VLDn
The MVE VLD2/4 and VST2/4 instructions require the pointer to be aligned
to at least the size of the element type. This adds a check for that
into the ARM lowerInterleavedStore and lowerInterleavedLoad functions,
not creating the intrinsics if they are invalid for the alignment of
the load/store.

Unfortunately this is one of those bug fixes that does effect some
useful codegen, as we were able to sometimes do some nice lowering of
q15 types. But they can cause problem with low aligned pointers.

Differential Revision: https://reviews.llvm.org/D95319
2021-01-28 13:10:08 +00:00
Kazu Hirata 054444177b [Target] Use llvm::append_range (NFC) 2021-01-24 12:18:56 -08:00
Kazu Hirata e4847a7fcf Revert "[Target] Use llvm::append_range (NFC)"
This reverts commit cc7a238286.

The X86WinEHState.cpp hunk seems to break certain builds.
2021-01-23 11:25:27 -08:00
Kazu Hirata cc7a238286 [Target] Use llvm::append_range (NFC) 2021-01-23 10:56:31 -08:00
David Green af03324984 [ARM] Disable sign extended SSAT pattern recognition.
I may have given bad advice, and skipping sext_inreg when matching SSAT
patterns is not valid on it's own. It at least needs to sext_inreg the
input again, but as far as I can tell is still only valid based on
demanded bits. For the moment disable that part of the combine,
hopefully reimplementing it in the future more correctly.
2021-01-22 14:07:48 +00:00
David Green 9ae73cdbc1 [ARM] Adjust isSaturatingConditional to return a new SDValue. NFC
This replaces the isSaturatingConditional function with
LowerSaturatingConditional that directly returns a new SSAT or
USAT SDValue, instead of returning true and the components of it.
2021-01-22 11:11:36 +00:00
David Green 6a563eef13 [ARM] Expand vXi1 VSELECT's
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.

Differential Revision: https://reviews.llvm.org/D94946
2021-01-19 17:56:50 +00:00
David Green c29ca8551a [ARM] Update isVMOVNOriginalMask to handle single input shuffle vectors
The isVMOVNOriginalMask was previously only checking for two input
shuffles that could be better expanded as vmovn nodes. This expands that
to single input shuffles that will later be legalized to multiple
vectors.

Differential Revision: https://reviews.llvm.org/D94189
2021-01-13 08:51:28 +00:00
David Green 024af42c60 [ARM] Custom lower i1 vector truncates
The ISel patterns we have for truncating to i1's under MVE do not seem
to be correct. Instead custom lower to icmp(ne, and(x, 1), 0).

Differential Revision: https://reviews.llvm.org/D94226
2021-01-08 18:21:00 +00:00
David Green ddb82fc76c [ARM] Handle any extend whilst lowering mull
Similar to 78d8a821e2 but for ARM, this handles any_extend whilst
creating MULL nodes, treating them as zextends.

Differential Revision: https://reviews.llvm.org/D93834
2021-01-06 10:51:12 +00:00
David Green 901cc9b6f3 [ARM] Extend lowering for i64 reductions
The lowering of a <4 x i16> or <4 x i8> vecreduce.add into an i64 would
previously be expanded, due to the i64 not being legal. This patch
adjusts our reduction matchers, making it produce a VADDLV(sext A to
v4i32) instead.

Differential Revision: https://reviews.llvm.org/D93622
2021-01-04 12:44:43 +00:00
Kazu Hirata 0e219b6443 [Target] Construct SmallVector with iterator ranges (NFC) 2021-01-03 09:57:45 -08:00
Kristof Beyls df8ed39283 [ARM] harden-sls-blr: avoid r12 and lr in indirect calls.
As a linker is allowed to clobber r12 on function calls, the code
transformation that hardens indirect calls is not correct in case a
linker does so.  Similarly, the transformation is not correct when
register lr is used.

This patch makes sure that r12 or lr are not used for indirect calls
when harden-sls-blr is enabled.

Differential Revision: https://reviews.llvm.org/D92469
2020-12-19 12:39:59 +00:00
David Green 03e675fd12 [ARM] Turn pred_cast(xor(x, -1)) into xor(pred_cast(x), -1)
This folds a not (an xor -1) though a predicate_cast, so that it can be
turned into a VPNOT and potentially be folded away as an else predicate
inside a VPT block.

Differential Revision: https://reviews.llvm.org/D92235
2020-12-08 15:22:46 +00:00
David Green eedf0ed63e [ARM] Mark select and selectcc of MVE vector operations as expand.
We already expand select and select_cc in codegenprepare, but they can
still be generated under some situations. Explicitly mark them as expand
to ensure they are not produced, leading to a failure to select the
nodes.

Differential Revision: https://reviews.llvm.org/D92373
2020-12-01 15:05:55 +00:00
David Green 7923d71b4a [ARM] PREDICATE_CAST demanded bits
The PREDICATE_CAST node is used to model moves between MVE predicate
registers and gpr's, and eventually become a VMSR p0, rn. When moving to
a predicate only the bottom 16 bits of the sources register are
demanded. This adds a simple fold for that, allowing it to potentially
remove instructions like uxth.

Differential Revision: https://reviews.llvm.org/D92213
2020-12-01 10:32:24 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green d14db8c8dc [ARM] Match MVE vqdmulh
This adds ISel matching for a form of VQDMULH. There are several ir
patterns that we could match to that instruction, this one is for:

min(ashr(mul(sext(a), sext(b)), 7), 127)

Which is what llvm will optimize to once it has removed the max that
usually makes up the min/max saturate pattern, as in this case the
compare will always be false. The additional complication to match i32
patterns (which extend into an i64) is that the min will be a
vselect/setcc, as vmin is not supported for i64 vectors. Tablegen
patterns have also been updated to attempt to reuse the MVE_TwoOpPattern
patterns.

Differential Revision: https://reviews.llvm.org/D90096
2020-10-30 13:34:27 +00:00
David Sherwood 47f2dc7e5f [SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.

Differential Revision: https://reviews.llvm.org/D89101
2020-10-15 09:01:21 +01:00
Sam Tebbs 68e002e181 [ARM] Fold select_cc(vecreduce_[u|s][min|max], x) into VMINV or VMAXV
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.

    Differential Revision: https://reviews.llvm.org/D87836
2020-10-06 14:44:58 +01:00
Amara Emerson c9f5cdd453 Revert "[ARM]Fold select_cc(vecreduce_[u|s][min|max], x) into VMINV or VMAXV"
This reverts commit 2573cf3c3d.

These seem to break some lit tests.
2020-10-05 10:52:43 -07:00
Sam Tebbs 2573cf3c3d [ARM]Fold select_cc(vecreduce_[u|s][min|max], x) into VMINV or VMAXV
This folds a select_cc or select(set_cc) of a max or min vector reduction with a scalar value into a VMAXV or VMINV.

    Differential Revision: https://reviews.llvm.org/D87836
2020-10-05 15:51:28 +01:00
David Green 7feafa0286 [ARM] Fix pointer offset when splitting stores from VMOVDRR
We were not accounting for the pointer offset when splitting a store from
a VMOVDRR node, which could lead to incorrect aliasing info. In this
case it is the fneg via integer arithmetic that gives us a store->load
pair that we started getting wrong.

Differential Revision: https://reviews.llvm.org/D88653
2020-10-03 16:47:50 +01:00
David Sherwood e077367a28 [SVE] Make EVT::getScalarSizeInBits and others consistent with Type::getScalarSizeInBits
An existing function Type::getScalarSizeInBits returns a uint64_t
instead of a TypeSize class because the caller is requesting a
scalar size, which cannot be scalable. This patch makes other
similar functions requesting a scalar size consistent with that,
thereby eliminating more than 1000 implicit TypeSize -> uint64_t
casts.

Differential revision: https://reviews.llvm.org/D87889
2020-09-23 09:20:08 +01:00
Momchil Velikov 742250bf62 [ARM][CMSE] Issue an error if passing arguments through memory across
security boundary

It was never supported and that part was accidentally omitted when
upstreaming D76518.

Differential Revision: https://reviews.llvm.org/D86478

Change-Id: If6ba9506eb0431c87a1d42a38aa60e47ce263039
2020-09-21 17:26:10 +01:00
David Green f4c5cadbcb [ARM] Select f32 constants with vmov.f16
This adds lowering for f32 values using the vmov.f16, which zeroes the
top bits whilst setting the lower bits to a pattern. This range of
values does not often come up, except where a f16 constant value has
been converted to a f32.

Differential Revision: https://reviews.llvm.org/D87790
2020-09-21 11:10:47 +01:00
David Green 29bd8ea110 [ARM] Constant fold VMOVrh
This adds simple constant folding for VMOVrh, to constant fold fp16
constants to integer values. It can help especially with soft calling
conventions, but some of the results are not optimal as we end up
loading using a vldr. This will be improved in a follow up patch.

Differential Revision: https://reviews.llvm.org/D87789
2020-09-20 21:32:51 +01:00
David Green 34b27b9441 [ARM] Sink splats to MVE intrinsics
The predicated MVE intrinsics are generated as, for example,
llvm.arm.mve.add.predicated(x, splat(y). p). We need to sink the splat
value back into the loop, like we do for other instructions, so we can
re-select qr variants.

Differential Revision: https://reviews.llvm.org/D87693
2020-09-17 16:00:51 +01:00
Meera Nakrani 1119bf95be [ARM] Corrected condition in isSaturatingConditional
Fixed a small error in an if condition to prevent usat/ssat being generated if (upper constant + 1) is not a
power of 2.
2020-09-15 10:14:30 +00:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Meera Nakrani dd519bf0b0 [ARM] Selects SSAT/USAT from correct LLVM IR
LLVM will canonicalize conditional selectors to a different pattern than the old code that was used.
This is updating the function to match the new expected patterns and select SSAT or USAT when successful.
Tests have also been updated to use the new patterns.

Differential Review: https://reviews.llvm.org/D87379
2020-09-14 10:58:21 +00:00
David Green 6cfd38d03d [ARM] Fixup single source mla reductions.
This fixes a complication on top of D87276. If we are sign extending
around a mul with the two operands that are the same, instcombine will
helpfully convert one of the sext to a zext. Reverse that so that we
again generate a reduction.

Differnetial Revision: https://reviews.llvm.org/D87287
2020-09-12 14:31:26 +01:00
David Green c437446d90 [ARM] Recognize "double extend" reduction patterns
We can sometimes get code that does:
  xe = zext i16 x to i32
  ye = zext i16 y to i32
  m = mul i32 xe, ye
  me = zext i32 m to i64
  r = vecreduce.add(me)
This "double extend" can trip up the reduction identification, but
should give identical results.

This extends the pattern matching to handle them.

Differential Revision: https://reviews.llvm.org/D87276
2020-09-12 13:51:42 +01:00
Sjoerd Meijer 5f1cad4d29 [ARM] Skip combining base updates for vld1x NEON intrinsics
Skip this for now, to avoid a backend crash in:

  UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412

This should fix PR45824.

Differential Revision: https://reviews.llvm.org/D86784
2020-08-28 20:29:15 +01:00
Anna Welker 8048068c3e [ARM][MVE] Allow tail predication for strides !=1 with gather/scatters
If gather/scatters are enabled, ARMTargetTransformInfo now allows
tail predication for loops with a much wider range of strides, up
to anything that is loop invariant.

Differential Revision: https://reviews.llvm.org/D85410
2020-08-24 13:54:47 +01:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
David Green 186a7f81e8 [ARM] Add VADDV and VMLAV patterns for v16i16
This adds patterns for v16i16's vecreduce, using all the existing code
to go via an i32 VADDV/VMLAV and truncating the result.

Differential Revision: https://reviews.llvm.org/D85452
2020-08-09 11:09:49 +01:00
David Green b37e92201c [ARM] Add predicated mla reduction patterns
Similar to 8fa824d7a3 but this time for MLA patterns, this selects
predicated vmlav/vmlava/vmlalv/vmlava instructions from
vecreduce.add(select(p, mul(x, y), 0)) nodes.

Differential Revision: https://reviews.llvm.org/D84102
2020-07-23 21:47:59 +01:00
David Green 8fa824d7a3 [ARM] Add predicated add reduction patterns
Given a vecreduce.add(select(p, x, 0)), we can convert that to a
predicated vaddv, as the else value for the select is the identity
value, a zero. That is what this patch does for the vaddv, vaddva,
vaddlv and vaddlva instructions, copying the existing patterns to also
handle predication through a select.

Differential Revision: https://reviews.llvm.org/D84101
2020-07-22 17:30:02 +01:00
Pavel Iliin b9a6fb6428 [ARM] VBIT/VBIF support added.
Vector bitwise selects are matched by pseudo VBSP instruction
and expanded to VBSL/VBIT/VBIF after register allocation
depend on operands registers to minimize extra copies.
2020-07-16 11:25:53 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Guillaume Chatelet d3085c2501 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82956
2020-07-01 14:31:56 +00:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
Guillaume Chatelet 4f5133a4dc [Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82749
2020-06-30 07:56:17 +00:00
David Green deb72ce298 [ARM] Better reductions
MVE has native reductions for integer add and min/max. The others need
to be expanded to a series of extract's and scalar operators to reduce
the vector into a single scalar. The default codegen for that expands
the reduction into a series of in-order operations.

This modifies that to something more suitable for MVE. The basic idea is
to use vector operations until there are 4 remaining items then switch
to pairwise operations. For example a v8f16 fadd reduction would become:
Y = VREV X
Z = ADD(X, Y)
z0 = Z[0] + Z[1]
z1 = Z[2] + Z[3]
return z0 + z1

The awkwardness (there is always some) comes in from something like a
v4f16, which is first legalized by adding identity values to the extra
lanes of the reduction, and which can then not be optimized away through
the vrev; fadd combo, the inserts remain. I've made sure they custom
lower so that we can produce the pairwise additions before the extra
values are added.

Differential Revision: https://reviews.llvm.org/D81397
2020-06-29 16:04:13 +01:00
Guillaume Chatelet 368a5e3a66 [Alignment][NFC] migrate DataLayout::getPreferredAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82752
2020-06-29 11:24:36 +00:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
David Green d79b57b8bb [ARM] Split FPExt loads
This extends PerformSplittingToWideningLoad to also handle FP_Ext, as
well as sign and zero extends. It uses an integer extending load
followed by a VCVTL on the bottom lanes to efficiently perform an fpext
on a smaller than legal type.

The existing code had to be rewritten a little to not just split the
node in two and let legalization handle it from there, but to actually
split into legal chunks.

Differential Revision: https://reviews.llvm.org/D81340
2020-06-25 21:55:13 +01:00
David Green 8532b2ee89 [ARM] MVE VCVT lowering for f16->f32 extends
This adds code to lower f16 to f32 fp_exts's using an MVE VCVT
instructions, similar to a recent similar patch for fp_trunc. Again it
goes through the lowering of a BUILD_VECTOR, but is slightly simpler
only having to deal with interleaved indices. It adds a VCVTL node to
lower to, similar to VCVTN.

Differential Revision: https://reviews.llvm.org/D81339
2020-06-25 20:54:26 +01:00
David Green 0bfb4c2506 [ARM] Add FP_ROUND handling to splitting MVE stores
This splits MVE vector stores of a fp_trunc in the same way that we do
for standard trunc's. It extends PerformSplittingToNarrowingStores to
handle fp_round, splitting the store into pieces and adding a VCVTNb to
perform the actual fp_round. The actual store is then converted to an
integer store so that it can truncate bottom lanes of the result.

Differential Revision: https://reviews.llvm.org/D81141
2020-06-25 19:37:15 +01:00
David Green b044a82270 [ARM] Fixup for signed comparison warning. NFC 2020-06-25 16:29:44 +01:00
David Green 3cb2190b0b [ARM] MVE VCVT lowering for f32->f16 truncs
This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT
instructions. Due to v4f16 not being legal, fp_round are often split up
fairly early. So this reconstructs the vcvt's from a buildvector of
fp_rounds from two vector inputs. Something like:

BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0),
            FP_ROUND(EXTRACT_ELT(Y, 0),
            FP_ROUND(EXTRACT_ELT(X, 1),
            FP_ROUND(EXTRACT_ELT(Y, 1), ...)

It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers
into the top/bottom lanes of an MVE instruction.

Differential Revision: https://reviews.llvm.org/D81139
2020-06-25 15:59:36 +01:00
Simon Tatham b769eb02b5 [ARM][BFloat] Legalize bf16 type even without fullfp16.
Summary:
This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.

Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.

The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on //not// having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).

For function arguments and returns, instead of writing isel patterns
to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions `MoveToHPR` and `MoveFromHPR` which have a
fallback for non-fullfp16 subtargets.

The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous
situation.

Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea

Reviewed By: dmgreen, labrinea

Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82372
2020-06-24 09:36:26 +01:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Alexandros Lamprineas ecdf48f15b [ARM] Basic bfloat support
This patch adds basic support for BFloat in the Arm backend.
For now the code generation relies on fullfp16 being present.

Briefly:
* adds the bfloat scalar and vector types in the necessary register classes,
* adjusts the calling convention to cope with bfloat argument passing and return,
* adds codegen patterns for moves, loads and stores.

It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy).

The following people contributed to this patch:

 * Alexandros Lamprineas
 * Ties Stuij

Differential Revision: https://reviews.llvm.org/D81373
2020-06-18 17:26:24 +01:00
Lucas Prates 92ad6d57c2 [ARM] Moving CMSE handling of half arguments and return to the backend
Summary:
As half-precision floating point arguments and returns were previously
coerced to either float or int32 by clang's codegen, the CMSE handling
of those was also performed in clang's side by zeroing the unused MSBs
of the coercer values.

This patch moves this handling to the backend's calling convention
lowering, making sure the high bits of the registers used by
half-precision arguments and returns are zeroed.

Reviewers: chill, rjmccall, ostannard

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81428
2020-06-18 13:16:29 +01:00
Lucas Prates a255931c40 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
David Green 158e734af1 [ARM] Adjust AND/OR combines to not call isConstantSplat on i1 vectors. NFC.
The rearranges PerformANDCombine and PerformORCombine to try and make
sure we don't call isConstantSplat on any i1 vectors. As pointed out in
D81860 it may not be very well defined in those cases.
2020-06-18 08:25:44 +01:00
David Green f269bb7da0 [ARM] Fix crash trying to generate i1 immediates
These code patterns attempt to call isVMOVModifiedImm on a splat of i1
values, leading to an unreachable being hit. I've guarded the call on a
more specific set of sizes, as i1 vectors are legal under MVE.

Differential Revision: https://reviews.llvm.org/D81860
2020-06-16 12:27:24 +01:00
Eli Friedman 7e58d0ded0 Revert "[arm][darwin] Don't generate libcalls for wide shifts on Darwin"
This reverts commit 2ba016cd5c.

This is causing a failure on the clang-cmake-armv7-full bot, and there
are outstanding review comments.
2020-06-08 16:37:29 -07:00
Guillaume Chatelet 94b0c32a0b [Alignment][NFC] Migrate HandleByVal to Align
Summary: Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::HandleByVal` without marking it `override`.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81365
2020-06-08 10:50:27 +00:00
Simon Wallis 7432fb2c78 [ARM][XO] Execute-only miscompiles double literals for big-endian
Summary:
With -mbig-endian -mexecute-only and targeting an fpu,
an incorrect sequence of movw/movt was generated to construct a double literal.
The test suite was hardwired to check these wrong values.

The fault was caused by the explicit word swap in LowerConstantFP().

With -mbig-endian -mexecute-only -mfpu=none, a correct sequence of
movw/movt is generated to construct a double literal.
The test suite did not test this no fpu case.

The test suite expected values have been corrected.
The test file is updated to add testing of fpu=none case

Reviewers: christof, llvm-commits, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81259

Change-Id: Ia3737df243218c89c82f02b7f9f4032ecd5a3917
2020-06-08 08:13:08 +01:00
Alex Lorenz 2ba016cd5c [arm][darwin] Don't generate libcalls for wide shifts on Darwin
Similar to ceb801612a.

Darwin doesn't always use compiler-rt, and so we can't assume that these
functions are available on arm.
2020-06-05 15:41:23 -07:00
Fangrui Song d2bd075e8d Fix -Wunused-variable after D80515 2020-06-05 11:46:50 -07:00
David Green e73bb45c2b [ARM] VQMOVN demand bits analysis
Similar to VMOVN, a VQMOVN will only demand the top/bottom lanes of it's
first input. However unlike VMOVN it will need access to the entire
second argument, as that value is saturated not just moved in place.

Differential Revision: https://reviews.llvm.org/D80515
2020-06-05 18:41:02 +01:00
Zequan Wu 80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Victor Campos c010d4d195 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Differential Revision: https://reviews.llvm.org/D70072
2020-05-28 10:52:43 +01:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Victor Campos 872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
David Green 72f1fb2edf [ARM] Combines for VMOVN
This adds two combines for VMOVN, one to fold
VMOVN[tb](c, VQMOVNb(a, b)) => VQMOVN[tb](c, b)
The other to perform demand bits analysis on the lanes of a VMOVN. We
know that only the bottom lanes of the second operand and the top or
bottom lanes of the Qd operand are needed in the result, depending on if
the VMOVN is bottom or top.

Differential Revision: https://reviews.llvm.org/D77718
2020-05-16 15:13:16 +01:00
David Green 2e1fbf85b6 [ARM] MVE saturating truncates
This adds some custom lowering for VQMOVN, an instruction that can be
used to perform saturating truncates from a pair of min(max(X, -0x8000),
0x7fff), providing those constants are correct. This leaves a VQMOVNBs
which saturates the value and inserts that into the bottom lanes of an
existing vector. We then need to do something with the other lanes,
extending the value using a vmovlb.

Ideally, as will often be the case, only the bottom lane of what remains
will be demanded, allowing the vmovlb to be removed. Which should mean
the instruction is either equal or a win most of the time, and allows
some extra follow-up folding to happen.

Differential Revision: https://reviews.llvm.org/D77590
2020-05-16 15:10:20 +01:00
Christopher Tetreault 245679b62e [SVE] Remove usages of VectorType::getNumElements() from ARM
Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen

Reviewed By: dmgreen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79816
2020-05-15 12:55:27 -07:00
Momchil Velikov bc2e572f51 Re-commit: [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to no use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns

* when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-14 16:46:16 +01:00
David Green fa15255d8a [ARM] Convert floating point splats to integer
Under MVE a vdup will always take a gpr register, not a floating point
value. During DAG combine we convert the types to a bitcast to an
integer in an attempt to fold the bitcast into other instructions. This
is OK, but only works inside the same basic block. To do the same trick
across a basic block boundary we need to convert the type in
codegenprepare, before the splat is sunk into the loop.

This adds a convertSplatType function to codegenprepare to do that,
putting bitcasts around the splat to force the type to an integer. There
is then some adjustment to the code in shouldSinkOperands to handle the
extra bitcasts.

Differential Revision: https://reviews.llvm.org/D78728
2020-05-13 15:24:16 +01:00
David Green 87c56594dd [ARM] Sink splats to fma intrinsics
Similar to fmul/fadd, we can sink a splat into a loop containing a fma
in order to use more register instruction variants. For that there are
also adjustments to the sinking code to handle more than 2 arguments.

Differential Revision: https://reviews.llvm.org/D78386
2020-05-13 14:58:30 +01:00
Craig Topper 8c72b0271b [CodeGen] Use Align in MachineConstantPool. 2020-05-12 10:06:40 -07:00
David Green 6eee2d9b5b [ARM] Convert VDUPLANE to VDUP under MVE
Unlike Neon, MVE does not have a way of duplicating from a vector lane,
so a VDUPLANE currently selects to a VDUP(move_from_lane(..)). This
forces that to be done earlier as a dag combine to allow other folds to
happen.

It converts to a VDUP(EXTRACT). On FP16 this is then folded to a
VGETLANEu to prevent it from creating a vmovx;vmovhr pair, using a
single move_from_reg instead.

Differential Revision: https://reviews.llvm.org/D79606
2020-05-09 18:58:13 +01:00
Craig Topper d1119980e5 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
David Green f5f83cf4df [ARM] VMOVhr load -> vldr
Much like the similar combine added recently for VMOVrh load, this
adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a
vldrh and vmov.f16.

Differential Revision: https://reviews.llvm.org/D78714
2020-05-06 15:45:56 +01:00
David Green d05f8a38c5 [ARM] VMOVrh of VMOVhr
A VMOVhr of a VMOVrh can be simply folded to the original HPR value.

Differential Revision: https://reviews.llvm.org/D78710
2020-05-06 15:10:01 +01:00
David Green a349949f8a [ARM] Extract from a VDUP
If we get into the situation where we are extracting from a VDUP, the
extracted value is just the origin, so long as the types match or we can
bitcast between the two.

Differential Revision: https://reviews.llvm.org/D78708
2020-05-06 14:51:25 +01:00
David Green ed7db68c35 [ARM] Convert a bitcast VDUP to a VDUP
The idea, under MVE, is to introduce more bitcasts around VDUP's in an
attempt to get the type correct across basic block boundaries. In order
to do that without other regressions we need a few fixups, of which this
is the first. If the code is a bitcast of a VDUP, we can convert that
straight into a VDUP of the new type, so long as they have the same
size.

Differential Revision: https://reviews.llvm.org/D78706
2020-05-06 14:14:21 +01:00
Momchil Velikov fb18dffaeb Revert "[ARM] CMSE code generation"
This reverts commit 7cbbf89d23.

The regression tests fail with the expensive checks.
2020-05-05 19:05:40 +01:00
Momchil Velikov 7cbbf89d23 [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to not use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns
  when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-05 18:23:28 +01:00
David Green f85acb1915 [ARM] Correct the type on a predicate cast
A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a
PREDICATE_CAST(X) as the operation can convert between any forms of
predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on
one of the rarer converts, which would lead to invalid nodes during
isel. This fixes it up to use the correct type.

Differential Revision: https://reviews.llvm.org/D79402
2020-05-05 13:15:10 +01:00
Pierre-vh d5eb7ffa33 [Target][ARM] Fold or(A, B) more aggressively for I1 vectors
This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and improves
codegen, because it removes a lot of msr/mrs instructions on VPR.P0.

This patch also adds a xor(vcmp) -> !vcmp fold for MVE.

Differential Revision: https://reviews.llvm.org/D77202
2020-05-05 10:03:02 +01:00
Pierre-vh ffdda495f7 [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops
This patch adds an implementation of PerformVSELECTCombine in the
ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into
vselect(cond, rhs, lhs).

Normally, this should be done by the target-independent DAG Combiner,
but it doesn't handle the kind of constants that we generate, so we
have to reimplement it here.

Differential Revision: https://reviews.llvm.org/D77712
2020-05-05 10:03:02 +01:00
Eli Friedman 1eb160fe8d [ARM] Fix tail call validity checking for varargs calls.
If a varargs function is calling a non-varargs function, or vice versa,
make sure we use the correct "varargs" bit for each.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45234

Differential Revision: https://reviews.llvm.org/D79199
2020-05-04 12:34:14 -07:00
David Green 1084b32339 [ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh
This changes the logic with lowering fp16 bitcasts to always produce
either a VMOVhr or a VMOVrh, instead of only trying to do it with
certain surrounding nodes. To perform the same optimisations demand bits
and known bits information has been added for them.

Differential Revision: https://reviews.llvm.org/D78587
2020-04-28 16:12:53 +01:00
Craig Topper a58b62b4a2 [IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().

I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.

Differential Revision: https://reviews.llvm.org/D78882
2020-04-27 22:17:03 -07:00
David Green 8807139026 [ARM] Only produce qadd8b under hasV6Ops
When compiling for a arm5te cpu from clang, the +dsp attribute is set.
This meant we could try and generate qadd8 instructions where we would
end up having no pattern. I've changed the condition here to be hasV6Ops
&& hasDSP, which is what other parts of ARMISelLowering seem to use for
similar instructions.

Fixed PR45677.

Differential Revision: https://reviews.llvm.org/D78877
2020-04-27 10:13:29 +01:00
Benjamin Kramer 166467e822 [VectorUtils] Create shufflevector masks as int vectors instead of Constants
No functionality change intended.
2020-04-17 15:28:00 +02:00
Christopher Tetreault 0badd8f613 [SVE] Remove calls to getBitWidth from ARM
Reviewers: efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77904
2020-04-14 10:56:38 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Christopher Tetreault e1e131ea5e Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: grosbach, efriedma, sdesmalen

Reviewed By: efriedma

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77271
2020-04-09 12:52:44 -07:00
Matt Arsenault 84aa58cbe2 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Oliver Stannard a294d9eb21 Revert "[IPRA][ARM] Spill extra registers at -Oz"
Reverting because this is causing failures on bots with expensive checks
enabled.

This reverts commit 73cea83a6f.
2020-04-06 10:34:59 +01:00
John Brawn 4ad9ca0f9e [ARM] Fix incorrect handling of big-endian vmov.i64
Currently when the target is big-endian vmov.i64 reverses the order of the two
words of the vector. This is correct only when the underlying element type is
32-bit, as actually what it should be doing is considering it a vector of the
underlying type and reversing the elements of that.

Differential Revision: https://reviews.llvm.org/D76515
2020-04-03 17:36:50 +01:00
John Brawn cd58fb6325 [ARM] Avoid pointless vrev of element-wise vmov
If we have an element-wise vmov immediate instruction then a subsequent vrev
with width greater or equal to the vmov element width, then that vrev won't do
anything. Add a DAG combine to convert bitcasts that would become such vrevs
into vector_reg_casts instead.

Differential Revision: https://reviews.llvm.org/D76514
2020-04-03 17:36:50 +01:00
David Green fbd53ffc3a [ARM] MVE VMULL patterns
This adds MVE vmull patterns, which are conceptually the same as
mul(vmovl, vmovl), and so the tablegen patterns follow the same
structure.

For i8 and i16 this is simple enough, but in the i32 version the
multiply (in 64bits) is illegal, meaning we need to catch the pattern
earlier in a dag fold. Because bitcasts are involved in the zext
versions and the patterns are a little different in little and big
endian. I have only added little endian support in this patch.

Differential Revision: https://reviews.llvm.org/D76740
2020-04-02 10:57:40 +01:00
Guillaume Chatelet 189d2e215f [Alignment][NFC] Use more Align versions of various functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77291
2020-04-02 09:00:53 +00:00
Guillaume Chatelet 3a78f44daf [Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77212
2020-04-01 13:22:11 +00:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Guillaume Chatelet b9810988b2 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77127
2020-03-31 11:04:10 +00:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Guillaume Chatelet b91535f6c7 [Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76933
2020-03-30 07:26:48 +00:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
David Green 37b9cc8f29 [ARM] Sink splats to vector float instructions
Some MVE floating point instructions have gpr register variants that take
the scalar gpr value and splat them to all lanes. In order to accept
them in loops, the shuffle_vector and insert need to be sunk down into
the loop, next to the instruction so that ISel can see the whole
pattern.

This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for
mul are slightly more constrained as there are no fms variants taking
register arguments.

Differential Revision: https://reviews.llvm.org/D76023
2020-03-26 09:02:18 +00:00
David Green f8c79b94af [ARM] Fold VMOVrh VLDR to LDRH
This adds a simple fold to combine VMOVrh load to a integer load.
Similar to what is already performed for BITCAST, but needs to account
for the types being of different sizes, creating an zero extending load.

Differential Revision: https://reviews.llvm.org/D76485
2020-03-24 15:51:03 +00:00
David Green 1232cfa385 [ARM] Don't split trunc stores that can be better handled as VMOVN
We deliberately split stores of the form
store(truncate(larger-than-legal-type)) into two stores, allowing each
store to perform part of the truncate for free.

There are times however where it makes more sense to use VMOVN to
de-interlace the results back into a single vector, and store that in
one go. This adds a check for that situation, not splitting the store if
it looks like a VMOVN can be more useful.

Differential Revision: https://reviews.llvm.org/D76511
2020-03-24 08:48:52 +00:00
Simon Tatham 1adfa4c991 [ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family.
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.

The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.

Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.

For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76491
2020-03-20 15:42:33 +00:00
Simon Tatham 45a9945b9e [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family.
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.

We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.

In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76490
2020-03-20 15:42:33 +00:00
David Green b3499f572d [ARM] Change VDUP type to i32 for MVE
The MVE VDUP instruction take a GPR and splats into every lane of a
vector register. Unlike NEON we do not have a VDUPLANE equivalent
instruction, doing the same splat from a fp register. Previously a VDUP
to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which
would mean the instruction pattern needs to add a COPY_TO_REGCLASS to
the GPR.

Instead this now converts that earlier during an ISel DAG combine,
converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction
selection to tell that the input needs to be an i32, which in one of the
testcases allows it to use ldr (or specifically ldm) over (vldr;vmov).

Whilst being simple enough for floats, as the types sizes are the same,
these is no BITCAST equivalent for getting a half into a i32. This uses
a VMOVrh ARMISD node, which doesn't know the same tricks yet.

Differential Revision: https://reviews.llvm.org/D76292
2020-03-20 09:48:45 +00:00
Eli Friedman e24e95fe90 Remove CompositeType class.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.

Differential Revision: https://reviews.llvm.org/D75660
2020-03-18 13:53:17 -07:00
Oliver Stannard 73cea83a6f [IPRA][ARM] Spill extra registers at -Oz
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.

We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.

Differential revision: https://reviews.llvm.org/D69936
2020-03-18 13:51:16 +00:00
Simon Tatham 928776de92 [ARM,MVE] Add intrinsics for the VQDMLAH family.
Summary:
These are complicated integer multiply+add instructions with extra
saturation, taking the high half of a double-width product, and
optional rounding. There's no sensible way to represent that in
standard IR, so I've converted the clang builtins directly to
target-specific intrinsics.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76123
2020-03-18 10:55:04 +00:00
Simon Tatham 28c5d97bee [ARM,MVE] Add intrinsics and isel for MVE integer VMLA.
Summary:
These instructions compute multiply+add in integers, with one of the
operands being a splat of a scalar. (VMLA and VMLAS differ in whether
the splat operand is a multiplier or the addend.)

I've represented these in IR using existing standard IR operations for
the unpredicated forms. The predicated forms are done with target-
specific intrinsics, as usual.

When operating on n-bit vector lanes, only the bottom n bits of the
i32 scalar operand are used. So we have to tell that to isel lowering,
to allow it to remove a pointless sign- or zero-extension instruction
on that input register. That's done in `PerformIntrinsicCombine`, but
first I had to enable `PerformIntrinsicCombine` for MVE targets
(previously all the intrinsics it handled were for NEON), and make it
a method of `ARMTargetLowering` so that it can get at
`SimplifyDemandedBits`.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76122
2020-03-18 10:55:04 +00:00
David Green 2c6c169dbd [ARM] Optimise ASRL/LSRL to smaller shifts using demand bits.
The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have
them, it might turn out that enough of the 64bit result was not required
that we can use a smaller shift to perform the same result. As the
smaller shift can in general be folded in more way, such as into add
instructions in one of the test cases here, we can use the demand bit
analysis to prefer the smaller shifts where we can.

Differential Revision: https://reviews.llvm.org/D75371
2020-03-13 10:09:03 +00:00
David Green f67d93dc23 [ARM] Constant long shift combines
This changes the way that asrl and lsrl intrinsics are lowered, going
via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes.
On top of that, it adds some constant folds for long shifts, in case it
turns out that the shift amount was either constant or 0.

Differential Revision: https://reviews.llvm.org/D75553
2020-03-13 08:54:59 +00:00
Victor Campos 8a12553223 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-03-11 10:19:27 +00:00
James Greenhalgh f0de8d0940 [Arm] Do not lower vmax/vmin to Neon instructions
On some Arm cores there is a performance penalty when forwarding from an
S register to a D register.  Calculating VMAX in a D register creates
false forwarding hazards, so don't do that unless we're on a core which
specifically asks for it.

Patch by James Greenhalgh

Differential Revision: https://reviews.llvm.org/D75248
2020-03-10 10:48:48 +00:00
Djordje Todorovic c15c68abdc [CallSiteInfo] Enable the call site info only for -g + optimizations
Emit call site info only in the case of '-g' + 'O>0' level.

Differential Revision: https://reviews.llvm.org/D75175
2020-03-09 12:12:44 +01:00
David Green 13f2a5883f [ARM] Fixup FP16 bitcasts
Under fp16 we optimise the bitcast between a VMOVhr and a CopyToReg via
custom lowering. This rewrites that to be a DAG combine instead, which
helps produce better code in the cases where the bitcast is actaully
legal.

Differential Revision: https://reviews.llvm.org/D72753
2020-02-27 12:19:31 +00:00
Craig Topper 735d27dc40 [SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.

This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.

Differential Revision: https://reviews.llvm.org/D75132
2020-02-25 16:58:23 -08:00
Hans Wennborg decd021fac Don't generate libcalls for wide shift on Windows ARM (PR42711)
The previous patch (cff90f07cb) didn't
cover ARM.
2020-02-25 11:54:07 +01:00
Sam Parker 03756a4197 [ARM][MVE] Combine more extending masked loads
For MVE, don't look at the users of the extending loads so that more
as desirable for folding.

Differential Revision: https://reviews.llvm.org/D74958
2020-02-24 07:50:15 +00:00
David Green 83012cb217 [ARM] Correct Formatting. NFC
Also removed an unnecessary TODO that I don't believe is relevant for
the instruction in question.
2020-02-21 16:08:56 +00:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
David Green 33aa5dfe9c [ARM] VMLAVA reduction patterns
Similar to VADDV and VADDLV that have been added recently, this adds
lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They
perform the same roles as the add's, just folding a mul into the same
instruction (and so taking two inputs). As such, they need to be lowered
in the same way as the types are often not legal.

Differential Revision: https://reviews.llvm.org/D74390
2020-02-19 12:39:58 +00:00
David Green fceb3e3b4a [ARM] MVE VADDLV lowering
Following on from the extra VADDV lowering, this extends things to
handle VADDLV which allows summing values into a pair of i32 registers,
together treated as a i64. This needs to be done in DAGCombine too as
the types are otherwise illegal, which is a fairly simple addition on
top of the existing code.

There is also a VADDLVA instruction handled here, that adds the incoming
values from the two general purpose registers. As opposed to the
non-long version where we could just add patterns for add(x, VADDV), the
long version needs to handle this early before the i64 has being split
into too many pieces.

Differential Revision: https://reviews.llvm.org/D74224
2020-02-19 11:07:20 +00:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
David Green 51c6e9445c [ARM] Extra MVE VADDV reduction patterns
We already make use of the VADDV vector reduction instruction for cases
where the input and the output start out at the same type. The MVE
instruction however will sum into an i32, so if we are summing a v16i8
into an i32, we can still use the same instructions. In terms of IR,
this looks like a sext of a legal type (v16i8) into a very illegal type
(v16i32) and a vecreduce.add of that into the result. This means we have
to catch the pattern early in a DAG combine, producing a target VADDVs/u
node, where the signedness is now important.

This is the first part, handling VADDV and VADDVA. There are also
VADDVL/VADDVLA instructions, which are interesting because they sum into
a 64bit value. And VMLAV and VMLALV, which are interesting because they
also do a multiply of two values. It may look a little odd in places as
a result.

On it's own this will probably not do very much, as the vectorizer will
not produce this IR yet.

Differential Revision: https://reviews.llvm.org/D74218
2020-02-19 09:45:35 +00:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
John Brawn 594a89f727 [FPEnv][ARM] Don't call mutateStrictFPToFP when lowering
mutateStrictFPToFP can delete the node and replace it with another with the same
value which can later cause problems, and returning the result of
mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the
returned value has the same number of results as the original. Instead handle
things by doing the mutation manually.

Differential Revision: https://reviews.llvm.org/D74726
2020-02-17 18:19:25 +00:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
John Brawn 0ec5797296 [ARM] Fix infinite loop when lowering STRICT_FP_EXTEND
If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND
and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the
current implementation will cause in infinite loop for STRICT_FP_EXTEND due to
emitting a merge_values of the original node which after replacement becomes a
merge_values of itself.

Fix this by not doing anything for f32 to f64 extend when we have FP64, though
for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that
doesn't happen automatically for opcodes with custom lowering.

Differential Revision: https://reviews.llvm.org/D74559
2020-02-13 16:12:50 +00:00
David Green 9d4c597541 [ARM] Fix ReconstructShuffle for bigendian
Simon pointed out that this function is doing a bitcast, which can be
incorrect for big endian. That makes the lowering of VMOVN in MVE
wrong, but the function is shared between Neon and MVE so both can
be incorrect.

This attempts to fix things by using the newly added VECTOR_REG_CAST
instead of the BITCAST. As it may now be used on Neon, I've added the
relevant patterns for it there too. I've also added a quick dag combine
for it to remove them where possible.

Differential Revision: https://reviews.llvm.org/D74485
2020-02-13 09:56:46 +00:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Victor Campos af2a384581 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 60e0120c91.
2020-02-08 13:18:45 +00:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
John Brawn b37d59353f [FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.

The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.

Differential Revision: https://reviews.llvm.org/D73194
2020-02-03 12:59:12 +00:00
Simon Tatham 961530fdc9 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Simon Tatham 4321c6af28 [ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics.
Summary:
Immediate vmvnq is code-generated as a simple vector constant in IR,
and left to the backend to recognize that it can be created with an
MVE VMVN instruction. The predicated version is represented as a
select between the input and the same constant, and I've added a
Tablegen isel rule to turn that into a predicated VMVN. (That should
be better than the previous VMVN + VPSEL: it's the same number of
instructions but now it can fold into an adjacent VPT block.)

The unpredicated forms of VBIC and VORR are done by enabling the same
isel lowering as for NEON, recognizing appropriate immediates and
rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I
then instruction-select into the right MVE instructions (now that I've
also reworked those instructions to use the same MC operand encoding).
In order to do that, I had to promote the Tablegen SDNode instance
`NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and
similarly for `NEONvbicImm`.

The predicated forms of VBIC and VORR are represented as a vector
select between the original input vector and the output of the
unpredicated operation. The main convenience of this is that it still
lets me use the existing isel lowering for VBICIMM/VORRIMM, and not
have to write another copy of the operand encoding translation code.

This intrinsic family is the first to use the `imm_simd` system I put
into the MveEmitter tablegen backend. So, naturally, it showed up a
bug or two (emitting bogus range checks and the like). Fixed those,
and added a full set of tests for the permissible immediates in the
existing Sema test.

Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching
because lowering started turning its input into a VBICIMM. Now it
recognizes the VBICIMM instead.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72934
2020-01-23 11:53:52 +00:00
David Green ff2e67a4f7 [ARM] MVE VLDn postinc
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.

It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.

Differential Revision: https://reviews.llvm.org/D71194
2020-01-20 06:57:07 +00:00
Craig Topper bb2553175a [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.

This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.

Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn

Reviewed By: efriedma

Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72536
2020-01-10 19:30:08 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00