Commit Graph

2052 Commits

Author SHA1 Message Date
Craig Topper bb2553175a [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.

This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.

Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn

Reviewed By: efriedma

Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72536
2020-01-10 19:30:08 -08:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Victor Campos 60e0120c91 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers

Reviewed By: efriedma, nickdesaulniers

Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2020-01-07 13:16:18 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
David Green fb8c9a339a [ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.

Differential Revision: https://reviews.llvm.org/D72139
2020-01-05 11:24:04 +00:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
QingShan Zhang 2133d3c558 [DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal'
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.

Differential Revision: https://reviews.llvm.org/D70000
2020-01-03 03:26:41 +00:00
David Green b4abe7afbf [ARM] Sink splat to ICmp
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.

Differential Revision: https://reviews.llvm.org/D70997
2019-12-30 12:58:14 +00:00
Martin Storsjö b774aa1011 [ARM] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71720
2019-12-23 12:13:49 +02:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
John Brawn 01ba201abc [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
David Green 57d96ab593 [ARM] Add some VCMP folding and canonicalisation
The VCMP instructions in MVE can accept a register or ZR, but only as
the right hand operator. Most of the time this will already be correct
because the icmp will have been canonicalised that way already. There
are some cases in the lowering of float conditions that this will not
apply to though. This code should fix up those cases.

Differential Revision: https://reviews.llvm.org/D70822
2019-12-02 19:57:12 +00:00
Carey Williams 76fd58d0fe Revert "[ARM] Allocatable Global Register Variables for ARM"
This reverts commit 2d739f98d8.
2019-11-29 17:01:05 +00:00
David Green 9f15fcc271 [ARM] Replace arm_neon_vqadds with sadd_sat
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.

Differential Revision: https://reviews.llvm.org/D69350
2019-11-27 13:32:29 +00:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
David Green 882f23caea [ARM] MVE interleaving load and stores.
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
2019-11-19 18:37:30 +00:00
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Anna Welker 2d739f98d8 [ARM] Allocatable Global Register Variables for ARM
Provides support for using r6-r11 as globally scoped
      register variables. This requires a -ffixed-rN flag
      in order to reserve rN against general allocation.

      If for a given GRV declaration the corresponding flag
      is not found, or the the register in question is the
      target's FP, we fail with a diagnostic.

      Differential Revision: https://reviews.llvm.org/D68862
2019-11-18 10:07:37 +00:00
Simon Tatham 5b9e4daef0 [ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement.
MVE includes instructions that extract an 8- or 16-bit lane from a
vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td`
already included isel patterns to select those instructions in
response to the `ARMISD::VGETLANEs` selection-DAG node type. But
`ARMISD::VGETLANEs` was never actually generated, because the code
that creates it was conditioned on NEON only.

It's an easy fix to enable the same code for integer MVE, and now IR
that sign-extends the result of an extractelement (whether explicitly
or as part of the function call ABI) will use `vmov.s8` instead of
`vmov.u8` followed by `sxtb`.

Reviewers: SjoerdMeijer, dmgreen, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70132
2019-11-13 09:08:41 +00:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
David Green 91b0cad813 [ARM] Use isFMAFasterThanFMulAndFAdd for MVE
The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd,
where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually
available for scalar code. For MVE we don't have the non-fused version
though. It makes more sense for isFMAFasterThanFMulAndFAdd to return
true, allowing us to simplify some of the existing ISel patterns.

The tests here are that non of the existing tests failed, and so we are
still selecting VFMA and VFMS. The one test that changed shows we can
now select from fast math flags, as opposed to just relying on the
isFMADLegalForFAddFSub option.

Differential Revision: https://reviews.llvm.org/D69115
2019-11-04 15:05:41 +00:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
vhscampos f6e11a36c4 [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
Summary:
Writing support for three ACLE functions:
  unsigned int __cls(uint32_t x)
  unsigned int __clsl(unsigned long x)
  unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69250
2019-10-28 11:06:58 +00:00
Simon Tatham 1b45297e01 [ARM] Begin adding IR intrinsics for MVE instructions.
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67158
2019-10-24 16:33:13 +01:00
David Green d7b77f2203 [ARM] Add qadd lowering from a sadd_sat
This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the
same time.

The qadd instruction sets the q flag, but we already have many cases where we
do not model this in llvm.

Differential Revision: https://reviews.llvm.org/D68976

llvm-svn: 375411
2019-10-21 12:33:46 +00:00
Guillaume Chatelet bac5f6bd21 [Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69243

llvm-svn: 375407
2019-10-21 11:01:55 +00:00
David Green fba831e791 [ARM] Lower sadd_sat to qadd8 and qadd16
Lower the target independent signed saturating intrinsics to qadd8 and qadd16.
This custom lowers them from a sadd_sat, catching the node early before it is
promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a
qadd8/qadd16, so that we can call demand bits on it to show that it does not
use the upper bits.

Also handles QSUB8 and QSUB16.

Differential Revision: https://reviews.llvm.org/D68974

llvm-svn: 375402
2019-10-21 09:53:38 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
David Green 543236232c [ARM] Selection for MVE VMOVN
The adds both VMOVNt and VMOVNb instruction selection from the appropriate
shuffles. We detect shuffle masks of the form:
0, N, 2, N+2, 4, N+4, ...
or
0, N+1, 2, N+3, 4, N+5, ...
ISel will also try the opposite patterns, with inputs reversed. These are
selected to VMOVNt and VMOVNb respectively.

Differential Revision: https://reviews.llvm.org/D68283

llvm-svn: 374781
2019-10-14 15:19:33 +00:00
David Green 8628bb0491 [ARM] VQSUB instruction
Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics.

Differential Revision: https://reviews.llvm.org/D68567

llvm-svn: 374377
2019-10-10 16:34:30 +00:00
David Green 39596ec2fe [ARM] VQADD instructions
This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat
intrinsics.

Differential Revision: https://reviews.llvm.org/D68566

llvm-svn: 374336
2019-10-10 13:05:04 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Benjamin Kramer 12e915b3fc [ARM] Make helpers static. NFC.
llvm-svn: 373503
2019-10-02 18:20:24 +00:00
David Green c9b5ab8b1c [ARM] Identity shuffles are legal
Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE
(they essentially just become bitcasts). We were not catching that in the
existing set of what we considered legal though. On NEON, they would be covered
by vext's, but that is not generally available in MVE.

This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here
but does what we want and prevents us from just rewriting what is the same
function.

Differential Revision: https://reviews.llvm.org/D68241

llvm-svn: 373446
2019-10-02 11:40:51 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
David Green 10d10102a4 [ARM] Ensure we do not attempt to create lsll #0
During legalisation we can end up with some pretty strange nodes, like shifts
of 0. We need to make sure we don't try to make long shifts of these, ending up
with invalid assembly instructions. A long shift with a zero immediate actually
encodes a shift by 32.

Differential Revision: https://reviews.llvm.org/D67664

llvm-svn: 372839
2019-09-25 10:16:48 +00:00
David Green 2fb41fc70c [ARM] Split large widening MVE loads
Similar to rL372717, we can force the splitting of extends of vector loads in
MVE, in order to use the better widening loads as opposed to going through
expensive extends. This adds a combine to early-on detect extends of loads and
split the load in two, from where normal legalisation will kick in and we get a
series of widening loads.

Differential Revision: https://reviews.llvm.org/D67909

llvm-svn: 372721
2019-09-24 10:53:09 +00:00
David Green 49d851f403 [ARM] Split large truncating MVE stores
MVE does not have a simple sign extend instruction that can move elements
across lanes. We currently often end up moving each lane into and out of a GPR,
in order to get elements into the correct places. When we have a store of a
trunc (or a extend of a load), we can instead just split the store/load in two,
using the narrowing/widening load/store instructions from each half of the
vector.

This does that for stores. It happens very early in a store combine, so as to
easily detect the truncates. (It would be possible to do this later, but that
would involve looking through a buildvector of extract elements. Not impossible
but this way seemed simpler).

By enabling store combines we also get a vmovdrr combine for free, helping some
other tests.

Differential Revision: https://reviews.llvm.org/D67828

llvm-svn: 372717
2019-09-24 10:10:41 +00:00
Guillaume Chatelet c281b40814 [Alignment] Get DataLayout::StackAlignment as Align
Summary:
Internally it is needed to know if StackAlignment is set but we can
expose it as llvm::Align.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67852

llvm-svn: 372585
2019-09-23 12:01:32 +00:00
Oliver Cruickshank c84722ff27 [ARM] Fix CTTZ not generating correct instructions MVE
CTTZ intrinsic should have been set to Custom, not Expand

llvm-svn: 372401
2019-09-20 15:03:44 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
David Green 0cfb78e52a [ARM] MVE i1 splat
We needn't BFI each lane individually into a predicate register when each lane
in the same. A simple sign extend and a vmsr will do.

Differential Revision: https://reviews.llvm.org/D67653

llvm-svn: 372313
2019-09-19 12:17:41 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Graham Hunter 1a9195d817 [SVE][MVT] Fixed-length vector MVT ranges
* Reordered MVT simple types to group scalable vector types
    together.
  * New range functions in MachineValueType.h to only iterate over
    the fixed-length int/fp vector types.
  * Stopped backends which don't support scalable vector types from
    iterating over scalable types.

Reviewers: sdesmalen, greened

Reviewed By: greened

Differential Revision: https://reviews.llvm.org/D66339

llvm-svn: 372099
2019-09-17 10:19:23 +00:00
David Green 8d21460dc5 [ARM] A predicate cast of a predicate cast is a predicate cast
The adds some very basic folding of PREDICATE_CASTS, removing cases when they
are chained together. These would already be removed eventually, as these are
lowered to copies. This just allows it to happen earlier, which can help other
simplifications.

Differential Revision: https://reviews.llvm.org/D67591

llvm-svn: 372012
2019-09-16 17:29:07 +00:00
Oliver Cruickshank ee6fbebbaf [ARM] Add patterns for BSWAP intrinsic on MVE
BSWAP can use the VREV instruction on MVE to produce better results than
expanding.

llvm-svn: 372002
2019-09-16 15:20:10 +00:00
Oliver Cruickshank e9510a6cad [ARM] Add patterns for bitreverse intrinsic on MVE
BITREVERSE can use the VBRSR which will reverse and right shift.
Shifting right by 0 will just reverse the bits.

llvm-svn: 372001
2019-09-16 15:20:03 +00:00
Oliver Cruickshank 5f799ef162 [ARM] Lower CTTZ on MVE
Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and
count the leading zeros, equivalent to a count trailing zeros (CTTZ).

llvm-svn: 372000
2019-09-16 15:19:56 +00:00
Oliver Cruickshank cd1a0b9271 [ARM] Add patterns for CTLZ on MVE
CTLZ intrinsic can use the VCLS instruction on MVE, which produces
better results than expanding.

llvm-svn: 371999
2019-09-16 15:19:49 +00:00
David Green b325c05732 [ARM] Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

Differential Revision: https://reviews.llvm.org/D67186

llvm-svn: 371932
2019-09-15 14:14:47 +00:00
Sam Tebbs 1572b68509 [ARM] Add support for MVE vmaxv and vminv
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.

Differential Revision: https://reviews.llvm.org/D66413

llvm-svn: 371827
2019-09-13 09:11:46 +00:00
Guillaume Chatelet b6722af068 [Alignment] Use Align for TargetLowering::MinStackArgumentAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67288

llvm-svn: 371498
2019-09-10 09:01:18 +00:00
David Green 2b7089949e [ARM] Fix loads and stores for predicate vectors
These predicate vectors can usually be loaded and stored with a single
instruction, a VSTR_P0. However this instruction will store the entire P0
predicate, 16 bits, zeroextended to 32bits. Each lane of the the
v4i1/v8i1/v16i1 representing 4/2/1 bits.

As far as I understand, when llvm says "store this v4i1", it really does need
to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the
interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a
store followed by a load, which is how the code is expanded.

So this instead lowers the v4i1/v8i1 load/store through some shuffles to get
the bits into the correct positions. This, as you might imagine, is not as
efficient as a single instruction. But I believe it is needed for correctness.
v16i1 equally should not load/store 32bits, only storing the 16bits of data.
Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not
changing). This is fine as they are self-consistent, it is only "externally
observable loads/stores" (from our point of view) that need to be corrected.

Differential revision: https://reviews.llvm.org/D67085

llvm-svn: 371419
2019-09-09 16:35:49 +00:00
Simon Pilgrim d7d8bb937a Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.
llvm-svn: 371302
2019-09-07 11:04:04 +00:00
Sam Tebbs f1cdd95a2f [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together.

This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Differential revision: https://reviews.llvm.org/D66295

llvm-svn: 371218
2019-09-06 16:01:32 +00:00
Guillaume Chatelet 9fcf066d0c [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67278

llvm-svn: 371210
2019-09-06 14:51:15 +00:00
Guillaume Chatelet 4fc3ad9e13 [Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67229

llvm-svn: 371200
2019-09-06 12:48:34 +00:00
David Candler a59bffb576 [ARM] Add support for the s,j,x,N,O inline asm constraints
A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang:

Target independent constraints:

s: An integer constant, but allowing only relocatable values

ARM specific constraints:

j: An immediate integer between 0 and 65535 (valid for MOVW)
x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3
N: An immediate integer between 0 and 31 (Thumb1 only)
O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only)

This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation.

Differential Revision: https://reviews.llvm.org/D65863

Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a
llvm-svn: 371079
2019-09-05 15:17:25 +00:00
Guillaume Chatelet aff45e4b23 [LLVM][Alignment] Make functions using log of alignment explicit
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:

 - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
 - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
 - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,

Reviewers: lattner, thegameg, courbet

Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65945

llvm-svn: 371045
2019-09-05 10:00:22 +00:00
David Green 61973d978b [ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise
This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can
also be used in ISel Lowering, adding codesize values to the computed costs, to
be able to compare either approximate instruction counts or codesize costs.

It also adds a HasLowerConstantMaterializationCost, which compares the
ConstantMaterializationCost of two values, returning true if the first is
smaller either in instruction count/codesize, or falling back to the other in
the case that they are equal.

This is used in constant CSEL lowering to invert the predicate if the opposite
is easier to materialise.

Differential revision: https://reviews.llvm.org/D66701

llvm-svn: 370741
2019-09-03 11:06:24 +00:00
David Green 57cc65ff47 [ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions.
Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value.

This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used.

Code by Ranjeet Singh and Simon Tatham, with some modifications from me.

Differential revision: https://reviews.llvm.org/D66483

llvm-svn: 370739
2019-09-03 10:53:07 +00:00
David Green a95ec59fa5 [ARM] Use MQPR not QPR for MVE registers
We should be using MQPR, and if we don't we can get COPYs and PHIs created for
QPR. These get folded into instructions, failing verification checks.

Differential revision: https://reviews.llvm.org/D66214

llvm-svn: 370676
2019-09-02 17:18:23 +00:00
David Green 8469a39af3 [ARM] Remove MVE masked loads/stores
These were never enabled correctly and are causing other problems. Taking them
out for the moment, whilst we work on the issues.

This reverts r370329.

llvm-svn: 370607
2019-09-01 10:11:40 +00:00
David Green 942c2e3795 [ARM] MVE Masked loads and stores
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc.

The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.

We also need to do something with unaligned loads/stores. Currently this uses a
similar method used in big endian, using an VLDRB.8 (and potentially a VREV in
BE). This does mean that the predicate mask is converted from, for example, a
v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of
the relevant mask lane, so this could potentially load different results if the
predicate is little odd. As the input is a v4i1 however, I believe this is OK
and all the bits required should be set in the predicate, making the VLDRB.8
load the same data.

Differential Revision: https://reviews.llvm.org/D66534

llvm-svn: 370329
2019-08-29 10:54:35 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Sam Tebbs a69d9d6156 Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the
changes from the above patch.

This reverts commit cd53ff6, reapplying 7c6b229.

llvm-svn: 369638
2019-08-22 10:29:20 +00:00
Hans Wennborg cd53ff6c0d Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32"
It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/

> This patch fixes shifts by a 128/256 bit shift amount. It also fixes
> codegen for shifts of 32 by delegating to LLVM's default optimisation
> instead of emitting a long shift.
>
> Tests that used to generate long shifts of 32 are updated to check for the
> more optimised codegen.
>
> Differential revision: https://reviews.llvm.org/D66519
>
> llvm-svn: 369626

llvm-svn: 369636
2019-08-22 09:16:53 +00:00
Sam Tebbs 7c6b229204 [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32
This patch fixes shifts by a 128/256 bit shift amount. It also fixes
codegen for shifts of 32 by delegating to LLVM's default optimisation
instead of emitting a long shift.

Tests that used to generate long shifts of 32 are updated to check for the
more optimised codegen.

Differential revision: https://reviews.llvm.org/D66519

llvm-svn: 369626
2019-08-22 08:12:06 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Sam Tebbs f312c1ecf4 [ARM] Add support for MVE vaddv
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.

Differential revision: https://reviews.llvm.org/D66085

llvm-svn: 369245
2019-08-19 09:38:28 +00:00
Jian Cai 16fa8b0970 Reland "[ARM] push LR before __gnu_mcount_nc"
This relands r369147 with fixes to unit tests.

https://reviews.llvm.org/D65019

llvm-svn: 369173
2019-08-16 23:30:16 +00:00
Jian Cai 2d957cfe02 Revert "[ARM] push LR before __gnu_mcount_nc"
This reverts commit f4cf3b9593.

llvm-svn: 369149
2019-08-16 20:40:21 +00:00
Jian Cai f4cf3b9593 [ARM] push LR before __gnu_mcount_nc
Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of
the stack on ARM32.

Differential Revision: https://reviews.llvm.org/D65019

llvm-svn: 369147
2019-08-16 20:21:08 +00:00
David Green 8c2c5f5045 [ARM] Don't pretend we know how to generate MVE VLDn
We don't yet know how to generate these instructions for MVE. And in the case
of VLD3, we don't even have the instruction. For the moment don't tell the
vectoriser that we have VLD4, just to end up serialising the results.

Differential Revision: https://reviews.llvm.org/D66009

llvm-svn: 369101
2019-08-16 13:06:49 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
David Green 27ca82f32a [ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.

Differential Revision: https://reviews.llvm.org/D63840

llvm-svn: 368305
2019-08-08 15:27:58 +00:00
David Green 824ffd8b12 [ARM] MVE big endian loads/stores
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.

Differential Revision: https://reviews.llvm.org/D65583

llvm-svn: 368304
2019-08-08 15:15:19 +00:00
David Green 1becefd3f7 [ARM] Tighten up VLDRH.32 with low alignments
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.

Differential Revision: https://reviews.llvm.org/D65580

llvm-svn: 368256
2019-08-08 06:22:03 +00:00
Oliver Cruickshank 4d4eefda6c [ARM] Expand CTPOP intrinsic for MVE
llvm-svn: 368180
2019-08-07 15:47:45 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Eli Friedman 89b80f1239 [ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1.
This is extremely specific, but saves three instructions when it's
legal.  I don't think the code can be usefully generalized.

Differential Revision: https://reviews.llvm.org/D65351

llvm-svn: 367492
2019-07-31 23:19:21 +00:00
Eli Friedman 2f45ec1c39 [ARM] Transform compare of masked value to shift on Thumb1.
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.

It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.

Differential Revision: https://reviews.llvm.org/D65175

llvm-svn: 367491
2019-07-31 23:17:34 +00:00
David Green 9cf344e739 [ARM] Better patterns for fp <> predicate vectors
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.

Differential Revision: https://reviews.llvm.org/D65103

llvm-svn: 367191
2019-07-28 13:53:39 +00:00
David Green cd7a6fa314 [ARM] Rewrite how VCMP are lowered, using a single node
This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and
VCMPZ with an extra operand as the condition code. I believe this will make
some combines simpler, allowing us to just look at these codes and not the
operands. It also helps fill in a missing VCGTUZ MVE selection without adding
extra nodes for it.

Differential Revision: https://reviews.llvm.org/D65072

llvm-svn: 366934
2019-07-24 17:36:47 +00:00
David Green 047a0b6575 [ARM] Disable MVE fptosi and friends
The prevents us from trying to convert an i1 predicate vector to a float, or
vice-versa. Better patterns are possible, which will follow in a subsequent
commit. For now we just expand them.

Differential Revision: https://reviews.llvm.org/D65066

llvm-svn: 366931
2019-07-24 17:26:26 +00:00
David Green bab4d8ac5a [ARM] Better OR's for MVE compares
This adds a DeMorgan combine for OR's of compares to turn them into AND's,
helping prevent them from going into and out of gpr registers. It also fills in
the VCLE and VCLT nodes that MVE can select, allowing it to invert more
compares.

Differential Revision: https://reviews.llvm.org/D65059

llvm-svn: 366920
2019-07-24 16:42:09 +00:00
David Green 4fc78c496e [ARM] MVE floating point compares and selects
Much like integers, this adds MVE floating point compares and select. It
requires a lot more buildvector/shuffle code because we may need to expand the
compares without mve.fp, and requires support for and/or because of the way we
lower llvm condition codes.

Some original code by David Sherwood

Differential Revision: https://reviews.llvm.org/D65054

llvm-svn: 366909
2019-07-24 14:28:22 +00:00
David Green c7e55d4f52 [ARM] MVE predicate register support
This adds support code for building and shuffling i1 predicate registers. It
generally uses two basic principles, either converting the predicate into an
scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or
by converting the register to an full vector register and back.

Some of the code here is a not super efficient but will hopefully cover most
cases of moving i1 vectors around and can be improved in subsequent patches.

Some code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65052

llvm-svn: 366890
2019-07-24 11:51:36 +00:00
David Green b9d96ceca0 [ARM] MVE integer compares and selects
This adds the very basics for MVE vector predication, adding integer VCMP and
VSEL instruction support. This is done through predicate registers (MVT::v16i1,
MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom
lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc).

An extra VCNE was added, as this can be handled sensibly by MVE's expanded
number of VCMP condition codes. (There are also VCLE and VCLT which are added
later).

VPSEL is also added here, simply selecting on the vselect.

Original code by David Sherwood.

Differential Revision: https://reviews.llvm.org/D65051

llvm-svn: 366885
2019-07-24 11:08:14 +00:00
Sam Parker 57e87dd81b [ARM][LowOverheadLoops] Fix branch target codegen
While lowering test.set.loop.iterations, it wasn't checked how the
brcond was using the result and so the wls could branch to the loop
preheader instead of not entering it. The same was true for
loop.decrement.reg.
    
So brcond and br_cc and now lowered manually when using the hwloop
intrinsics. During this we now check whether the result has been
negated and whether we're using SETEQ or SETNE and 0 or 1. We can
then figure out which basic block the WLS and LE should be targeting.

Differential Revision: https://reviews.llvm.org/D64616

llvm-svn: 366809
2019-07-23 14:08:46 +00:00
David Green fdedf240f8 [ARM] Rename NEONModImm to VMOVModImm. NFC
Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE.

llvm-svn: 366790
2019-07-23 09:19:24 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Diogo N. Sampaio 11512e742b [ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine
Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}

Which would allow the machine scheduller
to break dependencies with the second load.

 - pr42638

Reviewers: eli.friedman, dmgreen, ostannard

Reviewed By: ostannard

Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64870

llvm-svn: 366423
2019-07-18 10:05:56 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
David Green dc56995c57 [ARM] MVE vector for 64bit types
We need to make sure that we are sensibly dealing with vectors of types v2i64
and v2f64, even if most of the time we cannot generate native operations for
them. This mostly adds a lot of testing, plus fixes up a couple of the issues
found. And, or and xor can be legal for v2i64, and shifts combining needs a
slight fixup.

Differential Revision: https://reviews.llvm.org/D64316

llvm-svn: 366106
2019-07-15 18:42:54 +00:00
David Green 6e89887642 [ARM] MVE Vector Shifts
This adds basic lowering for MVE shifts. There are many shifts in MVE, but the
instructions handled here are:
 VSHL (imm)
 VSHRu (imm)
 VSHRs (imm)
 VSHL (vector)
 VSHL (register)

MVE, like NEON before it, doesn't have shift right by a vector (or register).
We instead have to negate the amount and shift in the opposite direction. This
means we have to convert any SHR's into a form of SHL (that is still signed or
unsigned) with a negated condition and selecting from there. MVE still does
have shifting by an immediate for SHL, ASR and LSR.

This adds lowering for these and for register forms, which work well for shift
lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially
work optimally for right shifts.

Differential Revision: https://reviews.llvm.org/D64212

llvm-svn: 366056
2019-07-15 11:35:39 +00:00
David Green da750b1688 [ARM] Adjust how NEON shifts are lowered
This adjusts the way that we lower NEON shifts to use a DAG target node, not
via a neon intrinsic. This is useful for handling MVE shifts operations in the
same the way. It also renames some of the immediate shift nodes for
consistency, and moves some of the processing of immediate shifts into
LowerShift allowing it to capture more cases.

Differential Revision: https://reviews.llvm.org/D64426

llvm-svn: 366051
2019-07-15 10:44:50 +00:00
David Green 458a720ec1 [ARM] Add sign and zero extend patterns for MVE
The vmovlb instructions can be uses to sign or zero extend vector registers
between types. This adds some patterns for them and relevant testing. The
VBICIMM generation is also put behind a hasNEON check (as is already done for
VORRIMM).

Code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D64069

llvm-svn: 366008
2019-07-13 15:43:00 +00:00
David Green 4ce648b5e8 [ARM] MVE integer abs
Similar to floating point abs, we also have instructions for integers.

Differential Revision: https://reviews.llvm.org/D64027

llvm-svn: 366005
2019-07-13 14:58:32 +00:00
David Green 701bf714db [ARM] MVE integer min and max
This simply makes the MVE integer min and max instructions legal and adds the
relevant patterns for them.

Differential Revision: https://reviews.llvm.org/D64026

llvm-svn: 366004
2019-07-13 14:48:54 +00:00
David Green ac5bcbeb9f [ARM] MVE VRINT support
This adds support for the floor/ceil/trunc/... series of instructions,
converting to various forms of VRINT. They use the same suffixes as their
floating point counterparts. There is not VTINTR, so nearbyint is expanded.

Also added a copysign test, to show it is expanded.

Differential Revision: https://reviews.llvm.org/D63985

llvm-svn: 366003
2019-07-13 14:38:53 +00:00
David Green ec8af0db6c [ARM] MVE minnm and maxnm instructions
This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes,
similar to scalar types.

Original patch by Simon Tatham

Differential Revision: https://reviews.llvm.org/D63870

llvm-svn: 366002
2019-07-13 14:29:02 +00:00
Sam Parker 08b4a8da07 [ARM][LowOverheadLoops] Correct offset checking
This patch addresses a couple of problems:
1) The maximum supported offset of LE is -4094.
2) The offset of WLS also needs to be checked, this uses a
   maximum positive offset of 4094.
    
The use of BasicBlockUtils has been changed because the block offsets
weren't being initialised, but the isBBInRange checks both positive
and negative offsets.
    
ARMISelLowering has been tweaked because the test case presented
another pattern that we weren't supporting.

llvm-svn: 365749
2019-07-11 09:56:15 +00:00
Fangrui Song 7d63be09b6 [ARM] Fix null pointer dereference in CodeGen/ARM/Windows/stack-protector-msvc.ll.test after D64292/r365283
CLI.CS may not be set.

llvm-svn: 365299
2019-07-08 08:43:31 +00:00
Martin Storsjo 8d9d290d4c [ARM] Add support for MSVC stack cookie checking
Heavily based on the same for AArch64, from SVN r346469.

Differential Revision: https://reviews.llvm.org/D64292

llvm-svn: 365283
2019-07-07 18:57:31 +00:00
David Green 25cf705097 [ARM] MVE VMOV immediate handling
This adds some handling for VMOVimm, using the same method that NEON uses. We
create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them
using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate
mode that I have not yet added here.

Code by David Sherwood

Differential Revision: https://reviews.llvm.org/D63884

llvm-svn: 365178
2019-07-05 10:02:43 +00:00
David Green bb7e97d783 [ARM] MVE fp to int conversions
This adds the patterns needed for fptosi and sitofp.

Differential Revision: https://reviews.llvm.org/D63729

llvm-svn: 365176
2019-07-05 09:34:30 +00:00
David Green 2b20ee4110 [ARM] Favour PL/MI over GE/LT when possible
The arm condition codes for GE is N==V (and for LT is N!=V). If the source of
flags cannot set V (overflow), such as a cmp against #0, then we can use the
simpler PL and MI conditions that only check N. As these PL/MI conditions are
simpler than GE/LT, other passes like the peephole optimiser can have a better
time optimising away the redundant CMPs.

The exception is the VSEL instruction, which cannot take the PL code, so there
the transform favours GE.

Differential Revision: https://reviews.llvm.org/D64160

llvm-svn: 365117
2019-07-04 08:58:58 +00:00
Sam Parker 6005681ac6 [ARM] Fix for NDEBUG builds
Fix unused variable warning as well as a nonsense assert.

Differential Revision: https://reviews.llvm.org/D63816

llvm-svn: 365046
2019-07-03 14:39:23 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Simon Tatham bffd099d15 [ARM] MVE: allow soft-float ABI to pass vector types.
Passing a vector type over the soft-float ABI involves it being split
into four GPRs, so the first thing that has to happen at the start of
the function is to recombine those into a vector register. The ABI
types all vectors as v2f64, so we need to support BUILD_VECTOR for
that type, which I do in this patch by allowing it to be expanded in
terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in
turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a
returned vector can be marshalled back into GPRs.

While I'm here, I've also added ISD::UNDEF to the list of operations
we turn back on in `setAllExpand`, because I noticed that otherwise it
gets expanded into a BUILD_VECTOR with explicit zero inputs, leading
to pointless machine instructions to zero out a vector register that's
about to have every lane overwritten of in any case.

Reviewers: dmgreen, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63937

llvm-svn: 364910
2019-07-02 11:26:11 +00:00
Simon Tatham 7b63a9533c [ARM] Stop using scalar FP instructions in integer-only MVE mode.
If you compile with `-mattr=+mve` (enabling integer MVE instructions
but not floating-point ones), then the scalar FP //registers// exist
and it's legal to move things in and out of them, load and store them,
but it's not legal to do arithmetic on them.

In D60708, the calls to `addRegisterClass` in ARMISelLowering that
enable use of the scalar FP registers became conditionalised on
`Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so
that loads, stores and moves of those registers would work. But I
didn't realise that that would also enable all the operations on those
types by default.

Now, if the target doesn't have basic VFP, we follow up those
`addRegisterClass` calls by turning back off all the nontrivial
operations you can perform on f32 and f64. That causes several
knock-on failures, which are fixed by allowing the `VMOVDcc` and
`VMOVScc` instructions to be selected even if all you have is
`HasFPRegs`, and adjusting several checks for 'is this a double in a
single-precision-only world?' to the more general 'is this any FP type
we can't do arithmetic on?'. Between those, the whole of the
`float-ops.ll` and `fp16-instructions.ll` tests can now run in
MVE-without-FP mode and generate correct-looking code.

One odd side effect is that I had to relax the check lines in that
test so that they permit test functions like `add_f` to be generated
as tailcalls to software FP library functions, instead of ordinary
calls. Doing that is entirely legal, but the mystery is why this is
the first RUN line that's needed the relaxation: on the usual kind of
non-FP target, no tailcalls ever seem to be generated. Going by the
llc messages, I think `SoftenFloatResult` must be perturbing the code
generation in some way, but that's as much as I can guess.

Reviewers: dmgreen, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63938

llvm-svn: 364909
2019-07-02 11:26:00 +00:00
Mikhail Maltsev 4a9e3f15bb [ARM] MVE: support QQPRRegClass and QQQQPRRegClass
Summary:
QQPRRegClass and QQQQPRRegClass are used by the
interleaving/deinterleaving loads/stores to represent sequences of
consecutive SIMD registers.

Reviewers: ostannard, simon_tatham, dmgreen

Reviewed By: simon_tatham

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64009

llvm-svn: 364794
2019-07-01 16:05:23 +00:00
Sam Parker 98722691b0 [ARM] WLS/LE Code Generation
Backend changes to enable WLS/LE low-overhead loops for armv8.1-m:
1) Use TTI to communicate to the HardwareLoop pass that we should try
   to generate intrinsics that guard the loop entry, as well as setting
   the loop trip count.
2) Lower the BRCOND that uses said intrinsic to an Arm specific node:
   ARMWLS.
3) ISelDAGToDAG the node to a new pseudo instruction:
   t2WhileLoopStart.
4) Add support in ArmLowOverheadLoops to handle the new pseudo
   instruction.

Differential Revision: https://reviews.llvm.org/D63816

llvm-svn: 364733
2019-07-01 08:21:28 +00:00
Sam Tebbs e39e958da3 [ARM] Add support for the MVE long shift instructions
MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers.

The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl.

test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions.

Differential Revision: https://reviews.llvm.org/D63430

llvm-svn: 364654
2019-06-28 15:43:31 +00:00
David Green 2883944035 [ARM] Mark math routines as non-legal for MVE
This adds handling and tests for a number of floating point math routines,
which have no MVE instructions.

Differential Revision: https://reviews.llvm.org/D63725

llvm-svn: 364641
2019-06-28 11:17:38 +00:00
David Green eb7080ac6e [ARM] Widening loads and narrowing stores
MVE has instructions to widen as it loads, and narrow as it stores. This adds
the required patterns and legalisation to make them work including specifying
that they are legal, patterns to select them and test changes.

Patch by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63839

llvm-svn: 364636
2019-06-28 09:47:55 +00:00
David Green 07e53fee14 [ARM] MVE loads and stores
This fills in the gaps for basic MVE loads and stores, allowing unaligned
access and adding far too many tests. These will become important as
narrowing/expanding and pre/post inc are added. Big endian might still not be
handled very well, because we have not yet added bitcasts (and I'm not sure how
we want it to work yet). I've included the alignment code anyway which maps
with our current patterns. We plan to return to that later.

Code written by Simon Tatham, with additional tests from Me and Mikhail Maltsev.

Differential Revision: https://reviews.llvm.org/D63838

llvm-svn: 364633
2019-06-28 08:41:40 +00:00
David Green fc4102417b [ARM] Mark div and rem as expand for MVE
We don't have vector operations for these, so they need to be expanded for both
integer and float.

Differential Revision: https://reviews.llvm.org/D63595

llvm-svn: 364631
2019-06-28 08:18:55 +00:00
David Green 8be372b190 [ARM] MVE vector shuffles
This patch adds necessary shuffle vector and buildvector support for ARM MVE.
It essentially adds support for VDUP, VREVs and some VMOVs, which are often
required by other code (like upcoming patches).

This mostly uses the same code from Neon that already generated
NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and
moved to ARMInstrInfo as they are common to both architectures. Most of the
selection code seems to be applicable to both, but NEON does have some more
instructions making some parts specific.

Most code originally by David Sherwood.

Differential Revision: https://reviews.llvm.org/D63567

llvm-svn: 364626
2019-06-28 07:08:42 +00:00
Sam Tebbs 8747c5f482 [ARM] Fix formatting issue in ARMISelLowering.cpp
Fix a formatting error in ARMISelLowering.cpp::Expand64BitShift. My test
commit after receiving write access.

llvm-svn: 364560
2019-06-27 16:28:28 +00:00
Eli Friedman ab1d73ee32 [ARM] Don't reserve R12 on Thumb1 as an emergency spill slot.
The current implementation of ThumbRegisterInfo::saveScavengerRegister
is bad for two reasons: one, it's buggy, and two, it blocks using R12
for other optimizations.  So this patch gets rid of it, and adds the
necessary support for using an ordinary emergency spill slot on Thumb1.

(Specifically, I think saveScavengerRegister was broken by r305625, and
nobody noticed for two years because the codepath is almost never used.
The new code will also probably not be used much, but it now has better
tests, and if we fail to emit a necessary emergency spill slot we get a
reasonable error message instead of a miscompile.)

A rough outline of the changes in the patch:

1. Gets rid of ThumbRegisterInfo::saveScavengerRegister.
2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an
emergency spill slot for Thumb1.
3. Implements useFPForScavengingIndex, so the emergency spill slot isn't
placed at a negative offset from FP on Thumb1.
4. Modifies the heuristics for allocating an emergency spill slot to
support Thumb1.  This includes fixing ExtraCSSpill so we don't try to
use "lr" as a substitute for allocating an emergency spill slot.
5. Allocates a base pointer in more cases, so the emergency spill slot
is always accessible.
6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the
right offset in the new cases where we're forcing a base pointer.
7. Ensures we never generate a load or store with an offset outside of
its frame object.  This makes the heuristics more straightforward.
8. Changes Thumb1 prologue and epilogue emission so it never uses
register scavenging.

Some of the changes to the emergency spill slot heuristics in
determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow
the compiler to avoid allocating an emergency spill slot in cases
where it isn't necessary. The rest of the changes should only affect
Thumb1.

Differential Revision: https://reviews.llvm.org/D63677

llvm-svn: 364490
2019-06-26 23:46:51 +00:00
Fangrui Song 6a4c68e187 [ARM] Fix -Wimplicit-fallthrough after D60709/r364331
llvm-svn: 364376
2019-06-26 02:34:10 +00:00
Simon Tatham e8de8ba6a6 [ARM] Support inline assembler constraints for MVE.
"To" selects an odd-numbered GPR, and "Te" an even one. There are some
8.1-M instructions that have one too few bits in their register fields
and require registers of particular parity, without necessarily using
a consecutive even/odd pair.

Also, the constraint letter "t" should select an MVE q-register, when
MVE is present. This didn't need any source changes, but some extra
tests have been added.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60709

llvm-svn: 364331
2019-06-25 16:49:32 +00:00
Simon Tatham a4b415a683 [ARM] Code-generation infrastructure for MVE.
This provides the low-level support to start using MVE vector types in
LLVM IR, loading and storing them, passing them to __asm__ statements
containing hand-written MVE vector instructions, and *if* you have the
hard-float ABI turned on, using them as function parameters.

(In the soft-float ABI, vector types are passed in integer registers,
and combining all those 32-bit integers into a q-reg requires support
for selection DAG nodes like insert_vector_elt and build_vector which
aren't implemented yet for MVE. In fact I've also had to add
`arm_aapcs_vfpcc` to a couple of existing tests to avoid that
problem.)

Specifically, this commit adds support for:

 * spills, reloads and register moves for MVE vector registers

 * ditto for the VPT predication mask that lives in VPR.P0

 * make all the MVE vector types legal in ISel, and provide selection
   DAG patterns for BITCAST, LOAD and STORE

 * make loads and stores of scalar FP types conditional on
   `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few
   existing tests needed their llc command lines updating to use
   `-mattr=-fpregs` as their method of turning off all hardware FP
   support.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60708

llvm-svn: 364329
2019-06-25 16:48:46 +00:00
Fangrui Song 807d2f442a [ARM] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after D60692
llvm-svn: 364312
2019-06-25 13:28:44 +00:00
Simon Tatham 4cf18c2849 [ARM] Explicit lowering of half <-> double conversions.
If an FP_EXTEND or FP_ROUND isel dag node converts directly between
f16 and f32 when the target CPU has no instruction to do it in one go,
it has to be done in two steps instead, going via f32.

Previously, this was done implicitly, because all such CPUs had the
storage-only implementation of f16 (i.e. the only thing you can do
with one at all is to convert it to/from f32). So isel would legalize
the f16 into an f32 as soon as it saw it, by inserting an fp16_to_fp
node (or vice versa), and then the fp_extend would already be f32->f64
rather than f16->f64.

But that technique can't support a target CPU which has full f16
support but _not_ f64, such as some variants of Arm v8.1-M. So now we
provide custom lowering for FP_EXTEND and FP_ROUND, which checks
support for f16 and f64 and decides on the best thing to do given the
combination of flags it gets back.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60692

llvm-svn: 364294
2019-06-25 11:24:50 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
David Green c5471c2a57 [ARM] Adjust isLegalT1AddressImmediate for non-legal types
Types such as float and i64's do not have legal loads in Thumb1, but will still
be loaded with a LDR (or potentially multiple LDR's). As such we can treat the
cost of addressing mode calculations the same as an i32 and get some optimisation
benefits.

Differential Revision: https://reviews.llvm.org/D62968

llvm-svn: 362874
2019-06-08 10:32:53 +00:00
David Green 342d1b81a3 [ARM] Add MVE addressing to isLegalT2AddressImmediate
Now with MVE being added, we can add the vector addressing mode costs for it.
These are generally imm7 multiplied by the size of the type being loaded /
stored.

Differential Revision: https://reviews.llvm.org/D62967

llvm-svn: 362873
2019-06-08 10:18:23 +00:00
David Green 4ecce205d5 [ARM] Add fp16 addressing to isLegalT2AddressImmediate
The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs
to account for those, and adds extra testing. It is dependant upon hasFPRegs16
as this is what the load/store instructions require.

Differential Revision: https://reviews.llvm.org/D62966

llvm-svn: 362872
2019-06-08 10:09:02 +00:00
Diogo N. Sampaio df92f84110 [ARM][FIX] Ran out of registers due tail recursion
Summary:
- pr42062
When compiling for MinSize,
ARMTargetLowering::LowerCall decides to indirect
multiple calls to a same function. However,
it disconsiders the limitation that thumb1
indirect calls require the callee to be in a
register from r0 to r3 (llvm limiation).
If all those registers are used by arguments, the
compiler dies with "error: run out of registers
during register allocation".
This patch tells the function
IsEligibleForTailCallOptimization if we intend to
perform indirect calls, as to avoid tail call
optimization.

Reviewers: dmgreen, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62683

llvm-svn: 362366
2019-06-03 08:58:05 +00:00
Simon Pilgrim 5359bb4d31 [ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI.
llvm-svn: 362094
2019-05-30 14:01:24 +00:00
Simon Tatham 760df47b77 [ARM] Replace fp-only-sp and d16 with fp64 and d32.
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.

Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.

A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.

Reviewers: dmgreen, samparker, SjoerdMeijer

Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60691

llvm-svn: 361845
2019-05-28 16:13:20 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
David Green 21542cd6f4 [ARM] Select a number of fp16 rounding functions
This add patterns for fp16 round and ceil etc. Same as the float and double
patterns.

Differential Revision: https://reviews.llvm.org/D62326

llvm-svn: 361718
2019-05-26 11:13:00 +00:00
David Green c9f4b7d201 [ARM] Promote various fp16 math intrinsics
Promote a number of fp16 math intrinsics to float, so that the relevant float
math routines can be used. Copysign is expanded so as to be handled in-place.

Differential Revision: https://reviews.llvm.org/D62325

llvm-svn: 361717
2019-05-26 10:59:21 +00:00
David Green caf8a11b65 [ARM] Promote fp16 frem
Promote fp16 frem operations on ARM to floats so they call fmodf.

Differential Revision: https://reviews.llvm.org/D62321

llvm-svn: 361713
2019-05-26 10:30:22 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
David Green 0582b22f10 [ARM] Don't use the Machine Scheduler for cortex-m at minsize
The new cortex-m schedule in rL360768 helps performance, but can increase the
amount of high-registers used. This, on average, ends up increasing the
codesize by a fair amount (because less instructions are converted from T2 to
T1). On cortex-m at -Oz, where we are quite size-paranoid, it is better to use
the existing DAG scheduler with the RegPressure scheduling preference (at least
until the issues around T2 vs T1 instructions can be improved).

I have also made sure that the Sched::RegPressure dag scheduler is always
chosen for MinSize.

The test shows one case where we increase the number of registers used.

Differential Revision: https://reviews.llvm.org/D61882

llvm-svn: 360769
2019-05-15 12:58:02 +00:00
Eli Friedman 2ea088173d [ARM] Glue register copies to tail calls.
This generally follows what other targets do. I don't completely
understand why the special case for tail calls existed in the first
place; even when the code was committed in r105413, call lowering didn't
work in the way described in the comments.

Stack protector lowering breaks if the register copies are not glued to
a tail call: we have to insert the stack protector check before the tail
call, and we choose the location based on the assumption that all
physical register dependencies of a tail call are adjacent to the tail
call. (See FindSplitPointForStackProtector.) This is sort of fragile,
but I don't see any reason to break that assumption.

I'm guessing nobody has seen this before just because it's hard to
convince the scheduler to actually schedule the code in a way that
breaks; even without the glue, the only computation that could actually
be scheduled after the register copies is the computation of the call
address, and the scheduler usually prefers to schedule that before the
copies anyway.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41417

Differential Revision: https://reviews.llvm.org/D60427

llvm-svn: 360099
2019-05-06 23:21:59 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
David Green 0d741507f7 [ARM] Rewrite isLegalT2AddressImmediate
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.

Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.

I have not attempted to do much with vectors yet. That will need changing once
MVE is added.

Differential Revision: https://reviews.llvm.org/D60677

llvm-svn: 358845
2019-04-21 09:54:29 +00:00
Simon Pilgrim e5573f4f4e [TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359)
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.

shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask

preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair

llvm-svn: 358526
2019-04-16 20:57:28 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Eli Friedman 3813fe0bda [ARM] Optimize expressions like "return x != 0;" for Thumb1.
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.

While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.

Differential Revision: https://reviews.llvm.org/D59616

llvm-svn: 357437
2019-04-02 00:01:23 +00:00
Eli Friedman 1e5d569c8c [ARM] Add missing memory operands to a bunch of instructions.
This should hopefully lead to minor improvements in code generation, and
more accurate spill/reload comments in assembly.

Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they
don't lead to misleading assembly comments for merged memory operands;
this is technically orthogonal, but in practice the relevant memory
operand lists don't show up without this change.

Differential Revision: https://reviews.llvm.org/D59713

llvm-svn: 356963
2019-03-25 22:42:30 +00:00
Simon Pilgrim 87d261bfd3 [Thumb] Fix infinite loop in ABS expansion (PR41160)
Don't expand ISD::ABS node if its legal.

llvm-svn: 356661
2019-03-21 12:41:18 +00:00
Simon Pilgrim 7a8e5051f4 Fix unused variable warning. NFCI.
llvm-svn: 356474
2019-03-19 16:49:59 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Tim Renouf d1477e989c [ARM] Fixed an assumption of power-of-2 vector MVT
I am about to introduce some non-power-of-2 width vector MVTs. This
commit fixes a power-of-2 assumption that my forthcoming change would
otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and
vdiv_combine.ll.

Differential Revision: https://reviews.llvm.org/D58927

Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317
llvm-svn: 356341
2019-03-17 20:48:54 +00:00
Florian Hahn 13bbcb3264 [ARM] Sink zext/sext operands for add and sub to enable vsubl generation.
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.

See https://bugs.llvm.org/show_bug.cgi?id=40025.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D58063

llvm-svn: 355460
2019-03-06 00:10:03 +00:00
Oliver Stannard 4a9086b537 [ARM] Fix select_cc lowering for fp16
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.

Differential revision: https://reviews.llvm.org/D58813

llvm-svn: 355385
2019-03-05 10:42:34 +00:00
Oliver Stannard e019e6223b [ARM] Consider undefined-on-NaN conditions in checkVSELConstraints
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.

Differential revision: https://reviews.llvm.org/D58812

llvm-svn: 355199
2019-03-01 13:58:25 +00:00
Kadir Cetinkaya 1b1b1a6135 [Target][ARM] Add a usage for SrcSz to unbreak build-bots without assertions
llvm-svn: 355101
2019-02-28 15:55:11 +00:00
Bjorn Pettersson d30f308a9f Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Sam Parker 5b09834bc3 [ARM] Add OptMinSize to ARMSubtarget
In many places in the backend, we like to know whether we're
optimising for code size and this is performed by checking the
current machine function attributes. A subtarget is created on a
per-function basis, so it's possible to know when we're compiling for
code size on construction so record this in the new object.

Differential Revision: https://reviews.llvm.org/D57812

llvm-svn: 353501
2019-02-08 07:57:42 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
Reid Kleckner 27fd307b83 [ARM] Deduplicate table generated CC analysis code
Create ARMCallingConv.cpp and emit code for calling convention analysis
from there.

llvm-svn: 352431
2019-01-28 21:28:43 +00:00
Matt Arsenault 39508331ef Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00
Chandler Carruth 285fe716c5 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

llvm-svn: 351796
2019-01-22 10:29:58 +00:00
Matt Arsenault bfdba5e4fc IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

llvm-svn: 351778
2019-01-22 03:32:36 +00:00
Eli Friedman 1eaa04d682 [ARM] Combine ands+lsls to lsls+lsrs for Thumb1.
This patch may seem familiar... but my previous patch handled the
equivalent lsls+and, not this case.  Usually instcombine puts the
"and" after the shift, so this case doesn't come up. However, if the
shift comes out of a GEP, it won't get canonicalized by instcombine,
and DAGCombine doesn't have an equivalent transform.

This also modifies isDesirableToCommuteWithShift to suppress DAGCombine
transforms which would make the overall code worse.

I'm not really happy adding a bunch of code to handle this, but it would
probably be tricky to substantially improve the behavior of DAGCombine
here.

Differential Revision: https://reviews.llvm.org/D56032

llvm-svn: 351776
2019-01-22 01:51:37 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Diogo N. Sampaio f192cdb5c9 [ARM] ComputeKnownBits to handle extract vectors
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.

Differential revision: https://reviews.llvm.org/D56098

llvm-svn: 350553
2019-01-07 19:01:47 +00:00
Simon Pilgrim d800ee4861 [ARM] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349909
2018-12-21 15:15:38 +00:00
Eli Friedman b1bbd5dca3 [ARM] Complete the Thumb1 shift+and->shift+shift transforms.
This saves materializing the immediate.  The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.

I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.

Differential Revision: https://reviews.llvm.org/D55630

llvm-svn: 349857
2018-12-20 23:39:54 +00:00
Tim Northover ae3b66b7b0 ARM: use acquire/release instruction variants when available.
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.

llvm-svn: 349355
2018-12-17 15:05:32 +00:00
Tim Northover 5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Sjoerd Meijer ecc7dcb879 [ARM] Don't expand sdiv when optimising for minsize
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:

sdiv %1, i32 4

gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.

Differential Revision: https://reviews.llvm.org/D54546

llvm-svn: 347965
2018-11-30 08:14:28 +00:00
Eli Friedman 7d7d41debc [ARM] Fix CPSR liveness in tMOVCCr_pseudo lowering.
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions.  This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.

For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)

The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.

Differential Revision: https://reviews.llvm.org/D54192

llvm-svn: 346355
2018-11-07 21:08:13 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Peter Collingbourne 4bb928c110 ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.

Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.

Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.

Differential Revision: https://reviews.llvm.org/D53614

llvm-svn: 345171
2018-10-24 18:10:38 +00:00
Saleem Abdulrasool 4005f9a860 ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

llvm-svn: 345107
2018-10-24 00:00:52 +00:00
Simon Pilgrim 5abb607ebe [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Eli Friedman 5ab09a684f [ARM] Fix correctness checks in promoteToConstantPool.
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.

This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51472

llvm-svn: 343361
2018-09-28 20:27:31 +00:00
Eli Friedman bb993be56b [ARM] Use preferred alignment for constants in promoteToConstantPool.
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.

Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51469

llvm-svn: 343359
2018-09-28 20:21:51 +00:00
Nirav Dave 0a0c2e6dd9 [ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI.
llvm-svn: 342987
2018-09-25 15:30:47 +00:00
Alex Bradbury 79518b02cd [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Tim Northover c15d47bb01 ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.

This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.

llvm-svn: 342127
2018-09-13 10:28:05 +00:00
Martin Storsjo 68df812cce [MinGW] Move code for indicating "potentially not DSO local" into shouldAssumeDSOLocal. NFC.
On Windows, if shouldAssumeDSOLocal returns false, it's either a
dllimport reference, or a reference that we should treat as non-local
and create a stub for.

Clean up AArch64Subtarget::ClassifyGlobalReference a little while
touching the flag handling relating to dllimport.

Differential Revision: https://reviews.llvm.org/D51590

llvm-svn: 341402
2018-09-04 20:56:28 +00:00
Martin Storsjo 2dcaa41e1e [MinGW] [ARM] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51450

llvm-svn: 341176
2018-08-31 08:00:25 +00:00
Eli Friedman 96e3cd85bd [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.

Differential Revision: https://reviews.llvm.org/D47917

llvm-svn: 340458
2018-08-22 21:47:14 +00:00
Eli Friedman c11e2b9470 [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.
This avoids a potential infinite loop setting and unsetting bits in the
mask.

Reduced from a failure on the polly-aosp bot.

Differential Revision: https://reviews.llvm.org/D51066

llvm-svn: 340446
2018-08-22 20:13:45 +00:00
David Green 9dd1d451d9 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673

llvm-svn: 340397
2018-08-22 11:31:39 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Eli Friedman 6b84a48953 Fix unused lambda capture warning from r339472.
llvm-svn: 339479
2018-08-10 22:03:25 +00:00
Eli Friedman e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Sjoerd Meijer b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Sjoerd Meijer b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Eli Friedman 733f4ed1bb [ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.
Saves materializing the immediate for the "ands".

Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.

Now implemented as a DAGCombine.

Differential Revision: https://reviews.llvm.org/D49585

llvm-svn: 337945
2018-07-25 18:22:22 +00:00
Tim Northover e00cf4fc68 ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.
Since the triple's default is hard float, the libcalls will already use VFP
registers.

llvm-svn: 337386
2018-07-18 12:37:43 +00:00
Eli Friedman d2c739230c [ARM] Treat cmn immediates as legal in isLegalICmpImmediate.
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions.  Fix the type
conversions, and perform the correct check for negative immediates.

This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.

Differential Revision: https://reviews.llvm.org/D48907

llvm-svn: 336742
2018-07-10 23:44:37 +00:00
Ivan A. Kosarev 466037900c [NEON] Fix combining of vldx_dup intrinsics with updating of base addresses
Resolves:
Unsupported ARM Neon intrinsics in Target-specific DAG combine
function for VLDDUP
https://bugs.llvm.org/show_bug.cgi?id=38031

Related diff: D48439

Differential Revision: https://reviews.llvm.org/D48920

llvm-svn: 336325
2018-07-05 08:59:49 +00:00
Vadzim Dambrouski fd10286e04 [ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.
Reviewers: efriedma, rogfer01, javed.absar

Reviewed By: efriedma, rogfer01

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D48846

llvm-svn: 336144
2018-07-02 21:05:26 +00:00
Ivan A. Kosarev 7231598fce [NEON] Support vldNq intrinsics in AArch32 (LLVM part)
This patch adds support for the q versions of the dup
(load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for
example.

Currently, non-q versions of the dup intrinsics are implemented
in clang by generating IR that first loads the elements of the
structure into the first lane with the lane (to-single-lane)
intrinsics, and then propagating it other lanes. There are at
least two problems with this approach. First, there are no
double-spaced to-single-lane byte-element instructions. For
example, there is no such instruction as 'vld2.8 { d0[0], d2[0]
}, [r0]'. That means we cannot rely on the to-single-lane
intrinsics and instructions to implement the q versions of the
dup intrinsics. Note that to-all-lanes instructions do support
all sizes of data items, including bytes.

The second problem with the current approach is that we need a
separate vdup instruction to propagate the structure to each
lane. So for vld4q_dup_f16() we would need four vdup instructions
in addition to the initial vld instruction.

This patch introduces dup LLVM intrinsics and reworks handling of
the currently supported (non-q) NEON dup intrinsics to expand
them into those LLVM intrinsics, thus eliminating the need for
using to-single-lane intrinsics and instructions.

Additionally, this patch adds support for u64 and s64 dup NEON
intrinsics. These are marked as Arch64-only in the ARM NEON
Reference, but it seems there are no reasons to not support them
in AArch32 mode. Please correct, if that is wrong.

That's what we generate with this patch applied:

vld2q_dup_f16:
  vld2.16 {d0[], d2[]}, [r0]
  vld2.16 {d1[], d3[]}, [r0]

vld3q_dup_f16:
  vld3.16 {d0[], d2[], d4[]}, [r0]
  vld3.16 {d1[], d3[], d5[]}, [r0]

vld4q_dup_f16:
  vld4.16 {d0[], d2[], d4[], d6[]}, [r0]
  vld4.16 {d1[], d3[], d5[], d7[]}, [r0]

Differential Revision: https://reviews.llvm.org/D48439

llvm-svn: 335733
2018-06-27 13:57:52 +00:00
Ivan A. Kosarev 847daa11f8 [NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47447

llvm-svn: 334361
2018-06-10 09:27:27 +00:00
Ivan A. Kosarev 60a991ed1a [NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47120

llvm-svn: 333825
2018-06-02 16:40:03 +00:00
Ivan A. Kosarev 73c5337a64 Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"
The LLVM part was committed instead of the Clang part.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333824
2018-06-02 16:38:38 +00:00
Ivan A. Kosarev 51f19b9ee1 [NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.

Differential Revision: https://reviews.llvm.org/D47121

llvm-svn: 333819
2018-06-02 16:26:42 +00:00
Amaury Sechet f47d9f30b0 [ARM] Remove code handling ADDC/ADDE/SUBC/SUBE
Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY .

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D47413

llvm-svn: 333544
2018-05-30 13:45:43 +00:00
Eli Friedman 63fead0f43 [ARM] Enable SETCCCARRY lowering for Thumb1.
We've had Thumb1 support for ARMISD::SUBE for a while now, so this just
works.  Reduces codesize a bit for 64-bit integer comparisons.

Differential Revision: https://reviews.llvm.org/D47387

llvm-svn: 333445
2018-05-29 18:17:16 +00:00
Tim Northover 4e3eec39fa ARM: be conservative when asked load/store alignment of weird type.
Chances are we'll be asked again after type legalization, but before that point
it's better to claim misaligned accesses aren't allowed than to assert.

llvm-svn: 332840
2018-05-21 12:43:54 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Amaury Sechet 4f729f6a67 [ARM] Add support for SETCCCARRY instead of SETCCE
Summary: As per title. SETCCE is deprecated and will eventually be removed.

Reviewers: rogfer01, efriedma, rengolin, javed.absar

Subscribers: kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D46512

llvm-svn: 331929
2018-05-09 22:15:51 +00:00
Amaury Sechet f91b6a8cf7 [ARM] Select result 1 from ConvertBooleanCarryToCarryFlag's result automatically. NFC
The old behavior return the value 0, which is error prone.

llvm-svn: 331614
2018-05-07 01:43:42 +00:00
Tim Northover 28e0a6f7dd ARM: don't try to over-align large vectors as arguments.
By default LLVM thinks very large vectors get aligned to their size when
passed across functions. Unfortunately no-one told the ARM backend so it
doesn't trigger stack realignment and so accesses can cause the usual
misalignment issues (e.g. a data abort).

This changes the ABI alignment to the stack alignment, which in practice
(and as a bonus) also coincides with the alignment "natural" vectors get.

llvm-svn: 331451
2018-05-03 12:54:25 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sjoerd Meijer 834f7dc7ab [ARM] FP16 vmaxnm/vminnm scalar instructions
This adds code generation support for the FP16 vmaxnm/vminnm scalar
instructions.

Differential Revision: https://reviews.llvm.org/D44675

llvm-svn: 330034
2018-04-13 15:34:26 +00:00
Sjoerd Meijer ac96d7c4b3 [ARM] FP16 VSEL codegen
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.

More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.

Differential Revision: https://reviews.llvm.org/D45205

llvm-svn: 329788
2018-04-11 09:28:04 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
Christof Douma a1e77c0e02 [ARM] Support float literals under XO
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.

Differential Revision: https://reviews.llvm.org/D44941

llvm-svn: 328691
2018-03-28 10:02:26 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Christof Douma 4a025cc79d [ARM] Support float literals under XO
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.

This patch also restores using fpcmp with immediate 0 under fp-armv8.

Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
2018-03-23 13:02:03 +00:00
Martin Storsjo 9a55c1b0dc [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

llvm-svn: 327897
2018-03-19 20:06:50 +00:00
Sjoerd Meijer d16037d9bb [ARM] Support for v4f16 and v8f16 vectors
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.

Differential Revision: https://reviews.llvm.org/D44538

llvm-svn: 327839
2018-03-19 13:35:25 +00:00
Sjoerd Meijer d391a1a985 [ARM] FP16 codegen support for VSEL
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.

Differential Revision: https://reviews.llvm.org/D44518

llvm-svn: 327695
2018-03-16 08:06:25 +00:00
Sjoerd Meijer af30f06d5c [ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that also uses a
constant because we may get stuck in an infinite combine loop.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577

Patch by Sam Parker.

Differential Revision: https://reviews.llvm.org/D44097

llvm-svn: 326882
2018-03-07 09:10:44 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Pablo Barrio 512f7ee315 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Martin Svanfeldt

Reviewers: fhahn, pbarrio, rogfer01

Reviewed By: pbarrio, rogfer01

Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 326333
2018-02-28 17:13:07 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Sjoerd Meijer 4d5c40492a [ARM] Lower BR_CC for f16
This case wasn't handled yet.

Differential Revision: https://reviews.llvm.org/D43508

llvm-svn: 325616
2018-02-20 19:28:05 +00:00
Roger Ferrer Ibanez d41059a9f6 [ARM] Materialise some boolean values to avoid a branch
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form

  a != b ? x : 0
  a == b ? 0 : x

and that currently (e.g. in Thumb1) are emitted as branches.

Differential Revision: https://reviews.llvm.org/D34515

llvm-svn: 325323
2018-02-16 09:23:59 +00:00
Pablo Barrio e28cb8399a [ARM] Allow 64- and 128-bit types with 't' inline asm constraint
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).

For example, the following C code:

#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))

results in the following assembly if compiled with GCC:

vadd.f32 s0, s0, s1

whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.

This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:

vadd.f32 q0, q0, q1

This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:

asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))

Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.

Reviewers: grosbach, rengolin

Reviewed By: rengolin

Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42962

llvm-svn: 325244
2018-02-15 14:44:22 +00:00
Sjoerd Meijer 89ea2648bb [ARM] Armv8.2-A FP16 code generation (part 3/3)
This adds most of the FP16 codegen support, but these areas need further work:

- FP16 literals and immediates are not properly supported yet (e.g. literal
  pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
  added.

This will be addressed in follow-up patches.

Differential Revision: https://reviews.llvm.org/D42849

llvm-svn: 324321
2018-02-06 08:43:56 +00:00
Sjoerd Meijer 9d9a86535e [ARM] FullFP16 LowerReturn Fix
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).

Differential Revision: https://reviews.llvm.org/D42743

llvm-svn: 323968
2018-02-01 13:48:40 +00:00
Evgeniy Stepanov 7746899f48 Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"
Miscompiles code. Testcase pending.

This reverts commit r323869.

llvm-svn: 323929
2018-01-31 22:55:19 +00:00
Pablo Barrio 2e442a7831 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 323869
2018-01-31 13:20:10 +00:00
Sjoerd Meijer 98d5359ea2 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861
2018-01-31 10:18:29 +00:00
Sjoerd Meijer 011de9c0ca [ARM] Armv8.2-A FP16 code generation (part 1/3)
This is the groundwork for Armv8.2-A FP16 code generation .

Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:

_Float16 sub(_Float16 a, _Float16 b) {
  return a + b;
}

gets lowered to this:

define float @sub(float %a.coerce, float %b.coerce) {
entry:
  %0 = bitcast float %a.coerce to i32
  %tmp.0.extract.trunc = trunc i32 %0 to i16
  %1 = bitcast i16 %tmp.0.extract.trunc to half
  <SNIP>
  %add = fadd half %1, %3
  <SNIP>
}

When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.

When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.

As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.

I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.

This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.


Thanks to Sam Parker and Oliver Stannard for their help and reviews!


Differential Revision: https://reviews.llvm.org/D38315

llvm-svn: 323512
2018-01-26 09:26:40 +00:00
Weiming Zhao 665784f170 [ARM] Expand long shifts for Thumb1 to __aeabi_ calls
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.

Reviewers: samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42401

llvm-svn: 323354
2018-01-24 18:00:57 +00:00
Martin Storsjo 4ed94a06ac [ARM] Call __chkstk for dynamic stack allocation in all windows environments
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.

On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.

Differential Revision: https://reviews.llvm.org/D42292

llvm-svn: 323308
2018-01-24 06:40:11 +00:00
Joel Galenson bbcaf4ac5c [ARM] Optimize {s,u}mul.with.overflow.
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.

Differential revision: https://reviews.llvm.org/D40922

llvm-svn: 322738
2018-01-17 19:19:05 +00:00
Andre Vieira 5627c218e1 [ARM] Add codegen for SMMULR, SMMLAR and SMMLSR
This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR
instructions from equivalent IR patterns.

Differential Revision: https://reviews.llvm.org/D41775

llvm-svn: 322361
2018-01-12 09:24:41 +00:00
Joel Galenson 6f4e827e4c [ARM] Optimize {s,u}{add,sub}.with.overflow.
The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG.  This commit ports that code to the ARM backend.

Differential revision: https://reviews.llvm.org/D35635

llvm-svn: 321224
2017-12-20 22:25:39 +00:00
Florian Hahn c3aa6d83fd [ARM] Lower unsigned saturation to USAT
Summary:
Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it.

Patch by Marten Svanfeldt

Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41348

llvm-svn: 321164
2017-12-20 11:13:57 +00:00
Adrian Prantl 0e6694d111 Silence a bunch of implicit fallthrough warnings
llvm-svn: 321114
2017-12-19 22:05:25 +00:00
Matthias Braun a4852d2c19 X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
  InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
  superfluous as in the darwin case it wouldn't be used while for all
  other cases InitLibcalls already does it.

llvm-svn: 321036
2017-12-18 23:19:42 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
Geoff Berry 60c431022e [MachineOperand][MIR] Add isRenamable to MachineOperand.
Summary:
Add isRenamable() predicate to MachineOperand.  This predicate can be
used by machine passes after register allocation to determine whether it
is safe to rename a given register operand.  Register operands that
aren't marked as renamable may be required to be assigned their current
register to satisfy constraints that are not captured by the machine
IR (e.g. ABI or ISA constraints).

Reviewers: qcolombet, MatzeB, hfinkel

Subscribers: nemanjai, mcrosier, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D39400

llvm-svn: 320503
2017-12-12 17:53:59 +00:00
Roger Ferrer Ibanez 5ea0f2501f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045
 - fixes PR34564
 - fixes PR35103

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 320355
2017-12-11 12:13:45 +00:00
Francis Visoiu Mistrih a8a83d150f [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Pablo Barrio 2b4385846c Fix function pointer tail calls in armv8-M.base
Summary:
The compiler fails with the following error message:

fatal error: error in backend: ran out of registers during
register allocation

Tail call optimization for Armv8-M.base fails to meet all the required
constraints when handling calls to function pointers where the
arguments take up r0-r3. This is because the pointer to the
function to be called can only be stored in r0-r3, but these are
all occupied by arguments. This patch makes sure that tail call
optimization does not try to handle this type of calls.

Reviewers: chill, MatzeB, olista01, rengolin, efriedma

Reviewed By: olista01, efriedma

Subscribers: efriedma, aemerson, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D40706

llvm-svn: 319664
2017-12-04 16:55:49 +00:00
Martin Storsjo c85cc41801 [ARM] Allow using emulated tls on platforms other than ELF
This matches how it is done on X86.

This allows using emulated tls on windows; in MinGW environments,
native tls isn't supported at the moment.

Differential Revision: https://reviews.llvm.org/D40769

llvm-svn: 319643
2017-12-04 09:08:55 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Yi Kong 39bcd4ed3e [ARM] 't' asm constraint should accept i32
't' constraint normally only accepts f32 operands, but for VCVT the
operands can be i32. LLVM is overly restrictive and rejects asm like:

  float foo() {
    float result;
    __asm__ __volatile__(
      "vcvt.f32.s32 %[result], %[arg1]\n"
      : [result]"=t"(result)
      : [arg1]"t"(0x01020304) );
    return result;
  }

Relax the value type for 't' constraint to either f32 or i32.

Differential Revision: https://reviews.llvm.org/D40137

llvm-svn: 318472
2017-11-16 23:38:17 +00:00
Martin Storsjo 4629f52312 [ARM, AArch64] Fix an assert message, Darwin isn't the only target supporting TLS. NFC.
llvm-svn: 318184
2017-11-14 19:57:59 +00:00
Evgeniy Stepanov 76d5ac4906 [arm] Fix Unnecessary reloads from GOT.
Summary:
This fixes PR35221.
Use pseudo-instructions to let MachineCSE hoist global address computation.

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39871

llvm-svn: 318081
2017-11-13 20:45:38 +00:00
Momchil Velikov 842aa90192 [ARM] Place jump table as the first operand in additions
When generating table jump code for switch statements, place the jump
table label as the first operand in the various addition instructions
in order to enable addressing mode selectors to better match index
computation and possibly fold them into the addressing mode of the
table entry load instruction.

Differential revision: https://reviews.llvm.org/D39752

llvm-svn: 318033
2017-11-13 11:56:48 +00:00
David Blaikie 3f833edc7c Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Sam Parker 242052c6b4 [ARM] and, or, xor and add with shl combine
The generic dag combiner will fold:

(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
(shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)

This can create constants which are too large to use as an immediate.
Many ALU operations are also able of performing the shl, so we can
unfold the transformation to prevent a mov imm instruction from being
generated.

Other patterns, such as b + ((a << 1) | 510), can also be simplified
in the same manner.

Differential Revision: https://reviews.llvm.org/D38084

llvm-svn: 317197
2017-11-02 10:43:10 +00:00
Roger Ferrer Ibanez 9dfbc10522 Revert r313618 "[ARM] Use ADDCARRY / SUBCARRY"
That change causes PR35103, so reverting until I figure it out.

llvm-svn: 317092
2017-11-01 14:06:57 +00:00
Eli Friedman d5dfb62de7 [ARM] Honor -mfloat-abi for libcall calling convention
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.

This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.

Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.

Differential Revision: https://reviews.llvm.org/D38299

llvm-svn: 316708
2017-10-26 21:42:32 +00:00
Sam Parker 1f742117bd [ARM] OrCombineToBFI function
Extract the functionality to combine OR to BFI into its own function.

Differential Revision: https://reviews.llvm.org/D39001

llvm-svn: 316563
2017-10-25 08:37:33 +00:00
Sam Parker ccb209bb97 [ARM] Swap cmp operands for automatic shifts
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.

Differential Revision: https://reviews.llvm.org/D39004

llvm-svn: 316562
2017-10-25 08:33:06 +00:00
Sjoerd Meijer 7a22a4948f ISel type legalization: add debug messages. NFCI.
This adds some more debug messages to the type legalizer and functions
like PromoteNode, ExpandNode, ExpandLibCall in an attempt to make
the debug messages a little bit more informative and useful.

Differential Revision: https://reviews.llvm.org/D38450

llvm-svn: 314773
2017-10-03 08:54:15 +00:00
Sam Parker 211f47aa37 [ARM] isTruncateFree fix
I implemented isTruncateFree in rL313533, this patch fixes the logic
to match my comment, as the previous logic was too general. Now the
only truncates that are free are i64 -> i32.

Differential Revision: https://reviews.llvm.org/D38234

llvm-svn: 314280
2017-09-27 08:30:45 +00:00
Eugene Zelenko 076468c0d0 [ARM] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 313823
2017-09-20 21:35:51 +00:00
Roger Ferrer Ibanez 8d0180c955 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045
 - fixes PR34564

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313618
2017-09-19 09:05:39 +00:00
Sam Parker 71efbe4c68 [ARM] Implement isTruncateFree
Implement the isTruncateFree hooks, lifted from AArch64, that are
used by TargetTransformInfo. This allows simplifycfg to reduce the
test case into a single basic block.

Differential Revision: https://reviews.llvm.org/D37516

llvm-svn: 313533
2017-09-18 14:28:51 +00:00
Hans Wennborg 8c1eb106bd Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY"
This was causing PR34045 to fire again.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

Also revert follow-up r313010:

> [ARM] Fix typo when creating ISD::SUB nodes
>
> In D35192, I accidentally introduced a typo when creating ISD::SUB nodes,
> giving them two values instead of one.
>
> This fails when the merge_values combiner finds one of these nodes.
>
> This change fixes PR34564.
>
> Differential Revision: https://reviews.llvm.org/D37690

llvm-svn: 313044
2017-09-12 16:24:17 +00:00
Roger Ferrer Ibanez 9df2527b0b [ARM] Fix typo when creating ISD::SUB nodes
In D35192, I accidentally introduced a typo when creating ISD::SUB nodes,
giving them two values instead of one.

This fails when the merge_values combiner finds one of these nodes.

This change fixes PR34564.

Differential Revision: https://reviews.llvm.org/D37690

llvm-svn: 313010
2017-09-12 07:42:28 +00:00
Roger Ferrer Ibanez 4f92b4162f [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 313009
2017-09-12 07:40:09 +00:00
Hans Wennborg 075e5a2e2b Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY"
It caused PR34564.

> This is a preparatory step for D34515 and also is being recommitted as its
> first version caused PR34045.
>
> This change:
>  - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
>  - lowering is done by first converting the boolean value into the carry flag
>    using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
>    using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
>    operations does the actual addition.
>  - for subtraction, given that ISD::SUBCARRY second result is actually a
>    borrow, we need to invert the value of the second operand and result before
>    and after using ARMISD::SUBE. We need to invert the carry result of
>    ARMISD::SUBE to preserve the semantics.
>  - given that the generic combiner may lower ISD::ADDCARRY and
>    ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
>    as well otherwise i64 operations now would require branches. This implies
>    updating the corresponding test for unsigned.
>  - add new combiner to remove the redundant conversions from/to carry flags
>    to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
>  - fixes PR34045
>
> Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312980
2017-09-11 23:52:02 +00:00
Roger Ferrer Ibanez 12b20f2307 [ARM] Use ADDCARRY / SUBCARRY
This is a preparatory step for D34515 and also is being recommitted as its
first version caused PR34045.

This change:
 - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
 - lowering is done by first converting the boolean value into the carry flag
   using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value
   using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
   operations does the actual addition.
 - for subtraction, given that ISD::SUBCARRY second result is actually a
   borrow, we need to invert the value of the second operand and result before
   and after using ARMISD::SUBE. We need to invert the carry result of
   ARMISD::SUBE to preserve the semantics.
 - given that the generic combiner may lower ISD::ADDCARRY and
   ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering
   as well otherwise i64 operations now would require branches. This implies
   updating the corresponding test for unsigned.
 - add new combiner to remove the redundant conversions from/to carry flags
   to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C
 - fixes PR34045

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 312898
2017-09-11 07:38:05 +00:00
Diana Picus 1d101d76c0 Move static helper into ARMTargetLowering. NFC
This exposes the isReadOnly(GlobalValue *) in the ARMTargetLowering so
we can make use of it in GlobalISel as well.

llvm-svn: 312320
2017-09-01 10:44:48 +00:00
Evgeny Astigeevich 540a39adf7 [ARM, Thumb1] Prevent ARMTargetLowering::isLegalAddressingMode from accepting illegal modes
ARMTargetLowering::isLegalAddressingMode can accept illegal addressing modes
for the Thumb1 target. This causes generation of redundant code and affects
performance.

This fixes PR34106: https://bugs.llvm.org/show_bug.cgi?id=34106

Differential Revision: https://reviews.llvm.org/D36467

llvm-svn: 311649
2017-08-24 10:00:25 +00:00
Renato Golin f63d701669 [ARM] Call setBooleanContents(ZeroOrOneBooleanContent)
The ARM backend should call setBooleanContents so that it can
use known bits to make some optimizations.

Review: D35821

Patch by Joel Galenson <jgalenson@google.com>

llvm-svn: 311446
2017-08-22 11:02:37 +00:00
Chandler Carruth b866178067 Fix a typo in r311435.
llvm-svn: 311437
2017-08-22 09:20:52 +00:00
Alex Bradbury 080f6976c0 Use report_fatal_error for unsupported calling conventions
The calling convention can be specified by the user in IR. Failing to support 
a particular calling convention isn't a programming error, and so relying on 
llvm_unreachable to catch and report an unsupported calling convention is not 
appropriate.

Differential Revision: https://reviews.llvm.org/D36830

llvm-svn: 311435
2017-08-22 09:11:41 +00:00
Martin Storsjo d606d2a643 [ARM] Factorize the calculation of WhichResult in isV*Mask. NFC.
Differential Revision: https://reviews.llvm.org/D36930

llvm-svn: 311260
2017-08-19 20:26:51 +00:00
Martin Storsjo 91522ffa12 [ARM] Check the right order for halves of VZIP/VUZP if both parts are used
This is the exact same fix as in SVN r247254. In that commit, the fix was
applied only for isVTRNMask and isVTRN_v_undef_Mask, but the same issue
is present for VZIP/VUZP as well.

This fixes PR33921.

Differential Revision: https://reviews.llvm.org/D36899

llvm-svn: 311258
2017-08-19 19:47:48 +00:00
Saleem Abdulrasool dd8c16b58e ARM: mark CPSR as clobbered for Windows VLAs
When lowering a VLA, we emit a __chstk call.  However, this call can
internally clobber CPSR.  We did not mark this register as an ImpDef,
which could potentially allow a comparison to be hoisted above the call
to `__chkstk`.  In such a case, the CPSR could be clobbered, and the
check invalidated.  When the support was initially added, it seemed that
the call would take care of preventing CPSR from being clobbered, but
this is not the case.  Mark the register as clobbered to fix a possible
state corruption.

llvm-svn: 311061
2017-08-17 02:42:24 +00:00
Craig Topper 2251ef95a3 [X86][ARM][TargetLowering] Add SrcVT to isExtractSubvectorCheap
Summary:
Without the SrcVT its hard to know what is really being asked for. For example if your target has 128, 256, and 512 bit vectors. Maybe extracting 128 from 256 is cheap, but maybe extracting 128 from 512 is not.

For x86 we do support extracting a quarter of a 512-bit register. But for i1 vectors we don't have isel patterns for extracting arbitrary pieces. So we need this to have a correct implementation of isExtractSubvectorCheap for mask vectors.

Reviewers: RKSimon, zvi, efriedma

Reviewed By: RKSimon

Subscribers: aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36649

llvm-svn: 310793
2017-08-13 17:29:07 +00:00
Eli Friedman e9499dd4c7 [ARM] Clarify legal addressing modes for ARM and Thumb2. NFC
The existing code is very clever, but not clear, which seems
like the wrong tradeoff here.

Differential Revision: https://reviews.llvm.org/D36559

llvm-svn: 310653
2017-08-10 19:31:27 +00:00
Nico Weber bfde70b097 Revert r309923, it caused PR34045.
llvm-svn: 309950
2017-08-03 15:41:26 +00:00
Roger Ferrer Ibanez 854980341b [ARM] Use ADDCARRY / SUBCARRY
This patch:

- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
  using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value
  using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
  operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
  borrow, we need to invert the value of the second operand and result before
  and after using ARMISD::SUBE. We need to invert the carry result of
  ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
  ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering
  as well otherwise i64 operations now would require branches. This implies
  updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
  to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 309923
2017-08-03 07:45:10 +00:00