Commit Graph

3643 Commits

Author SHA1 Message Date
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette 2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Pablo Barrio a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Diogo N. Sampaio 3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Petr Pavlu e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Francis Visoiu Mistrih 879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
David Blaikie 1fecbec5fa AArch64ISelLowering: Remove a return-of-assignment to allow NRVO
Patch by Arthur O'Dwyer!

llvm-svn: 347609
2018-11-26 22:57:18 +00:00
Evandro Menezes 6a38a5effe [AArch64] Refactor the scheduling predicates (3/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasExtendedReg()`.

Differential revision: https://reviews.llvm.org/D54822

llvm-svn: 347599
2018-11-26 21:47:46 +00:00
Evandro Menezes 56368c6fa5 [AArch64] Refactor the scheduling predicates (2/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasShiftedReg()`.

Differential revision: https://reviews.llvm.org/D54820

llvm-svn: 347598
2018-11-26 21:47:41 +00:00
Evandro Menezes b02ac8bd21 [AArch64] Refactor the scheduling predicates (1/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::isScaledAddr()`

Differential revision: https://reviews.llvm.org/D54777

llvm-svn: 347597
2018-11-26 21:47:28 +00:00
Than McIntosh 30c804bbb1 [CodeGen] Support custom format of stack maps
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.

For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, apilipenko, reames

Reviewed By: reames

Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53892

llvm-svn: 347584
2018-11-26 18:43:48 +00:00
Luke Cheeseman 6db3a6a4a7 Revert r347490 as it breaks address sanitizer builds
llvm-svn: 347499
2018-11-23 17:13:06 +00:00
Luke Cheeseman d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
John Brawn d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Simon Pilgrim c9cc6cca42 Fix MSVC 'truncation of constant value' warning. NFCI.
llvm-svn: 347308
2018-11-20 14:29:40 +00:00
Martin Elshuber fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Peter Collingbourne 527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Jessica Paquette 27e1754fc9 [MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.

If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.

llvm-svn: 346901
2018-11-14 22:23:38 +00:00
Jessica Paquette 4e97ec94d9 [MachineOutliner][NFC] Use flags set in all candidates to check for calls
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.

This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)

llvm-svn: 346816
2018-11-13 23:41:31 +00:00
Jessica Paquette cad864d49e [MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.

The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.

llvm-svn: 346809
2018-11-13 23:01:34 +00:00
Jessica Paquette b2d53c5d7d [MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.

llvm-svn: 346803
2018-11-13 22:16:27 +00:00
Jessica Paquette 106946329d [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.

llvm-svn: 346721
2018-11-13 00:32:09 +00:00
Jessica Paquette 82d9c0a3fa [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom
Instead of returning Flags, return true if the MBB is safe to outline from.

This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.

llvm-svn: 346718
2018-11-12 23:51:32 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Eli Friedman ad1151cf6a [ARM64] [Windows] Handle funclets
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.

Patch by Sanjin Sijaric, with some fixes by me.

Differential Revision: https://reviews.llvm.org/D51524

llvm-svn: 346568
2018-11-09 23:33:30 +00:00
Bryan Chan 123553921f [AArch64] Support HiSilicon's TSV110 processor
Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls

Reviewed By: kristof.beyls

Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D53908

llvm-svn: 346546
2018-11-09 19:32:08 +00:00
Clement Courbet eee2e06e2a [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Mandeep Singh Grang 397765bc51 [COFF, ARM64] Add support for MSVC buffer security check
Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D54248

llvm-svn: 346469
2018-11-09 02:48:36 +00:00
Eli Friedman 0917d0c80c [AArch64] [Windows] Address post-commit review comment on r346358.
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.

llvm-svn: 346366
2018-11-07 22:30:56 +00:00
Eli Friedman d00fb2e0a8 [AArch64] [Windows] Trap after noreturn calls.
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.

Differential Revision: https://reviews.llvm.org/D54129

llvm-svn: 346358
2018-11-07 21:31:14 +00:00
Evandro Menezes f1a0d93b1d [PATCH] [AArch64] Refactor helper functions (NFC)
Refactor helper functions in AArch64InstrInfo to be static methods.

llvm-svn: 346273
2018-11-06 22:17:14 +00:00
Matthias Braun 96d12513a1 AArch64: Cleanup CCMP code; NFC
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
  (it won't accept arbitrary disjunctions and is really about whether we
   can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`

llvm-svn: 346203
2018-11-06 03:15:22 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Mandeep Singh Grang 547a0d765a [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53996

llvm-svn: 345909
2018-11-01 23:22:25 +00:00
Mandeep Singh Grang df19e57a1c [COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic
Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53962

llvm-svn: 345892
2018-11-01 21:23:47 +00:00
Volkan Keles 0a8dc9eb0f [GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.

Reviewers: dsanders, aemerson, aditya_nandakumar

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53734

llvm-svn: 345875
2018-11-01 19:01:53 +00:00
Reid Kleckner eb56894a4b [AArch64] Fix unintended fallthrough and strengthen cast
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.

This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.

llvm-svn: 345864
2018-11-01 18:02:27 +00:00
Mandeep Singh Grang b0cdf56dd7 Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"
This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2.

llvm-svn: 345863
2018-11-01 17:53:57 +00:00
Chad Rosier 1546efd4a7 [AArch64] Add support for ARMv8.4 in Saphira.
llvm-svn: 345827
2018-11-01 13:45:16 +00:00
Mandeep Singh Grang 88ad9ac720 [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53673

llvm-svn: 345791
2018-10-31 23:16:20 +00:00
Evandro Menezes 3a06c46470 [AArch64] Sort switch cases (NFC)
llvm-svn: 345786
2018-10-31 21:56:49 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Sanjin Sijaric fadebc8aae [ARM64] [Windows] Exception handling support in frame lowering
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue.  These are used by the MCLayer to
populate the .xdata section.

Differential Revision: https://reviews.llvm.org/D50288

llvm-svn: 345701
2018-10-31 09:27:01 +00:00
Martin Storsjo 315357faca [AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk
This is similar to SVN r311061 for ARM.

Differential Revision: https://reviews.llvm.org/D53878

llvm-svn: 345698
2018-10-31 08:14:09 +00:00
Mandeep Singh Grang 71e0cc2a0b [COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg
Summary:
    Thunk functions in Windows are varag functions that call a musttail function
    to pass the arguments after the fixup is done.  We need to make sure that we
    forward the arguments from the caller vararg to the callee vararg function.
    This is the same mechanism that is used for Windows on X86.

Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53843

llvm-svn: 345641
2018-10-30 20:46:10 +00:00
Eli Friedman 93d0129b78 [AArch64] [Windows] SEH opcodes should be scheduling boundaries.
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.

isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.

Differential Revision: https://reviews.llvm.org/D53851

llvm-svn: 345634
2018-10-30 19:24:51 +00:00
David Greene 3e89fa8e08 [AArch64] Create proper memoperand for multi-vector stores
Re-apply r345315 with testcase fixes.

Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345631
2018-10-30 19:17:51 +00:00
Diogo N. Sampaio a3783b2ac2 [AArch64] Add support for UDF instruction
Summary: Add support for AArch64 UDF instruction.
UDF - Permanently Undefined generates an Undefined
Instruction exception (ESR_ELx.EC = 0b000000).

Reviewers: DavidSpickett, javed.absar, t.p.northover 

Reviewed By: javed.absar

Subscribers: nhaehnle, kristof.beyls

Differential Revision: https://reviews.llvm.org/D53319

llvm-svn: 345581
2018-10-30 11:06:50 +00:00
Reid Kleckner 23c9efc071 Remove unneeded friend declarations that clang-cl warns on
llvm-svn: 345549
2018-10-29 22:38:13 +00:00
Bryan Chan bfd32d4377 [AArch64] Rename FP16FML instruction format (NFC)
Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow
usual naming convention, and add some comments in the .td files.

llvm-svn: 345515
2018-10-29 17:27:34 +00:00
Luke Cheeseman 71c989ae1f [AArch64] Return address signing B key support
- Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return
  address signing
- The key used to sign the function is controlled by the function attribute
  "sign-return-address-key"

Differential Revision: https://reviews.llvm.org/D51427

llvm-svn: 345511
2018-10-29 16:26:58 +00:00
Sanjin Sijaric 96f2ea3dd4 [ARM64][Windows] MCLayer support for exception handling
Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted
by the frame lowering patch to follow.  We only emit unwind codes into object
object files for now.

Differential Revision: https://reviews.llvm.org/D50166

llvm-svn: 345450
2018-10-27 06:13:06 +00:00
Vlad Tsyrklevich 21beeb29ea Revert "[AArch64] Create proper memoperand for multi-vector stores"
This reverts commit r345315, it was causing test failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 345356
2018-10-26 02:00:14 +00:00
Bryan Chan f0923f16f8 [AArch64] Implement FP16FML intrinsics
Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a
DAG pattern to define the indexed-form intrinsics in terms of the vector-form
ones, similarly to how the Dot Product intrinsics were implemented.

Based on a patch by Gao Yiling.

Differential Revision: https://reviews.llvm.org/D53632

llvm-svn: 345337
2018-10-25 23:36:41 +00:00
David Greene 53e869da7d [AArch64] Create proper memoperand for multi-vector stores
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345315
2018-10-25 21:10:39 +00:00
Volkan Keles f87473fe1c [GISel] LegalizerInfo: Rename MemDesc::Size to SizeInBits to make the value clearer
Requested in D53679.

llvm-svn: 345288
2018-10-25 17:37:07 +00:00
Volkan Keles 3a103b1d25 [AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.

This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.

Reviewers: dsanders, aemerson

Reviewed By: dsanders

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53679

llvm-svn: 345282
2018-10-25 17:23:25 +00:00
Evandro Menezes b53cf99388 [AArch64] Refactor Exynos feature sets (NFC)
llvm-svn: 345279
2018-10-25 16:45:46 +00:00
John Brawn 958865202d [AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.

Differential Revision: https://reviews.llvm.org/D53582

llvm-svn: 345275
2018-10-25 15:31:51 +00:00
John Brawn b8e7887f33 [AArch64] Refactor definition of EXT patterns to use a multiclass
Using a multiclass reduces duplication, and makes it easier to add new patterns
later. This refactoring does add some new patterns, but as far as I can tell
there's no IR that will end up triggering them so this is effectively NFC.

Differential Revision: https://reviews.llvm.org/D53580

llvm-svn: 345271
2018-10-25 15:00:10 +00:00
John Brawn 49e61d90ca [AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.

Differential Revision: https://reviews.llvm.org/D53579

llvm-svn: 345270
2018-10-25 14:56:48 +00:00
Amara Emerson cbd86d8429 [GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index.
Allows for better imported pattern re-use.

llvm-svn: 345265
2018-10-25 14:04:54 +00:00
Simon Pilgrim 071e82218f [TTI] Add generic SK_Broadcast shuffle costs
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.

This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.

Differential Revision: https://reviews.llvm.org/D53570

llvm-svn: 345253
2018-10-25 10:52:36 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Evandro Menezes 096e2497b5 [AArch64] Refactor Exynos machine model
Effectively, NFC.

llvm-svn: 345201
2018-10-24 21:40:43 +00:00
Tim Northover 1c353419ab AArch64: add a pass to compress jump-table entries when possible.
llvm-svn: 345188
2018-10-24 20:19:09 +00:00
Evandro Menezes 769d4cebad [AArch64] Refactor Exynos machine model (NFC)
llvm-svn: 345187
2018-10-24 20:03:24 +00:00
Evandro Menezes 80bc136732 [AArch64] Fix overlapping instructions
Fix overlapping instruction descriptions in the machine model for Exynos M3.
Effectively, NFC.

llvm-svn: 345186
2018-10-24 20:03:20 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Evandro Menezes c98decf864 [PATCH] [NFC][AArch64] Fix refactoring of macro fusion
Fix compiler error.

llvm-svn: 344632
2018-10-16 17:41:45 +00:00
Evandro Menezes de655c6d3a [NFC][AArch64] Refactor macro fusion
Simplify API of checking functions.

llvm-svn: 344624
2018-10-16 17:19:28 +00:00
Simon Pilgrim 095a7fe635 [AARCH64] Improve vector popcnt lowering with ADDLP
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.

This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.

Differential Revision: https://reviews.llvm.org/D53259

llvm-svn: 344554
2018-10-15 21:15:58 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Arnaud A. de Grandmaison 162435e7b5 [AArch64] Swap comparison operands if that enables some folding.
Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.

This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751

Reviewers: efriedma, t.p.northover, javed.absar

Subscribers: mcrosier, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53067

llvm-svn: 344439
2018-10-13 07:43:56 +00:00
Diogo N. Sampaio 352a2fa1e7 [AARCH64][FIX] Emit data symbol for constant pool data
The ARM64 elf emitter would omit printing data
symbol for zero filled constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D53132

llvm-svn: 344248
2018-10-11 14:10:32 +00:00
Oliver Stannard 367b4741f4 [AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled
When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

llvm-svn: 343969
2018-10-08 14:12:08 +00:00
Oliver Stannard c922116a51 [AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI
When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

llvm-svn: 343968
2018-10-08 14:09:15 +00:00
Oliver Stannard 250e5a5b65 [AArch64][v8.5A] Branch Target Identification code-generation pass
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967
2018-10-08 14:04:24 +00:00
Oliver Stannard 9ecdac8ee0 [AArch64] Fix verifier error when outlining indirect calls
The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

llvm-svn: 343959
2018-10-08 09:18:48 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Matthias Braun 0c67a4e958 AArch64: Fix XSeqPairs/WSeqPairs problems
- Fix spill/reloads of XSeqPairs failing with vregs (only physregs
  worked correctly)
- Add missing spill/reload code for WSeqPairs class

Differential Revision: https://reviews.llvm.org/D52761

llvm-svn: 343799
2018-10-04 17:02:53 +00:00
Daniel Sanders 34eac35a60 Add the missing new files from r343654
llvm-svn: 343655
2018-10-03 02:21:30 +00:00
Daniel Sanders c973ad1878 Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.

llvm-svn: 343654
2018-10-03 02:12:17 +00:00
Matt Morehouse 4b1ec17fb0 Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Oliver Stannard c41902807e [AArch64][v8.5A] Add Memory Tagging instructions
This adds new instructions to manipluate tagged pointers, and to load
and store the tags associated with memory.

Patch by Pablo Barrio, David Spickett and Oliver Stannard!

Differential revision: https://reviews.llvm.org/D52490

llvm-svn: 343572
2018-10-02 10:04:39 +00:00
Oliver Stannard 2a5fcba94b [AArch64][v8.5A] Add Memory Tagging system registers
This adds new system registers introduced by the Memory Tagging
extension.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52488

llvm-svn: 343571
2018-10-02 09:54:35 +00:00
Oliver Stannard 4493f421ac [AArch64][v8.5A] Add MTE system instructions
The Memory Tagging Extension adds system instructions for data cache
maintenance, implemented as new operands to the DC instruction.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52487

llvm-svn: 343570
2018-10-02 09:48:43 +00:00
Oliver Stannard 85de54090e [AArch64][v8.5A] Add MTE as an optional AArch64 extension
This adds the memory tagging extension, which is an optional extension
introduced in v8.5A. The new instructions and registers will be added by
subsequent patches.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52486

llvm-svn: 343563
2018-10-02 09:36:28 +00:00
Daniel Sanders 33f42f97af Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.

llvm-svn: 343546
2018-10-01 22:32:08 +00:00
Daniel Sanders 9659bfda5a [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 343521
2018-10-01 18:56:47 +00:00
Matthias Braun 3e081703c3 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Evandro Menezes 55b9a5395b [AArch64] Refactor cheap cost model
Refactor the order in `TII::isAsCheapAsAMove()` to ease future development
and maintenance.  Practically NFC.

llvm-svn: 343489
2018-10-01 16:11:19 +00:00
Evandro Menezes fc1852ff1c [AArch64] Split zero cycle feature more granularly
Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp`
and `zcz-fp`, respectively, while retaining the original feature option to
mean both.

Differential revision: https://reviews.llvm.org/D52621

llvm-svn: 343354
2018-09-28 19:05:09 +00:00
Luke Cheeseman 10981cc884 Revert r343317
- asan buildbots are breaking and I need to investigate the issue

llvm-svn: 343341
2018-09-28 17:01:50 +00:00
Luke Cheeseman 21f2955bb2 Reapply changes reverted by r343235
- Add fix so that all code paths that create DWARFContext
  with an ObjectFile initialise the target architecture in the context
- Add an assert that the Arch is known in the Dwarf CallFrameString method

llvm-svn: 343317
2018-09-28 13:37:27 +00:00
David Spickett a799fe40dc Remove extra whitespace. NFC. (test commit)
llvm-svn: 343301
2018-09-28 08:45:28 +00:00
Luke Cheeseman 8e5676b1aa Revert r343192 as an ubsan build is currently failing
llvm-svn: 343235
2018-09-27 16:47:30 +00:00
Oliver Stannard 2721e6f0ed [AArch64] Refactor immediate details out of add/sub tblgen class (NFCI)
Bits [23-22] are used in Add and Sub to specify the shift. The value of the
shift field must be 0x; values of 1x are unallocated. MTE adds some instructions
that use such encodings, and this patch refactors the Add/Sub class so that
another class could derive from this one to implement other encodings and other
formats of bitfields.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52489

llvm-svn: 343231
2018-09-27 16:19:04 +00:00
Oliver Stannard a4f68bf4ad [AArch64][v8.5A] Add speculation barriers SSBB and PSSBB
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52483

llvm-svn: 343229
2018-09-27 16:09:05 +00:00
Oliver Stannard a9a5eee169 [AArch64][v8.5A] Add Branch Target Identification instructions
This adds new instructions used by the Branch Target Identification
feature. When this is enabled, these are the only instructions which can
be targeted by indirect branch instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52485

llvm-svn: 343225
2018-09-27 14:54:33 +00:00
Oliver Stannard 8459d34e82 [AArch64][v8.5A] Add speculation restriction system registers
This adds some new system registers which can be used to restrict
certain types of speculative execution.

Patch by Pablo Barrio and David Spickett!

Differential revision: https://reviews.llvm.org/D52482

llvm-svn: 343218
2018-09-27 14:05:46 +00:00
Oliver Stannard dc837e3f1f [AArch64][v8.5A] Add Armv8.5-A random number instructions
This adds two new system registers, used to generate random numbers.

This is an optional extension to v8.5-A, and will be controlled by the
"+rng" modifier of the -march= and -mcpu= options.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52481

llvm-svn: 343217
2018-09-27 14:01:40 +00:00
Oliver Stannard 6930b12d53 [AArch64][v8.5A] Add Armv8.5-A "DC CVADP" instruction
This adds a new variant of the DC system instruction for persistent
memory.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52480

llvm-svn: 343216
2018-09-27 13:53:35 +00:00
Oliver Stannard 224428c06a [AArch64][v8.5A] Add prediction invalidation instructions to AArch64
This adds new system instructions which act as barriers to speculative
execution based on earlier execution within a particular execution
context.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52479

llvm-svn: 343214
2018-09-27 13:47:40 +00:00
Oliver Stannard e481f1d95a [AArch64][v8.5A] Add speculation barrier to AArch64 instruction set
This is a new barrier which limits speculative execution of the
instructions following it.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52476

llvm-svn: 343211
2018-09-27 13:39:06 +00:00
Oliver Stannard ddb7d46aa5 [AArch64][v8.5A] Add FRINT[32,64][Z,X] instructions
These are some new variants of the "Floating-point Round to Integral"
family of instructions, which round to the nearest floating-point value
which fits in a 32- or 64-bit integer.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52475

llvm-svn: 343209
2018-09-27 13:32:06 +00:00
Luke Cheeseman f6844b307a Reapply changes reverted in r343114, lldb patch to follow shortly
llvm-svn: 343192
2018-09-27 10:39:20 +00:00
Oliver Stannard 31af178f4a [AArch64][v8.5A] Add PSTATE manipulation instructions XAFlag and AXFlag
These new instructions manipluate the NZCV bits, to convert between the
regular Arm floating-point comare format and an alternative format.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52473

llvm-svn: 343187
2018-09-27 09:11:27 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Oliver Stannard 2905937435 [AArch64] Extend single-operand FP insns to match Arm ARM (NFCI)
The Armv8.3-A reference manual defines floating-point data-processing
instructions with one source operand to have an opcode of 6 bits
[20:15]. The current class in tablegen, BaseSingleOperandFPData, only
allows [18:15]. This was ok because [20:19] could only be '00', with
other encodings unallocated. Armv8.5-A brings in the FRINT group of
instructions which use other values for these bits.

This patch refactors the existing class a bit to allow using the full 6
bits of the opcode, as defined in the Arm ARM.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52474

llvm-svn: 343120
2018-09-26 15:42:47 +00:00
Luke Cheeseman 77aaa22081 Revert r343112 as CallFrameString API change has broken lldb builds
llvm-svn: 343114
2018-09-26 14:48:03 +00:00
Oliver Stannard c5d192b611 [AArch64] Refactor instructions that write PSTATE (NFCI)
Reuse some code in preparation for the v8.5A XAFlag/AXFlag instructions,
which shares part of the encoding of the MSR-immediate.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52472

llvm-svn: 343113
2018-09-26 14:42:59 +00:00
Luke Cheeseman 03ad8812f5 [AArch64] - Return address signing dwarf support
- Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll

llvm-svn: 343112
2018-09-26 14:30:29 +00:00
Oliver Stannard 89b1604935 [AArch64][AsmParser] Show name of missing feature for system instructions
Parsing of the system instructions (IC, DC, AT and TLBI) uses this
function to show the required architecture when the operand is valid,
but the architecture is not enabled. Armv8.5A adds a few different
system instructions as part of optional features, so we need to extend
it to show individual features, not just base architectures.

This is NFC for now, but will be used by three different features added
in v8.5A, and will be tested by them.

Patch by David Spickett!

Differential revision: https://reviews.llvm.org/D52478

llvm-svn: 343109
2018-09-26 13:52:27 +00:00
Hans Wennborg 00b88bbcaf Revert r343089 "[AArch64] - Return address signing dwarf support"
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.

> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
>   state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
>   i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
>   (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
>   https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136

llvm-svn: 343103
2018-09-26 12:57:45 +00:00
Oliver Stannard 7c3c4baa3f [ARM/AArch64][v8.5A] Add Armv8.5-A target
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52470

llvm-svn: 343102
2018-09-26 12:48:21 +00:00
Luke Cheeseman f755e687fc [AArch64] - Return address signing dwarf support
Functions that have signed return addresses need additional dwarf support:
- After signing the LR, and before authenticating it, the LR register is in a
  state the is unusable by a debugger or unwinder
- To account for this a new directive, .cfi_negate_ra_state, is added
- This directive says the signed state of the LR register has now changed,
  i.e. unsigned -> signed or signed -> unsigned
- This directive has the same CFA code as the SPARC directive GNU_window_save
  (0x2d), adding a macro to account for multiply defined codes
- This patch matches the gcc implementation of this support:
  https://patchwork.ozlabs.org/patch/800271/

Differential Revision: https://reviews.llvm.org/D50136

llvm-svn: 343089
2018-09-26 10:14:15 +00:00
Nirav Dave e40e2bbd37 [AArch64] Share search bookkeeping in combines. NFCI.
Share predecessor search bookkeeping in both perform PostLD1Combine
and performNEONPostLDSTCombine. This should be approximately a 4x and
2x performance improvement.

llvm-svn: 342986
2018-09-25 15:30:22 +00:00
Benjamin Kramer b3478fcf0e [Aarch64] Fix memcpy that was copying 4x too many bytes
Found by asan.

llvm-svn: 342845
2018-09-23 18:43:28 +00:00
Tri Vo 6c47c62588 [AArch64] Support adding X[8-15,18] registers as CSRs.
Summary:
Specifying X[8-15,18] registers as callee-saved is used to support
CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we:
- use custom CSR list/mask when user specifies custom CSRs
- update Machine Register Info's list of CSRs with additional custom CSRs in
LowerCall and LowerFormalArguments.

Reviewers: srhines, nickdesaulniers, efriedma, javed.absar

Reviewed By: nickdesaulniers

Subscribers: kristof.beyls, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52216

llvm-svn: 342824
2018-09-22 22:17:50 +00:00
Matthias Braun c0ef786004 AArch64FastISel: Abort if we failed to select operand of intrinsic
rdar://44642447

Differential Revision: https://reviews.llvm.org/D52335

llvm-svn: 342742
2018-09-21 15:47:41 +00:00
Matthias Braun 28d6a4ac9a AArch64: Add FuseCryptoEOR fusion rules
There's some additional rules available on newer apple CPUs.

rdar://41235346

llvm-svn: 342590
2018-09-19 20:50:51 +00:00
Alex Bradbury 79518b02cd [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Calixte Denizet 7413a43886 Verify commit access in fixing typo
llvm-svn: 342538
2018-09-19 11:26:20 +00:00
Matthias Braun 934be5fecf AArch64MacroFusion: Factor out some opcode handling code; NFC
llvm-svn: 342521
2018-09-19 00:23:37 +00:00
David Green 85d6a55995 [AArch64] Attempt to parse more operands as expressions
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.

This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.

Differential Revision: https://reviews.llvm.org/D51792

llvm-svn: 342455
2018-09-18 09:44:53 +00:00
Sander de Smalen 2d77e788f2 [AArch64] Implement aarch64_vector_pcs codegen support.
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51479

llvm-svn: 342049
2018-09-12 12:10:22 +00:00
Sander de Smalen 7140363cd0 [AArch64] NFC: Refactoring to prepare for vector PCS.
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51478

llvm-svn: 342038
2018-09-12 09:44:46 +00:00
Sander de Smalen 4dbc512676 [AArch64] Add parsing of aarch64_vector_pcs attribute.
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.

More information describing the vector ABI and procedure call standard
can be found here:

  https://developer.arm.com/products/software-development-tools/\
                            hpc/arm-compiler-for-hpc/vector-function-abi

Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D51477

llvm-svn: 342030
2018-09-12 08:54:06 +00:00
JF Bastien 49ddd5aca1 NFC: use bit_cast more in AArch64AddressingModes
The was previously committed as r341749 then reverted as r341750 because
bit_cast needed to do its own thing to check is_trivially_copyable on GCC 4.x.
This is now done and std;:array should now get accepted.

llvm-svn: 341897
2018-09-11 04:08:05 +00:00
Benjamin Kramer 27c769d28a [Target] Untangle disassemblers
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.

llvm-svn: 341822
2018-09-10 12:53:46 +00:00
JF Bastien 6d010103db Revert "NFC: use bit_cast more in AArch64AddressingModes"
It seems some bots think std::array is either not trivially-copyable, or isn't
the right size.

llvm-svn: 341750
2018-09-08 16:50:56 +00:00
JF Bastien 1825d10d3c NFC: use bit_cast more in AArch64AddressingModes
llvm-svn: 341749
2018-09-08 16:43:49 +00:00
JF Bastien c4986cef12 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

This was originally committed as r341728 and reverted in r341730.

Reviewers: javed.absar, steven_wu, srhines

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341741
2018-09-08 03:55:25 +00:00
JF Bastien 05430cc6e5 Revert "ADT: add <bit> header, implement C++20 bit_cast, use"
Bots sad. Looks like missing std::is_trivially_copyable.

llvm-svn: 341730
2018-09-07 23:23:47 +00:00
JF Bastien 28655081a4 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

Reviewers: javed.absar

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341728
2018-09-07 23:08:26 +00:00
Nick Desaulniers 287a3be379 [AArch64] Support reserving x1-7 registers.
Summary:
Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.

Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma

Reviewed By: nickdesaulniers, efriedma

Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D48580

llvm-svn: 341706
2018-09-07 20:58:57 +00:00
JF Bastien 2920061105 ARM64: improve non-zero memset isel by ~2x
Summary:
I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated
where the generated code was bad. This patch fixes the majority of the issues by
requesting that a 2xi64 vector be used for memset of 32 bytes and above.

The patch leaves the former request for f128 unchanged, despite f128
materialization being suboptimal: doing otherwise runs into other asserts in
isel and makes this patch too broad.

This patch hides the issue that was present in bzero_40_stack and bzero_72_stack
because the code now generates in a better order which doesn't have the store
offset issue. I'm not aware of that issue appearing elsewhere at the moment.

<rdar://problem/44157755>

Reviewers: t.p.northover, MatzeB, javed.absar

Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51706

llvm-svn: 341558
2018-09-06 16:03:32 +00:00
JF Bastien da33900b95 NFC: improve ARM64 isFPImmLegal debug print
Forking this change from D51706. This just made it easier to understand llc
output with -debug.

llvm-svn: 341504
2018-09-05 23:38:11 +00:00
Martin Storsjo 68df812cce [MinGW] Move code for indicating "potentially not DSO local" into shouldAssumeDSOLocal. NFC.
On Windows, if shouldAssumeDSOLocal returns false, it's either a
dllimport reference, or a reference that we should treat as non-local
and create a stub for.

Clean up AArch64Subtarget::ClassifyGlobalReference a little while
touching the flag handling relating to dllimport.

Differential Revision: https://reviews.llvm.org/D51590

llvm-svn: 341402
2018-09-04 20:56:28 +00:00
Martin Storsjo fed420d6b6 [MinGW] [AArch64] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the AArch64 format PC
relative addressing in adrp+add/ldr pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51452

llvm-svn: 341401
2018-09-04 20:56:21 +00:00
Martin Storsjo 5c984fb16d [AArch64] Simplify code in LowerGlobalAddress. NFCI.
When initial support for dllimport was added for aarch64 in
SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT
flag - that was only completed in SVN r323810. Reuse the return
value from ClassifyGlobalReference for this purpose as well.

llvm-svn: 341310
2018-09-03 11:59:23 +00:00
Martin Storsjo 9e4d5f9b7b [AArch64] Hook up the missed machine operand flag name for MO_DLLIMPORT
llvm-svn: 341178
2018-08-31 08:00:34 +00:00
Ties Stuij 9c16d809d2 [CodeGen] emit inline asm clobber list warnings for reserved (cont)
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:

Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:

  extern int bar(int[]);
  
  int foo(int i) {
    int a[i]; // VLA
    asm volatile(
        "mov r7, #1"
      :
      :
      : "r7"
    );
  
    return 1 + bar(a);
  }

Compiled for thumb, this gives:

  $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
  ...
  foo:
          .fnstart
  @ %bb.0:                                @ %entry
          .save   {r4, r5, r6, r7, lr}
          push    {r4, r5, r6, r7, lr}
          .setfp  r7, sp, #12
          add     r7, sp, #12
          .pad    #4
          sub     sp, #4
          movs    r1, #7
          add.w   r0, r1, r0, lsl #2
          bic     r0, r0, #7
          sub.w   r0, sp, r0
          mov     sp, r0
          @APP
          mov.w   r7, #1
          @NO_APP
          bl      bar
          adds    r0, #1
          sub.w   r4, r7, #12
          mov     sp, r4
          pop     {r4, r5, r6, r7, pc}
  ...

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:

  repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
        "mov r7, #1"
        ^

Reviewers: efriedma, olista01, javed.absar

Reviewed By: efriedma

Subscribers: eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D51165

llvm-svn: 341062
2018-08-30 12:52:35 +00:00
David Green 1f203bcd75 [AArch64] Optimise load(adr address) to ldr address
Providing that the load is known to be 4 byte aligned, we can optimise a
ldr(adr address) to just ldr address.

Differential Revision: https://reviews.llvm.org/D51030

llvm-svn: 341058
2018-08-30 11:55:16 +00:00
Eli Friedman 071203bbf2 [AArch64] Reject inline asm with FP registers when FP is disabled.
Otherwise, we would crash trying to deal with an illegal input.

Differential Revision: https://reviews.llvm.org/D51202

llvm-svn: 340637
2018-08-24 19:12:13 +00:00
Joel Galenson d36fb48a27 Find PLT entries for x86, x86_64, and AArch64.
This adds a new method to ELFObjectFileBase that returns the symbols and addresses of PLT entries.

This design was suggested by pcc and eugenis in https://reviews.llvm.org/D49383.

Differential Revision: https://reviews.llvm.org/D50203

llvm-svn: 340610
2018-08-24 15:21:56 +00:00
David Green 9dd1d451d9 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673

llvm-svn: 340397
2018-08-22 11:31:39 +00:00
Daniel Sanders 6a943fb16a [aarch64][mc] Don't lookup symbols when there is no symbol lookup callback
Summary: When run under llvm-mc-disassemble-fuzzer, there is no symbol lookup callback so tryAddingSymbolicOperand() must fail gracefully instead of crashing

Reviewers: aemerson, javed.absar

Reviewed By: aemerson

Subscribers: lhames, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D51005

llvm-svn: 340287
2018-08-21 15:47:25 +00:00
Sander de Smalen 07db432265 [AArch64][SVE] Asm: Add SVE System registers
This patch adds system registers for controlling aspects of SVE:
- ZCR_EL1  (r/w)   visible at EL1 and EL0.
- ZCR_EL2  (r/w)   visible at EL2 and Non-secure EL1 and EL0.
- ZCR_EL3  (r/w)   visible at all exception levels.

and a system register identifying SVE:
- ID_AA64ZFR0_EL1  (r)  SVE Feature identifier.

Reviewers: SjoerdMeijer, samparker, pbarrio, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D50885

llvm-svn: 340158
2018-08-20 09:16:59 +00:00
Luke Cheeseman 64dcdec60c [AArch64] - Generate pointer authentication instructions
- Generate pointer authentication instructions
- The functions instrumented depend on function attribtues:
  all (all functions instrumentent)
  non-leaf (only those that spill LR)
  none
- Function epilogues sign the LR before spilling to the stack and authenticate
  the LR once restored
- If the target is v8.3a or greater than can use the combined authenticate and
  return instruction

Differential revision: https://reviews.llvm.org/D49793

llvm-svn: 340018
2018-08-17 12:53:22 +00:00
Bernard Ogden b828bb2a15 [ARM/AArch64] Support FP16 +fp16fml instructions
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.

In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.

Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.

Differential Revision: https://reviews.llvm.org/D50228

llvm-svn: 340013
2018-08-17 11:29:49 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Chandler Carruth 66654b72c9 [SDAG] Remove the reliance on MI's allocation strategy for
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.

Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.

This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.

This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.

As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]

Differential Revision: https://reviews.llvm.org/D50680

llvm-svn: 339740
2018-08-14 23:30:32 +00:00
Eli Friedman 0d12e90bf5 [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.

To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.

Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)

Differential Revision: https://reviews.llvm.org/D50667

llvm-svn: 339734
2018-08-14 22:10:25 +00:00
Bryan Chan e023706471 [AArch64] Fix assertion failure on widened f16 BUILD_VECTOR
Summary:
Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the
same type. This fixes an assertion failure in VerifySDNode.

Reviewers: SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50202

llvm-svn: 339013
2018-08-06 14:14:41 +00:00
Alexander Ivchenko 49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
David Green ea60446c6d [AArch64] Add support for got relocated LDR's
As a part of adding the tiny codemodel, we need to support ldr's with :got:
relocations on them. This seems to be mostly already done, just needs the
relocation type support.

Differential Revision: https://reviews.llvm.org/D50137

llvm-svn: 338673
2018-08-02 06:24:40 +00:00
Lei Liu 8e422b8403 [AArch64] DWARF: do not generate AT_location for thread local
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.

Differential Revision: https://reviews.llvm.org/D43860

llvm-svn: 338655
2018-08-01 23:46:49 +00:00
Bryan Chan 67106b5e08 [AArch64] Fix FCCMP with FP16 operands
Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection.

Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar

Reviewed By: SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D50115

llvm-svn: 338554
2018-08-01 13:50:29 +00:00
Martin Storsjo d4590c38ab [AArch64] Disallow the MachO specific .loh directive for windows
Also add a test for it being unsupported for linux.

Differential Revision: https://reviews.llvm.org/D49929

llvm-svn: 338493
2018-08-01 06:50:18 +00:00
Martin Storsjo 3e3d39d07e [AArch64] Support the .inst directive for MachO and COFF targets
Contrary to ELF, we don't add any markers that distinguish data generated
with .long from normal instructions, so the .inst directive only adds
compatibility with assembly that uses it.

Differential Revision: https://reviews.llvm.org/D49935

llvm-svn: 338355
2018-07-31 09:26:52 +00:00
Amara Emerson 1e8c164c63 [AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR.
Also refactors some existing code to materialize addresses for the large code
model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR.

This implements PR36390.

Differential Revision: https://reviews.llvm.org/D49903

llvm-svn: 338337
2018-07-31 00:09:02 +00:00
Amara Emerson 0e86c07077 [AArch64][GlobalISel] Make G_BLOCK_ADDR legal.
Differential Revision: https://reviews.llvm.org/D49902

llvm-svn: 338336
2018-07-31 00:08:56 +00:00
Craig Topper 2f60ef2c78 [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc.
The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation.

llvm-svn: 338329
2018-07-30 23:22:00 +00:00
Craig Topper a568a27dfa [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2.
llvm-svn: 338303
2018-07-30 21:04:34 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Jessica Paquette fa3bee4756 [MachineOutliner][AArch64] Add support for saving LR to a register
This teaches the outliner to save LR to a register rather than the stack when
possible. This allows us to avoid bumping the stack in outlined functions in
some cases. By doing this, in a later patch, we can teach the outliner to do
something like this:

f1:
  ...
  bl OUTLINED_FUNCTION
  ...

f2:
  ...
  move LR's contents to a register
  bl OUTLINED_FUNCTION
  move the register's contents back

instead of falling back to saving LR in both cases.

llvm-svn: 338278
2018-07-30 17:45:28 +00:00
Sander de Smalen e64206a02c [AArch64][SVE] Asm: Enable instructions to be prefixed.
This patch enables instructions that are destructive on their
destination- and first source operand, to be prefixed with a
MOVPRFX instruction.

This patch also adds a variety of tests:

- positive tests for all instructions and forms that accept a
  movprfx for either or both predicated and unpredicated forms.

- negative tests for all instructions and forms that do not accept
  an unpredicated or predicated movprfx.

- negative tests for the diagnostics that get emitted when a MOVPRFX
  instruction is used incorrectly.

This is patch [2/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593

Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D49593

llvm-svn: 338261
2018-07-30 16:05:45 +00:00
Sander de Smalen 9b33309c87 [AArch64][SVE] Asm: Add MOVPRFX instructions.
This patch adds predicated and unpredicated MOVPRFX instructions, which
can be prepended to SVE instructions that are destructive on their first
source operand, to make them a constructive operation, e.g.

  add z1.s, p0/m, z1.s, z2.s        <=> z1 = z1 + z2

can be made constructive:

  movprfx z0, z1
  add z0.s, p0/m, z0.s, z2.s        <=> z0 = z1 + z2

The predicated MOVPRFX instruction can additionally be used to zero
inactive elements, e.g.

  movprfx z0.s, p0/z, z1.s
  add z0.s, p0/m, z0.s, z2.s

Not all instructions can be prefixed with the MOVPRFX instruction
which is why this patch also adds a mechanism to validate prefixed
instructions. The exact rules when a MOVPRFX applies is detailed in
the SVE supplement of the Architectural Reference Manual.

This is patch [1/2] in a series to add MOVPRFX instructions:
- Patch [1/2]: https://reviews.llvm.org/D49592
- Patch [2/2]: https://reviews.llvm.org/D49593

Reviewers: rengolin, SjoerdMeijer, samparker, fhahn, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D49592

llvm-svn: 338258
2018-07-30 15:42:46 +00:00
Sander de Smalen ad88a99956 [AArch64][SVE] Asm: Support for WHILE(LE|LO|LS|LT) instructions.
The WHILE instructions generate a predicate that is true while the 
comparison of the first scalar operand (incremented for each predicate
element) with the second scalar operand is true and false thereafter.

  WHILELE  While incrementing signed scalar less than or equal to scalar
  WHILELO  While incrementing unsigned scalar lower than scalar
  WHILELS  While incrementing unsigned scalar lower than or same as scalar
  WHILELT  While incrementing signed scalar less than scalar

e.g.

  whilele  p0.s, x0, x1

  generates predicate p0 (for 32bit elements) by incrementing
  (signed) x0 and comparing that vector to splat(x1).

llvm-svn: 338211
2018-07-29 08:51:08 +00:00
Sander de Smalen e70ed3187c [AArch64][SVE] Asm: Instructions to perform serialized operations.
The instructions added in this patch permit active elements within
a vector to be processed sequentially without unpacking the vector.

  PFIRST      Set the first active element to true.
  PNEXT       Find next active element in predicate.
  CTERMEQ     Compare and terminate loop when equal.
  CTERMNE     Compare and terminate loop when not equal.

llvm-svn: 338210
2018-07-29 08:00:16 +00:00
Sander de Smalen 5b3a289424 [AArch64][SVE] Asm: Support for PFALSE and PTEST instructions.
This patch adds PFALSE (unconditionally sets all elements of
the predicate to false) and PTEST (set the status flags for the
predicate).

llvm-svn: 338198
2018-07-28 14:18:11 +00:00
Sander de Smalen 3878bf83dd [AArch64][SVE] Asm: Data-dependent loop predicate partitioning instructions.
This patch adds support for instructions that partition a predicate
based on data-dependent termination conditions in a loop.

  BRKA      Break after the first true condition
  BRKAS     Break after the first true condition, setting condition flags
  BRKB      Break before the first true condition
  BRKBS     Break before the first true condition, setting condition flags

  BRKPA     Break after the first true condition, propagating from the 
            previous partition
  BRKPAS    Break after the first true condition, propagating from the 
            previous partition, setting condition flags
  BRKPB     Break before the first true condition, propagating from the 
            previous partition
  BRKPBS    Break before the first true condition, propagating from the 
            previous partition, setting condition flags

  BRKN      Propagate break to next partition
  BKRNS     Propagate break to next partition, setting condition flags

llvm-svn: 338196
2018-07-28 14:04:52 +00:00
Matt Arsenault 81920b0a25 DAG: Add calling convention argument to calling convention funcs
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.

llvm-svn: 338194
2018-07-28 13:25:19 +00:00
Jessica Paquette f90edbe3d6 Recommit "Enable MachineOutliner by default under -Oz for AArch64"
Fixed the ASAN failure from before in r338148, so recommiting.

This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338160
2018-07-27 20:18:27 +00:00
Jessica Paquette 9d93c6026a [MachineOutliner] Exit getOutliningCandidateInfo when we erase all candidates
There was a missing check for if a candidate list was entirely deleted. This
adds that check.

This fixes an asan failure caused by running test/CodeGen/AArch64/addsub_ext.ll
with the MachineOutliner enabled.

llvm-svn: 338148
2018-07-27 18:21:57 +00:00
Jessica Paquette faea2d3130 Revert "Enable MachineOutliner by default under -Oz for AArch64"
It failed an Asan test on a bot:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio

Fixing that before recommitting.

llvm-svn: 338136
2018-07-27 17:25:38 +00:00
Jessica Paquette d4229b985c Enable MachineOutliner by default under -Oz for AArch64
This patch enables the MachineOutliner by default in AArch64 under -Oz.

The MachineOutliner offers around a 4.5% improvement on the current -Oz code
size improvements.

We have done work into improving the debuggability of outlined code, so that
users of -Oz won't be surprised by the optimization. We have also been executing
the LLVM test suite and common external tests such as the SPEC suites
continuously with no issue. The outliner has a low compile-time overhead of
roughly 1%. At this point, the outliner would be a really good addition to the
-Oz pass pipeline!

llvm-svn: 338133
2018-07-27 16:44:42 +00:00
Sander de Smalen a703b8dc71 [AArch64][SVE] Asm: Predicated integer reductions.
This patch adds support for various integer reduction operations:

  SADDV    signed add reduction to scalar
  UADDV    unsigned add reduction to scalar

  SMAXV    signed maximum reduction to scalar
  SMINV    signed minimum reduction to scalar
  UMAXV    unsigned maximum reduction to scalar
  UMINV    unsigned minimum reduction to scalar

  ANDV     logical AND reduction to scalar
  ORV      logical OR reduction to scalar
  EORV     logical EOR reduction to scalar

The reduction is predicated, e.g.
  smaxv s0, p0, z1.s

performs a signed maximum reduction on active elements in z1,
and stores the (signed max value) result in s0.

llvm-svn: 338126
2018-07-27 14:24:55 +00:00
Sander de Smalen fcb636d222 [AArch64][SVE] Asm: Predicated floating point reductions.
This patch adds support for various floating-point
reduction operations:

  FADDA    strictly-ordered add reduction, accumulating in scalar
  FADDV    recursive add reduction to scalar
  FMAXV    recursive max reduction to scalar
  FMINV    recursive min reduction to scalar
  FMAXNMV  recursive max number reduction to scalar
  FMINNMV  recursive min number reduction to scalar

The reduction is predicated, e.g.

  fadda d0, p0, d0, z1.d

performs the add-reduction in strict order on active elements
in z1, accumulating into d0.

  faddv d0, p0, z1.d

performs the add-reduction (not in strict order)
on active elements in z1, storing the result in d0.

llvm-svn: 338123
2018-07-27 13:58:48 +00:00
Sander de Smalen 88e154ff90 [AArch64][SVE] Asm: Support for FEXPA and FTSSEL.
This patch adds support for transcendental acceleration
instructions 'FEXPA' (exponential accelerator) and 'FTSSEL'
(trigonometric select coefficient).

llvm-svn: 338121
2018-07-27 12:40:09 +00:00
Sander de Smalen 71929e7cad [AArch64][SVE] Asm: Support for FRECPE and FRSQRTE.
Support for floating-point instructions for reciprocal
estimate (FRECPE) and reciprocal square root estimate (FRSQRTE).

llvm-svn: 338120
2018-07-27 12:26:24 +00:00
Luke Cheeseman 66b5e7da4c Enable some pointer authentication instructions for aarch64 v8a targets
- Some of the v8.3 pointer authentication instruction inhabit the Hint space
- These instructions can be assembled to hint instructions which act as NOP instructions prior to v8.3
- This patch permits using the hint instructions for all v8a targets
- Also, correct the RETA{A,B} instructions to match the instruction attributes of RET (set isTerminator and isBarrier)

Differential Revision: https://reviews.llvm.org/D49786

llvm-svn: 338029
2018-07-26 14:00:50 +00:00
Sjoerd Meijer dc198344ce [AArch64] Armv8.2-A: add the crypto extensions
This adds MC support for the crypto instructions that were made optional
extensions in Armv8.2-A (AArch64 only).

Differential Revision: https://reviews.llvm.org/D49370

llvm-svn: 338010
2018-07-26 07:13:59 +00:00
Martin Storsjo d2662c32fb [COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.

With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:

fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4

Differential Revision: https://reviews.llvm.org/D49644

llvm-svn: 337950
2018-07-25 18:35:31 +00:00
Jessica Paquette 69f517df27 [MachineOutliner][NFC] Move target frame info into OutlinedFunction
Just some gardening here.

Similar to how we moved call information into Candidates, this moves outlined
frame information into OutlinedFunction. This allows us to remove
TargetCostInfo entirely.

Anywhere where we returned a TargetCostInfo struct, we now return an
OutlinedFunction. This establishes OutlinedFunctions as more of a general
repeated sequence, and Candidates as occurrences of those repeated sequences.

llvm-svn: 337848
2018-07-24 20:13:10 +00:00
Jessica Paquette fca55129b1 [MachineOutliner][NFC] Make Candidates own their call information
Before this, TCI contained all the call information for each Candidate.

This moves that information onto the Candidates. As a result, each Candidate
can now supply how it ought to be called. Thus, Candidates will be able to,
say, call the same function in cheaper ways when possible. This also removes
that information from TCI, since it's no longer used there.

A follow-up patch for the AArch64 outliner will demonstrate this.

llvm-svn: 337840
2018-07-24 17:42:11 +00:00
Martin Storsjo c2b701408e [AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes
This matches the structure used on X86 and ARM. This requires
a little bit of duplication of the parts that are equal in both
AArch64 COFF variants though.

Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF
didn't, but now they do.

This makes AArch64 match X86 in how comdat is used for float constants
for MinGW.

Differential Revision: https://reviews.llvm.org/D49637

llvm-svn: 337755
2018-07-23 22:15:14 +00:00
Sander de Smalen 33f588acb9 [AArch64][SVE] Asm: Support for bit/byte reverse operations.
This patch adds the following instructions:

  RBIT      reverse bits within each active elemnt (predicated), e.g.
                rbit z0.d, p0/m, z1.d

            for 8, 16, 32 and 64 bit elements.

  REV       reverse order of elements in data/predicate vector
            (unpredicated), e.g.
                rev z0.d, z1.d
                rev p0.d, p1.d

            for 8, 16, 32 and 64 bit elements.

  REVB      reverse order of bytes within each active element, e.g.
                revb z0.d, p0/m, z1.d

            for 16, 32 and 64 bit elements.

  REVH      reverse order of 16-bit half-words within each active
            element, e.g.
                revh z0.d, p0/m, z1.d

            for 32 and 64 bit elements.

  REVW      reverse order of 32-bit words within each active element,
            e.g.
                revw z0.d, p0/m, z1.d

            for 64 bit elements.

llvm-svn: 337534
2018-07-20 09:00:44 +00:00
Sander de Smalen 3ed7f81ce1 [AArch64][SVE] Asm: Support for FTMAD instruction.
Floating-point trigonometric multiply-add coefficient,
e.g.

  ftmad z0.h, z0.h, z1.h, #7

with variants for 16, 32 and 64-bit elements.

llvm-svn: 337533
2018-07-20 08:47:26 +00:00
Sander de Smalen 330d887d72 [AArch64][SVE] Asm: Support for unpredicated FP operations.
This patch adds support for the following unpredicated
floating-point instructions:

  FADD      Floating point add
  FSUB      Floating point subtract
  FMUL      Floating point multiplication
  FTSMUL    Floating point trigonometric starting value
  FRECPS    Floating point reciprocal step
  FRSQRTS   Floating point reciprocal square root step

The instructions have the following assembly format:
  fadd z0.h, z1.h, z2.h
and have variants for 16, 32 and 64-bit FP elements.

llvm-svn: 337383
2018-07-18 11:59:12 +00:00
Sander de Smalen ccdc7ebc1d [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.
The signed/unsigned DOT instructions perform a dot-product on
quadtuplets from two source vectors and accumulate the result in
the destination register. The instructions come in two forms:

Vector form, e.g.
  sdot  z0.s, z1.b, z2.b     - signed dot product on four 8-bit quad-tuplets,
                               accumulating results in 32-bit elements.

  udot  z0.d, z1.h, z2.h     - unsigned dot product on four 16-bit quad-tuplets,
                               accumulating results in 64-bit elements.

Indexed form, e.g.
  sdot  z0.s, z1.b, z2.b[3]  - signed dot product on four 8-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 32-bit
                               elements.
  udot  z0.d, z1.h, z2.h[1]  - dot product on four 16-bit quad-tuplets
                               with specified quadtuplet from second
                               source vector, accumulating results in 64-bit
                               elements.

llvm-svn: 337372
2018-07-18 09:37:51 +00:00
Sander de Smalen 889fe81ce5 [AArch64][SVE] Asm: Integer divide instructions.
This patch adds the following predicated instructions:

  UDIV    Unsigned divide active elements
  UDIVR   Unsigned divide active elements, reverse form.
  SDIV    Signed divide active elements
  SDIVR   Signed divide active elements, reverse form.

e.g.
  udiv  z0.s, p0/m, z0.s, z1.s
    (unsigned divide active elements in z0 by z1, store result in z0)

  sdivr z0.s, p0/m, z0.s, z1.s
    (signed divide active elements in z1 by z0, store result in z0)

llvm-svn: 337369
2018-07-18 09:17:29 +00:00
Sander de Smalen ac0cb5bf75 [AArch64][SVE] Asm: Support for integer MUL instructions.
This patch adds the following instructions:
  MUL   - multiply vectors, e.g.
    mul z0.h, p0/m, z0.h, z1.h
        - multiply with immediate, e.g.
    mul z0.h, z0.h, #127

  SMULH - signed multiply returning high half, e.g.
    smulh z0.h, p0/m, z0.h, z1.h

  UMULH - unsigned multiply returning high half, e.g.
    umulh z0.h, p0/m, z0.h, z1.h

llvm-svn: 337358
2018-07-18 08:10:03 +00:00
Sander de Smalen 7b33faaf38 [AArch64][SVE]: Integer multiply-add/subtract instructions.
This patch adds support for the following instructions:
  MLA  mul-add, writing addend       (Zda = Zda +   Zn * Zm)
  MLS  mul-sub, writing addend       (Zda = Zda +  -Zn * Zm)
  MAD  mul-add, writing multiplicant (Zdn =  Za +  Zdn * Zm)
  MSB  mul-sub, writing multiplicant (Zdn =  Za + -Zdn * Zm)

llvm-svn: 337293
2018-07-17 15:41:58 +00:00
Sander de Smalen f41dd122d6 [AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.
This patch adds support for the following instructions:

  FMLA    mul-add, writing addend                (Zda =  Zda +   Zn * Zm)
  FNMLA   negated mul-add, writing addend        (Zda = -Zda +  -Zn * Zm)

  FMLS    mul-sub, writing addend                (Zda =  Zda +  -Zn * Zm)
  FNMLS   negated mul-sub, writing addend        (Zda = -Zda +   Zn * Zm)

  FMAD    mul-add, writing multiplicant          (Zdn =  Za  +  Zdn * Zm)
  FNMAD   negated mul-add, writing multiplicant  (Zdn = -Za  + -Zdn * Zm)

  FMSB    mul-sub, writing multiplicant          (Zdn =  Za  + -Zdn * Zm)
  FNMSB   negated mul-sub, writing multiplicant  (Zdn = -Za  +  Zdn * Zm)

llvm-svn: 337282
2018-07-17 13:58:46 +00:00
Sander de Smalen 5dabcf887b [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)
This patch completes support for the following floating point
instructions that take FP immediates:
  FADD*  (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FMUL*  (multiplication)
  FMAX*  (maximum)
  FMAXNM (maximum number)
  FMIN   (maximum)
  FMINNM (maximum number)

All operations are predicated and take a FP immediate operand,
e.g.

  fadd z0.h, p0/m, z0.h, #0.5
  fmin z0.s, p0/m, z0.s, #1.0
        ^___________^ (tied)

* Instructions added in a previous patch.

llvm-svn: 337272
2018-07-17 12:36:08 +00:00
Sander de Smalen 3b9e342ae1 [AArch64][SVE] Asm: Support for predicated FP operations.
This patch adds support for the following floating point
instructions:
  FABD   (absolute difference)
  FADD   (addition)
  FSUB   (subtract)
  FSUBR  (subtract reverse form)
  FDIV   (divide)
  FDIVR  (divide reverse form)
  FMAX   (maximum)
  FMAXNM (maximum number)
  FMIN   (minimum)
  FMINNM (minimum number)
  FSCALE (adjust exponent)
  FMULX  (multiply extended)

All operations are predicated and binary form, e.g.

  fadd z0.h, p0/m, z0.h, z1.h
        ^___________^ (tied)

Supporting 16, 32 and 64-bit FP elements.

llvm-svn: 337259
2018-07-17 09:48:57 +00:00
Sander de Smalen ec229abb9b [AArch64][SVE] Asm: Support for SPLICE instruction.
The SPLICE instruction splices two vectors into one vector using a
predicate. It copies the active elements from the first vector, and
then fills the remaining elements with the low-numbered elements from
the second vector.

The instruction has the following form, e.g.

  splice z0.b, p0, z0.b, z1.b

for 8-bit elements. It also supports 16, 32 and
64-bit elements.

llvm-svn: 337253
2018-07-17 08:52:45 +00:00
Sander de Smalen 78fe30a662 [AArch64][SVE] Asm: Support for EXT instruction.
This patch adds an instruction that allows extracting
a vector from a pair of vectors, given an immediate index
that describes the element position to extract from.

The instruction has the following assembly:
  ext z0.b, z0.b, z1.b, #imm

where #imm is an immediate between 0 and 255.

llvm-svn: 337251
2018-07-17 08:39:48 +00:00
Roman Lebedev de506632aa [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern
Summary:

[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]

As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.

But the IR-optimal patter does not lower efficiently, so we want to undo it..

This handles the simple pattern.
There is a second pattern with predicate and constants inverted.

NOTE: we do not check uses here. we always do the transform.

Reviewers: spatel, craig.topper, RKSimon, javed.absar

Reviewed By: spatel

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49266

llvm-svn: 337166
2018-07-16 12:44:10 +00:00
Sjoerd Meijer ceabd50a5c [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)
Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha
for pointing this out.

Differential Revision: https://reviews.llvm.org/D49284

llvm-svn: 337009
2018-07-13 15:25:42 +00:00
Joel Galenson 06e7e5798f [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Sander de Smalen 7c3c0f24a3 [AArch64][SVE] Asm: Vector Unpack Low/High instructions.
This patch adds support for the following unpack instructions:
  
- PUNPKLO, PUNPKHI   Unpack elements from low/high half and
                     place into elements of twice their size.

  e.g. punpklo p0.h, p0.b

- UUNPKLO, UUNPKHI   Unpack elements from low/high half and 
  SUNPKLO, SUNPKHI   place into elements of twice their size
                     after zero- or sign-extending the values.

  e.g. uunpklo z0.h, z0.b

llvm-svn: 336982
2018-07-13 09:25:43 +00:00
Sander de Smalen faee91a52b [AArch64][SVE] Asm: Support for insert element (INSR) instructions.
Insert general purpose register into shifted vector, e.g.
  insr    z0.s, w0
  insr    z0.d, x0

Insert SIMD&FP scalar register into shifted vector, e.g.
  insr    z0.b, b0
  insr    z0.h, h0
  insr    z0.s, s0
  insr    z0.d, d0

llvm-svn: 336979
2018-07-13 08:51:57 +00:00
Sjoerd Meijer 83a2a62fb4 [AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions
These instructions are added to AArch64 only.

llvm-svn: 336913
2018-07-12 14:57:59 +00:00
Sander de Smalen ea45a89e5c [AArch64][SVE] Asm: Support for COMPACT instruction.
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero. 

e.g.
  compact z0.s, p0, z1.s

llvm-svn: 336789
2018-07-11 11:22:26 +00:00
Sander de Smalen a90530f7c1 [AArch64][SVE] Asm: Support for LAST(A|B) and CLAST(A|B) instructions.
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.

The added variants are:

  Scalar:
  last(a|b)  w0, p0, z0.b
  last(a|b)  w0, p0, z0.h
  last(a|b)  w0, p0, z0.s
  last(a|b)  x0, p0, z0.d

  SIMD & FP Scalar:
  last(a|b)  b0, p0, z0.b
  last(a|b)  h0, p0, z0.h
  last(a|b)  s0, p0, z0.s
  last(a|b)  d0, p0, z0.d

The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.

The added variants are:

  Scalar:
  clast(a|b)  w0, p0, w0, z0.b
  clast(a|b)  w0, p0, w0, z0.h
  clast(a|b)  w0, p0, w0, z0.s
  clast(a|b)  x0, p0, x0, z0.d

  SIMD & FP Scalar:
  clast(a|b)  b0, p0, b0, z0.b
  clast(a|b)  h0, p0, h0, z0.h
  clast(a|b)  s0, p0, s0, z0.s
  clast(a|b)  d0, p0, d0, z0.d

  Vector:
  clast(a|b)  z0.b, p0, z0.b, z1.b
  clast(a|b)  z0.h, p0, z0.h, z1.h
  clast(a|b)  z0.s, p0, z0.s, z1.s
  clast(a|b)  z0.d, p0, z0.d, z1.d

Please refer to the architecture specification for more details on
the semantics of the added instructions.

llvm-svn: 336783
2018-07-11 10:08:00 +00:00
Sander de Smalen 53108d48f7 [AArch64][SVE] Asm: Support for predicated unary operations.
This patch adds support for the following instructions:
  CLS  (Count Leading Sign bits)
  CLZ  (Count Leading Zeros)
  CNT  (Count non-zero bits)
  CNOT (Logically invert boolean condition in vector)
  NOT  (Bitwise invert vector)
  FABS (Floating-point absolute value)
  FNEG (Floating-point negate)

All operations are predicated and unary, e.g.
  clz  z0.s, p0/m, z1.s

- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
  and 64 bit elements.

- FABS and FNEG have variants for 16, 32 and 64 bit elements.

llvm-svn: 336677
2018-07-10 14:05:55 +00:00
Sander de Smalen d3efb59f29 [AArch64][SVE] Asm: Support for CNT(B|H|W|D) and CNTP instructions.
This patch adds support for the following instructions:

  CNTB CNTH - Determine the number of active elements implied by
  CNTW CNTD   the named predicate constant, multiplied by an
              immediate, e.g.

                cnth x0, vl8, #16

  CNTP      - Count active predicate elements, e.g.
                cntp  x0, p0, p1.b

              counts the number of active elements in p1, predicated
              by p0, and stores the result in x0.

llvm-svn: 336552
2018-07-09 15:22:08 +00:00
Sander de Smalen 813b21e33a [AArch64][SVE] Asm: Support for remaining shift instructions.
This patch completes support for shifts, which include:
- LSL   - Logical Shift Left
- LSLR  - Logical Shift Left, Reversed form
- LSR   - Logical Shift Right
- LSRR  - Logical Shift Right, Reversed form
- ASR   - Arithmetic Shift Right
- ASRR  - Arithmetic Shift Right, Reversed form
- ASRD  - Arithmetic Shift Right for Divide

In the following variants:

- Predicated shift by immediate - ASR, LSL, LSR, ASRD
  e.g.
    asr z0.h, p0/m, z0.h, #1

  (active lanes of z0 shifted by #1)

- Unpredicated shift by immediate - ASR, LSL*, LSR*
  e.g.
    asr z0.h, z1.h, #1

  (all lanes of z1 shifted by #1, stored in z0)

- Predicated shift by vector - ASR, LSL*, LSR*
  e.g.
    asr z0.h, p0/m, z0.h, z1.h

  (active lanes of z0 shifted by z1, stored in z0)

- Predicated shift by vector, reversed form - ASRR, LSLR, LSRR
  e.g.
    lslr z0.h, p0/m, z0.h, z1.h

  (active lanes of z1 shifted by z0, stored in z0)

- Predicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, p0/m, z0.h, z1.d

  (active lanes of z0 shifted by wide elements of vector z1)

- Unpredicated shift left/right by wide vector - ASR, LSL, LSR
  e.g.
    lsl z0.h, z1.h, z2.d

  (all lanes of z1 shifted by wide elements of z2, stored in z0)

*Variants added in previous patches.

llvm-svn: 336547
2018-07-09 13:23:41 +00:00
Sander de Smalen 54077dcfcb [AArch64][SVE] Asm: Support for TBL instruction.
Support for SVE's TBL instruction for programmable table
lookup/permute using vector of element indices, e.g.

  tbl  z0.d, { z1.d }, z2.d

stores elements from z1, indexed by elements from z2, into z0.

llvm-svn: 336544
2018-07-09 12:32:56 +00:00
Sander de Smalen c69944c6b0 [AArch64][SVE] Asm: Support for ADR instruction.
Supporting various addressing modes:
- adr z0.s, [z0.s, z0.s]
- adr z0.s, [z0.s, z0.s, lsl #<shift>]
- adr z0.d, [z0.d, z0.d]
- adr z0.d, [z0.d, z0.d, lsl #<shift>]
- adr z0.d, [z0.d, z0.d, uxtw #<shift>]
- adr z0.d, [z0.d, z0.d, sxtw #<shift>]

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D48870

llvm-svn: 336533
2018-07-09 09:58:24 +00:00
Sander de Smalen bd513b42a1 [AArch64][SVE] Asm: Support for UZP and TRN instructions.
This patch adds support for:
  UZP1  Concatenate even elements from two vectors
  UZP2  Concatenate  odd elements from two vectors
  TRN1  Interleave  even elements from two vectors
  TRN2  Interleave   odd elements from two vectors

With variants for both data and predicate vectors, e.g.
  uzp1    z0.b, z1.b, z2.b
  trn2    p0.s, p1.s, p2.s

llvm-svn: 336531
2018-07-09 09:12:17 +00:00
Yvan Roux a96a04558d [MachineOutliner] Assert that Liveness tracking is accurate (NFC)
The checking is done deeper inside MachineBasicBlock, but this will
hopefully help to find issues when porting the machine outliner to a
target where Liveness tracking is broken (like ARM).

Differential Revision: https://reviews.llvm.org/D49023

llvm-svn: 336481
2018-07-07 08:02:19 +00:00
Sjoerd Meijer 35bd8f5d1e [AArch64] Armv8.4-A: TLB support
This adds:
- outer shareable TLB Maintenance instructions, and
- TLB range maintenance instructions.

llvm-svn: 336434
2018-07-06 13:00:16 +00:00
Sjoerd Meijer a3dad801b7 Recommit: [AArch64] Armv8.4-A: Flag manipulation instructions
Now with the asm operand definition included.

llvm-svn: 336432
2018-07-06 12:32:33 +00:00
Sjoerd Meijer 8203177e5e Revert [AArch64] Armv8.4-A: Flag manipulation instructions
It's causing build errors.

llvm-svn: 336422
2018-07-06 08:39:43 +00:00
Sjoerd Meijer 6f5f6d5b2e [AArch64] Armv8.4-A: Flag manipulation instructions
These instructions are added to AArch64 only.

Differential Revision: https://reviews.llvm.org/D48926

llvm-svn: 336421
2018-07-06 08:12:20 +00:00
Sjoerd Meijer 2a57b357a3 [AArch64][ARM] Armv8.4-A: Trace synchronization barrier instruction
This adds the Armv8.4-A Trace synchronization barrier (TSB) instruction.

Differential Revision: https://reviews.llvm.org/D48918

llvm-svn: 336418
2018-07-06 08:03:12 +00:00
Sander de Smalen e2c10f8f47 This is a recommit of r336322, previously reverted in r336324 due to
a deficiency in TableGen that has been addressed in r336334.

[AArch64][SVE] Asm: Support for predicated FP rounding instructions.

This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336387
2018-07-05 20:21:21 +00:00
Simon Pilgrim 6dc45e6ca0 Try to fix -Wimplicit-fallthrough warning. NFCI.
llvm-svn: 336331
2018-07-05 09:48:01 +00:00
Sander de Smalen 097ab704c9 Reverting r336322 for now, as it causes an assert failure
in TableGen, for which there is already a patch in Phabricator
(D48937) that needs to be committed first.

llvm-svn: 336324
2018-07-05 08:52:03 +00:00
Sander de Smalen ef44226c4f [AArch64][SVE] Asm: Support for predicated FP rounding instructions.
This patch also adds instructions for predicated FP square-root and
reciprocal exponent.

The added instructions are:
- FRINTI  Round to integral value (current FPCR rounding mode)
- FRINTX  Round to integral value (current FPCR rounding mode, signalling inexact)
- FRINTA  Round to integral value (to nearest, with ties away from zero)
- FRINTN  Round to integral value (to nearest, with ties to even)
- FRINTZ  Round to integral value (toward zero)
- FRINTM  Round to integral value (toward minus Infinity)
- FRINTP  Round to integral value (toward plus Infinity)
- FSQRT   Floating-point square root
- FRECPX  Floating-point reciprocal exponent

llvm-svn: 336322
2018-07-05 08:38:30 +00:00
Sander de Smalen 592718f906 [AArch64][SVE] Asm: Support for signed/unsigned MIN/MAX/ABD
This patch implements the following varieties:

- Unpredicated signed max,   e.g. smax z0.h, z1.h, #-128
- Unpredicated signed min,   e.g. smin z0.h, z1.h, #-128

- Unpredicated unsigned max, e.g. umax z0.h, z1.h, #255
- Unpredicated unsigned min, e.g. umin z0.h, z1.h, #255

- Predicated signed max,     e.g. smax z0.h, p0/m, z0.h, z1.h
- Predicated signed min,     e.g. smin z0.h, p0/m, z0.h, z1.h
- Predicated signed abd,     e.g. sabd z0.h, p0/m, z0.h, z1.h

- Predicated unsigned max,   e.g. umax z0.h, p0/m, z0.h, z1.h
- Predicated unsigned min,   e.g. umin z0.h, p0/m, z0.h, z1.h
- Predicated unsigned abd,   e.g. uabd z0.h, p0/m, z0.h, z1.h

llvm-svn: 336317
2018-07-05 07:54:10 +00:00
Yvan Roux eaececf5e0 [MachineOutliner] Fix typo in getOutliningCandidateInfo function name
getOutlininingCandidateInfo -> getOutliningCandidateInfo

Differential Revision: https://reviews.llvm.org/D48867

llvm-svn: 336285
2018-07-04 15:37:08 +00:00
Sander de Smalen 1e4dc2e97d [AArch64][SVE] Asm: Support for reversed subtract (SUBR) instruction.
This patch adds both a vector and an immediate form, e.g.                  
                                                                           
- Vector form:                                                             
                                                                           
    subr z0.h, p0/m, z0.h, z1.h                                            
                                                                           
  subtract active elements of z0 from z1, and store the result in z0.      
                                                                           
- Immediate form:                                                          
                                                                           
    subr z0.h, z0.h, #255                                                  
                                                                           
  subtract elements of z0, and store the result in z0.

llvm-svn: 336274
2018-07-04 14:05:33 +00:00
Sander de Smalen ab2b0530d9 [AArch64][SVE] Asm: Support for instructions to set/read FFR.
Includes instructions to read the First-Faulting Register (FFR):
- RDFFR (unpredicated)
    rdffr   p0.b
- RDFFR (predicated)
    rdffr   p0.b, p0/z
- RDFFRS (predicated, sets condition flags)
    rdffr   p0.b, p0/z

Includes instructions to set/write the FFR:
- SETFFR (no arguments, sets the FFR to all true)
    setffr
- WRFFR  (unpredicated)
    wrffr   p0.b

llvm-svn: 336267
2018-07-04 12:58:46 +00:00
Sander de Smalen 80283b2af4 [AArch64][SVE] Asm: Support for FP conversion instructions.
The variants added are:

- fcvt   (FP convert precision)
- scvtf  (signed int -> FP) 
- ucvtf  (unsigned int -> FP) 
- fcvtzs (FP -> signed int (round to zero))
- fcvtzu (FP -> unsigned int (round to zero))

For example:
  fcvt   z0.h, p0/m, z0.s  (single- to half-precision FP) 
  scvtf  z0.h, p0/m, z0.s  (32-bit int to half-precision FP) 
  ucvtf  z0.h, p0/m, z0.s  (32-bit unsigned int to half-precision FP) 
  fcvtzs z0.s, p0/m, z0.h  (half-precision FP to 32-bit int)
  fcvtzu z0.s, p0/m, z0.h  (half-precision FP to 32-bit unsigned int)

llvm-svn: 336265
2018-07-04 12:13:17 +00:00
Sander de Smalen e31e6d46dd [AArch64][SVE] Asm: Support for SVE condition code aliases
SVE overloads the AArch64 PSTATE condition flags and introduces
a set of condition code aliases for the assembler. The 
details are described in section 2.2 of the architecture
reference manual supplement for SVE.

In short:

  SVE alias =>  AArch64 name
  --------------------------
  NONE      => EQ
  ANY       => NE
  NLAST     => HS
  LAST      => LO
  FIRST     => MI
  NFRST     => PL
  PMORE     => HI
  PLAST     => LS
  TCONT     => GE
  TSTOP     => LT

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48869

llvm-svn: 336245
2018-07-04 08:50:49 +00:00
Fangrui Song bc5c7f2ef0 [AArch64] Make function parameter names in declarations match those of definitions
llvm-svn: 336222
2018-07-03 19:07:53 +00:00
Sander de Smalen 128fdfa23f [AArch64][SVE] Asm: Support for FP Complex ADD/MLA.
The variants added in this patch are:

- Predicated Complex floating point ADD with rotate, e.g.

   fcadd   z0.h, p0/m, z0.h, z1.h, #90

- Predicated Complex floating point MLA with rotate, e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h, #180

- Unpredicated Complex floating point MLA with rotate (indexed operand), e.g.

   fcmla   z0.h, p0/m, z1.h, z2.h[0], #180

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48824

llvm-svn: 336210
2018-07-03 16:01:27 +00:00
Amara Emerson d912ffaba5 [AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores.
r336120 resulted in falling back to SelectionDAG more often due to the G_STORE
MMOs not matching the vreg size. This fixes that by explicitly any-extending the
value.

llvm-svn: 336209
2018-07-03 15:59:26 +00:00
Sander de Smalen 8cd1f53334 [AArch64][SVE] Asm: Support for FMUL (indexed)
Unpredicated FP-multiply of SVE vector with a vector-element given by
vector[index], for example:

  fmul z0.s, z1.s, z2.s[0]

which performs an unpredicated FP-multiply of all 32-bit elements in
'z1' with the first element from 'z2'.

This patch adds restricted register classes for SVE vectors:
  ZPR_3b (only z0..z7 are allowed)  - for indexed vector of 16/32-bit elements.
  ZPR_4b (only z0..z15 are allowed) - for indexed vector of 64-bit elements.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48823

llvm-svn: 336205
2018-07-03 15:31:04 +00:00
Sander de Smalen cbd224941f [AArch64][SVE] Asm: Support for predicated unary operations.
The patch includes support for the following instructions:

       ABS z0.h, p0/m, z0.h
       NEG z0.h, p0/m, z0.h

  (S|U)XTB z0.h, p0/m, z0.h
  (S|U)XTB z0.s, p0/m, z0.s
  (S|U)XTB z0.d, p0/m, z0.d

  (S|U)XTH z0.s, p0/m, z0.s
  (S|U)XTH z0.d, p0/m, z0.d

  (S|U)XTW z0.d, p0/m, z0.d

llvm-svn: 336204
2018-07-03 14:57:48 +00:00
Sjoerd Meijer 173b7f0ec7 [AArch64] Armv8.4-A: system registers
This adds the following system registers:
- RAS registers,
- MPAM registers,
- Activitiy monitor registers,
- Trace Extension registers,
- Timing insensitivity of data processing instructions,
- Enhanced Support for Nested Virtualization.

Differential Revision: https://reviews.llvm.org/D48871

llvm-svn: 336193
2018-07-03 12:09:20 +00:00
Sander de Smalen 7fc8543208 [AArch64][SVE] Asm: Support for saturing ADD/SUB instructions.
The variants added are:
    signed Saturating ADD/SUB (immediate)  e.g. sqadd z0.h, z0.h, #42
  unsigned Saturating ADD/SUB (immediate)  e.g. uqadd z0.h, z0.h, #42
    signed Saturating ADD/SUB (vectors)    e.g. sqadd z0.h, z0.h, z1.h
  unsigned Saturating ADD/SUB (vectors)    e.g. uqadd z0.h, z0.h, z1.h

llvm-svn: 336186
2018-07-03 09:48:22 +00:00
Sander de Smalen 8fcc3f5feb [AArch64][SVE] Asm: Support for vector element FP compare.
Contains the following variants:

- Compare with (elements from) other vector
  instructions: fcmeq, fcmgt, fcmge, fcmne, fcmuo.
  aliases: fcmle, fcmlt.

  e.g. fcmle   p0.h, p0/z, z0.h, z1.h => fcmge p0.h, p0/z, z1.h, z0.h

- Compare absolute values with (absolute values from) other vector.
  instructions: facge, facgt.
  aliases: facle, faclt.

  e.g. facle   p0.h, p0/z, z0.h, z1.h => facge   p0.h, p0/z, z1.h, z0.h

- Compare vector elements with #0.0
  instructions: fcmeq, fcmgt, fcmge, fcmle, fcmlt, fcmne.

  e.g. fcmle   p0.h, p0/z, z0.h, #0.0

llvm-svn: 336182
2018-07-03 09:07:23 +00:00
Amara Emerson 846f2436e8 [AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin.
We currently don't any-extend vararg parameters before storing them to the stack
locations on Darwin. However, SelectionDAG however does this, and so user code
is in the wild which inadvertently relies on this extension. This can manifest
in cases where the value stored is (int)0, but the actual parameter is interpreted
by va_arg as a pointer, and so not extending to 64 bits causes the callee to
load additional undefined bits.

llvm-svn: 336120
2018-07-02 16:39:09 +00:00
Sander de Smalen 8d4c01a702 [AArch64][SVE] Asm: Support for (SQ)INCP/DECP (scalar, vector)
Increments/decrements the result with the number of active bits
from the predicate.

The inc/dec variants added are:
- incp   x0, p0.h     (scalar)
- incp   z0.h, p0     (vector)

The unsigned saturating inc/dec variants added are:
- uqincp x0, p0.h     (scalar)
- uqincp w0, p0.h     (scalar, 32bit)
- uqincp z0.h, p0     (vector)

The signed saturating inc/dec variants added are:
- sqincp x0, p0.h     (scalar)
- sqincp x0, p0.h, w0 (scalar, 32bit)
- sqincp z0.h, p0     (vector)

llvm-svn: 336091
2018-07-02 10:08:36 +00:00
Sander de Smalen c504101781 [AArch64][SVE] Asm: Support for (saturating) vector INC/DEC instructions.
Increment/decrement vector by multiple of predicate constraint
element count.

The variants added by this patch are:
 - INCH, INCW, INC 

and (saturating):
 - SQINCH, SQINCW, SQINCD
 - UQINCH, UQINCW, UQINCW
 - SQDECH, SQINCW, SQINCD
 - UQDECH, UQINCW, UQINCW

For example:
  incw z0.s, all, mul #4

llvm-svn: 336090
2018-07-02 09:31:11 +00:00
Sander de Smalen 8eea4f1c7d [AArch64][SVE] Asm: Support for vector element compares (immediate).
Compare vector elements with a signed/unsigned immediate, e.g.
  cmpgt   p0.s, p0/z, z0.s, #-16
  cmphi   p0.s, p0/z, z0.s, #127

llvm-svn: 336081
2018-07-02 08:20:59 +00:00
Sander de Smalen 0325e304b9 Reapply r334980 and r334983.
These patches were previously reverted as they led to 
buildbot time-outs caused by large switch statement in
printAliasInstr when using UBSan and O3.  The issue has
been addressed with a workaround (r335525).

llvm-svn: 336079
2018-07-02 07:34:52 +00:00
Sjoerd Meijer 3b599d75d5 [AArch64] Armv8.4-A: Virtualization system registers
This adds the Secure EL2 extension.

Differential Revision: https://reviews.llvm.org/D48711

llvm-svn: 335962
2018-06-29 11:03:15 +00:00
Sjoerd Meijer 195e904002 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Jessica Paquette dafa198c96 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Tim Northover 70666e7765 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Vlad Tsyrklevich 98724e582e Revert r334980 and 334983
This reverts commits r334980 and r334983 because they were causing build
timeouts on the x86_64-linux-ubsan bot.

llvm-svn: 335085
2018-06-20 00:02:32 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Sander de Smalen 067eee1c13 [AArch64][SVE] Asm: Fix predicate pattern diagnostics.
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.

Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48220

llvm-svn: 334983
2018-06-18 21:03:02 +00:00
Sander de Smalen 7ac9e193ec [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.
The variants added by this patch are:
- SQINC     signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC     signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC   unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC   unsigned decrement, e.g. uqdec w0, all, mul #4
 
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
  x0 == GPR64(w0)
in:
  sqinc x0, w0, all, mul #4
         ^___^ (must match)

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47716

llvm-svn: 334980
2018-06-18 20:50:33 +00:00
Sander de Smalen 13684d8400 [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.
Summary:
The variants added by this patch are:
- SQINC  (signed increment)
- UQINC  (unsigned increment)
- SQDEC  (signed decrement)
- UQDEC  (unsigned decrement)

For example:
  uqincw  x0, all, mul #4

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Differential Revision: https://reviews.llvm.org/D47715

llvm-svn: 334948
2018-06-18 14:47:52 +00:00
Sander de Smalen d521c4353e [AArch64][SVE] Asm: Support for vector element compares.
This patch adds instructions for comparing elements from two vectors, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.s

and also adds support for comparing to a 64-bit wide element vector, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.d

The patch also contains aliases for certain comparisons, e.g.:
  cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
  cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
  cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
  cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s

llvm-svn: 334931
2018-06-18 10:59:19 +00:00
Sander de Smalen 279b7e74e7 [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.
This patch adds support for instructions performing bitwise operations
on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and
their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS.

This patch also adds several aliases:

  orr  p0.b, p1/z, p1.b, p1.b  => mov  p0.b, p1.b
  orrs p0.b, p1/z, p1.b, p1.b  => movs p0.b, p1.b

  and  p0.b, p1/z, p2.b, p2.b  => mov  p0.b, p1/z, p2.b
  ands p0.b, p1/z, p2.b, p2.b  => movs p0.b, p1/z, p2.b

  eor  p0.b, p1/z, p2.b, p1.b  => not  p0.b, p1/z, p2.b
  eors p0.b, p1/z, p2.b, p1.b  => nots p0.b, p1/z, p2.b

llvm-svn: 334906
2018-06-17 10:48:21 +00:00
Sander de Smalen 2c25b4cd36 [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.

llvm-svn: 334905
2018-06-17 10:11:04 +00:00
Sander de Smalen a6edca72ba [AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions.
Predicated splat/copy of SIMD/FP register or general purpose
register to SVE vector, along with MOV-aliases.

llvm-svn: 334842
2018-06-15 16:39:46 +00:00
Sander de Smalen 18ac8f9f25 [AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions.
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47713

llvm-svn: 334838
2018-06-15 15:47:44 +00:00
Sander de Smalen 5eb51d7495 [AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47712

llvm-svn: 334831
2018-06-15 13:57:51 +00:00
Sander de Smalen 3cbf171479 [AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates.
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.

This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47711

llvm-svn: 334826
2018-06-15 13:11:49 +00:00
Clement Courbet 5eeed77f87 [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Petr Hosek 7250908016 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Luke Geeson dc82aa44e6 [AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns
llvm-svn: 334488
2018-06-12 09:35:20 +00:00
Clement Courbet f4f6899cdf [ExynosM1][Sched] Fix resource usage in scheduling model.
This is part of https://reviews.llvm.org/D46356.

llvm-svn: 334391
2018-06-11 07:33:08 +00:00
Evandro Menezes b2c8244715 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Jessica Paquette aa087327ce [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Luke Geeson 43e4367961 [AArch64] Audit on rL333634 to fix FP16 Disasm BitPatterns
llvm-svn: 333879
2018-06-04 09:41:32 +00:00
Sander de Smalen d0a6f6a502 [AArch64][SVE] Fix range for DUP immediates (16bit elts)
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47619

llvm-svn: 333873
2018-06-04 07:24:23 +00:00
Sander de Smalen fd54a781f6 [AArch64][SVE] Asm: Print indexed element 0 as FPR.
Print the first indexed element as a FP register, for example:

  mov z0.d, z1.d[0]

Is now printed as:

  mov z0.d, d1

Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47571

llvm-svn: 333872
2018-06-04 07:07:35 +00:00
Sander de Smalen c33d668ab7 [AArch64][SVE] Asm: Support for indexed DUP instructions.
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.

For example:

  dup     z0.h, z1.h[0]

duplicates the first 16-bit element from z1 to all elements in
the result vector z0.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47570

llvm-svn: 333871
2018-06-04 06:40:55 +00:00
Sander de Smalen 367a53b059 [AArch64][SVE] Asm: Support for FCPY immediate instructions.
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47518

llvm-svn: 333869
2018-06-04 05:58:06 +00:00
Sander de Smalen 512d57f1a5 [AArch64][SVE] Asm: Support for CPY immediate instructions
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47517

llvm-svn: 333868
2018-06-04 05:40:46 +00:00
Amara Emerson 5a3bb68e12 [AArch64][GlobalISel] Zero-extend s1 values when returning.
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.

Fixes PR36719

Differential Revision: https://reviews.llvm.org/D47425

llvm-svn: 333747
2018-06-01 13:20:32 +00:00
Sander de Smalen f95ea047e5 [AArch64][SVE] Asm: Support for FDUP_ZI (copy fp immediate) instruction.
Unpredicated copy of floating-point immediate value into SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47482

llvm-svn: 333744
2018-06-01 12:54:46 +00:00
Sander de Smalen 97ca6b9e09 [AArch64][SVE] Asm: Support for DUPM (masked immediate) instruction.
Unpredicated copy of repeating immediate pattern to SVE vector, along
with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47328

llvm-svn: 333731
2018-06-01 07:25:46 +00:00
Francis Visoiu Mistrih 90aba024c5 [MC] Fallback on DWARF when generating compact unwind on AArch64
Instead of asserting when using the def_cfa directive with a register
different from fp, fallback on DWARF.

Easily triggered with:

.cfi_def_cfa x1, 32;

rdar://40249694

Differential Revision: https://reviews.llvm.org/D47593

llvm-svn: 333667
2018-05-31 16:33:26 +00:00
Luke Geeson 2e09995d42 [AArch64] Reverted rL333427 fixing Clang UnitTest Failure
llvm-svn: 333634
2018-05-31 08:27:53 +00:00
Roman Tereshin 5a65eb75c7 [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Roman Tereshin 8f1753e994 [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Tim Northover d8949f5002 AArch64: print correct annotation for ADRP addresses.
The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the
actual PC-offset that will be calculated.

llvm-svn: 333525
2018-05-30 09:54:59 +00:00
Sander de Smalen bdf09fe7a2 [AArch64][AsmParser] Fix segfault on illegal fpimm.
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0  caused the compiler to crash.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47483

llvm-svn: 333524
2018-05-30 09:54:19 +00:00
Evandro Menezes f8425340e4 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Amara Emerson d5a9e7bbc9 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Sander de Smalen 8704b03c4d [AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors)
Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47365

llvm-svn: 333422
2018-05-29 14:40:24 +00:00
Sander de Smalen 26b9b2a8c3 [AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions.
This patch addresses the following variants:
  - bitmask immediate,         e.g. 'and z0.d, z0.d, #0x6'.
  - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
  - predicated data vectors,   e.g. 'and z0.d, p0/m, z0.d, z1.d'.

And also several aliases, such as: 
  - ORN, alias of ORR.
  - EON, alias of EOR.
  - BIC, alias of AND (immediate variant)
  - MOV, alias of ORR (if unpredicated and source register operands are the same)

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47363

llvm-svn: 333414
2018-05-29 13:08:43 +00:00
Luke Geeson 16092ab3c5 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Sander de Smalen 98686c6b15 [AArch64][SVE] Asm: Support for ADD (immediate) instructions.
This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands
that are unsigned values in the range 0 to 255. For element widths of
16 bits or higher it may also be a signed multiple of 256 in the
range 0 to 65280.

Note: This also does some refactoring to reuse convenience function
getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be
accepted to be mapped to SUB #4096.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47310

llvm-svn: 333408
2018-05-29 10:39:49 +00:00
Sander de Smalen 6e2a5b4cf0 Fix ubsan errors introduced by r333263 re. left-shifting negative values.
llvm-svn: 333270
2018-05-25 11:41:04 +00:00
Sander de Smalen 62770795a5 [AArch64][SVE] Asm: Support for DUP (immediate) instructions.
Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.

This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.

Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47309

llvm-svn: 333263
2018-05-25 09:47:52 +00:00
Eli Friedman 9e177882aa [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry 98150e3a62 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Chad Rosier 3f66363139 [CodeGen][AArch64] Use RegUnits to track register aliases. (NFC)
Use RegUnits to track register aliases in AArch64RedundantCopyElimination.

Differential Revision: https://reviews.llvm.org/D47269

llvm-svn: 333107
2018-05-23 17:49:38 +00:00
Alex Bradbury 0a59f18951 [AArch64] Use addAliasForDirective to support data directives
The AArch64 asm parser currently has custom parsing logic for .hword, .word, 
and .xword. Rather than use this custom logic, we can just use 
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.

Differential Revision: https://reviews.llvm.org/D47000

llvm-svn: 333077
2018-05-23 11:17:20 +00:00
Eli Friedman 785acce51d Delete unused variable from r333015.
(The assertion suppressed the unused variable warning on
Release+Asserts builds, so I didn't notice.)

llvm-svn: 333018
2018-05-22 19:38:07 +00:00
Eli Friedman 042dc9e092 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Peter Collingbourne dcd7d6c331 MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne 571a3301ae MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Peter Collingbourne e3f652973e Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Peter Collingbourne f7b81db715 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Clement Courbet 8892c7db08 [ExynosM3] Fix scheduling info.
Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 332713
2018-05-18 13:10:41 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Sander de Smalen 75cfa34156 [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46680

llvm-svn: 332584
2018-05-17 09:05:41 +00:00
Eli Friedman ddbf6d6514 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Eli Friedman 02709bcb78 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Sander de Smalen 22176a2242 [AArch64][SVE] Improve diagnostics for vectors with incorrect element-size.
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).

For example:
  add z0.s, z1.s, z2.b      -> invalid element width
               ^_____^
               mismatch

For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.

For example:
  ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
  ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
          ^________________^
               mismatch

For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
  prfw #0, p0, [x0, z0.d]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Without this change, the diagnostic would unnecessarily suggest a
different element size:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46688

llvm-svn: 332483
2018-05-16 15:45:17 +00:00
Sirish Pande cabe50a308 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Sander de Smalen bbc4e9a4e3 [AArch64][SVE] Asm: Support for gather PRF prefetch instructions
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46686

llvm-svn: 332472
2018-05-16 14:16:01 +00:00
Amara Emerson 0d6a26dffc [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

llvm-svn: 332449
2018-05-16 10:32:02 +00:00
Peter Smith c811758da6 [AArch64] Support "S" inline assembler constraint
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is

asm("adrp %0, %1\n\t"
    "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));

I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.

Fixes PR37180

Differential Revision: https://reviews.llvm.org/D46745

llvm-svn: 332444
2018-05-16 09:33:25 +00:00
Sander de Smalen a680f558be [AArch64][SVE] Asm: Support for structured LD2, LD3 and LD4 (scalar+scalar) load instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46679

llvm-svn: 332442
2018-05-16 09:16:20 +00:00
Sander de Smalen 67f9154964 [AArch64][SVE] Asm: Support for contiguous PRF prefetch instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46682

llvm-svn: 332433
2018-05-16 07:50:09 +00:00
Evandro Menezes 8d522d811a [AArch64] Improve single vector lane unscaled stores
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead.  In this case, also using the unscaled
offset.

Differential revision: https://reviews.llvm.org/D46762

llvm-svn: 332394
2018-05-15 20:41:12 +00:00
Evandro Menezes 14fa2e4fa5 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Sander de Smalen 93380371bb [AArch64][SVE] Extend parsing of Prefetch operation for SVE.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46681

llvm-svn: 332234
2018-05-14 11:54:41 +00:00
Geoff Berry 60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Haicheng Wu 0aae2bc260 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Adhemerval Zanella f384bc7166 [AArch64] Improve cost of vector division by constant
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32).  The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.

Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.

llvm-svn: 331873
2018-05-09 12:48:22 +00:00
Daniel Sanders 618437459c Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Daniel Sanders d24dcdd1f7 [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Sander de Smalen d8e76494fc [AArch64][SVE] Asm: Support for LD1R load-and-replicate scalar instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46251

llvm-svn: 331758
2018-05-08 10:46:55 +00:00
Sander de Smalen 20eede7093 [AArch64] Disallow vector operand if FPR128 Q register is required.
Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()'
to also return 'true' if the parsed register operand is a vector
register. Code in the AsmMatcher checks if a register is a subclass of the
expected register class. However, even though both parsed registers map
to the same physical register, the 'v' register is of kind 'NeonVector',
where 'q' is of type Scalar, where isSubclass() does not distinguish
between the two cases.

The solution is to use an AsmOperand instead of the register directly,
and use the PredicateMethod to distinguish the two operands.

This fixes for example:
  ldr v0, [x0]    // 'v0' is an invalid operand for this instruction
  ldr q0, [x0]    // valid

Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46310

llvm-svn: 331755
2018-05-08 10:01:04 +00:00
Daniel Sanders f84bc3793e [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Craig Topper 781aa181ab Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Adhemerval Zanella a57ef17ab6 [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Martin Storsjo d0b5034b8a [COFF, ARM64] Hook up a few remaining relocations
Differential Revision: https://reviews.llvm.org/D46355

llvm-svn: 331384
2018-05-02 18:24:37 +00:00
Sander de Smalen 659a48cd38 [AArch64][SVE] Asm: Support for LDR/STR fill and spill instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46270

llvm-svn: 331352
2018-05-02 13:32:39 +00:00
Sander de Smalen 57da042e32 [AArch64][SVE] Asm: Support for scatter ST1 store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46248

llvm-svn: 331349
2018-05-02 13:00:30 +00:00
Sander de Smalen 414d2358a4 [AArch64][SVE] Asm: Support for non-temporal, contiguous LDNT1/STNT1 load/store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46269

llvm-svn: 331343
2018-05-02 11:48:49 +00:00
Sander de Smalen c1e44bdfc7 [AArch64][SVE] Asm: Support for LD1RQ load-and-replicate quad-word vector instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46250

llvm-svn: 331339
2018-05-02 08:49:08 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sander de Smalen 788dc70c78 [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+scalar) store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46121

llvm-svn: 331260
2018-05-01 13:36:03 +00:00
Daniel Sanders 2de9d4ad5d Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Sander de Smalen 5861c263e0 [AArch64][SVE] Asm: Improve diagnostics for gather loads.
This patch extends the 'isSVEVectorRegWithShiftExtend' function to 
improve diagnostics for SVE's gather load (scalar + vector) addressing 
modes. Instead of always suggesting the 'unscaled' addressing mode, 
the use of DiagnosticPredicate enables a more specific error message
in the context where the scaling is incorrect. For example:

  ld1h z0.d, p0/z, [x0, z0.d, lsl #2]
                                   ^ 
           shift amount should be '1'

Instead of suggesting the packed, unscaled addressing mode:
  expected 'z[0..31].d, (uxtw|sxtw)'

the assembler now suggests using the proper scaling:
  expected 'z[0..31].d, (lsl|uxtw|sxtw) #1'

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46124

llvm-svn: 331162
2018-04-30 07:24:38 +00:00
Sander de Smalen afe1ee2180 [AArch64][AsmParser] NFC: Cleanup of addOperands functions
Most of the add<operandname>Operands() functions are the same
and can be replaced by using a single 'RenderMethod' in
the AArch64InstrFormats.td file. Since many of the scaled
immediates (with different scaling/bits) are the same, most of
these can reuse the same AsmOperandClass.

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46122

llvm-svn: 331146
2018-04-29 18:18:21 +00:00
Sander de Smalen 50ded90072 [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (vector + imm) load instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46120

llvm-svn: 331145
2018-04-29 17:33:38 +00:00
Daniel Sanders 5eb9f581b6 [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Jessica Paquette 0b6724917a [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Daniel Sanders 27fe8a5011 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Jun Bum Lim 47aece1344 [CodeGen] Use RegUnits to track register aliases (NFC)
Summary: Use RegUnits to track register aliases in PostRASink and AArch64LoadStoreOptimizer.

Reviewers: thegameg, mcrosier, gberry, qcolombet, sebpop, MatzeB, t.p.northover, javed.absar

Reviewed By: thegameg, sebpop

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45695

llvm-svn: 331066
2018-04-27 18:44:37 +00:00
Francis Visoiu Mistrih c855e92ca9 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Oliver Stannard 76088a5929 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Eli Friedman da018e5687 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Geoff Berry 08ab8c9544 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Matthew Simpson b4096ebe26 [TTI, AArch64] Add transpose shuffle kind
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.

Differential Revision: https://reviews.llvm.org/D45982

llvm-svn: 330941
2018-04-26 13:48:33 +00:00
Sander de Smalen fe17a78b86 [AArch64][SVE] Enable DiagnosticPredicates for SVE LD1 instructions.
This patch extends the PredicateMethod of AsmOperands used in SVE's 
LD1 instructions with a DiagnosticPredicate. This makes them 'context
sensitive' to the operand that has been parsed and tells the user to
use the right register (with expected shift/extend), rather than telling 
the immediate is out of range when it actually parsed a register.

Patch [2/2] in a series to improve assembler diagnostics for SVE:
-  Patch [1/2]: https://reviews.llvm.org/D45879
-  Patch [2/2]: https://reviews.llvm.org/D45880

Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45880

llvm-svn: 330934
2018-04-26 12:54:42 +00:00
Sander de Smalen 74f9e6720b [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (scalar + vector) load instructions.
Patch [2/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46023

llvm-svn: 330928
2018-04-26 08:19:53 +00:00
Amara Emerson 1f5d994119 [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Sander de Smalen eb896b148b [AArch64][SVE] Asm: Add AsmOperand classes for SVE gather/scatter addressing modes.
This patch adds parsing support for 'vector + shift/extend' and
corresponding asm operand classes, needed for implementing SVE's
gather/scatter addressing modes.

The added combinations of vector (ZPR) and Shift/Extend are:

Unscaled:
  ZPR64ExtLSL8:           signed 64-bit offsets  (z0.d)
  ZPR32ExtUXTW8:        unsigned 32-bit offsets  (z0.s, uxtw)
  ZPR32ExtSXTW8:          signed 32-bit offsets  (z0.s, sxtw)

Unpacked and unscaled:
  ZPR64ExtUXTW8:        unsigned 32-bit offsets  (z0.d, uxtw)
  ZPR64ExtSXTW8:          signed 32-bit offsets  (z0.d, sxtw)

Unpacked and scaled:
  ZPR64ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.d, uxtw #<shift>)
  ZPR64ExtSXTW<scale>:    signed 32-bit offsets  (z0.d, sxtw #<shift>)

Scaled:
  ZPR32ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.s, uxtw #<shift>)
  ZPR32ExtSXTW<scale>:    signed 32-bit offsets  (z0.s, sxtw #<shift>)
  ZPR64ExtLSL<scale>:   unsigned 64-bit offsets  (z0.d,  lsl #<shift>)
  ZPR64ExtLSL<scale>:     signed 64-bit offsets  (z0.d,  lsl #<shift>)


Patch [1/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45951

llvm-svn: 330805
2018-04-25 09:26:47 +00:00
Jessica Paquette 4f56428de1 [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Sander de Smalen eb1053f9d3 [AArch64][SVE] Asm: Support for contiguous, first-faulting LDFF1 (scalar+scalar) load instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45946

llvm-svn: 330697
2018-04-24 08:59:08 +00:00
Peter Collingbourne 5ab4a4793e Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure.
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 330630
2018-04-23 19:09:34 +00:00
Nico Weber 5d53aed419 Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt
llvm-svn: 330584
2018-04-23 12:49:34 +00:00
Sander de Smalen 7893f722b2 [AArch64][SVE] Asm: Support for contiguous, non-faulting LDNF1 (scalar+imm) load instructions
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45684

llvm-svn: 330583
2018-04-23 12:43:19 +00:00
Sander de Smalen 1b6d374422 [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+imm) store instructions.
Reviewers: fhahn, rengolin, javed.absar, SjoerdMeijer, t.p.northover, echristo, evandro, huntergr

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45681

llvm-svn: 330565
2018-04-23 07:50:35 +00:00
Jessica Paquette 2e5ada5c81 [MachineOutliner] Change B instruction for tail calls to TCRETURNdi
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.

Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.

llvm-svn: 330459
2018-04-20 18:03:21 +00:00
Sander de Smalen 30f9f11d51 [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+scalar) load instructions.
This is patch [4/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45690

llvm-svn: 330423
2018-04-20 12:52:01 +00:00
Sander de Smalen 137efb231e [AArch64][SVE] Fix diagnostic for SVE LD4 instructions:
Diagnostic:
  'index must be multiple of 3 in range [-32, 28]'

Must be:
  'index must be multiple of 4 in range [-32, 28]'

llvm-svn: 330407
2018-04-20 09:45:50 +00:00
Sander de Smalen 367694b093 [AArch64][SVE] Added GPR64shifted and GPR64NoXZRshifted register classes.
Summary:
This is patch [3/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: SjoerdMeijer

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45689

llvm-svn: 330406
2018-04-20 08:54:49 +00:00
Sander de Smalen 149916d29a [AArch64][AsmParser] Extend RegOp with integrated 'shift/extend'.
Summary:
In some cases the shift/extend needs to be explicitly parsed together
with the register, rather than as a separate operand. This is needed
for addressing modes where the instruction as a whole dictates the
scaling/extend, rather than specific bits in the instruction.
By parsing them as a single operand, we avoid the need to pass an
extra operand in all CodeGen patterns (because all operands need to
have an associated value), and we avoid the need to update TableGen to
accept operands that have no associated bits in the instruction.

An added benefit of parsing them together is that the assembler
can give a sensible diagnostic if the scaling is not correct.

This is patch [2/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn, SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45688

llvm-svn: 330394
2018-04-20 07:24:20 +00:00
Sander de Smalen 50d8702f26 [AArch64][AsmParser] NFC: Cleanup parsing of scalar registers.
Summary:
- Renamed tryParseRegister to tryParseScalarRegister, which
  now returns an OperandMatchResultTy.
- Moved matching of certain aliases into matchRegisterNameAlias.
- Changed type of most 'Reg' variables to 'unsigned'.

This is patch [1/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro, samparker

Reviewed By: samparker

Subscribers: samparker, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45687

llvm-svn: 330311
2018-04-19 07:35:08 +00:00
Amara Emerson 9de072f8ae [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00
Sander de Smalen 7a210db81e [AArch64][SVE] Asm: Support for structured LD4 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45624

llvm-svn: 330120
2018-04-16 10:46:18 +00:00
Sander de Smalen d239eb3ce3 [AArch64][SVE] Asm: Support for structured LD3 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45623

llvm-svn: 330116
2018-04-16 10:10:48 +00:00
Sander de Smalen f836af869d [AArch64][SVE] Asm: Support for structured LD2 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45622

llvm-svn: 330108
2018-04-16 07:09:29 +00:00
Tim Northover 271d3d2771 MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Peter Collingbourne ab40e44ba1 Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses."
Caused a hang and eventually an assertion failure in LTO builds
of 7zip-benchmark on aarch64 iOS targets.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

llvm-svn: 330063
2018-04-13 20:21:00 +00:00
Sander de Smalen 5b12db5d23 [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+imm) load instructions
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45618

llvm-svn: 330024
2018-04-13 14:41:36 +00:00
Sander de Smalen 5c62598b0d [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+imm) store instructions.
Summary:
Added instructions for contiguous stores, ST1, with scalar+imm addressing
modes and corresponding tests. The patch also adds parsing of
'mul vl' as needed for the VL-scaled immediate.

This is patch [6/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45432

llvm-svn: 330014
2018-04-13 12:56:14 +00:00
Sander de Smalen ea626e37ba [AArch64][SVE] Asm: Add support for parsing and printing SVE vector lists.
Summary:
Added Z_(b|h|s|d) vector list RegisterOperands along with support to
add/print the vector lists.

This is patch [5/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45431

llvm-svn: 330000
2018-04-13 09:11:53 +00:00
Peter Collingbourne 00db326b0d AArch64: Introduce a DAG combine for folding offsets into addresses.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329956
2018-04-12 21:23:55 +00:00
Jessica Paquette 8aa6cd5cb9 [AArch64] Move AFI->setRedZone(false) to top of emitPrologue
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.

This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.

llvm-svn: 329922
2018-04-12 16:16:18 +00:00
Sander de Smalen 525e3225c2 [AArch64][AsmParser] Unify 'addVectorListOperands' functions.
Summary:
Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a
generic 'addVectorListOperands', which can be easily extended to work
for SVE vectors.

This is patch [4/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45430

llvm-svn: 329909
2018-04-12 13:19:32 +00:00
Sander de Smalen 650234ba36 [AArch64][AsmParser] Make parse function for VectorLists generic to other vector types.
Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand 
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple 
kinds of vectors and recover if there is no match.

This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45429

llvm-svn: 329900
2018-04-12 11:40:52 +00:00
Sander de Smalen c88f9a1a57 [AArch64][AsmParser] Split index parsing from vector list.
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.

This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45428

llvm-svn: 329809
2018-04-11 14:10:37 +00:00
Francis Visoiu Mistrih 6463922e3a [AArch64] Fix regression after r329691
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.

This patch now always picks the offset that fits first, even if FP is
preferred.

llvm-svn: 329797
2018-04-11 12:36:55 +00:00
Sander de Smalen 73937b7c9d [AArch64][AsmParser] Unify code for parsing Neon/SVE vectors.
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.

This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45427

llvm-svn: 329782
2018-04-11 07:36:10 +00:00
Geoff Berry 5696e075c3 [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.

Reviewers: mcrosier

Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45502

llvm-svn: 329761
2018-04-10 21:43:03 +00:00
Jessica Paquette a450ed2352 Recommit r329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This commit fixes the bot failures that were coming up before with r329716.

The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.

This should work in debug and release builds now.

llvm-svn: 329746
2018-04-10 19:46:43 +00:00
Amara Emerson e27d5016ef [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.
rdar://39175175

llvm-svn: 329743
2018-04-10 19:01:58 +00:00
Jessica Paquette c140bbddaf Revert 329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This broke a bunch of bots so I'm reverting while I figure it out.

llvm-svn: 329728
2018-04-10 17:53:41 +00:00
Peter Collingbourne a7d936f0c0 Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."
Caused a build failure in check-tsan.

llvm-svn: 329718
2018-04-10 16:19:30 +00:00
Jessica Paquette e4b90d82a0 Add missing nullptr check to AArch64MachObjectWriter::recordRelocation
There was missing nullptr check before a call to getSection() in
recordRelocation. This would result in a segfault in code like the attached
test.

This adds the missing check and a test which makes sure we get the expected 
error output.

llvm-svn: 329716
2018-04-10 15:53:28 +00:00
Chad Rosier af7519e9af Fix spelling. NFC.
llvm-svn: 329709
2018-04-10 14:57:13 +00:00
Francis Visoiu Mistrih f2c22050e8 [AArch64] Use FP to access the emergency spill slot
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.

This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.

The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.

Previously discussed here: https://reviews.llvm.org/D40876.

Differential Revision: https://reviews.llvm.org/D45358

llvm-svn: 329691
2018-04-10 11:29:40 +00:00
Tim Northover 6a1c51bf6b AArch64: diagnose unpredictable store-exclusive instructions
Much like any written register in load/store instructions, the status register
is not allowed to overlap with any others. So diagnose it like we already do
with the other cases.

llvm-svn: 329687
2018-04-10 11:04:29 +00:00
Sander de Smalen f974e255fe [AArch64][SVE] Asm: Add support for unpredicated LSL/LSR (shift by immediate) instructions.
Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro

Reviewed By: rengolin, fhahn

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45371

llvm-svn: 329681
2018-04-10 10:03:13 +00:00
Sander de Smalen 30fda45c18 [AArch64][SVE] Asm: Add support for SVE INDEX instructions.
Reviewers: rengolin, fhahn, javed.absar, SjoerdMeijer, huntergr, t.p.northover, echristo, evandro

Reviewed By: rengolin, fhahn

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45370

llvm-svn: 329674
2018-04-10 07:01:53 +00:00
Daniel Sanders 5281b02e84 [globalisel][legalizerinfo] Add support for the Lower action in getActionDefinitionsBuilder() and use it in AArch64.
Lower is slightly odd. It often doesn't change the type but the lowerings
do use the new type to decide what code to create. Treat it like a mutation
but provide convenience functions that re-use the existing type.

Re-uses the existing tests:
test/CodeGen/AArch64/GlobalISel/legalize-rem.mir
test/CodeGen/AArch64/GlobalISel//legalize-mul.mir
test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir

llvm-svn: 329623
2018-04-09 21:10:09 +00:00
Peter Collingbourne 5cff2409ae AArch64: Allow offsets to be folded into addresses with ELF.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. It reduces the size of Chromium for
Android's .text section by 46KB, or 56KB with ThinLTO (which exposes
more opportunities to use a direct access rather than a GOT access).

Because the addend range is limited in COFF and Mach-O, this is
enabled for ELF only.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329611
2018-04-09 19:59:57 +00:00
Hiroshi Inoue 9ff2380ea6 [NFC] fix trivial typos in comments and error message
"is is" -> "is", "are are" -> "are"

llvm-svn: 329546
2018-04-09 04:37:53 +00:00
Sanjay Patel 0d7df36c66 [TargetSchedule] shrink interface for init(); NFCI
The TargetSchedModel is always initialized using the TargetSubtargetInfo's 
MCSchedModel and TargetInstrInfo, so we don't need to extract those and 
pass 3 parameters to init().

Differential Revision: https://reviews.llvm.org/D44789

llvm-svn: 329540
2018-04-08 19:56:04 +00:00
Peter Collingbourne f11eb3ebe7 AArch64: Implement support for the shadowcallstack attribute.
The implementation of shadow call stack on aarch64 is quite different to
the implementation on x86_64. Instead of reserving a segment register for
the shadow call stack, we reserve the platform register, x18. Any function
that spills lr to sp also spills it to the shadow call stack, a pointer to
which is stored in x18.

Differential Revision: https://reviews.llvm.org/D45239

llvm-svn: 329236
2018-04-04 21:55:44 +00:00
Jessica Paquette bccd18b816 [MachineOutliner] Add `useMachineOutliner` target hook
The MachineOutliner has a bunch of target hooks that will call llvm_unreachable
if the target doesn't implement them. Therefore, if you enable the outliner on
such a target, it'll just crash. It'd be much better if it'd just *not* run
the outliner at all in this case.

This commit adds a hook to TargetInstrInfo that returns false by default.
Targets that implement the hook make it return true. The outliner checks the
return value of this hook to decide whether or not to continue.

llvm-svn: 329220
2018-04-04 19:13:31 +00:00
Mandeep Singh Grang 93ab79d205 [AArch64] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches.

Reviewers: t.p.northover, jmolloy, RKSimon, rengolin

Reviewed By: rengolin

Subscribers: dexonsmith, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D44853

llvm-svn: 329216
2018-04-04 18:20:28 +00:00
Nico Weber 1cbd096914 Sort targetgen calls in lib/Target/*/CMakeLists.
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.

Also remove some stale-looking comments from the Nios2 target cmakefile.

No intended behavior change.

llvm-svn: 329181
2018-04-04 12:37:44 +00:00
John Brawn 21d9b33d62 [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)
Differential Revision: https://reviews.llvm.org/D44573

llvm-svn: 329163
2018-04-04 10:12:53 +00:00
Evandro Menezes 6b8d8f4010 [AArch64] Adjust the cost model for Exynos M3
Fix typo and simplify matching expression.

llvm-svn: 329130
2018-04-03 22:57:17 +00:00
Jessica Paquette 642f6c61a3 [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfo
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.

This removes the requirement to pass -mno-red-zone when outlining for AArch64.

https://reviews.llvm.org/D45189

llvm-svn: 329120
2018-04-03 21:56:10 +00:00
Petr Hosek 934e5d5436 [AArch64] Reserve x18 register on Fuchsia
This register is reserved as a platform register on Fuchsia.

Differential Revision: https://reviews.llvm.org/D45105

llvm-svn: 328950
2018-04-01 23:44:04 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie 8ad9a97310 Plumb useAA through TargetTransformInfo to remove Transforms->CodeGen header dependency
Thanks to echristo for the pointers on direction.

llvm-svn: 328737
2018-03-28 22:28:50 +00:00
Jessica Paquette 4aa14dbcc2 [MachineOutliner] Simplify call outlining + require valid callee save info for call outlining
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.

llvm-svn: 328719
2018-03-28 17:52:31 +00:00
Jessica Paquette 2519ee7081 [MachineOutliner] AArch64: Don't outline ADRPs with un-outlinable operands
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.

This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.

llvm-svn: 328674
2018-03-27 22:23:48 +00:00
Rafael Auler d058b882be [AArch64] Decorate AArch64 instrs with OPERAND_PCREL
Summary:
This is a canonical way to teach objdump to print the target
symbols for branches when disassembling AArch64 code.

Reviewers: evandro, t.p.northover, espindola

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D44851

llvm-svn: 328638
2018-03-27 16:58:01 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
David Blaikie 6054e650ff Move TargetLoweringObjectFile from CodeGen to Target to fix layering
It's implemented in Target & include from other Target headers, so the
header should be in Target.

llvm-svn: 328392
2018-03-23 23:58:19 +00:00
John Brawn e3b44f9de6 [AArch64] Don't reduce the width of loads if it prevents combining a shift
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.

Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.

Differential Revision: https://reviews.llvm.org/D44794

llvm-svn: 328321
2018-03-23 14:47:07 +00:00
Florian Hahn 588e640ea1 [AArch64] Clean-up a few over-eager regexps in models.
Patch by Simon Pilgrim <llvm-dev@redking.me.uk>

That is a slightly modified version of the AArch64 changes from
Simon's D44687 .

llvm-svn: 328303
2018-03-23 11:00:42 +00:00
Michael Zolotukhin fab7a676c2 State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'.
That removes some redundant recomputations from the passes pipeline.

llvm-svn: 328272
2018-03-22 23:44:40 +00:00
Jun Bum Lim 2ecb7ba4c6 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Evandro Menezes 36afbee1d8 [AArch64] Adjust the cost model for Exynos M3
Fix typo in the number of integer dividers.

llvm-svn: 328027
2018-03-20 20:00:29 +00:00
Geoff Berry 0b64402adb [AArch64][Falkor] Correct load/store increment scheduling details
llvm-svn: 327982
2018-03-20 13:46:35 +00:00
Jessica Paquette 563548d8f3 [MachineOutliner] AArch64: Emit CFI instructions when outlining calls
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.

Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.

llvm-svn: 327917
2018-03-19 22:48:40 +00:00
Martin Storsjo 9a55c1b0dc [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

llvm-svn: 327897
2018-03-19 20:06:50 +00:00
Nicolai Haehnle 4186cc7c08 TableGen: Check the dynamic type of !cast<Rec>(string)
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.

To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.

Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164

Reviewers: arsenm, craig.topper, tra, MartinO

Subscribers: wdng, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D44475

llvm-svn: 327849
2018-03-19 14:14:20 +00:00
Craig Topper 75aeb62eb4 [AArch64] Fix a few InstRWs in the A53 scheduler model and enable FullInstRWOverlapCheck.
This fixes the errors found by the new check added in r327808.

llvm-svn: 327812
2018-03-18 22:16:53 +00:00
Craig Topper e1d6a4df1c [TableGen] When trying to reuse a scheduler class for instructions from an InstRW, make sure we haven't already seen another InstRW containing this instruction on this CPU.
This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check.

So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag.

A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found.

llvm-svn: 327808
2018-03-18 19:56:15 +00:00
Martin Storsjo 36d6419cc5 [AArch64] Skip an unnecessary getCopyToReg in DYNAMIC_STACKALLOC
Differential Revision: https://reviews.llvm.org/D44586

llvm-svn: 327779
2018-03-17 20:08:48 +00:00
Jessica Paquette b3e7dc9144 [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Matthew Simpson eacfefd056 [AArch64] Implement getArithmeticReductionCost
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.

Differential Revision: https://reviews.llvm.org/D44490

llvm-svn: 327702
2018-03-16 11:34:15 +00:00
Evandro Menezes d4254ac1b9 [AArch64] Adjust the cost model for Exynos M3
Fix typo.

llvm-svn: 327663
2018-03-15 20:37:32 +00:00
Evandro Menezes 5303f897d4 [AArch64] Adjust the cost model for Exynos M3
Add special case for rotate right.

llvm-svn: 327662
2018-03-15 20:31:25 +00:00
Evandro Menezes 1515e859c6 [AArch64] Adjust the cost model for Exynos M3
Increase the number of cheap as move cases of register reset.

llvm-svn: 327661
2018-03-15 20:31:13 +00:00
Francis Visoiu Mistrih 164560bd74 [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Francis Visoiu Mistrih 084e7d8770 [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Martin Storsjo bde677289a [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Martin Storsjo 7bc64bd889 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
Martin Storsjo cc24096d4d [AArch64] Implement native TLS for Windows
Differential Revision: https://reviews.llvm.org/D43971

llvm-svn: 327220
2018-03-10 19:05:21 +00:00
Weiming Zhao a4259cd3a6 [AArch64] Fix UB about shift amount exceeds data bit-width
Summary:
Fixes an UB caught by sanitizer. The shift amount might be larger than 32 so the operand should be 1ULL.
In this patch,  we replace the original expression with  existing API with uint64_t type.

Reviewers: eli.friedman, rengolin

Reviewed By: rengolin

Subscribers: rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D44234

llvm-svn: 326969
2018-03-08 00:28:25 +00:00
Rafael Espindola 06c064824e Delete code that is probably dead since r249303.
With r249303 the expression evaluation should expand variables that
are not in sections (and so don't have an atom).

llvm-svn: 326966
2018-03-08 00:17:13 +00:00
Evandro Menezes f9bd871d32 [AArch64] Adjust the cost of integer vector division
Since there is no instruction for integer vector division, factor in the
cost of singling out each element to be used with the scalar division
instruction.

Differential revision: https://reviews.llvm.org/D43974

llvm-svn: 326955
2018-03-07 22:35:32 +00:00
Sebastian Pop 33bdb3e0e6 [AArch64] add missing pattern for insert_subvector undef
The attached testcase started failing after the patch to define
isExtractSubvectorCheap with the following pattern mismatch:

ISEL: Starting pattern match
  Initial Opcode index to 85068
    Match failed at index 85076
    LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0>

The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc
At the location of the error it is:
/* 85076*/    OPC_CheckChild2Type, MVT::i32,

And it failed to match the type of operand 2.
Adding another def-pat for i64 fixes the failed def-pat error:

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

llvm-svn: 326949
2018-03-07 22:07:13 +00:00
Sebastian Pop 41073e8046 [AArch64] define isExtractSubvectorCheap
Following the ARM-neon backend, define isExtractSubvectorCheap to return true
when extracting low and high part of a neon register.

The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This
testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive"
all DAG transforms until ISelLowering. The testcase is supposed to check that
AArch64TargetLowering::ReconstructShuffle() works, and for that we need a
BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into
an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to
ISelLowering. As there is no way to disable the combiner to only exercise the
code in ISelLowering, the patch disables the testcase.

Differential revision: https://reviews.llvm.org/D43973

llvm-svn: 326811
2018-03-06 16:54:55 +00:00
Sebastian Pop ac0bfb5938 fix PR36582
The error occurs when reading i16 elements (as in the testcase) from a v8i8
with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the
operation is not a VUZP. The patch stops the pattern recognition of VUZP when
EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR.

llvm-svn: 326722
2018-03-05 17:35:49 +00:00
Evandro Menezes cd855f70c5 [AArch64] Improve code generation of constant vectors
Use the whole gammut of constant immediates available to set up a vector.
Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which
transfers between register files, use the more efficient `movi v0.4s, #-1`
instead.  Not limited to just a few values, but any immediate value that can
be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating
the need to there be patterns to optimize special cases.

Differential revision: https://reviews.llvm.org/D42133

llvm-svn: 326718
2018-03-05 17:02:47 +00:00
Evandro Menezes 2bbb4a7c93 [AArch64] Clean up code (NFC)
Clean up a couple of functions in `AArch64TargetLowering` by removing
redundant statements.

llvm-svn: 326486
2018-03-01 21:17:36 +00:00
Martin Storsjo c61ff3bef1 [AArch64] Add support for secrel add/load/store relocations for COFF
Differential Revision: https://reviews.llvm.org/D43288

llvm-svn: 326480
2018-03-01 20:42:28 +00:00
Sebastian Pop c33af715d7 [AArch64] generate vuzp instead of mov
when a BUILD_VECTOR is created out of a sequence of EXTRACT_VECTOR_ELT with a
specific pattern sequence, either <0, 2, 4, ...> or <1, 3, 5, ...>, replace the
BUILD_VECTOR with either vuzp1 or vuzp2.

With this patch LLVM generates the following code for the first function fun1 in the testcase:
adrp x8, .LCPI0_0
ldr  q0, [x8, :lo12:.LCPI0_0]
tbl  v0.16b, { v0.16b }, v0.16b
ext  v1.16b, v0.16b, v0.16b, #8
uzp1 v0.8b, v0.8b, v1.8b
str  d0, [x8]
ret

Without this patch LLVM currently generates this code:
adrp    x8, .LCPI0_0
ldr     q0, [x8, :lo12:.LCPI0_0]
tbl     v0.16b, { v0.16b }, v0.16b
mov     v1.16b, v0.16b
mov     v1.b[1], v0.b[2]
mov     v1.b[2], v0.b[4]
mov     v1.b[3], v0.b[6]
mov     v1.b[4], v0.b[8]
mov     v1.b[5], v0.b[10]
mov     v1.b[6], v0.b[12]
mov     v1.b[7], v0.b[14]
str     d1, [x8]
ret

llvm-svn: 326443
2018-03-01 15:47:39 +00:00
Chih-Hung Hsieh 9f9e4681ac [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00
Aditya Nandakumar 599990530e [GISel]: Don't assert when constraining RegisterOperands which are uses.
Currently we assert that only non target specific opcodes can have
missing RegisterClass constraints in the MCDesc. The backend can have
instructions with register operands but don't have RegisterClass
constraints (say using unknown_class) in which case the instruction
defining the register will constrain it.
Change the assert to only fire if a def has no regclass.

https://reviews.llvm.org/D43409

llvm-svn: 326142
2018-02-26 22:56:21 +00:00
Evandro Menezes 1afffac05b [PATCH] [AArch64] Add new target feature to fuse conditional select
This feature enables the fusion of the comparison and the conditional select
instructions together.

Differential revision: https://reviews.llvm.org/D42392

llvm-svn: 325939
2018-02-23 19:27:43 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Hans Wennborg 89c35fc44d Support for the mno-stack-arg-probe flag
Adds support for this flag. There is also another piece for clang
(separate review). More info:
https://bugs.llvm.org/show_bug.cgi?id=36221

By Ruslan Nikolaev!

Differential Revision: https://reviews.llvm.org/D43107

llvm-svn: 325900
2018-02-23 13:46:25 +00:00
Evandro Menezes 5c986b010b [AArch64] Refactor macro fusion (NFC)
Move checks for each fusion case into separate functions for better
legibility and maintainability.

Differential revision: https://reviews.llvm.org/D43649

llvm-svn: 325844
2018-02-23 00:14:39 +00:00
Evandro Menezes 72f3983633 [AArch64] Refactor instructions using SIMD immediates
Get rid of icky goto loops and make the code easier to maintain.  Otherwise,
NFC.

Restore r324903 and fix PR36369.

Differentail revision: https://reviews.llvm.org/D43364

llvm-svn: 325621
2018-02-20 20:31:45 +00:00
Amara Emerson db211892ed [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.

https://reviews.llvm.org/D43444

llvm-svn: 325550
2018-02-20 05:11:57 +00:00
Francis Visoiu Mistrih 68ced40a23 Revert "[CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print"
This reverts commit r324681.

llvm-svn: 325505
2018-02-19 15:08:49 +00:00
Amara Emerson 242efdb54b Fix unused assertion variable warning.
llvm-svn: 325464
2018-02-18 17:28:34 +00:00
Amara Emerson 7e9f348b2d [AArch64][GlobalISel] Fix an assert fail/miscompile when fp16 types are copied
to gpr register banks.

PR36345.

rdar://36478867

Differential Revision: https://reviews.llvm.org/D43310

llvm-svn: 325463
2018-02-18 17:10:49 +00:00
Amara Emerson bc03baef77 [AArch64][GlobalISel] Support G_INSERT/G_EXTRACT of types < s32 bits.
These are needed for operations on fp16 types in a later patch.

llvm-svn: 325462
2018-02-18 17:03:02 +00:00
Haicheng Wu aed6e52b3c [AArch64] Coalesce Copy Zero during instruction selection
Add special case for copy of zero to avoid a double copy.

Differential Revision: https://reviews.llvm.org/D36104

llvm-svn: 325459
2018-02-18 13:51:33 +00:00
Martin Storsjo a63a5b993e [AArch64] Implement dynamic stack probing for windows
This makes sure that alloca() function calls properly probe the
stack as needed.

Differential Revision: https://reviews.llvm.org/D42356

llvm-svn: 325433
2018-02-17 14:26:32 +00:00
Simon Pilgrim 63db669013 Fix unused variable warning. NFCI.
We were casting to AArch64InstrInfo but only using it for static methods which some compilers complain about.

llvm-svn: 325432
2018-02-17 13:48:23 +00:00
Evandro Menezes 10ae20d80c [AArch64] Fix BITCAST lowering crash
The data type is assumed to be a vector, but sometimes it is not, leading
to an assertion.

Add simple test-case to verify this.

Differential revision: https://reviews.llvm.org/D42599

llvm-svn: 325378
2018-02-16 20:00:57 +00:00
Daniel Sanders 7fc87360e9 [globalisel][legalizerinfo] Follow up on post-commit review comments after r323681
* Document most API's
* Delete a useless function call
* Fix a discrepancy between the single and multi-opcode variants of
  getActionDefinitions().
  The multi-opcode variant now requires that more than one opcode is requested.
  Previously it acted much like the single-opcode form but unnecessarily
  enforced the requirements of the multi-opcode form.

llvm-svn: 325067
2018-02-13 23:02:44 +00:00
Hans Wennborg f381e94ac8 Revert r324903 "[AArch64] Refactor identification of SIMD immediates"
It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>"
in Chromium builds. See PR36369.

> Get rid of icky goto loops and make the code easier to maintain (NFC).
>
> Differential revision: https://reviews.llvm.org/D42723

llvm-svn: 325034
2018-02-13 18:14:38 +00:00
Abderrazek Zaafrani e72d99261f [AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - llvm portion
https://reviews.llvm.org/D42993

llvm-svn: 324912
2018-02-12 17:35:42 +00:00
Oliver Stannard 02f08c9d1f [AArch64] Improve v8.1-A code-gen for atomic load-and
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.

This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.

To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.

I've left the old tablegen patterns in because they are still needed for
global isel.

Differential revision: https://reviews.llvm.org/D42478

llvm-svn: 324908
2018-02-12 17:03:11 +00:00
Evandro Menezes 7dc0f1ec45 [AArch64] Refactor identification of SIMD immediates
Get rid of icky goto loops and make the code easier to maintain (NFC).

Differential revision: https://reviews.llvm.org/D42723

llvm-svn: 324903
2018-02-12 16:41:41 +00:00
Oliver Stannard 4269917304 [AArch64] Improve v8.1-A code-gen for atomic load-subtract
Armv8.1-A added an atomic load-add instruction, but not a load-subtract
instruction. Our current code-generation for atomic load-subtract always
inserts a NEG instruction to negate it's argument, even if it could be
folded into a constant or another instruction.

This adds lowering early in selection DAG to convert a load-subtract
operation into a subtract and a load-add, allowing the normal DAG
optimisations to work on it.

I've left the old tablegen patterns in because they are still needed for
global isel.

Some of the tests in this patch are copied from D35375 by Chad Rosier (which
was abandoned).

Differential revision: https://reviews.llvm.org/D42477

llvm-svn: 324892
2018-02-12 14:22:03 +00:00
Daniel Neilson 4a58b4b52c [AArch64FastISel] Replace deprecated calls to MemoryIntrinsic::getAlignment() (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes
AArch64FastISel to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting
source & dest specific alignments through the new API.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324773
2018-02-09 21:49:29 +00:00
Evandro Menezes 205c0e085e [AArch64] Adjust the cost model for Exynos M3
Fix the modeling of transfers between a generic register and a partial ASIMD
one.

llvm-svn: 324766
2018-02-09 19:26:11 +00:00
Evandro Menezes b5f12090fc [AArch64] Refactor stand alone methods (NFC)
Make stand alone methods in AArch64InstrInfo static.

llvm-svn: 324745
2018-02-09 16:14:41 +00:00
Jonas Paulsson 7850601fa3 [AArch64] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Martin Storsjö
llvm-svn: 324720
2018-02-09 09:22:20 +00:00
Francis Visoiu Mistrih d65438d0ca [CodeGen] Move printing '\n' from MachineInstr::print to MachineBasicBlock::print
MBB.print wasn't printing it, but the MIRPrinter is printing it. The
goal is to unify that as much as possible.

llvm-svn: 324681
2018-02-08 23:42:27 +00:00
Sjoerd Meijer 5ea465ded7 [AArch64] Don't materialize 0 with "fmov h0, .." when FullFP16 is not supported
We were generating "fmov h0, wzr" instructions when FullFP16 is not enabled.
I've not added any tests, because the problem was visible in:
test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll,
which I had to change: I don't think Cyclone has FullFP16 enabled
by default, so it shouldn't be using this v8.2a instruction.

I've also removed these rdar tags, please shout if there are any objections.

Differential Revision: https://reviews.llvm.org/D43020

llvm-svn: 324581
2018-02-08 08:39:05 +00:00
Evandro Menezes cb7959fd78 [AArch64] Adjust the cost model for Exynos M3
Fix the modeling of long division and SIMD conversion from integer and
horizontal minimum and maximum.

llvm-svn: 324417
2018-02-06 22:35:47 +00:00
Sander de Smalen 81fcf865be [AArch64][SVE] Asm: Add AND_ZI instructions and aliases
Summary: Adds support for the SVE AND instruction with vector and logical-immediate operands, and their corresponding aliases.

Reviewers: fhahn, rengolin, samparker, echristo, aadg, kristof.beyls

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D42295

llvm-svn: 324343
2018-02-06 13:13:21 +00:00
Oliver Stannard 6df8f43c4d [AArch64] Fix spelling of ICH_ELRSR_EL2 system register
This register was mis-spelled as ICH_ELSR_EL2, but has the correct encoding for
ICH_ELRSR_EL2.

llvm-svn: 324325
2018-02-06 09:39:04 +00:00
Oliver Stannard ee0ac39305 [ARM][AArch64] Add CSDB speculation barrier instruction
This adds the CSDB instruction, which is a new barrier instruction
described by the whitepaper at [1].

This is in encoding space which was previously executed as a NOP, so it is
available for all targets that have the relevant NOP encoding space. This
matches the binutils behaviour for these instructions [2][3].

[1] https://developer.arm.com/support/security-update
[2] https://sourceware.org/ml/binutils/2018-01/msg00116.html
[3] https://sourceware.org/ml/binutils/2018-01/msg00120.html

llvm-svn: 324324
2018-02-06 09:24:47 +00:00
Amara Emerson 3838ed0370 [AArch64][GlobalISel] Use getRegClassForTypeOnBank() in selectCopy.
Differential Revision: https://reviews.llvm.org/D42832

llvm-svn: 324110
2018-02-02 18:03:30 +00:00
Amara Emerson 58aea52bc4 [GlobalISel] Constrain the dest reg of IMPLICT_DEF.
This fixes a crash where the user is a COPY, which deliberately does not
constrain its source operands, resulting in a vreg without a reg class escaping
selection.

Differential Revision: https://reviews.llvm.org/D42697

llvm-svn: 324047
2018-02-02 01:44:43 +00:00
Sanjay Patel d7bed12192 [AArch64] remove bogus comment; NFC
I added this comment with D42323, but as discussed in D42806, the architecture
does the right thing for denorms. We don't even need the select on 0.0 here?

llvm-svn: 323996
2018-02-01 19:59:33 +00:00
Sanjay Patel 657e5d8d41 [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when 
using a reciprocal square root estimate instruction.

Here, I've conditionalized the filtering out of denorms based on the function 
having "denormal-fp-math"="ieee" in its attributes. The other options for this 
attribute are 'preserve-sign' and 'positive-zero'.

So we don't generate this extra code by default with just '-ffast-math' (because 
then there's no denormal attribute string at all), but it works if you specify 
'-ffast-math -fdenormal-fp-math=ieee' from clang. 

As noted in the review, there may be other problems in clang that affect the 
results depending on platform (Linux x86 at least), but this should allow 
creating the desired codegen.

Differential Revision: https://reviews.llvm.org/D42323

llvm-svn: 323981
2018-02-01 16:57:18 +00:00
Clement Courbet ea8d07eb76 [AArch64][NFC] Make all ProcResource definitions include their SchedModel.
This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all
other targets.

llvm-svn: 323955
2018-02-01 12:12:01 +00:00
Martin Storsjo 708498a164 [AArch64] Properly handle dllimport of variables when using fast-isel
Differential Revision: https://reviews.llvm.org/D42567

llvm-svn: 323810
2018-01-30 19:50:51 +00:00
Evandro Menezes f1d01645a7 [AArch64] Add new target feature to fuse address generation with load or store
This feature enables the fusion of the address generation and a
corresponding load or store together.

Differential revision: https://reviews.llvm.org/D42393

llvm-svn: 323782
2018-01-30 16:28:01 +00:00
Evandro Menezes 07c78eeeef [AArch64] Add new target feature to handle cheap as move for Exynos
This feature enables special handling of cheap as move in the existing
custom handling specifically for Exynos processors.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323774
2018-01-30 15:40:22 +00:00
Evandro Menezes 9f9daa1f14 [AArch64] Add pipeline model for Exynos M3
Add the scheduling and cost model for Exynos M3.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323773
2018-01-30 15:40:16 +00:00
Evandro Menezes 1589d6e6a3 [AArch64] Change the filename of the Exynos M1 scheduling defs
After request by Matthias Braun in https://reviews.llvm.org/D42387.

llvm-svn: 323686
2018-01-29 20:22:24 +00:00
Jun Bum Lim fc7d56d949 Revert "AArch64: Omit callframe setup/destroy when not necessary"
This reverts commit r322917 due to multiple performance regressions in spec2006
and spec2017. XFAILed llvm/test/CodeGen/AArch64/big-callframe.ll which initially
motivated this change.

llvm-svn: 323683
2018-01-29 19:56:42 +00:00
Daniel Sanders 79cb839fcd [globalisel][legalizer] Adapt LegalizerInfo to support inter-type dependencies and other things.
Summary:
As discussed in D42244, we have difficulty describing the legality of some
operations. We're not able to specify relationships between types.
For example, declaring the following
  setAction({..., 0, s32}, Legal)
  setAction({..., 0, s64}, Legal)
  setAction({..., 1, s32}, Legal)
  setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
  {s32, s32}
  {s64, s32}
  {s32, s64}
  {s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES have relationships between the types that are currently
described incorrectly.
    
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).

It's also difficult to control the legalization strategy. We've added support
for legalizing non-power of 2 types but there's still some hardcoded assumptions
about the strategy. The main one I've noticed is that type0 is always legalized
before type1 which is not a good strategy for `type0 = G_EXTRACT type1, ...` if
you need to widen the container. It will converge on the same result eventually
but it will take a much longer route when legalizing type0 than if you legalize
type1 first.

Lastly, the definition of legality and the legalization strategy is kept
separate which is not ideal. It's helpful to be able to look at a one piece of
code and see both what is legal and the method the legalizer will use to make
illegal MIR more legal.

This patch adds a layer onto the LegalizerInfo (to be removed when all targets
have been migrated) which resolves all these issues.

Here are the rules for shift and division:
  for (unsigned BinOp : {G_LSHR, G_ASHR, G_SDIV, G_UDIV})
    getActionDefinitions(BinOp)
        .legalFor({s32, s64})     // If type0 is s32/s64 then it's Legal
        .clampScalar(0, s32, s64) // If type0 is <s32 then WidenScalar to s32
                                  // If type0 is >s64 then NarrowScalar to s64
        .widenScalarToPow2(0)     // Round type0 scalars up to powers of 2
        .unsupported();           // Otherwise, it's unsupported
This describes everything needed to both define legality and describe how to
make illegal things legal.

Here's an example of a complex rule:
  getActionDefinitions(G_INSERT)
      .unsupportedIf([=](const LegalityQuery &Query) {
        // If type0 is smaller than type1 then it's unsupported
        return Query.Types[0].getSizeInBits() <= Query.Types[1].getSizeInBits();
      })
      .legalIf([=](const LegalityQuery &Query) {
        // If type0 is s32/s64/p0 and type1 is a power of 2 other than 2 or 4 then it's legal
        // We don't need to worry about large type1's because unsupportedIf caught that.
        const LLT &Ty0 = Query.Types[0];
        const LLT &Ty1 = Query.Types[1];
        if (Ty0 != s32 && Ty0 != s64 && Ty0 != p0)
          return false;
        return isPowerOf2_32(Ty1.getSizeInBits()) &&
               (Ty1.getSizeInBits() == 1 || Ty1.getSizeInBits() >= 8);
      })
      .clampScalar(0, s32, s64)
      .widenScalarToPow2(0)
      .maxScalarIf(typeInSet(0, {s32}), 1, s16) // If type0 is s32 and type1 is bigger than s16 then NarrowScalar type1 to s16
      .maxScalarIf(typeInSet(0, {s64}), 1, s32) // If type0 is s64 and type1 is bigger than s32 then NarrowScalar type1 to s32
      .widenScalarToPow2(1)                     // Round type1 scalars up to powers of 2
      .unsupported();
This uses a lambda to say that G_INSERT is unsupported when type0 is bigger than
type1 (in practice, this would be a default rule for G_INSERT). It also uses one
to describe the legal cases. This particular predicate is equivalent to:
  .legalFor({{s32, s1}, {s32, s8}, {s32, s16}, {s64, s1}, {s64, s8}, {s64, s16}, {s64, s32}})

In terms of performance, I saw a slight (~6%) performance improvement when
AArch64 was around 30% ported but it's pretty much break even right now.
I'm going to take a look at constexpr as a means to reduce the initialization
cost.

Future work:
* Make it possible for opcodes to share rulesets. There's no need for
  G_LSHR/G_ASHR/G_SDIV/G_UDIV to have separate rule and ruleset objects. There's
  no technical barrier to this, it just hasn't been done yet.
* Replace the type-index numbers with an enum to get .clampScalar(Type0, s32, s64)
* Better names for things like .maxScalarIf() (clampMaxScalar?) and the vector rules.
* Improve initialization cost using constexpr

Possible future work:
* It's possible to make these rulesets change the MIR directly instead of
  returning a description of how to change the MIR. This should remove a little
  overhead caused by parsing the description and routing to the right code, but
  the real motivation is that it removes the need for LegalizeAction::Custom.
  With Custom removed, there's no longer a requirement that Custom legalization
  change the opcode to something that's considered legal.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner

Reviewed By: bogner

Subscribers: hintonda, bogner, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42251

llvm-svn: 323681
2018-01-29 19:54:49 +00:00
Daniel Sanders 9ade5592d9 [globalisel] Make LegalizerInfo::LegalizeAction available outside of LegalizerInfo. NFC
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.

llvm-svn: 323669
2018-01-29 17:37:29 +00:00
Sander de Smalen a1c259c22c [AArch64][AsmParser] NFC: Generalize LogicalImm[Not](32|64) code
Summary:
All variants of isLogicalImm[Not](32|64) can be combined into a single templated function, same for printLogicalImm(32|64).
By making it use a template instead, further SVE patches can use it for other data types as well (e.g. 8, 16 bits).

Reviewers: fhahn, rengolin, aadg, echristo, kristof.beyls, samparker

Reviewed By: samparker

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42294

llvm-svn: 323646
2018-01-29 13:05:38 +00:00
Oliver Stannard a9d2e004d2 [AArch64] Generate the CASP instruction for 128-bit cmpxchg
The Large System Extension added an atomic compare-and-swap instruction
that operates on a pair of 64-bit registers, which we can use to
implement a 128-bit cmpxchg.

Because i128 is not a legal type for AArch64 we have to do all of the
instruction selection in C++, and the instruction requires even/odd
register pairs, so we have to wrap it in REG_SEQUENCE and EXTRACT_SUBREG
nodes. This is very similar to what we do for 64-bit cmpxchg in the ARM
backend.

Differential revision: https://reviews.llvm.org/D42104

llvm-svn: 323634
2018-01-29 09:18:37 +00:00
Craig Topper 8f324bb1a4 [SelectionDAGISel] Add a debug print before call to Select. Adjust where blank lines are printed during isel process to make things more sensibly grouped.
Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.

It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.

There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.

llvm-svn: 323551
2018-01-26 19:34:20 +00:00
Joel Jones 0715092c65 [AArch64] Enable aggressive FMA on T99 and provide AArch64 options for others.
This patch enables aggressive FMA by default on T99, and provides a -mllvm
option to enable the same on other AArch64 micro-arch's (-mllvm
-aarch64-enable-aggressive-fma).

Test case demonstrating the effects on T99 is included.

Patch by: steleman (Stefan Teleman)

Differential Revision: https://reviews.llvm.org/D40696

llvm-svn: 323474
2018-01-25 21:55:39 +00:00
Amara Emerson 4f84f8862b [AArch64][GlobalISel] Fall back during AArch64 isel if we have a volatile load.
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.

There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.

Fixes/works around PR36018.

llvm-svn: 323371
2018-01-24 20:35:37 +00:00
Pablo Barrio 9b3d4c01a0 [AArch64] Avoid unnecessary vector byte-swapping in big-endian
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:

%1 = load <4 x half>, <4 x half>* %p

results in the following assembly:

ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h

This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:

ld1 { v0.4h }, [x1]

Reviewers: olista01, SjoerdMeijer, efriedma

Reviewed By: efriedma

Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D42235

llvm-svn: 323325
2018-01-24 14:13:47 +00:00
Matthias Braun 70fd374d1e AArch64: Cyclone: Remove SlowMisaligned128Store tuning flag
Remove FeatureSlowMisaligned128Store from cyclone flags.
This flag causes splitting of 16 byte wide stores into 2 stored of 8
bytes. This was useful on older apple CPUs which were slow for 16byte
stores that were not aligned on 16byte. As the compiler often cannot
predict the actual alignment, the splitting was choosen.

This has been a topic for a lot of debate as the splitting also
decreases performance for some benchmarks. Measuring the effects on
newer apple chips (rdar://35525421) shows that it harms more cases than
it helps. So it is time to retire this workaround.

llvm-svn: 323289
2018-01-24 00:39:53 +00:00
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
Evandro Menezes 312443fd83 [AArch64] Create a separate feature set for Exynos M3
Distinguish the features from Exynos M2.

llvm-svn: 323139
2018-01-22 19:03:26 +00:00
Sander de Smalen 7ab96f534c [AArch64][SVE] Asm: PTRUE and PTRUES instructions
Summary: These instructions initialize a predicate vector from a pattern/immediate.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01

Reviewed By: samparker

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41819

llvm-svn: 323124
2018-01-22 15:29:19 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Sander de Smalen 245e0e67f3 [AArch64][SVE] Asm: Predicate patterns
Summary:
This patch adds support for parsing/printing of named or unnamed
patterns that are used in SVE's PTRUE instruction, amongst others.

The pattern can be specified as a named pattern to initialize the predicate
vector or it can be specified as an immediate in the range 0-31.

Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41818

llvm-svn: 323098
2018-01-22 10:46:00 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Carey Williams 22c49c6470 Test commit
llvm-svn: 322958
2018-01-19 16:55:23 +00:00
Sander de Smalen 909cf956a1 [AArch64][SVE] Asm: Add support for RDVL/ADDVL/ADDPL instructions
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41900

llvm-svn: 322951
2018-01-19 15:22:00 +00:00
Matthias Braun 5c290dc206 AArch64: Fix emergency spillslot being out of reach for large callframes
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.

Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322919
2018-01-19 03:16:36 +00:00
Matthias Braun dc4b3e87f4 AArch64: Omit callframe setup/destroy when not necessary
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

- Fixes an invalid callframe nesting for byval arguments, which would
  look like this before this patch (as in `big-byval.ll`):
    ...
    ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
    ...
    ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
    ...
    BL &memcpy ...
    ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
    ...
    BL &extfunc
    ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
  changes).

Differential Revision: https://reviews.llvm.org/D42006

llvm-svn: 322917
2018-01-19 02:45:38 +00:00
Amara Emerson d5785775f8 [AArch64][GlobalISel] Add isel support for global values in the large code model.
Fixes PR35958.

Differential Revision: https://reviews.llvm.org/D42175

llvm-svn: 322878
2018-01-18 19:21:27 +00:00
Reid Kleckner 1aa9061c5f [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.

While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.

llvm-svn: 322788
2018-01-17 23:55:23 +00:00
Volkan Keles a79b0620a0 Add a TargetOption to enable/disable GlobalISel
Summary:
This patch adds a new target option in order to control GlobalISel.
This will allow the users to enable/disable GlobalISel prior to the
backend by calling `TargetMachine::setGlobalISel(bool Enable)`.

No test case as there is already a test to check GlobalISel
command line options.
See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll.

Reviewers: qcolombet, aemerson, ab, dsanders

Reviewed By: qcolombet

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42137

llvm-svn: 322773
2018-01-17 22:34:21 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Pablo Barrio f2c29571da [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Reviewers: craig.topper, jmolloy, olista01

Reviewed By: olista01

Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41863

llvm-svn: 322663
2018-01-17 14:39:29 +00:00
Volkan Keles f7f2568613 [GlobalISel][TableGen] Add support for SDNodeXForm
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.

Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.

def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
                       GISDNodeXFormEquiv<imm8_xform>;

Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);

Reviewers: dsanders, ab, rovka

Reviewed By: dsanders

Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet

Differential Revision: https://reviews.llvm.org/D42012

llvm-svn: 322582
2018-01-16 18:44:05 +00:00
Sander de Smalen 5aa809db79 [AArch64][AsmParser] Cleanup isSImm7s4, isSImm7s8, (etc) functions.
Reviewers: fhahn, rengolin, t.p.northover, echristo, olista01, samparker

Reviewed By: fhahn, samparker

Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41899

llvm-svn: 322481
2018-01-15 12:47:17 +00:00
Jessica Paquette 757e120379 [MachineOutliner] Move hasAddressTaken check to MachineOutliner.cpp
*Mostly* NFC. Still updating the test though just for completeness.

This moves the hasAddressTaken check to MachineOutliner.cpp and replaces it
with a per-basic block test rather than a per-function test. The old test was
too conservative and was preventing functions in C programs from being
outlined even though they were safe to outline.

This was mostly a problem in C sources.

llvm-svn: 322425
2018-01-13 00:42:28 +00:00
Evandro Menezes 2e05279399 [AArch64] Fix scheduling resources for post indexed loads and stores
Fix typos in the default scheduling resources when using the post indexed
addressing modes.

Differential revision: https://reviews.llvm.org/D40511

llvm-svn: 322392
2018-01-12 19:20:11 +00:00
Evgeniy Stepanov 99fa3e774d [hwasan] Stack instrumentation.
Summary:
Very basic stack instrumentation using tagged pointers.
Tag for N'th alloca in a function is built as XOR of:
 * base tag for the function, which is just some bits of SP (poor
   man's random)
 * small constant which is a function of N.

Allocas are aligned to 16 bytes. On every ReturnInst allocas are
re-tagged to catch use-after-return.

This implementation has a bunch of issues that will be taken care of
later:
1. lifetime intrinsics referring to tagged pointers are not
   recognized in SDAG. This effectively disables stack coloring.
2. Generated code is quite inefficient. There is one extra
   instruction at each memory access that adds the base tag to the
   untagged alloca address. It would be better to keep tagged SP in a
   callee-saved register and address allocas as an offset of that XOR
   retag, but that needs better coordination between hwasan
   instrumentation pass and prologue/epilogue insertion.
3. Lifetime instrinsics are ignored and use-after-scope is not
   implemented. This would be harder to do than in ASan, because we
   need to use a differently tagged pointer depending on which
   lifetime.start / lifetime.end the current instruction is dominated
   / post-dominated.

Reviewers: kcc, alekseyshl

Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D41602

llvm-svn: 322324
2018-01-11 22:53:30 +00:00
Joel Jones 90a60501c3 [AArch64] Remove Unsupported = 1 flag for the WriteAtomic WriteRes.
In practice, this patch has no effect on scheduling.

There is no test case as there already exists a comprehensive test case for
LSE Atomics.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D40694

llvm-svn: 322291
2018-01-11 16:50:56 +00:00
Matthias Braun e3a8db7ba1 Revert "AArch64: Fix emergency spillslot being out of reach for large callframes"
Revert for now as the testcase is hitting a pre-existing verifier error
that manifest as a failure when expensive checks are enabled (or
-verify-machineinstrs) is used.

This reverts commit r322200.

llvm-svn: 322231
2018-01-10 22:36:28 +00:00
Jessica Paquette c191f1097c [MachineOutliner] Outline ADRPs
ADRP instructions weren't being outlined because they're PC-relative and thus
fail the LR checks. This patch adds a special case for ADRPs to
getOutliningType to make sure that ADRPs can be outlined and updates the MIR
test.

llvm-svn: 322207
2018-01-10 18:49:57 +00:00
Matthias Braun b42ffa1283 AArch64: Fix emergency spillslot being out of reach for large callframes
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322200
2018-01-10 18:16:24 +00:00
Sander de Smalen a7ec090eaa [AArch64][SVE] Asm: Add support for (mov|dup) of scalar
Summary: This patch adds support for 'dup' (Scalar -> SVE) and its corresponding 'mov' alias.

Reviewers: fhahn, rengolin, evandro, echristo

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41822

llvm-svn: 322172
2018-01-10 11:32:47 +00:00
Sander de Smalen 886510f350 [TableGen][AsmMatcherEmitter] Generate assembler checks for tied operands
Summary:
This extends TableGen's AsmMatcherEmitter with code that generates
a table with tied-operand constraints. The constraints are checked
when parsing the instruction. If an operand is not equal to its tied operand,
the assembler will give an error.

Patch [2/3] in a series to add operand constraint checks for SVE's predicated ADD/SUB.

Reviewers: olista01, rengolin, mcrosier, fhahn, craig.topper, evandro, echristo

Reviewed By: fhahn

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D41446

llvm-svn: 322166
2018-01-10 10:10:56 +00:00
Sander de Smalen 906a5deace Recommit r322073: [AArch64][SVE] Asm: Add predicated ADD/SUB instructions
Fixed issue that was found on sanitizer-x86_64-linux-fast.
I changed the result type of 'Parser.getTok().getString().lower()'
in AArch64AsmParser::tryParseSVEPredicateVector() from 'StringRef' to
'auto', since StringRef::lower() returns a std::string.

llvm-svn: 322092
2018-01-09 17:01:27 +00:00
Sander de Smalen 6595603187 Reverted r322073 because of AddressSanitizer failure on
sanitizer-x86_64-linux-fast builder.

llvm-svn: 322077
2018-01-09 13:51:09 +00:00
Sander de Smalen 1f97363e5f [AArch64][SVE] Asm: Add predicated ADD/SUB instructions
Summary:
Add the predicated ADD/SUB instructions and corresponding tests.

Patch [3/3] in a series to add predicated ADD/SUB instructions for SVE.

Reviewers: rengolin, mcrosier, evandro, fhahn, echristo

Reviewed By: fhahn

Subscribers: aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41443

llvm-svn: 322073
2018-01-09 12:43:46 +00:00
Sander de Smalen 7868e74033 [AArch64][SVE] Asm: Add parsing of merging/zeroing suffix for SVE predicate vector operands
Summary:
Parsing of the '/m' (merging) or '/z' (zeroing) suffix of a predicate operand.

Patch [2/3] in a series to add predicated ADD/SUB instructions for SVE.

Reviewers: rengolin, mcrosier, evandro, fhahn, echristo, MatzeB, t.p.northover

Reviewed By: fhahn

Subscribers: t.p.northover, MatzeB, aemerson, javed.absar, tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D41442

llvm-svn: 322070
2018-01-09 11:17:06 +00:00
Jessica Paquette 3291e7353e [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixups
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.

Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if

- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block

If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.

The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say

%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy

are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
  

llvm-svn: 322048
2018-01-09 00:26:18 +00:00
Reid Kleckner 5619669a5a Fix -Wsign-compare warnings on Windows
These arise because enums are 'int' by default.

llvm-svn: 321887
2018-01-05 19:53:51 +00:00