Commit Graph

2986 Commits

Author SHA1 Message Date
Sjoerd Meijer 195e904002 [ARM][AArch64] Armv8.4-A Enablement
Initial patch adding assembly support for Armv8.4-A.

Besides adding v8.4 as a supported architecture to the usual places, this also
adds target features for the different crypto algorithms. Armv8.4-A introduced
new crypto algorithms, made them optional, and allows different combinations:

- none of the v8.4 crypto functions are supported, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- the v8.4 SHA512 and SHA3 support is implemented, in this case the Armv8.0
  SHA1 and SHA2 instructions must also be implemented.
- the v8.4 SM3 and SM4 support is implemented, which is independent of the
  implementation of the Armv8.0 SHA1 and SHA2 instructions.
- all of the v8.4 crypto functions are supported, in this case the Armv8.0 SHA1
  and SHA2 instructions must also be implemented.

The v8.4 crypto instructions are added to AArch64 only, and not AArch32,
and are made optional extensions to Armv8.2-A.

The user-facing Clang options will map on these new target features, their
naming will be compatible with GCC and added in follow-up patches.

The Armv8.4-A instruction sets can be downloaded here:
https://developer.arm.com/products/architecture/a-profile/exploration-tools

Differential Revision: https://reviews.llvm.org/D48625

llvm-svn: 335953
2018-06-29 08:43:19 +00:00
Jessica Paquette dafa198c96 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Matthias Braun da5e7e11d1 SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Daniel Sanders bdeb880d14 [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64
Now that we have the ability to legalize based on MMO's. Add support for
legalizing based on AtomicOrdering and use it to correct the legalization
of the atomic instructions.

Also extend all() to be a variadic template as this ruleset now requires
3 and 4 argument versions.

llvm-svn: 335767
2018-06-27 19:03:21 +00:00
Jessica Paquette f472f6159a [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across
It isn't safe to outline sequences of instructions where x16/x17/nzcv live
across the sequence.

This teaches the outliner to check whether or not a specific canidate has
x16/x17/nzcv live across it and discard the candidate in the case that that is
true.

https://bugs.llvm.org/show_bug.cgi?id=37573
https://reviews.llvm.org/D47655

llvm-svn: 335758
2018-06-27 17:43:27 +00:00
Luke Geeson 316327150b [AArch64] Reverting FP16 vcvth_n_s64_f16 to fix
llvm-svn: 335737
2018-06-27 14:34:40 +00:00
Adhemerval Zanella cadcfed7aa [AArch64] Add custom lowering for v4i8 trunc store
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

  define void @foo(<4 x i16> %x, i8* nocapture %p) {
    %0 = trunc <4 x i16> %x to <4 x i8>
    %1 = bitcast i8* %p to <4 x i8>*
    store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
    ret void
  }

Can be optimized from:

  umov    w8, v0.h[3]
  umov    w9, v0.h[2]
  umov    w10, v0.h[1]
  umov    w11, v0.h[0]
  strb    w8, [x0, #3]
  strb    w9, [x0, #2]
  strb    w10, [x0, #1]
  strb    w11, [x0]
  ret

To:

  xtn     v0.8b, v0.8h
  str     s0, [x0]
  ret

The patch also adjust the memory cost for autovectorization, so the C
code:

  void foo (const int *src, int width, unsigned char *dst)
  {
    for (int i = 0; i < width; i++)
       *dst++ = *src++;
  }

can be vectorized to:

  .LBB0_4:                                // %vector.body
                                          // =>This Inner Loop Header: Depth=1
        ldr     q0, [x0], #16
        subs    x12, x12, #4            // =4
        xtn     v0.4h, v0.4s
        xtn     v0.8b, v0.8h
        st1     { v0.s }[0], [x2], #4
        b.ne    .LBB0_4

Instead of byte operations.

llvm-svn: 335735
2018-06-27 13:58:46 +00:00
Luke Geeson 68cb233c0f [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns
llvm-svn: 335715
2018-06-27 09:20:13 +00:00
Simon Pilgrim 9c8f9374b5 [CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.

This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.

I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.

Differential Revision: https://reviews.llvm.org/D48172

llvm-svn: 335329
2018-06-22 09:45:31 +00:00
Sirish Pande b60acb9e48 Revert "[AArch64] Coalesce Copy Zero during instruction selection"
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.

Original Commit:

Author: Haicheng Wu <haicheng@codeaurora.org>
Date:   Sun Feb 18 13:51:33 2018 +0000

    [AArch64] Coalesce Copy Zero during instruction selection

    Add special case for copy of zero to avoid a double copy.

    Differential Revision: https://reviews.llvm.org/D36104

Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.

If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.

This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.

llvm-svn: 335251
2018-06-21 16:05:24 +00:00
Tim Northover 70666e7765 [AArch64] Implement FLT_ROUNDS macro.
Very similar to ARM implementation, just maps to an MRS.

Should fix PR25191.

Patch by Michael Brase.

llvm-svn: 335118
2018-06-20 12:09:01 +00:00
Vlad Tsyrklevich 98724e582e Revert r334980 and 334983
This reverts commits r334980 and r334983 because they were causing build
timeouts on the x86_64-linux-ubsan bot.

llvm-svn: 335085
2018-06-20 00:02:32 +00:00
Jessica Paquette 32de26d432 [MachineOutliner] NFC: Remove insertOutlinerPrologue, rename insertOutlinerEpilogue
insertOutlinerPrologue was not used by any target, and prologue-esque code was
beginning to appear in insertOutlinerEpilogue. Refactor that into one function,
buildOutlinedFrame.

This just removes insertOutlinerPrologue and renames insertOutlinerEpilogue.

llvm-svn: 335076
2018-06-19 21:14:48 +00:00
Sander de Smalen 067eee1c13 [AArch64][SVE] Asm: Fix predicate pattern diagnostics.
This patch uses the DiagnosticPredicate for SVE predicate patterns
to improve their diagnostics, now giving a 'invalid operand' diagnostic
if the type is not an immediate or one of the expected pattern
labels.

Reviewers: samparker, SjoerdMeijer, javed.absar, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48220

llvm-svn: 334983
2018-06-18 21:03:02 +00:00
Sander de Smalen 7ac9e193ec [AArch64][SVE] Asm: Support for saturating INC/DEC (32bit scalar) instructions.
The variants added by this patch are:
- SQINC     signed increment, e.g. sqinc x0, w0, all, mul #4
- SQDEC     signed decrement, e.g. sqdec x0, w0, all, mul #4
- UQINC   unsigned increment, e.g. uqinc w0, all, mul #4
- UQDEC   unsigned decrement, e.g. uqdec w0, all, mul #4
 
This patch includes asmparser changes to parse a GPR64 as a GPR32 in
order to satisfy the constraint check:
  x0 == GPR64(w0)
in:
  sqinc x0, w0, all, mul #4
         ^___^ (must match)

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47716

llvm-svn: 334980
2018-06-18 20:50:33 +00:00
Sander de Smalen 13684d8400 [AArch64][SVE] Asm: Support for saturating INC/DEC (64bit scalar) instructions.
Summary:
The variants added by this patch are:
- SQINC  (signed increment)
- UQINC  (unsigned increment)
- SQDEC  (signed decrement)
- UQDEC  (unsigned decrement)

For example:
  uqincw  x0, all, mul #4

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Differential Revision: https://reviews.llvm.org/D47715

llvm-svn: 334948
2018-06-18 14:47:52 +00:00
Sander de Smalen d521c4353e [AArch64][SVE] Asm: Support for vector element compares.
This patch adds instructions for comparing elements from two vectors, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.s

and also adds support for comparing to a 64-bit wide element vector, e.g.
  cmpgt p0.s, p0/z, z0.s, z1.d

The patch also contains aliases for certain comparisons, e.g.:
  cmple p0.s, p0/z, z0.s, z1.s => cmpge p0.s, p0/z, z1.s, z0.s
  cmplo p0.s, p0/z, z0.s, z1.s => cmphi p0.s, p0/z, z1.s, z0.s
  cmpls p0.s, p0/z, z0.s, z1.s => cmphs p0.s, p0/z, z1.s, z0.s
  cmplt p0.s, p0/z, z0.s, z1.s => cmpgt p0.s, p0/z, z1.s, z0.s

llvm-svn: 334931
2018-06-18 10:59:19 +00:00
Sander de Smalen 279b7e74e7 [AArch64][SVE] Asm: Support for bitwise operations on predicate vectors.
This patch adds support for instructions performing bitwise operations
on predicate vectors, including AND, BIC, EOR, NAND, NOR, ORN, ORR, and
their status flag setting variants ANDS, BICS, EORS, NANDS, ORNS, ORRS.

This patch also adds several aliases:

  orr  p0.b, p1/z, p1.b, p1.b  => mov  p0.b, p1.b
  orrs p0.b, p1/z, p1.b, p1.b  => movs p0.b, p1.b

  and  p0.b, p1/z, p2.b, p2.b  => mov  p0.b, p1/z, p2.b
  ands p0.b, p1/z, p2.b, p2.b  => movs p0.b, p1/z, p2.b

  eor  p0.b, p1/z, p2.b, p1.b  => not  p0.b, p1/z, p2.b
  eors p0.b, p1/z, p2.b, p1.b  => nots p0.b, p1/z, p2.b

llvm-svn: 334906
2018-06-17 10:48:21 +00:00
Sander de Smalen 2c25b4cd36 [AArch64][SVE] Asm: Support for SEL (vector/predicate) instructions.
Support for SVE's predicated select instructions to select elements
from either vector, both in a data-vector and a predicate-vector
variant.

llvm-svn: 334905
2018-06-17 10:11:04 +00:00
Sander de Smalen a6edca72ba [AArch64][SVE] Asm: Support for CPY SIMD/FP and GPR instructions.
Predicated splat/copy of SIMD/FP register or general purpose
register to SVE vector, along with MOV-aliases.

llvm-svn: 334842
2018-06-15 16:39:46 +00:00
Sander de Smalen 18ac8f9f25 [AArch64][SVE] Asm: Support for INC/DEC (scalar) instructions.
Increment/decrement scalar register by (scaled) element count given by
predicate pattern, e.g. 'incw x0, all, mul #4'.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47713

llvm-svn: 334838
2018-06-15 15:47:44 +00:00
Sander de Smalen 5eb51d7495 [AArch64][SVE] Asm: Support for FADD, FMUL and FMAX immediate instructions.
Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47712

llvm-svn: 334831
2018-06-15 13:57:51 +00:00
Sander de Smalen 3cbf171479 [AArch64][SVE] Asm: Add parsing/printing support for exact FP immediates.
Some instructions require of a limited set of FP immediates as operands,
for example '#0.5 or #1.0' for SVE's FADD instruction.

This patch adds support for parsing and printing such FP immediates as
exact values (e.g. #0.499999 is not accepted for #0.5).

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47711

llvm-svn: 334826
2018-06-15 13:11:49 +00:00
Clement Courbet 5eeed77f87 [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Petr Hosek 7250908016 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Luke Geeson dc82aa44e6 [AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns
llvm-svn: 334488
2018-06-12 09:35:20 +00:00
Clement Courbet f4f6899cdf [ExynosM1][Sched] Fix resource usage in scheduling model.
This is part of https://reviews.llvm.org/D46356.

llvm-svn: 334391
2018-06-11 07:33:08 +00:00
Evandro Menezes b2c8244715 [AArch64, ARM] Add support for Samsung Exynos M4
Create a separate feature set for Exynos M4 and add test cases.

llvm-svn: 334115
2018-06-06 18:56:00 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Jessica Paquette aa087327ce [MachineOutliner] NFC - Move intermediate data structures to MachineOutliner.h
This is setting up to fix bug 37573 cleanly.

This moves data structures that are technically both used in some way by the
target and the general-purpose outlining algorithm into MachineOutliner.h. In
particular, the `Candidate` class is of importance.

Before, the outliner passed the locations of `Candidates` to the target, which
would then make some decisions about the prospective outlined function. This
change allows us to just pass `Candidates` along to the target. This will allow
the target to discard `Candidates` that would be considered unsafe before cost
calculation. Thus, we will be able to remove the unsafe candidates described in
the bug without resorting to torching the entire prospective function.

Also, as a side-effect, it makes the outliner a bit cleaner.

https://bugs.llvm.org/show_bug.cgi?id=37573

llvm-svn: 333952
2018-06-04 21:14:16 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Luke Geeson 43e4367961 [AArch64] Audit on rL333634 to fix FP16 Disasm BitPatterns
llvm-svn: 333879
2018-06-04 09:41:32 +00:00
Sander de Smalen d0a6f6a502 [AArch64][SVE] Fix range for DUP immediates (16bit elts)
For immediates used in DUP instructions that have the range
-128 to 127, or a multiple of 256 in the range -32768 to 32512,
one could argue that when the result element size is 16bits (.h),
the value can be considered both signed and unsigned.

Reviewers: rengolin, fhahn, SjoerdMeijer, samparker, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47619

llvm-svn: 333873
2018-06-04 07:24:23 +00:00
Sander de Smalen fd54a781f6 [AArch64][SVE] Asm: Print indexed element 0 as FPR.
Print the first indexed element as a FP register, for example:

  mov z0.d, z1.d[0]

Is now printed as:

  mov z0.d, d1

Next to printing, this patch also adds aliases to parse 'mov z0.d, d1'.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47571

llvm-svn: 333872
2018-06-04 07:07:35 +00:00
Sander de Smalen c33d668ab7 [AArch64][SVE] Asm: Support for indexed DUP instructions.
Unpredicated copy of indexed SVE element to SVE vector,
along with MOV-aliases.

For example:

  dup     z0.h, z1.h[0]

duplicates the first 16-bit element from z1 to all elements in
the result vector z0.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47570

llvm-svn: 333871
2018-06-04 06:40:55 +00:00
Sander de Smalen 367a53b059 [AArch64][SVE] Asm: Support for FCPY immediate instructions.
Predicated copy of floating-point immediate value to SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47518

llvm-svn: 333869
2018-06-04 05:58:06 +00:00
Sander de Smalen 512d57f1a5 [AArch64][SVE] Asm: Support for CPY immediate instructions
Predicated copy of possibly shifted immediate value into SVE
vector, along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47517

llvm-svn: 333868
2018-06-04 05:40:46 +00:00
Amara Emerson 5a3bb68e12 [AArch64][GlobalISel] Zero-extend s1 values when returning.
Before we were relying on the any extend of the s1 to s32, but
for AAPCS we need to zero-extend it to at least s8.

Fixes PR36719

Differential Revision: https://reviews.llvm.org/D47425

llvm-svn: 333747
2018-06-01 13:20:32 +00:00
Sander de Smalen f95ea047e5 [AArch64][SVE] Asm: Support for FDUP_ZI (copy fp immediate) instruction.
Unpredicated copy of floating-point immediate value into SVE vector,
along with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47482

llvm-svn: 333744
2018-06-01 12:54:46 +00:00
Sander de Smalen 97ca6b9e09 [AArch64][SVE] Asm: Support for DUPM (masked immediate) instruction.
Unpredicated copy of repeating immediate pattern to SVE vector, along
with MOV-aliases.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47328

llvm-svn: 333731
2018-06-01 07:25:46 +00:00
Francis Visoiu Mistrih 90aba024c5 [MC] Fallback on DWARF when generating compact unwind on AArch64
Instead of asserting when using the def_cfa directive with a register
different from fp, fallback on DWARF.

Easily triggered with:

.cfi_def_cfa x1, 32;

rdar://40249694

Differential Revision: https://reviews.llvm.org/D47593

llvm-svn: 333667
2018-05-31 16:33:26 +00:00
Luke Geeson 2e09995d42 [AArch64] Reverted rL333427 fixing Clang UnitTest Failure
llvm-svn: 333634
2018-05-31 08:27:53 +00:00
Roman Tereshin 5a65eb75c7 [GlobalISel][AArch64] LegalizerInfo verifier: Fixing bugs exposed by LegalizerInfo::verify(...)
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333618
2018-05-31 01:56:05 +00:00
Roman Tereshin 8f1753e994 [GlobalISel][AArch64] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call w/o fixing bugs
This is to make it clear what kind of bugs the LegalizerInfo::verifier
is able to catch and test its output

Reviewers: aemerson, qcolombet

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46338

llvm-svn: 333597
2018-05-30 22:10:04 +00:00
Tim Northover d8949f5002 AArch64: print correct annotation for ADRP addresses.
The immediate on an ADRP MCInst needs to be multiplied by 0x1000 to obtain the
actual PC-offset that will be calculated.

llvm-svn: 333525
2018-05-30 09:54:59 +00:00
Sander de Smalen bdf09fe7a2 [AArch64][AsmParser] Fix segfault on illegal fpimm.
Floating point immediate combining a negative sign and
a hexadecimal number, e.g. #-0x0  caused the compiler to crash.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: javed.absar

Differential Revision: https://reviews.llvm.org/D47483

llvm-svn: 333524
2018-05-30 09:54:19 +00:00
Evandro Menezes f8425340e4 [AArch64] Fix PR32384: bump up the number of stores per memset and memcpy
As suggested in https://bugs.llvm.org/show_bug.cgi?id=32384#c1, this change
makes the inlining of `memset()` and `memcpy()` more aggressive when
compiling for speed.  The tuning remains the same when optimizing for size.

Patch by: Sebastian Pop <s.pop@samsung.com>
          Evandro Menezes <e.menezes@samsung.com>

Differential revision: https://reviews.llvm.org/D45098

llvm-svn: 333429
2018-05-29 15:58:50 +00:00
Amara Emerson d5a9e7bbc9 Revert "[AArch64] added FP16 vcvth intrinsic support"
This reverts commit r333410 due to bot failures.

llvm-svn: 333427
2018-05-29 15:34:22 +00:00
Sander de Smalen 8704b03c4d [AArch64][SVE] Asm: Support for predicated LSL/LSR (vectors)
Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D47365

llvm-svn: 333422
2018-05-29 14:40:24 +00:00
Sander de Smalen 26b9b2a8c3 [AArch64][SVE] Asm: Support for AND, ORR, EOR and BIC instructions.
This patch addresses the following variants:
  - bitmask immediate,         e.g. 'and z0.d, z0.d, #0x6'.
  - unpredicated data vectors, e.g. 'and z0.d, z1.d, z2.d'.
  - predicated data vectors,   e.g. 'and z0.d, p0/m, z0.d, z1.d'.

And also several aliases, such as: 
  - ORN, alias of ORR.
  - EON, alias of EOR.
  - BIC, alias of AND (immediate variant)
  - MOV, alias of ORR (if unpredicated and source register operands are the same)

Reviewers: rengolin, huntergr, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47363

llvm-svn: 333414
2018-05-29 13:08:43 +00:00
Luke Geeson 16092ab3c5 [AArch64] added FP16 vcvth intrinsic support
Summary: Change-Id: I0df845749c7689dfc99150ba7c19c7d0dadbd705

Reviewers: javed.absar, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: llvm-commits, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46311

llvm-svn: 333410
2018-05-29 11:40:33 +00:00
Sander de Smalen 98686c6b15 [AArch64][SVE] Asm: Support for ADD (immediate) instructions.
This patch adds addsub_imm8_opt_lsl_(i8|i16|i32|i64) operands
that are unsigned values in the range 0 to 255. For element widths of
16 bits or higher it may also be a signed multiple of 256 in the
range 0 to 65280.

Note: This also does some refactoring to reuse convenience function
getShiftedVal<shift>(), and now allows AArch64 scalar 'ADD #-4096' to be
accepted to be mapped to SUB #4096.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47310

llvm-svn: 333408
2018-05-29 10:39:49 +00:00
Sander de Smalen 6e2a5b4cf0 Fix ubsan errors introduced by r333263 re. left-shifting negative values.
llvm-svn: 333270
2018-05-25 11:41:04 +00:00
Sander de Smalen 62770795a5 [AArch64][SVE] Asm: Support for DUP (immediate) instructions.
Unpredicated copy of optionally-shifted immediate to SVE vector,
along with MOV-aliases.

This patch contains parsing and printing support for
cpy_imm8_opt_lsl_(i8|i16|i32|i64). This operand allows a signed value in
the range -128 to +127. For element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512.
For element-width of 8 bits a range of -128 to 255 is accepted, since a copy
of a byte can be considered either signed/unsigned.

Note: This patch renames tryParseAddSubImm() -> tryParseImmWithOptionalShift()
and moves the behaviour of trying to shift a plain immediate by an allowed
shift-value to its addImmWithOptionalShiftOperands() method, so that the
parsing itself is generic and allows immediates from multiple shifted operands.
This is done because an immediate can be divisible by both shifted operands.

Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D47309

llvm-svn: 333263
2018-05-25 09:47:52 +00:00
Eli Friedman 9e177882aa [AArch64] Improve orr+movk sequences for MOVi64imm.
The existing code has three different ways to try to lower a 64-bit
immediate to the sequence ORR+MOVK.  The result is messy: it misses
some possible sequences, and the order of the checks means we sometimes
emit two MOVKs when we only need one.

Instead, just use a simple loop to try all possible two-instruction
ORR+MOVK sequences.

Differential Revision: https://reviews.llvm.org/D47176

llvm-svn: 333218
2018-05-24 19:38:23 +00:00
Geoff Berry 98150e3a62 [AArch64] Take advantage of variable shift/rotate amount implicit mod operation.
Summary:
Optimize code generated for variable shifts/rotates by taking advantage
of the implicit and/mod done on the variable shift amount register.

Resolves bug 27582 and bug 37421.

Reviewers: t.p.northover, qcolombet, MatzeB, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46844

llvm-svn: 333214
2018-05-24 18:29:42 +00:00
Chad Rosier 3f66363139 [CodeGen][AArch64] Use RegUnits to track register aliases. (NFC)
Use RegUnits to track register aliases in AArch64RedundantCopyElimination.

Differential Revision: https://reviews.llvm.org/D47269

llvm-svn: 333107
2018-05-23 17:49:38 +00:00
Alex Bradbury 0a59f18951 [AArch64] Use addAliasForDirective to support data directives
The AArch64 asm parser currently has custom parsing logic for .hword, .word, 
and .xword. Rather than use this custom logic, we can just use 
addAliasForDirective to enable the reuse of AsmParser::parseDirectiveValue.

Differential Revision: https://reviews.llvm.org/D47000

llvm-svn: 333077
2018-05-23 11:17:20 +00:00
Eli Friedman 785acce51d Delete unused variable from r333015.
(The assertion suppressed the unused variable warning on
Release+Asserts builds, so I didn't notice.)

llvm-svn: 333018
2018-05-22 19:38:07 +00:00
Eli Friedman 042dc9e092 [MachineOutliner] Add "thunk" outlining for AArch64.
When we're outlining a sequence that ends in a call, we can save up to
three instructions in the outlined function by turning the call into
a tail-call. I refer to this as thunk outlining because the resulting
outlined function looks like a thunk; suggestions welcome for a better
name.

In addition to making the outlined function shorter, thunk outlining
allows outlining calls which would otherwise be illegal to outline:
we don't need to save/restore LR, so we don't need to prove anything
about the stack access patterns of the callee.

To make this work effectively, I also added
MachineOutlinerInstrType::LegalTerminator to the generic MachineOutliner
code; this allows treating an arbitrary instruction as a terminator in
the suffix tree.

Differential Revision: https://reviews.llvm.org/D47173

llvm-svn: 333015
2018-05-22 19:11:06 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Peter Collingbourne dcd7d6c331 MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Peter Collingbourne 571a3301ae MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Peter Collingbourne e3f652973e Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Peter Collingbourne f7b81db715 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Clement Courbet 8892c7db08 [ExynosM3] Fix scheduling info.
Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 332713
2018-05-18 13:10:41 +00:00
Eli Friedman 4081a57af7 [MachineOutliner] Count savings from outlining in bytes.
Counting the number of instructions is both unintuitive and inaccurate.
On AArch64, this only affects the generated remarks and certain rare
pseudo-instructions, but it will have a bigger impact on other targets.

Differential Revision: https://reviews.llvm.org/D46921

llvm-svn: 332685
2018-05-18 01:52:16 +00:00
Sander de Smalen 75cfa34156 [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+scalar) store instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46680

llvm-svn: 332584
2018-05-17 09:05:41 +00:00
Eli Friedman ddbf6d6514 [MachineOutliner] Don't outline instructions that modify SP.
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.

Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.

Differential Revision: https://reviews.llvm.org/D46920

llvm-svn: 332529
2018-05-16 21:20:16 +00:00
Eli Friedman 02709bcb78 [MachineOutliner] Don't save/restore LR for tail calls.
The cost computation assumes we do this correctly, but the actual
lowering was wrong.

Differential Revision: https://reviews.llvm.org/D46923

llvm-svn: 332514
2018-05-16 19:49:01 +00:00
Sander de Smalen 22176a2242 [AArch64][SVE] Improve diagnostics for vectors with incorrect element-size.
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).

For example:
  add z0.s, z1.s, z2.b      -> invalid element width
               ^_____^
               mismatch

For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.

For example:
  ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
  ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
          ^________________^
               mismatch

For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
  prfw #0, p0, [x0, z0.d]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Without this change, the diagnostic would unnecessarily suggest a
different element size:
  prfw #0, p0, [x0, z0.s]   -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'

Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46688

llvm-svn: 332483
2018-05-16 15:45:17 +00:00
Sirish Pande cabe50a308 [AArch64] Gangup loads and stores for pairing.
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.

Differential Revision https://reviews.llvm.org/D46477

llvm-svn: 332482
2018-05-16 15:36:52 +00:00
Sander de Smalen bbc4e9a4e3 [AArch64][SVE] Asm: Support for gather PRF prefetch instructions
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46686

llvm-svn: 332472
2018-05-16 14:16:01 +00:00
Amara Emerson 0d6a26dffc [GlobalISel][IRTranslator] Split aggregates during IR translation.
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.

This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.

A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.

Patch is based on the original work of Tim Northover.

Differential Revision: https://reviews.llvm.org/D46018

llvm-svn: 332449
2018-05-16 10:32:02 +00:00
Peter Smith c811758da6 [AArch64] Support "S" inline assembler constraint
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is

asm("adrp %0, %1\n\t"
    "add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));

I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.

Fixes PR37180

Differential Revision: https://reviews.llvm.org/D46745

llvm-svn: 332444
2018-05-16 09:33:25 +00:00
Sander de Smalen a680f558be [AArch64][SVE] Asm: Support for structured LD2, LD3 and LD4 (scalar+scalar) load instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46679

llvm-svn: 332442
2018-05-16 09:16:20 +00:00
Sander de Smalen 67f9154964 [AArch64][SVE] Asm: Support for contiguous PRF prefetch instructions.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46682

llvm-svn: 332433
2018-05-16 07:50:09 +00:00
Evandro Menezes 8d522d811a [AArch64] Improve single vector lane unscaled stores
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead.  In this case, also using the unscaled
offset.

Differential revision: https://reviews.llvm.org/D46762

llvm-svn: 332394
2018-05-15 20:41:12 +00:00
Evandro Menezes 14fa2e4fa5 [AArch64] Improve single vector lane stores
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Differential revision: https://reviews.llvm.org/D46655

llvm-svn: 332251
2018-05-14 15:26:35 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Sander de Smalen 93380371bb [AArch64][SVE] Extend parsing of Prefetch operation for SVE.
Reviewers: rengolin, fhahn, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46681

llvm-svn: 332234
2018-05-14 11:54:41 +00:00
Geoff Berry 60460268c0 [AArch64] Fix performPostLD1Combine to check for constant lane index.
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction.  The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:

- an assert when lowering the LD1LANE ISD node since it assumes an
  constant operand

- an assert in isel if the lane index value depends on the
  post-incremented base register

Both of these issues are avoided by simply checking that the lane index
is a constant.

Fixes bug 35822.

Reviewers: t.p.northover, javed.absar

Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D46591

llvm-svn: 332103
2018-05-11 16:25:06 +00:00
Haicheng Wu 0aae2bc260 [CGP] Split large data structres to sink more GEPs
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.

This patch tries to split a large data struct starting from %base like the
following.

Before:
BB0:
  %base     =

BB1:
  %gep0     = gep %base, off0
  %gep1     = gep %base, off1
  %gep2     = gep %base, off2

BB2:
  %load1    = load %gep0
  %load2    = load %gep1
  %load3    = load %gep2

After:
BB0:
  %base     =
  %new_base = gep %base, off0

BB1:
  %new_gep0 = %new_base
  %new_gep1 = gep %new_base, off1 - off0
  %new_gep2 = gep %new_base, off2 - off0

BB2:
  %load1    = load i32, i32* %new_gep0
  %load2    = load i32, i32* %new_gep1
  %load3    = load i32, i32* %new_gep2

In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.

The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.

Differential Revision: https://reviews.llvm.org/D42759

llvm-svn: 332015
2018-05-10 18:27:36 +00:00
Adhemerval Zanella f384bc7166 [AArch64] Improve cost of vector division by constant
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32).  The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.

Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.

llvm-svn: 331873
2018-05-09 12:48:22 +00:00
Daniel Sanders 618437459c Revert r331816 and r331820 - [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Reverting this to see if the clang-cmake-aarch64-global-isel and
clang-cmake-aarch64-quick bots are failing because of this commit.
We know it wasn't r331819.

llvm-svn: 331846
2018-05-09 05:00:17 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Daniel Sanders d24dcdd1f7 [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Reviewed By: aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 331816
2018-05-08 22:26:39 +00:00
Sander de Smalen d8e76494fc [AArch64][SVE] Asm: Support for LD1R load-and-replicate scalar instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46251

llvm-svn: 331758
2018-05-08 10:46:55 +00:00
Sander de Smalen 20eede7093 [AArch64] Disallow vector operand if FPR128 Q register is required.
Patch https://reviews.llvm.org/D41445 changed the behaviour of 'isReg()'
to also return 'true' if the parsed register operand is a vector
register. Code in the AsmMatcher checks if a register is a subclass of the
expected register class. However, even though both parsed registers map
to the same physical register, the 'v' register is of kind 'NeonVector',
where 'q' is of type Scalar, where isSubclass() does not distinguish
between the two cases.

The solution is to use an AsmOperand instead of the register directly,
and use the PredicateMethod to distinguish the two operands.

This fixes for example:
  ldr v0, [x0]    // 'v0' is an invalid operand for this instruction
  ldr q0, [x0]    // valid

Reviewers: aemerson, Gerolf, SjoerdMeijer, javed.absar

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D46310

llvm-svn: 331755
2018-05-08 10:01:04 +00:00
Daniel Sanders f84bc3793e [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Craig Topper 781aa181ab Fix a bunch of places where operator-> was used directly on the return from dyn_cast.
Inspired by r331508, I did a grep and found these.

Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.

llvm-svn: 331577
2018-05-05 01:57:00 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Adhemerval Zanella a57ef17ab6 [AArch64] Custom Lower MULLH{S,U} for v16i8, v8i16, and v4i32
This patch adds a custom lowering for ISD::MULH{S,U} used on divide by
constant optimization (DAGCombiner::BuildSDIV and DAGCombiner::BuildUDIV).

New patterns for smull and umull are added, so AArch64ISD::{S,U}MULL
can be correctly lowered to smull2 and umull2.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46009

llvm-svn: 331522
2018-05-04 14:33:55 +00:00
Martin Storsjo d0b5034b8a [COFF, ARM64] Hook up a few remaining relocations
Differential Revision: https://reviews.llvm.org/D46355

llvm-svn: 331384
2018-05-02 18:24:37 +00:00
Sander de Smalen 659a48cd38 [AArch64][SVE] Asm: Support for LDR/STR fill and spill instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46270

llvm-svn: 331352
2018-05-02 13:32:39 +00:00
Sander de Smalen 57da042e32 [AArch64][SVE] Asm: Support for scatter ST1 store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46248

llvm-svn: 331349
2018-05-02 13:00:30 +00:00
Sander de Smalen 414d2358a4 [AArch64][SVE] Asm: Support for non-temporal, contiguous LDNT1/STNT1 load/store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46269

llvm-svn: 331343
2018-05-02 11:48:49 +00:00
Sander de Smalen c1e44bdfc7 [AArch64][SVE] Asm: Support for LD1RQ load-and-replicate quad-word vector instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46250

llvm-svn: 331339
2018-05-02 08:49:08 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Sander de Smalen 788dc70c78 [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+scalar) store instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46121

llvm-svn: 331260
2018-05-01 13:36:03 +00:00
Daniel Sanders 2de9d4ad5d Fix infinite loop after r331115
There are two separate fixes here:
* The lowering code for non-extending loads should report UnableToLegalize instead of emitting the same instruction.
* The target should not be requesting lowering of non-extending loads.

llvm-svn: 331201
2018-04-30 17:20:01 +00:00
Sander de Smalen 5861c263e0 [AArch64][SVE] Asm: Improve diagnostics for gather loads.
This patch extends the 'isSVEVectorRegWithShiftExtend' function to 
improve diagnostics for SVE's gather load (scalar + vector) addressing 
modes. Instead of always suggesting the 'unscaled' addressing mode, 
the use of DiagnosticPredicate enables a more specific error message
in the context where the scaling is incorrect. For example:

  ld1h z0.d, p0/z, [x0, z0.d, lsl #2]
                                   ^ 
           shift amount should be '1'

Instead of suggesting the packed, unscaled addressing mode:
  expected 'z[0..31].d, (uxtw|sxtw)'

the assembler now suggests using the proper scaling:
  expected 'z[0..31].d, (lsl|uxtw|sxtw) #1'

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46124

llvm-svn: 331162
2018-04-30 07:24:38 +00:00
Sander de Smalen afe1ee2180 [AArch64][AsmParser] NFC: Cleanup of addOperands functions
Most of the add<operandname>Operands() functions are the same
and can be replaced by using a single 'RenderMethod' in
the AArch64InstrFormats.td file. Since many of the scaled
immediates (with different scaling/bits) are the same, most of
these can reuse the same AsmOperandClass.

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D46122

llvm-svn: 331146
2018-04-29 18:18:21 +00:00
Sander de Smalen 50ded90072 [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (vector + imm) load instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, javed.absar

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D46120

llvm-svn: 331145
2018-04-29 17:33:38 +00:00
Daniel Sanders 5eb9f581b6 [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Jessica Paquette 0b6724917a [MachineOutliner] Add defs to calls + don't track liveness on outlined functions
This commit makes it so that if you outline a def of some register, then the
call instruction created by the outliner actually reflects that the register
is defined by the call. It also makes it so that outlined functions don't
have the TracksLiveness property.

Outlined calls shouldn't break liveness assumptions that someone might make.

This also un-XFAILs the noredzone test, and updates the calls test.

llvm-svn: 331095
2018-04-27 23:36:35 +00:00
Daniel Sanders 27fe8a5011 [globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand
Summary:
Currently only the memory size is supported but others can be added as
needed.

narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45466

llvm-svn: 331071
2018-04-27 19:48:53 +00:00
Jun Bum Lim 47aece1344 [CodeGen] Use RegUnits to track register aliases (NFC)
Summary: Use RegUnits to track register aliases in PostRASink and AArch64LoadStoreOptimizer.

Reviewers: thegameg, mcrosier, gberry, qcolombet, sebpop, MatzeB, t.p.northover, javed.absar

Reviewed By: thegameg, sebpop

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45695

llvm-svn: 331066
2018-04-27 18:44:37 +00:00
Francis Visoiu Mistrih c855e92ca9 [AArch64] Place the first ldp at the end when ReverseCSRRestoreSeq is true
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.

This didn't work in case no frame was needed and resulted in code size
regressions.

llvm-svn: 331044
2018-04-27 15:30:54 +00:00
Oliver Stannard 76088a5929 [AArch64] Codegen for v8.2A dot product intrinsics
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.

Differential revisioon: https://reviews.llvm.org/D46107

llvm-svn: 331036
2018-04-27 13:45:32 +00:00
Eli Friedman da018e5687 [MachineOutliner] Don't outline from functions with a section marking.
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.

I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.

Differential Revision: https://reviews.llvm.org/D46091

llvm-svn: 331007
2018-04-27 00:21:34 +00:00
Geoff Berry 08ab8c9544 [AArch64] Fix scavenged spill slot base when stack realignment required.
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.

Based on problem and patch reported by @paulwalker-arm

This is an alternative to solution proposed in D45770

Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar

Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46063

llvm-svn: 330976
2018-04-26 18:50:45 +00:00
Matthew Simpson b4096ebe26 [TTI, AArch64] Add transpose shuffle kind
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.

Differential Revision: https://reviews.llvm.org/D45982

llvm-svn: 330941
2018-04-26 13:48:33 +00:00
Sander de Smalen fe17a78b86 [AArch64][SVE] Enable DiagnosticPredicates for SVE LD1 instructions.
This patch extends the PredicateMethod of AsmOperands used in SVE's 
LD1 instructions with a DiagnosticPredicate. This makes them 'context
sensitive' to the operand that has been parsed and tells the user to
use the right register (with expected shift/extend), rather than telling 
the immediate is out of range when it actually parsed a register.

Patch [2/2] in a series to improve assembler diagnostics for SVE:
-  Patch [1/2]: https://reviews.llvm.org/D45879
-  Patch [2/2]: https://reviews.llvm.org/D45880

Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45880

llvm-svn: 330934
2018-04-26 12:54:42 +00:00
Sander de Smalen 74f9e6720b [AArch64][SVE] Asm: Support for gather LD1/LDFF1 (scalar + vector) load instructions.
Patch [2/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D46023

llvm-svn: 330928
2018-04-26 08:19:53 +00:00
Amara Emerson 1f5d994119 [AArch64][GlobalISel] Implement selection for the llvm.trap intrinsic.
rdar://38674040

llvm-svn: 330831
2018-04-25 14:43:59 +00:00
Sander de Smalen eb896b148b [AArch64][SVE] Asm: Add AsmOperand classes for SVE gather/scatter addressing modes.
This patch adds parsing support for 'vector + shift/extend' and
corresponding asm operand classes, needed for implementing SVE's
gather/scatter addressing modes.

The added combinations of vector (ZPR) and Shift/Extend are:

Unscaled:
  ZPR64ExtLSL8:           signed 64-bit offsets  (z0.d)
  ZPR32ExtUXTW8:        unsigned 32-bit offsets  (z0.s, uxtw)
  ZPR32ExtSXTW8:          signed 32-bit offsets  (z0.s, sxtw)

Unpacked and unscaled:
  ZPR64ExtUXTW8:        unsigned 32-bit offsets  (z0.d, uxtw)
  ZPR64ExtSXTW8:          signed 32-bit offsets  (z0.d, sxtw)

Unpacked and scaled:
  ZPR64ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.d, uxtw #<shift>)
  ZPR64ExtSXTW<scale>:    signed 32-bit offsets  (z0.d, sxtw #<shift>)

Scaled:
  ZPR32ExtUXTW<scale>:  unsigned 32-bit offsets  (z0.s, uxtw #<shift>)
  ZPR32ExtSXTW<scale>:    signed 32-bit offsets  (z0.s, sxtw #<shift>)
  ZPR64ExtLSL<scale>:   unsigned 64-bit offsets  (z0.d,  lsl #<shift>)
  ZPR64ExtLSL<scale>:     signed 64-bit offsets  (z0.d,  lsl #<shift>)


Patch [1/3] in series to add support for SVE's gather load instructions
that use scalar+vector addressing modes:
- Patch [1/3]: https://reviews.llvm.org/D45951
- Patch [2/3]: https://reviews.llvm.org/D46023
- Patch [3/3]: https://reviews.llvm.org/D45958

Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45951

llvm-svn: 330805
2018-04-25 09:26:47 +00:00
Jessica Paquette 4f56428de1 [MachineOutliner] Check for explicit uses of LR/W30 in MI operands
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.

This also adds a test that ensures that those sorts of ADRPs won't be outlined.

llvm-svn: 330783
2018-04-24 22:38:15 +00:00
Sander de Smalen eb1053f9d3 [AArch64][SVE] Asm: Support for contiguous, first-faulting LDFF1 (scalar+scalar) load instructions.
Reviewers: fhahn, rengolin, samparker, SjoerdMeijer, t.p.northover, echristo, evandro, javed.absar

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45946

llvm-svn: 330697
2018-04-24 08:59:08 +00:00
Peter Collingbourne 5ab4a4793e Reland r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses.", with a fix for the bot failure.
This reland includes a check to prevent the DAG combiner from folding an
offset that is smaller than the existing one. This can cause oscillations
between two possible DAGs, which was the cause of the hang and later assertion
failure observed on the lnt-ctmark-aarch64-O3-flto bot.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

Original commit message:
> This is a code size win in code that takes offseted addresses
> frequently, such as C++ constructors that typically need to compute
> an offseted address of a vtable. This reduces the size of Chromium
> for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 330630
2018-04-23 19:09:34 +00:00
Nico Weber 5d53aed419 Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt
llvm-svn: 330584
2018-04-23 12:49:34 +00:00
Sander de Smalen 7893f722b2 [AArch64][SVE] Asm: Support for contiguous, non-faulting LDNF1 (scalar+imm) load instructions
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45684

llvm-svn: 330583
2018-04-23 12:43:19 +00:00
Sander de Smalen 1b6d374422 [AArch64][SVE] Asm: Support for structured ST2, ST3 and ST4 (scalar+imm) store instructions.
Reviewers: fhahn, rengolin, javed.absar, SjoerdMeijer, t.p.northover, echristo, evandro, huntergr

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45681

llvm-svn: 330565
2018-04-23 07:50:35 +00:00
Jessica Paquette 2e5ada5c81 [MachineOutliner] Change B instruction for tail calls to TCRETURNdi
First off, this is more correct than having the B. Second off, this was making
a bot upset. This fixes that.

Update the test to include -verify-machineinstrs as well to prevent stuff like
this slipping by non debug/assert builds in the future.

llvm-svn: 330459
2018-04-20 18:03:21 +00:00
Sander de Smalen 30f9f11d51 [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+scalar) load instructions.
This is patch [4/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D45690

llvm-svn: 330423
2018-04-20 12:52:01 +00:00
Sander de Smalen 137efb231e [AArch64][SVE] Fix diagnostic for SVE LD4 instructions:
Diagnostic:
  'index must be multiple of 3 in range [-32, 28]'

Must be:
  'index must be multiple of 4 in range [-32, 28]'

llvm-svn: 330407
2018-04-20 09:45:50 +00:00
Sander de Smalen 367694b093 [AArch64][SVE] Added GPR64shifted and GPR64NoXZRshifted register classes.
Summary:
This is patch [3/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: SjoerdMeijer

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45689

llvm-svn: 330406
2018-04-20 08:54:49 +00:00
Sander de Smalen 149916d29a [AArch64][AsmParser] Extend RegOp with integrated 'shift/extend'.
Summary:
In some cases the shift/extend needs to be explicitly parsed together
with the register, rather than as a separate operand. This is needed
for addressing modes where the instruction as a whole dictates the
scaling/extend, rather than specific bits in the instruction.
By parsing them as a single operand, we avoid the need to pass an
extra operand in all CodeGen patterns (because all operands need to
have an associated value), and we avoid the need to update TableGen to
accept operands that have no associated bits in the instruction.

An added benefit of parsing them together is that the assembler
can give a sensible diagnostic if the scaling is not correct.

This is patch [2/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn, SjoerdMeijer

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45688

llvm-svn: 330394
2018-04-20 07:24:20 +00:00
Sander de Smalen 50d8702f26 [AArch64][AsmParser] NFC: Cleanup parsing of scalar registers.
Summary:
- Renamed tryParseRegister to tryParseScalarRegister, which
  now returns an OperandMatchResultTy.
- Moved matching of certain aliases into matchRegisterNameAlias.
- Changed type of most 'Reg' variables to 'unsigned'.

This is patch [1/4] in a series to add assembler/disassembler support for
SVE's contiguous LD1 (scalar+scalar) instructions:
- Patch [1/4]: https://reviews.llvm.org/D45687
- Patch [2/4]: https://reviews.llvm.org/D45688
- Patch [3/4]: https://reviews.llvm.org/D45689
- Patch [4/4]: https://reviews.llvm.org/D45690

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro, samparker

Reviewed By: samparker

Subscribers: samparker, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45687

llvm-svn: 330311
2018-04-19 07:35:08 +00:00
Amara Emerson 9de072f8ae [AArch64] Add isel pattern for v8i8->v2f32 NVCASTs.
rdar://39454635

llvm-svn: 330276
2018-04-18 17:10:19 +00:00
Sander de Smalen 7a210db81e [AArch64][SVE] Asm: Support for structured LD4 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45624

llvm-svn: 330120
2018-04-16 10:46:18 +00:00
Sander de Smalen d239eb3ce3 [AArch64][SVE] Asm: Support for structured LD3 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45623

llvm-svn: 330116
2018-04-16 10:10:48 +00:00
Sander de Smalen f836af869d [AArch64][SVE] Asm: Support for structured LD2 (scalar+imm) load instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45622

llvm-svn: 330108
2018-04-16 07:09:29 +00:00
Tim Northover 271d3d2771 MachO: trap unreachable instructions
Debugability is more important than saving 4 bytes to let us to fall
through to nonense.

llvm-svn: 330073
2018-04-13 22:25:20 +00:00
Peter Collingbourne ab40e44ba1 Revert r329956, "AArch64: Introduce a DAG combine for folding offsets into addresses."
Caused a hang and eventually an assertion failure in LTO builds
of 7zip-benchmark on aarch64 iOS targets.
http://green.lab.llvm.org/green/job/lnt-ctmark-aarch64-O3-flto/2024/

llvm-svn: 330063
2018-04-13 20:21:00 +00:00
Sander de Smalen 5b12db5d23 [AArch64][SVE] Asm: Support for contiguous LD1 (scalar+imm) load instructions
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45618

llvm-svn: 330024
2018-04-13 14:41:36 +00:00
Sander de Smalen 5c62598b0d [AArch64][SVE] Asm: Support for contiguous ST1 (scalar+imm) store instructions.
Summary:
Added instructions for contiguous stores, ST1, with scalar+imm addressing
modes and corresponding tests. The patch also adds parsing of
'mul vl' as needed for the VL-scaled immediate.

This is patch [6/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45432

llvm-svn: 330014
2018-04-13 12:56:14 +00:00
Sander de Smalen ea626e37ba [AArch64][SVE] Asm: Add support for parsing and printing SVE vector lists.
Summary:
Added Z_(b|h|s|d) vector list RegisterOperands along with support to
add/print the vector lists.

This is patch [5/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45431

llvm-svn: 330000
2018-04-13 09:11:53 +00:00
Peter Collingbourne 00db326b0d AArch64: Introduce a DAG combine for folding offsets into addresses.
This is a code size win in code that takes offseted addresses
frequently, such as C++ constructors that typically need to compute
an offseted address of a vtable. This reduces the size of Chromium
for Android's .text section by 108KB.

Differential Revision: https://reviews.llvm.org/D45199

llvm-svn: 329956
2018-04-12 21:23:55 +00:00
Jessica Paquette 8aa6cd5cb9 [AArch64] Move AFI->setRedZone(false) to top of emitPrologue
AFI->setRedZone(false) was put in the wrong place before, and so it only fired
on functions that didn't have stack frames. This moves that to the top of
emitPrologue to make sure that every function without a redzone has it set
correctly.

This also adds a function representing one of the early exit cases (GHC calling
convention) to the MachineOutliner noredzone test to ensure that we can outline
from functions like these, where we never use a redzone.

llvm-svn: 329922
2018-04-12 16:16:18 +00:00
Sander de Smalen 525e3225c2 [AArch64][AsmParser] Unify 'addVectorListOperands' functions.
Summary:
Merged 'addVectorList64Operands' and 'addVectorList128Operands' into a
generic 'addVectorListOperands', which can be easily extended to work
for SVE vectors.

This is patch [4/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45430

llvm-svn: 329909
2018-04-12 13:19:32 +00:00
Sander de Smalen 650234ba36 [AArch64][AsmParser] Make parse function for VectorLists generic to other vector types.
Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand 
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple 
kinds of vectors and recover if there is no match.

This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45429

llvm-svn: 329900
2018-04-12 11:40:52 +00:00
Sander de Smalen c88f9a1a57 [AArch64][AsmParser] Split index parsing from vector list.
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.

This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: rengolin

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45428

llvm-svn: 329809
2018-04-11 14:10:37 +00:00
Francis Visoiu Mistrih 6463922e3a [AArch64] Fix regression after r329691
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.

This patch now always picks the offset that fits first, even if FP is
preferred.

llvm-svn: 329797
2018-04-11 12:36:55 +00:00
Sander de Smalen 73937b7c9d [AArch64][AsmParser] Unify code for parsing Neon/SVE vectors.
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.

This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.

Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro

Reviewed By: fhahn

Subscribers: tschuett, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45427

llvm-svn: 329782
2018-04-11 07:36:10 +00:00
Geoff Berry 5696e075c3 [AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass.
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.

Reviewers: mcrosier

Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45502

llvm-svn: 329761
2018-04-10 21:43:03 +00:00
Jessica Paquette a450ed2352 Recommit r329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This commit fixes the bot failures that were coming up before with r329716.

The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.

This should work in debug and release builds now.

llvm-svn: 329746
2018-04-10 19:46:43 +00:00
Amara Emerson e27d5016ef [AArch64] Fix isel failure when BUILD_PAIR nodes are left over.
rdar://39175175

llvm-svn: 329743
2018-04-10 19:01:58 +00:00
Jessica Paquette c140bbddaf Revert 329716 "Add missing nullptr check before getSection() to AArch64MachObjectWriter::recordRelocation"
This broke a bunch of bots so I'm reverting while I figure it out.

llvm-svn: 329728
2018-04-10 17:53:41 +00:00
Peter Collingbourne a7d936f0c0 Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF."
Caused a build failure in check-tsan.

llvm-svn: 329718
2018-04-10 16:19:30 +00:00