Commit Graph

3643 Commits

Author SHA1 Message Date
Amara Emerson 2806fd01a1 [AArch64][GlobalISel] Fix a crash when selecting shufflevectors with an undef mask element.
If a shufflevector's mask vector has an element with "undef" then the generic
instruction defining that element register is a G_IMPLICT_DEF instead of G_CONSTANT.
This fixes the selector to handle this case, and for now assumes that undef just means
zero. In future we'll optimize this case properly.

llvm-svn: 358312
2019-04-12 21:31:21 +00:00
Eric Christopher 6b06c6a5ef Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp
files rather than rely on transitive includes from MCStreamer.h.

llvm-svn: 358263
2019-04-12 07:40:01 +00:00
Amara Emerson 7e9355f870 [AArch64][GlobalISel] Flesh out vector load/store support for more types.
Some of these were legalizing into smaller vector types unnecessarily,
others were simply not supported yet.

llvm-svn: 358223
2019-04-11 20:40:01 +00:00
Amara Emerson b956051415 [AArch64][GlobalISel] Legalization and ISel support for load/stores of vectors of pointers.
Loads and store of values with type like <2 x p0> currently don't get imported
because SelectionDAG has no knowledge of pointer types. To leverage the existing
support for vector load/stores, we can bitcast the value to have s64 element
types instead. We do this as a custom legalization.

This patch also adds support for general loads of <2 x s64>, and relaxes some
type conditions on selecting G_BITCAST.

Differential Revision: https://reviews.llvm.org/D60534

llvm-svn: 358221
2019-04-11 20:32:24 +00:00
Diogo N. Sampaio 8ddfd46c61 [AArch64] Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64
Summary:  Add lowering pattern for llvm.aarch64.neon.vcvtfxs2fp.f16.i64

Reviewers: pbarrio, DavidSpickett, LukeGeeson

Reviewed By: LukeGeeson

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60259

llvm-svn: 358171
2019-04-11 14:19:43 +00:00
Amara Emerson 213e0bde04 [AArch64][GlobalISel] Make <2 x p0> = G_BUILD_VECTOR legal.
The existing isel support already works for p0 once the legalizer accepts it.

llvm-svn: 358144
2019-04-10 23:06:14 +00:00
Amara Emerson a7ff111b04 [AArch64][GlobalISel] Add legalizer support for <8 x s16> and <16 x s8> G_ADD.
llvm-svn: 358143
2019-04-10 23:06:11 +00:00
Amara Emerson ae878dab03 [AArch64][GlobalISel] Scalarize vector SDIV.
llvm-svn: 358142
2019-04-10 23:06:08 +00:00
Craig Topper 35fe07916a [AArch64] Teach getTestBitOperand to look through ANY_EXTENDS
This patch teach getTestBitOperand to look through ANY_EXTENDs when the extended bits aren't used. The test case changed here is based what D60358 did to test16 in tbz-tbnz.ll. So this patch will avoid that regression.

Differential Revision: https://reviews.llvm.org/D60482

llvm-svn: 358108
2019-04-10 17:27:29 +00:00
Nick Desaulniers 5277b3ff25 [AsmPrinter] refactor to remove remove AsmVariant. NFC
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.

Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.

This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.

Reviewers: craig.topper

Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60488

llvm-svn: 358101
2019-04-10 16:38:43 +00:00
Diogo N. Sampaio aae424a2d2 [AArch64] Add lowering pattern for scalar fp16 facge and facgt
Summary: The fp16 scalar version of facge and facgt requires a custom patter matching, as the result type is not the same width of the operands.

Reviewers: olista01, javed.absar, pbarrio

Reviewed By: javed.absar

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60212

llvm-svn: 358083
2019-04-10 13:34:18 +00:00
Clement Courbet 48e2eb0b27 [NFC] Fix unused variable warning.
llvm-svn: 358080
2019-04-10 13:18:05 +00:00
Amara Emerson 9bf092d719 [AArch64][GlobalISel] Add isel support for vector G_ICMP and G_ASHR & G_SHL
The selection for G_ICMP is unfortunately not currently importable from SDAG
due to the use of custom SDNodes. To support this, this selection method has an
opcode table which has been generated by a script, indexed by various
instruction properties. Ideally in future we will have a GISel native selection
patterns that we can write in tablegen to improve on this.

For selection of some types we also need support for G_ASHR and G_SHL which are
generated as a result of legalization. This patch also adds support for them,
generating the same code as SelectionDAG currently does.

Differential Revision: https://reviews.llvm.org/D60436

llvm-svn: 358035
2019-04-09 21:22:43 +00:00
Amara Emerson 888dd5d198 [AArch64][GlobalISel] Legalize vector G_ICMP.
Selection support will be coming in a later patch.

Differential Revision: https://reviews.llvm.org/D60435

llvm-svn: 358034
2019-04-09 21:22:40 +00:00
Amara Emerson 92d74f19cf [AArch64][GlobalISel] Add legalization for some vector G_SHL and G_ASHR.
This is needed for some future support for vector ICMP.

Differential Revision: https://reviews.llvm.org/D60433

llvm-svn: 358033
2019-04-09 21:22:37 +00:00
Amara Emerson 2b523f8162 [GlobalISel][AArch64] Allow CallLowering to handle types which are normally
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.

Differential Revision: https://reviews.llvm.org/D60425

llvm-svn: 358032
2019-04-09 21:22:33 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Sander de Smalen 772e4734d9 [AArch64][AsmParser] Fix .arch_extension directive parsing
This patch fixes .arch_extension directive parsing to handle a wider
range of architecture extension options. The existing parser was parsing
extensions as an identifier which breaks for extensions containing a
"-", such as the "tlb-rmi" extension.

The extension is now parsed as a string. This is consistent with the
extension parsing in the .arch and .cpu directive parsing.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D60118

llvm-svn: 357677
2019-04-04 09:11:17 +00:00
Jessica Paquette e794121cd0 [AArch64][GlobalISel] Legalize G_FEXP2
Same as G_EXP. Add a test, and update legalizer-info-validation.mir and
f16-instructions.ll.

Differential Revision: https://reviews.llvm.org/D60165

llvm-svn: 357605
2019-04-03 16:58:32 +00:00
Javed Absar 5820db93c9 [AArch64] Update v8.5a MTE LDG/STG instructions
The latest MTE specification adds register Xt to the STG instruction family:
  STG [Xn, #offset] -> STG Xt, [Xn, #offset]
The tag written to memory is taken from Xt rather than Xn.
Also, the LDG instruction also was changed to read return address from Xt:
  LDG Xt, [Xn, #offset].
This patch includes those changes and tests.
Specification is at: https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60188

llvm-svn: 357583
2019-04-03 14:12:13 +00:00
Jessica Paquette 22c6215c7e [AArch64][GlobalISel] Select llvm.aarch64.stlxr(i64, i64*)
This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.

Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.

Differential Revision: https://reviews.llvm.org/D60100

llvm-svn: 357518
2019-04-02 19:57:26 +00:00
Jessica Paquette e44c20a68d [AArch64][GlobalISe] Select STRQui for stores into v264s instead of scalarizing
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.

Differential Revision: https://reviews.llvm.org/D60083

llvm-svn: 357432
2019-04-01 22:19:13 +00:00
David Spickett 3d233d5d4d [AArch64] Add v8.5-a Memory Tagging STZGM instruction
This instruction writes a block of allocation tags
and stores zero to the associated data locations.

It differs from STGM by 1 bit and has the same
arguments.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60065

llvm-svn: 357397
2019-04-01 14:56:37 +00:00
David Spickett 9142b8ef1b [AArch64] Add v8.5-a Memory Tagging STGM/LDGM instructions
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.

The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c

Differential Revision: https://reviews.llvm.org/D60064

llvm-svn: 357395
2019-04-01 14:52:18 +00:00
David Spickett efe376add6 [AArch64] Add v8.5-a Memory Tagging GMID_EL1 register
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.

The specification can be found here:
https://developer.arm.com/docs/ddi0596/c

llvm-svn: 357392
2019-04-01 14:41:14 +00:00
Jessica Paquette d3ffd47df9 [GlobalISel][AArch64] Add isel support for G_INSERT_VECTOR_ELT on v2s32s
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59910

llvm-svn: 357318
2019-03-29 21:39:36 +00:00
Evandro Menezes 0f797b8732 [CodeGen] Refactor the option for the maximum jump table size
Refactor the option `max-jump-table-size` to default to the maximum
representable number.  Essentially, NFC.

llvm-svn: 357280
2019-03-29 17:28:11 +00:00
Amara Emerson 8a02aea6fc [AArch64][GlobalISel] Make G_PHI of v2s64, v4s32, v2s32 legal.
llvm-svn: 357108
2019-03-27 18:31:46 +00:00
Sander de Smalen e1eab42f65 [AArch64][SVE] Asm: error on unexpected SVE vector register type suffix
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.

The following are examples of what was previously valid:

    movprfx z0.b, z0.b
    movprfx z0.b, z0.s
    movprfx z0, z0.s

These instructions are now erroneous.

Patch by Cullen Rhodes (c-rhodes)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357094
2019-03-27 17:23:38 +00:00
Sander de Smalen 90d1b551e1 [AArch64] NFC: Cleanup isAArch64FrameOffsetLegal
Cleanup isAArch64FrameOffsetLegal by:
- Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo().
- Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction
  has an unscaled variant.
- Simplifying the logic that calculates the offset to fit the immediate.

Reviewers: paquette, evandro, eli.friedman, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D59636

llvm-svn: 357064
2019-03-27 13:16:19 +00:00
Sander de Smalen 46edefe3c4 [AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo
This patch also adds cases PRFUMi and PRFMui.
This change was discussed in https://reviews.llvm.org/D59635.

llvm-svn: 357059
2019-03-27 10:39:03 +00:00
Eli Friedman 92d0d13366 [AArch64] Prefer "mov" over "orr" to materialize constants.
This is generally more readable due to the way the assembler aliases
work.

(This causes a lot of test changes, but it's not really as scary as it
looks at first glance; it's just mechanically changing a bunch of checks
for orr to check for mov instead.)

Differential Revision: https://reviews.llvm.org/D59720

llvm-svn: 356954
2019-03-25 21:25:28 +00:00
Evandro Menezes 4a7739b681 [AArch64, ARM] Add support for Exynos M5
Add Exynos M5 support and test cases.

llvm-svn: 356793
2019-03-22 18:42:14 +00:00
Amara Emerson c10b24691a [AArch64] Split the neon.addp intrinsic into integer and fp variants.
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Differential Revision: https://reviews.llvm.org/D59655

llvm-svn: 356722
2019-03-21 22:31:37 +00:00
Evandro Menezes e5e77815b4 [AArch64] Update for Exynos
Fix the feature set for Exynos M4 by removing support for `+fp16fml` and fix test case.

llvm-svn: 356698
2019-03-21 18:54:58 +00:00
Oliver Stannard defdb1070f [AArch64] Allow -mattr=tpidr-el[1|2|3]
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.

Patch by Philip Derrin!

Differential revision: https://reviews.llvm.org/D54685

llvm-svn: 356657
2019-03-21 11:30:17 +00:00
Amara Emerson 761ca2e53b [AArch64][GlobalISel] Add an optimization to select vector DUP instructions.
This adds pattern matching for the insert+shufflevector sequence so we can
generate dup instructions instead of the current TBL sequence.

Differential Revision: https://reviews.llvm.org/D59558

llvm-svn: 356526
2019-03-19 21:43:05 +00:00
Amara Emerson 18e2c5724a [AArch64][GlobalISel] Make v4s32 G_IMPLICIT_DEF legal.
llvm-svn: 356525
2019-03-19 21:43:02 +00:00
Amara Emerson 8627178d46 Revert r356304: remove subreg parameter from MachineIRBuilder::buildCopy()
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().

For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.

This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.

llvm-svn: 356396
2019-03-18 19:20:10 +00:00
Adhemerval Zanella 270249de2b [AArch64] Small fix for getIntImmCost
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58461

llvm-svn: 356391
2019-03-18 18:50:58 +00:00
Adhemerval Zanella a3cefa5d64 [AArch64] Optimize floating point materialization
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.

The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58460

llvm-svn: 356390
2019-03-18 18:45:57 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Adhemerval Zanella 8a595b1d2e [AArch64] Refactor floating point materialization. NFC
It splits the login of actual instruction emission away from the logic
that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm.
The new function AArch64_IMM::expandMOVImm, which return the list of the 
instructions to materialize the immediate constant, is implemented on a 
separated unit because it will be used in a subsequent patch to optimize
floating point materialization.

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D58915

llvm-svn: 356387
2019-03-18 18:23:23 +00:00
Christof Douma 8cfd91dcc7 [AArch64] Fix bug 35094 atomicrmw on Armv8.1-A+lse
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094

The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:

"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."

A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):

    P0 (atomic_int* y,atomic_int* x) {
      atomic_store_explicit(x,1,memory_order_relaxed);
      atomic_thread_fence(memory_order_release);
      atomic_store_explicit(y,1,memory_order_relaxed);
    }

    P1 (atomic_int* y,atomic_int* x) {
      atomic_fetch_add_explicit(y,1,memory_order_relaxed);  // STADD
      atomic_thread_fence(memory_order_acquire);
      int r0 = atomic_load_explicit(x,memory_order_relaxed);
    }

    P2 (atomic_int* y) {
      int r1 = atomic_load_explicit(y,memory_order_relaxed);
    }

    My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
    this test has executed. However, if the relaxed add in P1 compiles to
    STADD and the subsequent acquire fence is compiled as DMB LD, then we
    don't have any ordering guarantees in P1 and the forbidden result could
    be observed.

Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
2019-03-18 09:21:06 +00:00
Amara Emerson 3739a20875 [GlobalISel] Allow MachineIRBuilder to build subregister copies.
This relaxes some asserts about sizes, and adds an optional subreg parameter
to buildCopy().

Also update AArch64 instruction selector to use this in places where we
previously used MachineInstrBuilder manually.

Differential Revision: https://reviews.llvm.org/D59434

llvm-svn: 356304
2019-03-15 21:59:50 +00:00
Nikita Popov 1a26144ff5 [AArch64] Turn BIC immediate creation into a DAG combine
Switch BIC immediate creation for vector ANDs from custom lowering
to a DAG combine, which gives generic DAG combines a change to
apply first. In particular this avoids (and x, -1) being turned into
a (bic x, 0) instead of being eliminated entirely.

Differential Revision: https://reviews.llvm.org/D59187

llvm-svn: 356299
2019-03-15 21:04:34 +00:00
Amara Emerson d55016b276 [AArch64][GlobalISel] Regbankselect: Fix G_BUILD_VECTOR trying to use s16 gpr sources.
Since we can't insert s16 gprs as we don't have 16 bit GPR registers, we need to
teach RBS to assign them to the FPR bank so our selector works.

llvm-svn: 356282
2019-03-15 18:00:01 +00:00
Jessica Paquette 7d6784f522 [AArch64][GlobalISel] Add isel support for G_UADDO on s32s and s64s
This adds instruction selection support for G_UADDO on s32s and s64s.

Also
- Add an instruction selection test
- Update the arm64-xaluo.ll test to show that we generate the correct assembly

Differential Revision: https://reviews.llvm.org/D58734

llvm-svn: 356214
2019-03-14 22:54:29 +00:00
Amara Emerson d61b89be8d [AArch64][GlobalISel] Implement selection for G_UNMERGE of vectors to vectors.
This re-uses the previous support for extract vector elt to extract the
subvectors.

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356213
2019-03-14 22:48:18 +00:00
Amara Emerson 2ff2298c3e [AArch64][GlobalISel] Add some support for G_CONCAT_VECTORS.
Handles concatenating 2 x v2s32 and 2 x v4s16

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356212
2019-03-14 22:48:15 +00:00
Jessica Paquette 5aff1f475c [GlobalISel][AArch64] Add partial selection support for G_INSERT_VECTOR_ELT
This adds support for inserting elements into packed vectors. It also adds
two tests: one for selection, and one for regbank select.

Unpacked vectors will come in a follow-up.

Differential Revision: https://reviews.llvm.org/D59325

llvm-svn: 356182
2019-03-14 18:01:30 +00:00
Jessica Paquette 85ace6269f [AArch64][GlobalISel] Gardening: Simplify subregister copy in selectBuildVector
NFC. Some more preliminary factoring for G_INSERT_VECTOR_ELT.

Also better code-reuse, etc., etc.

Differential Revision: https://reviews.llvm.org/D59323

llvm-svn: 356107
2019-03-13 23:29:54 +00:00
Jessica Paquette 16d67a3e32 [GlobalISel][AArch64] Gardening: Factor out vector inserts
Factor out the vector insert code in `selectBuildVector`. Replace part of it
with `emitScalarToVector`, since it was pretty much equivalent.

This will make implementing G_INSERT_VECTOR_ELT easier.

Differential Revision: https://reviews.llvm.org/D59322

llvm-svn: 356106
2019-03-13 23:22:23 +00:00
Jessica Paquette bb1aced80d [GlobalISel][AArch64] Gardening: Factor out code to find lane indices
Some more refactoring for G_INSERT_VECTOR_ELT.

Factor out the code used to find a lane index from `selectExtractElt`. Put it
into a more general-purpose `getConstantValueForReg` function.

This will be shared with the code for G_INSERT_VECTOR_ELT.

Differential Revision: https://reviews.llvm.org/D59324

llvm-svn: 356101
2019-03-13 21:19:29 +00:00
Jessica Paquette 607774c960 Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without
running into any problematic intrinsics.

Also add a fix for lane copies, which don't support index 0.

llvm-svn: 355871
2019-03-11 22:18:01 +00:00
Jessica Paquette 42d16501e6 [GlobalISel][AArch64] Always fall back on aarch64.neon.addp.*
Overloaded intrinsics aren't necessarily safe for instruction selection. One
such intrinsic is aarch64.neon.addp.*.

This is a temporary workaround to ensure that we always fall back on that
intrinsic. Eventually this will be replaced with a proper solution.

https://bugs.llvm.org/show_bug.cgi?id=40968

Differential Revision: https://reviews.llvm.org/D59062

llvm-svn: 355865
2019-03-11 20:51:17 +00:00
Nikita Popov aa7cfa75f9 [SDAG][AArch64] Legalize VECREDUCE
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.

Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.

This also includes a few more changes to make this work somewhat
reasonably:

 * Add support for expanding VECREDUCE in SDAG. Usually
   experimental.vector.reduce is expanded prior to codegen, but if the
   target does have native vector reduce, it may of course still be
   necessary to expand due to legalization issues. This uses a shuffle
   reduction if possible, followed by a naive scalar reduction.
 * Allow the result type of integer VECREDUCE to be larger than the
   vector element type. For example we need to be able to reduce a v8i8
   into an (nominally) i32 result type on AArch64.
 * Use the vector operand type rather than the scalar result type to
   determine the action, so we can control exactly which vector types are
   supported. Also change the legalize vector op code to handle
   operations that only have vector operands, but no vector results, as
   is the case for VECREDUCE.
 * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
   explicitly specify for which vector types the reductions are supported.

This does not handle anything related to VECREDUCE_STRICT_*.

Differential Revision: https://reviews.llvm.org/D58015

llvm-svn: 355860
2019-03-11 20:22:13 +00:00
Stanislav Mekhanoshin e98944ed47 Use bitset for assembler predicates
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.

This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.

Differential Revision: https://reviews.llvm.org/D59002

llvm-svn: 355839
2019-03-11 17:04:35 +00:00
Amara Emerson 7a05d1c1f1 [AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI.
Fixes PR41001.

llvm-svn: 355745
2019-03-08 22:17:00 +00:00
Mitch Phillips 790edbc16e [HWASan] Save + print registers when tag mismatch occurs in AArch64.
Summary:
This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance.

In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable.

The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format:

Registers where the failure occurred (pc 0x0055555561b4):
    x0  0000000000000014  x1  0000007ffffff6c0  x2  1100007ffffff6d0  x3  12000056ffffe025
    x4  0000007fff800000  x5  0000000000000014  x6  0000007fff800000  x7  0000000000000001
    x8  12000056ffffe020  x9  0200007700000000  x10 0200007700000000  x11 0000000000000000
    x12 0000007fffffdde0  x13 0000000000000000  x14 02b65b01f7a97490  x15 0000000000000000
    x16 0000007fb77376b8  x17 0000000000000012  x18 0000007fb7ed6000  x19 0000005555556078
    x20 0000007ffffff768  x21 0000007ffffff778  x22 0000000000000001  x23 0000000000000000
    x24 0000000000000000  x25 0000000000000000  x26 0000000000000000  x27 0000000000000000
    x28 0000000000000000  x29 0000007ffffff6f0  x30 00000055555561b4

... and prints after the dump of memory tags around the buggy address.

Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging.

Reviewers: pcc, eugenis

Reviewed By: pcc, eugenis

Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D58857

llvm-svn: 355738
2019-03-08 21:22:35 +00:00
Abderrazek Zaafrani 5ced596198 [AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions
https://reviews.llvm.org/D58855

llvm-svn: 355545
2019-03-06 20:30:06 +00:00
Amara Emerson 21f44dfe9c [AArch64] Remove a stray test from the AArch64 directory.
llvm-svn: 355534
2019-03-06 18:54:07 +00:00
Jessica Paquette 00d5847b5c Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT"
This broke test-suite::aarch64_neon_intrinsics.test

Reverting while I look into it.

Example failure:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740

llvm-svn: 355408
2019-03-05 15:47:00 +00:00
Jessica Paquette caf62b1d47 [GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT
This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases
where the index is defined by a G_CONSTANT.

It also factos out the lane copy opcode selection part into its own function,
`getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and
`selectExtractElt`.

Differential Revision: https://reviews.llvm.org/D58469

llvm-svn: 355344
2019-03-04 22:35:32 +00:00
Jessica Paquette 0632e12f89 [GlobalISel][AArch64] Legalize vector G_SELECT
Just scalarize it, and add a test showing it works.

Differential Revision: https://reviews.llvm.org/D58747

llvm-svn: 355339
2019-03-04 21:12:46 +00:00
Amara Emerson 8acb0d9c82 Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
The code to materialize a mask from a constant pool load tried to use a 128 bit
LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted
in a link failure in the NEON tests in the test suite since the LDR address was
unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is
64 bit, before converting back to a 128 bit register for the TBL.

llvm-svn: 355326
2019-03-04 19:16:00 +00:00
Jonas Hahnfeld 65a401f6a9 [AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCI
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
   and non-enumeral type in conditional expression". Solve this by casting
   to unsigned which is the final type anyway.

Differential Revision: https://reviews.llvm.org/D58834

llvm-svn: 355304
2019-03-04 08:51:32 +00:00
Eli Friedman d19a7060c6 [AArch64] [Windows] Don't skip constructing UnwindHelp.
In certain cases, the first non-frame-setup instruction in a function is
a branch.  For example, it could be a cbz on an argument.  Make sure we
correctly allocate the UnwindHelp, and find an appropriate register to
use to initialize it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40184

Differential Revision: https://reviews.llvm.org/D58752

llvm-svn: 355136
2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani abfd10807c [AArch64] Improve FP16 vector convert from short instructions.
https://reviews.llvm.org/D58563

llvm-svn: 355134
2019-02-28 20:21:46 +00:00
Amara Emerson 8d70e6425c Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1."
Seems to break some neon intrinsics tests.

llvm-svn: 355115
2019-02-28 18:47:29 +00:00
Amara Emerson 85c3afd7f6 [AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1.
This extends the existing support for shufflevector to handle cases like
<2 x float>, which we can implement by concating the vectors and using a TBL1.

Differential Revision: https://reviews.llvm.org/D58684

llvm-svn: 355104
2019-02-28 16:43:11 +00:00
Abderrazek Zaafrani 2fc498a652 [AArch64] Generate FP16 vector compare instructions.
https://reviews.llvm.org/D58561

llvm-svn: 355050
2019-02-28 00:31:38 +00:00
Amara Emerson 6bcfa1c419 [AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC.
This is a preparatory change as I want to use emitScalarToVector() elsewhere,
and in general we want to transition to MIRBuilder instead of using BuildMI
directly.

Differential Revision: https://reviews.llvm.org/D58528

llvm-svn: 354807
2019-02-25 18:52:54 +00:00
Luke Cheeseman 59f77e7891 [AArch64] Add support for Cortex-A76 and Cortex-A76AE
- Add LLVM backend support for Cortex-A76 and Cortex-A76AE
- Documentation can be found at
  https://developer.arm.com/products/processors/cortex-a/cortex-a76

llvm-svn: 354788
2019-02-25 15:08:27 +00:00
Amara Emerson 1abe05c0dd Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR""
Thanks to Richard Trieu for pointing out that the failures were due to a
use-after-free of an ArrayRef.

llvm-svn: 354616
2019-02-21 20:20:16 +00:00
David Spickett 8ac2b181a1 [AArch64] Print instruction before atomic semantic annotations
Commit r353303 added annotations when acquire semantics
were dropped from an instruction.

printAnnotation was called before printInstruction.
So if you didn't set a separate comment output stream
you got <comment><instr> instead of <instr><comment>
as expected.

To fix this move the new printAnnotation to after
the instruction is printed.

Differential Revision: https://reviews.llvm.org/D58059

llvm-svn: 354565
2019-02-21 10:42:49 +00:00
Amara Emerson 71f2a5e60f Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"
This reverts r354521 because it broke the bots, but passes on Darwin somehow.

llvm-svn: 354532
2019-02-21 00:31:13 +00:00
Amara Emerson a946d057b4 [AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR
This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and
implements them with a very pessimistic TBL2 instruction in the selector.

For TBL2, support is also needed to generate constant pool entries and load from
them in order to materialize the mask register.

Currently supports <2 x s64> and <4 x s32> result types.

Differential Revision: https://reviews.llvm.org/D58466

llvm-svn: 354521
2019-02-20 22:11:39 +00:00
Jessica Paquette b53e0f4b81 [GlobalISel][AArch64] Legalize + select some llvm.ctlz.* intrinsics
Legalize/select llvm.ctlz.*

Add select-ctlz to show that we actually select them. Update arm64-clrsb.ll and
arm64-vclz.ll to show that we perform valid transformations in optimized builds,
and document where GISel can improve.

Differential Revision: https://reviews.llvm.org/D58155

llvm-svn: 354299
2019-02-18 23:33:24 +00:00
Matt Arsenault 530d05e94a GlobalISel: Add alignment to LegalityQuery MMOs
This allows targets to specify the minimum alignment required for the
load/store.

llvm-svn: 354071
2019-02-14 22:41:09 +00:00
Petr Hosek fcbec02ea6 [AArch64] Support reserving arbitrary general purpose registers
This is a follow up to D48580 and D48581 which allows reserving
arbitrary general purpose registers with the exception of registers
with special purpose (X8, X16-X18, X29, X30) and registers used by LLVM
(X0, X19). This change also generalizes some of the existing logic to
rely entirely on values generated from tablegen.

Differential Revision: https://reviews.llvm.org/D56305

llvm-svn: 353957
2019-02-13 17:28:47 +00:00
Nikita Popov a3be17ea1c [AArch64] Expand v8i8 cttz (PR39729)
Fix for https://bugs.llvm.org/show_bug.cgi?id=39729.

Rather than adding just a case for v8i8 I'm setting cttz to expand
for all vector types.

Differential Revision: https://reviews.llvm.org/D58008

llvm-svn: 353872
2019-02-12 18:55:53 +00:00
Jessica Paquette 0e71e73faa [GlobalISel][AArch64] Select llvm.bswap* for non-vector types
This teaches the IRTranslator to emit G_BSWAP when it runs into
Intrinsic::bswap. This allows us to select G_BSWAP for non-vector types in
AArch64.

Add a select-bswap.mir test, and add global isel checks to a couple existing
tests in test/CodeGen/AArch64.

This doesn't handle every bswap case, since some of these rely on known bits
stuff. This just lets us handle the naive case.

Differential Revision: https://reviews.llvm.org/D58081

llvm-svn: 353861
2019-02-12 17:28:17 +00:00
Jessica Paquette e57fe23f70 [AArch64][GlobalISel] Add isel support for a couple vector exts/truncs
Add support for

- v4s16 <-> v4s32
- v2s64 <-> v2s32

And update tests that use them to show that we generate the correct
instructions.

Differential Revision: https://reviews.llvm.org/D57832

llvm-svn: 353732
2019-02-11 18:56:39 +00:00
Jessica Paquette ebdb021031 [GlobalISel][AArch64] Select G_FFLOOR
This teaches the legalizer about G_FFLOOR, and lets us select G_FFLOOR in
AArch64.

It updates the existing floating point tests, and adds a select-floor.mir test.

Differential Revision: https://reviews.llvm.org/D57486

llvm-svn: 353722
2019-02-11 17:22:58 +00:00
Benjamin Kramer 711950c116 Move some classes into anonymous namespaces. NFC.
llvm-svn: 353710
2019-02-11 15:16:21 +00:00
Eli Friedman 29c0609301 [AArch64] Fix condition for "high-vector" DUP optimizations.
AArch64 NEON has a bunch of instructions with a "2" suffix that extract
the top half of the source vectors, instead of the bottom half.  We have
some DAGCombines to try to take advantage of that.  However, they
assumed that any EXTRACT_VECTOR was extracting the high half of the
vector in question.

This issue has apparently existed since the AArch64 backend was merged.

Fixes https://bugs.llvm.org/show_bug.cgi?id=40632 .

Differential Revision: https://reviews.llvm.org/D57862

llvm-svn: 353486
2019-02-08 00:23:35 +00:00
Matt Arsenault fbec8fe93b GlobalISel: Implement narrowScalar for shift main type
This is pretty much directly ported from SelectionDAG. Doesn't include
the shift by non-constant but known bits version, since there isn't a
globalisel version of computeKnownBits yet.

This shows a disadvantage of targets not specifically which type
should be used for the shift amount. If type 0 is legalized before
type 1, the operations on the shift amount type use the wider type
(which are also less likely to legalize). This can be avoided by
targets specifying legalization actions on type 1 earlier than for
type 0.

llvm-svn: 353455
2019-02-07 19:37:44 +00:00
Tim Northover 638110a208 AArch64: implement copy for paired GPR registers.
When doing 128-bit atomics using CASP we might need to copy a GPRPair to a
different register, but that was unimplemented up to now.

llvm-svn: 353383
2019-02-07 10:35:34 +00:00
Tim Northover 474f5d9b55 AArch64: enforce even/odd register pairs for CASP instructions.
ARMv8.1a CASP instructions need the first of the pair to be an even register
(otherwise the encoding is unallocated). We enforced this during assembly, but
not CodeGen before.

llvm-svn: 353308
2019-02-06 15:26:35 +00:00
Tim Northover 71025a2f3e AArch64: annotate atomics with dropped acquire semantics when printing.
A quirk of the v8.1a spec is that when the writeback regiser for an atomic
read-modify-write instruction is wzr/xzr, the instruction no longer enforces
acquire ordering. However, it's still written with the misleading 'a' mnemonic.

So this adds an annotation when disassembling such instructions, mentioning the
change.

llvm-svn: 353303
2019-02-06 15:07:59 +00:00
Aditya Nandakumar fef7619b05 [NFC][GlobalISel]: Add a convenience method to MachineInstrBuilder to simplify getOperand(i).getReg()
https://reviews.llvm.org/D57608

It's a common pattern in GISel to have a MachineInstrBuilder from which we get various regs
(commonly MIB->getOperand(0).getReg()). This adds a helper method and the above can be
replaced with MIB.getReg(0).

llvm-svn: 353223
2019-02-05 22:14:40 +00:00
Oliver Stannard 78dc38ec94 [AArch64][Outliner] Don't outline BTI instructions
We can't outline BTI instructions, because they need to be the very first
instruction executed after an indirect call or branch. If we outline them, then
an indirect call might go to the branch to the outlined function, which will
fault.

Differential revision: https://reviews.llvm.org/D57753

llvm-svn: 353190
2019-02-05 17:21:57 +00:00
Matt Arsenault 86504fb20e AArch64/GlobalISel: Don't clamp from 2 to 2
This is equivalent to clampMaxNumElements, but saves a check.

llvm-svn: 353188
2019-02-05 16:57:18 +00:00
Florian Hahn 3b251963c3 [CGP] Add support for sinking operands to their users, if they are free.
This patch improves code generation for some AArch64 ACLE intrinsics. It adds
support to CGP to duplicate and sink operands to their user, if they can be
folded into a target instruction, like zexts and sub into usubl. It adds a
TargetLowering hook shouldSinkOperands, which looks at the operands of
instructions to see if sinking is profitable.

I decided to add a new target hook, as for the sinking to be profitable,
at least on AArch64, we have to look at multiple operands of an
instruction, instead of looking at the users of a zext for example.

The sinking is done in CGP, because it works around an instruction
selection limitation. If instruction selection is not limited to a
single basic block, this patch should not be needed any longer.

Alternatively this could be done in the LoopSink pass, which tries to
undo LICM for instructions in blocks that are not executed frequently.

Note that we do not force the operands to sink to have a single user,
because we duplicate them before sinking. Therefore this is only
desirable if they really can be done for free. Additionally we could
consider the impact on live ranges later on.

This should fix https://bugs.llvm.org/show_bug.cgi?id=40025.

As for performance, we have internal code that uses intrinsics and can
be speed up by 10% by this change.

Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma, RKSimon, spatel

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D57377

llvm-svn: 353152
2019-02-05 10:27:40 +00:00
Andrea Di Biagio edbf06a767 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Mandeep Singh Grang dc1e778369 [AArch64] Fix unused variable [NFC]
llvm-svn: 352940
2019-02-01 23:42:34 +00:00
Mandeep Singh Grang 70d484d94e [COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects
Summary: This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.

Reviewers: rnk, efriedma, ssijaric, TomTan

Reviewed By: rnk, efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D57183

llvm-svn: 352923
2019-02-01 21:41:33 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
James Y Knight 7976eb5838 [opaque pointer types] Pass function types to CallInst creation.
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57170

llvm-svn: 352909
2019-02-01 20:43:25 +00:00
Adhemerval Zanella b3ccc5550d [AArch64] Optimize floating point materialization
This patch changes isFPImmLegal to return if the value can be enconded
as the immediate operand of a logical instruction besides checking if
for immediate field for fmov.

This optimizes some floating point materization, inclusive values
used on isinf lowering.

Reviewed By: rengolin, efriedma, evandro

Differential Revision: https://reviews.llvm.org/D57044

llvm-svn: 352866
2019-02-01 12:26:06 +00:00
James Y Knight 13680223b9 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.

Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352827
2019-02-01 02:28:03 +00:00
James Y Knight fadf25068e Revert "[opaque pointer types] Add a FunctionCallee wrapper type, and use it."
This reverts commit f47d6b38c7 (r352791).

Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.

llvm-svn: 352800
2019-01-31 21:51:58 +00:00
James Y Knight f47d6b38c7 [opaque pointer types] Add a FunctionCallee wrapper type, and use it.
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.

Then:
- update the CallInst/InvokeInst instruction creation functions to
  take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.

One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.

However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)

Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.

Differential Revision: https://reviews.llvm.org/D57315

llvm-svn: 352791
2019-01-31 20:35:56 +00:00
Sjoerd Meijer f7cc34cae8 [SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS
And instead just generate a libcall. My motivating example on ARM was a simple:
  
  shl i64 %A, %B

for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.

Differential Revision: https://reviews.llvm.org/D57386

llvm-svn: 352736
2019-01-31 08:07:30 +00:00
Matt Arsenault 2a64598ef2 GlobalISel: Fix creating MMOs with align 0
llvm-svn: 352712
2019-01-31 01:38:47 +00:00
Jessica Paquette 84bedac7e9 [GlobalISel][AArch64] Select G_FEXP
This teaches the legalizer to handle G_FEXP in AArch64. As a result, it also
allows us to select G_FEXP.

It...

- Updates the legalizer-info tests
- Adds a test for legalizing exp
- Updates the existing fp tests to show that we can now select G_FEXP

https://reviews.llvm.org/D57483

llvm-svn: 352692
2019-01-30 23:46:15 +00:00
Jessica Paquette 10f59405ae [GlobalISel][AArch64] Select G_FABS
This adds instruction selection support for G_FABS in AArch64. It also updates
the existing basic FP tests, adds a selection test for G_FABS.

https://reviews.llvm.org/D57418

llvm-svn: 352684
2019-01-30 22:54:21 +00:00
Jessica Paquette 0154bd1385 [GlobalISel][AArch64] Add instruction selection support for @llvm.log2
This teaches GlobalISel to emit a RTLib call for @llvm.log2 when it encounters
it.

It updates the existing floating point tests to show that we don't fall back on
the intrinsic, and select the correct instructions. It also adds a legalizer
test for G_FLOG2.

https://reviews.llvm.org/D57357

llvm-svn: 352673
2019-01-30 21:16:04 +00:00
Jessica Paquette 22457f8e9b [GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt
This teaches the legalizer about G_FSQRT in AArch64. Also adds a legalizer
test for G_FSQRT, a selection test for it, and updates existing floating point
tests.

https://reviews.llvm.org/D57361

llvm-svn: 352671
2019-01-30 21:03:52 +00:00
Amara Emerson 102c9ed768 [AArch64][GlobalISel] Unmerge into scalars from a vector should use FPR bank.
This currently shows up as a selection fallback since the dest regs were given
GPR banks but the source was a vector FPR reg.

Differential Revision: https://reviews.llvm.org/D57408

llvm-svn: 352545
2019-01-29 21:19:33 +00:00
Martin Storsjo f5884d255e [COFF, ARM64] Don't put jump table into a separate COFF section for EK_LabelDifference32
Windows ARM64 has PIC relocation model and uses jump table kind
EK_LabelDifference32. This produces jump table entry as
".word LBB123 - LJTI1_2" which represents the distance between the block
and jump table.

A new relocation type (IMAGE_REL_ARM64_REL32) is needed to do the fixup
correctly if they are in different COFF section.

This change saves the jump table to the same COFF section as the
associated code. An ideal fix could be utilizing IMAGE_REL_ARM64_REL32
relocation type.

Patch by Tom Tan!

Differential Revision: https://reviews.llvm.org/D57277

llvm-svn: 352465
2019-01-29 09:36:48 +00:00
Reid Kleckner 96c581d7d0 [AArch64] Include AArch64GenCallingConv.inc once
Summary:
Avoids duplicating generated static helpers for calling convention
analysis.

This also means you can modify AArch64CallingConv.td without recompiling
the AArch64ISelLowering.cpp monolith, so it provides faster incremental
rebuilds.

Saves 12K in llc.exe, but adds a new object file, which is large.

Reviewers: efriedma, t.p.northover

Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56948

llvm-svn: 352430
2019-01-28 21:28:40 +00:00
Jessica Paquette 2d73ecd0a3 [GlobalISel][AArch64] Add legalization for G_FLOG
This adds support for legalizing G_FLOG into a RTLib call.

It adds a legalizer test, and updates the existing floating point tests.

https://reviews.llvm.org/D57347

llvm-svn: 352429
2019-01-28 21:27:23 +00:00
Jessica Paquette c49428a97d [GlobalISel][AArch64] Add instruction selection support for @llvm.log10
This adds instruction selection support for @llvm.log10 in AArch64. It teaches
GISel to lower it to a library call, updates the relevant tests, and adds a
legalizer test for log10.

https://reviews.llvm.org/D57341

llvm-svn: 352418
2019-01-28 19:53:14 +00:00
Francis Visoiu Mistrih 556ea7d2e0 [AArch64] Add 'apple-latest' CPU alias
The 'apple-latest' alias is supposed to provide a CPU that contains the
latest Apple processor model supported by LLVM.

This is supposed to be used by tools like lldb to provide a target that
supports most of the CPU features.

For now, this is mapped to Cyclone.

Differential Revision: https://reviews.llvm.org/D56384

llvm-svn: 352412
2019-01-28 19:27:33 +00:00
Jessica Paquette 7db82d7257 [GlobalISel][AArch64] Add instruction selection support for G_FCOS and G_FSIN
This contains all of the legalizer changes from D57197 necessary to select
G_FCOS and G_FSIN. It also updates several existing IR tests in
test/CodeGen/AArch64 that verify that we correctly lower the G_FCOS and G_FSIN
instructions.

https://reviews.llvm.org/D57197
3/3

llvm-svn: 352402
2019-01-28 18:34:18 +00:00
Amara Emerson fd31bf95c1 [AArch64][GlobalISel] Teach RBS about G_FNEG default mapping.
llvm-svn: 352340
2019-01-28 03:21:14 +00:00
Amara Emerson 0bfa2faccc [AArch64][GlobalISel] Add some missing vector support for FP arithmetic ops.
Moved the fneg lowering legalization test from AArch64 to X86, as we want to
specify that it's already legal.

llvm-svn: 352338
2019-01-28 02:28:22 +00:00
Amara Emerson 92ffb305cc [AArch64][GlobalISel] Add some vector support for fp <-> int conversions.
Some unrelated, but benign, test changes as well due to the test update script.

llvm-svn: 352337
2019-01-28 02:27:59 +00:00
Jessica Paquette 1f9bc2854f [GlobalISel][AArch64][NFC] Fix incorrect comment in selectUnmergeValues
s/scalar/vector/

llvm-svn: 352243
2019-01-25 21:28:27 +00:00
Simon Pilgrim dea6174b0b Fix gcc -Wparentheses warning. NFCI.
llvm-svn: 352193
2019-01-25 11:38:40 +00:00
Matt Arsenault 990f507704 GlobalISel: Add convenience mutatations to scalarize
llvm-svn: 352143
2019-01-25 00:51:00 +00:00
Benjamin Kramer 653020d3cc [GlobalISel][AArch64] Avoid unused variable warning for variable only used in assert
llvm-svn: 352133
2019-01-24 23:45:07 +00:00
Benjamin Kramer 1411ecf08b [GlobalISel][AArch64] Avoid unused function warnings in Release builds
llvm-svn: 352129
2019-01-24 23:39:47 +00:00
Jessica Paquette 76c40f827d Suppress unused capture warning in CheckCopy
Werror bots didn't like the lambda + assert thing in my previous commit.

Capture everything to suppress the error.

Example failure here:
http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/29393

llvm-svn: 352124
2019-01-24 22:51:31 +00:00
Jessica Paquette 245047dfe8 [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceil
This patch adds support for vector @llvm.ceil intrinsics when full 16 bit
floating point support isn't available.

To do this, this patch...

- Implements basic isel for G_UNMERGE_VALUES
- Teaches the legalizer about 16 bit floats
- Teaches AArch64RegisterBankInfo to respect floating point registers on
  G_BUILD_VECTOR and G_UNMERGE_VALUES
- Teaches selectCopy about 16-bit floating point vectors

It also adds

- A legalizer test for the 16-bit vector ceil which verifies that we create a
  G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported
- An instruction selection test which makes sure we lower to G_FCEIL when
  full fp16 is supported
- A test for selecting G_UNMERGE_VALUES

And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types
work as expected.

https://reviews.llvm.org/D56682

llvm-svn: 352113
2019-01-24 22:00:41 +00:00
Benjamin Kramer 4ebed81fc4 [AArch64] Fix out of bounds strlen
CFIInst is not zero-terminated. This is one of more annoying functional
differences between StringRef and ArrayRef.

Found by asan.

llvm-svn: 351955
2019-01-23 14:51:21 +00:00
Kristof Beyls 3ff5dfd735 [SLH] AArch64: correctly pick temporary register to mask SP
As part of speculation hardening, the stack pointer gets masked with the
taint register (X16) before a function call or before a function return.
Since there are no instructions that can directly mask writing to the
stack pointer, the stack pointer must first be transferred to another
register, where it can be masked, before that value is transferred back
to the stack pointer.
Before, that temporary register was always picked to be x17, since the
ABI allows clobbering x17 on any function call, resulting in the
following instruction pattern being inserted before function calls and
returns/tail calls:

mov x17, sp
and x17, x17, x16
mov sp, x17
However, x17 can be live in those locations, for example when the call
is an indirect call, using x17 as the target address (blr x17).

To fix this, this patch looks for an available register just before the
call or terminator instruction and uses that.

In the rare case when no register turns out to be available (this
situation is only encountered twice across the whole test-suite), just
insert a full speculation barrier at the start of the basic block where
this occurs.

Differential Revision: https://reviews.llvm.org/D56717

llvm-svn: 351930
2019-01-23 08:18:39 +00:00
Peter Collingbourne 73078ecd38 hwasan: Move memory access checks into small outlined functions on aarch64.
Each hwasan check requires emitting a small piece of code like this:
https://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html#memory-accesses

The problem with this is that these code blocks typically bloat code
size significantly.

An obvious solution is to outline these blocks of code. In fact, this
has already been implemented under the -hwasan-instrument-with-calls
flag. However, as currently implemented this has a number of problems:
- The functions use the same calling convention as regular C functions.
  This means that the backend must spill all temporary registers as
  required by the platform's C calling convention, even though the
  check only needs two registers on the hot path.
- The functions take the address to be checked in a fixed register,
  which increases register pressure.
Both of these factors can diminish the code size effect and increase
the performance hit of -hwasan-instrument-with-calls.

The solution that this patch implements is to involve the aarch64
backend in outlining the checks. An intrinsic and pseudo-instruction
are created to represent a hwasan check. The pseudo-instruction
is register allocated like any other instruction, and we allow the
register allocator to select almost any register for the address to
check. A particular combination of (register selection, type of check)
triggers the creation in the backend of a function to handle the check
for specifically that pair. The resulting functions are deduplicated by
the linker. The pseudo-instruction (really the function) is specified
to preserve all registers except for the registers that the AAPCS
specifies may be clobbered by a call.

To measure the code size and performance effect of this change, I
took a number of measurements using Chromium for Android on aarch64,
comparing a browser with inlined checks (the baseline) against a
browser with outlined checks.

Code size: Size of .text decreases from 243897420 to 171619972 bytes,
or a 30% decrease.

Performance: Using Chromium's blink_perf.layout microbenchmarks I
measured a median performance regression of 6.24%.

The fact that a perf/size tradeoff is evident here suggests that
we might want to make the new behaviour conditional on -Os/-Oz.
But for now I've enabled it unconditionally, my reasoning being that
hwasan users typically expect a relatively large perf hit, and ~6%
isn't really adding much. We may want to revisit this decision in
the future, though.

I also tried experimenting with varying the number of registers
selectable by the hwasan check pseudo-instruction (which would result
in fewer variants being created), on the hypothesis that creating
fewer variants of the function would expose another perf/size tradeoff
by reducing icache pressure from the check functions at the cost of
register pressure. Although I did observe a code size increase with
fewer registers, I did not observe a strong correlation between the
number of registers and the performance of the resulting browser on the
microbenchmarks, so I conclude that we might as well use ~all registers
to get the maximum code size improvement. My results are below:

Regs | .text size | Perf hit
-----+------------+---------
~all | 171619972  | 6.24%
  16 | 171765192  | 7.03%
   8 | 172917788  | 5.82%
   4 | 177054016  | 6.89%

Differential Revision: https://reviews.llvm.org/D56954

llvm-svn: 351920
2019-01-23 02:20:10 +00:00
Matt Arsenault 30989e492b GlobalISel: Allow shift amount to be a different type
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.

X86 uses i8, but seemed to be hacking around this before.

llvm-svn: 351882
2019-01-22 21:42:11 +00:00
Matt Arsenault 39508331ef Reapply "IR: Add fp operations to atomicrmw"
This reapplies commits r351778 and r351782 with
RISCV test fixes.

llvm-svn: 351850
2019-01-22 18:18:02 +00:00
Chandler Carruth 285fe716c5 Revert r351778: IR: Add fp operations to atomicrmw
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.

This also required reverting the RISCV build fix in r351782.

llvm-svn: 351796
2019-01-22 10:29:58 +00:00
Matt Arsenault bfdba5e4fc IR: Add fp operations to atomicrmw
Add just fadd/fsub for now.

llvm-svn: 351778
2019-01-22 03:32:36 +00:00
Eli Friedman 23e60c7893 [AArch64] Add patterns for zext/sext of shift amount.
Not sure this is the best fix, but it saves an instruction for certain
constructs involving variable shifts.

Differential Revision: https://reviews.llvm.org/D55572

llvm-svn: 351768
2019-01-22 00:21:35 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Sanjin Sijaric 4d1450298c Fix the buildbot failure introduced by r351404
EXPENSIVE_CHECKS buildbots are failing due to r351404.

Add x1 as live in to the funclet basic block for SEH funclets, as well as
-verify-machineinstrs to the test case that triggered the failure.

llvm-svn: 351472
2019-01-17 20:24:14 +00:00
Matt Arsenault 0cb08e448a Allow FP types for atomicrmw xchg
llvm-svn: 351427
2019-01-17 10:49:01 +00:00
Sanjin Sijaric 685565ae9a [SEH] [ARM64] Retrieve the frame pointer from SEH funclets
The Windows ARM64 runtime passes the establisher frame to funclets as the first
argument.

llvm-svn: 351404
2019-01-17 00:24:38 +00:00
Mandeep Singh Grang 33c49c0c82 [COFF, ARM64] Implement support for SEH extensions __try/__except/__finally
Summary:
This patch supports MS SEH extensions __try/__except/__finally. The intrinsics localescape and localrecover are responsible for communicating escaped static allocas from the try block to the handler.

We need to preserve frame pointers for SEH. So we create a new function/property HasLocalEscape.

Reviewers: rnk, compnerd, mstorsjo, TomTan, efriedma, ssijaric

Reviewed By: rnk, efriedma

Subscribers: smeenai, jrmuizel, alex, majnemer, ssijaric, ehsan, dmajor, kristina, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53540

llvm-svn: 351370
2019-01-16 19:52:59 +00:00
Aditya Nandakumar 500e3ead9f [GISel]: Add support for CSEing continuously during GISel passes.
https://reviews.llvm.org/D52803

This patch adds support to continuously CSE instructions during
each of the GISel passes. It consists of a GISelCSEInfo analysis pass
that can be used by the CSEMIRBuilder.

llvm-svn: 351283
2019-01-16 00:40:37 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Evandro Menezes 7d7e3256cd [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Evandro Menezes 0c14c87d00 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Bryan Chan 7ce5775e62 [AArch64] Fix operation actions for FP16 vector intrinsics
Summary:
This patch changes the legalization action for some half-precision floating-
point vector intrinsics (FSIN, FLOG, etc.) from Promote to Expand. These ops
are not supported in hardware for half-precision vectors, but promotion is
not always possible (for v8f16 operands). Changing the action to Expand fixes
an assertion failure in the legalizer when the frontend produces such ops.
In addition, a quick microbenchmark shows that, in the v4f16 case,
expanding introduces fewer spills and is therefore slightly faster than
promoting.

Reviewers: t.p.northover, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56296

llvm-svn: 350825
2019-01-10 15:02:37 +00:00
Mandeep Singh Grang 859cb2e35d [AArch64] Emit the correct MCExpr relocations specifiers like VK_ABS_G0, etc
Summary:
D55896 and D56029 add support to emit fixups for :abs_g0: , :abs_g1_s: , etc.
This patch adds the necessary enums and MCExpr needed for lowering these.

Reviewers: rnk, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56037

llvm-svn: 350798
2019-01-10 04:59:44 +00:00
Kristof Beyls c650ff77eb Initial AArch64 SLH implementation.
This is an initial implementation for Speculative Load Hardening for
AArch64. It builds on top of the recently introduced
AArch64SpeculationHardening pass.
This doesn't implement (yet) some of the optimizations implemented for
the X86SpeculativeLoadHardening pass. I thought introducing the
optimizations incrementally in follow-up patches should make this easier
to review.

Differential Revision: https://reviews.llvm.org/D55929

llvm-svn: 350729
2019-01-09 15:13:34 +00:00
Diogo N. Sampaio 1eb31c8e94 [AArch64] Move feature predctrl to predres
Follow up patch of rL350385, for adding predres
command line option. This patch renames the
feature as to keep it aligned with the option
passed by/to clang

Differential Revision: https://reviews.llvm.org/D56484

llvm-svn: 350702
2019-01-09 11:24:15 +00:00
Matt Arsenault 3dddb163dd GlobalISel: Implement fewerElements for implicit_def
llvm-svn: 350697
2019-01-09 07:51:52 +00:00
Evandro Menezes 39c97bf6cd [AArch64] Adjust the cost model for Exynos
Improve the modeling of ALU instructions.

llvm-svn: 350663
2019-01-08 22:29:58 +00:00
Tim Northover 964eea7ad2 AArch64: avoid splitting vector truncating stores.
We have code to split vector splats (of zero and non-zero) for performance
reasons, but it ignores the fact that a store might be truncating.

Actually, truncating stores are formed for vNi8 and vNi16 types. Since the
truncation is from a legal type, the size of the store is always <= 64-bits and
so they don't actually benefit from being split up anyway, so this patch just
disables that transformation.

llvm-svn: 350620
2019-01-08 13:30:27 +00:00
Mandeep Singh Grang f286bee9fe [MC] [AArch64] Support resolving signed fixups for :abs_g0_s: etc.
Summary: This patch is a follow-up to D55896.

Reviewers: efriedma, mstorsjo

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D56029

llvm-svn: 350606
2019-01-08 04:48:00 +00:00
Evandro Menezes 9f53bea536 [AArch64] Adjust the cost model for Exynos M3
Improve the modeling of ASIMD loads and stores.

llvm-svn: 350434
2019-01-04 21:02:25 +00:00
Evandro Menezes 0f67746c92 [AArch64] Add new scheduling predicates
Add new scheduling predicates to identify the ASIMD loads and stores using the post indexed addressing mode.

llvm-svn: 350332
2019-01-03 17:28:09 +00:00
Martin Storsjo 74d93f9b24 [AArch64] Accept "sve" as arch feature in assembler
Differential Revision: https://reviews.llvm.org/D56128

llvm-svn: 350174
2018-12-31 10:22:04 +00:00
Martin Storsjo 2018777836 [AArch64] Implement the .arch_extension directive
Differential Revision: https://reviews.llvm.org/D56131

llvm-svn: 350169
2018-12-30 21:06:32 +00:00
Diogo N. Sampaio 9123f82cc4 [AArch64] Add command-line option for SB
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.

This patch also moves to FeatureSB the old FeatureSpecRestrict.

Reviewers: pbarrio, olista01, t.p.northover, LukeCheeseman	

Differential Revision: https://reviews.llvm.org/D55921

llvm-svn: 350126
2018-12-28 17:14:58 +00:00
Jessica Paquette 453ab1db5b [GlobalISel][AArch64] Add support for widening G_FCEIL
This adds support for widening G_FCEIL in LegalizerHelper and
AArch64LegalizerInfo. More specifically, it teaches the AArch64 legalizer to
widen G_FCEIL from a 16-bit float to a 32-bit float when the subtarget doesn't
support full FP 16.

This also updates AArch64/f16-instructions.ll to show that we perform the
correct transformation.

llvm-svn: 349927
2018-12-21 17:05:26 +00:00
Evandro Menezes 96c11eceb2 [AArch64] Refactor Exynos predicate (NFC)
Change order of conditions in predicate.

llvm-svn: 349918
2018-12-21 15:51:34 +00:00
Simon Pilgrim 148957f336 [AArch64] Always use the version of computeKnownBits that returns a value. NFCI.
Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version.

llvm-svn: 349908
2018-12-21 15:05:10 +00:00
Luke Cheeseman 41a9e53500 [Dwarf/AArch64] Return address signing B key dwarf support
- When signing return addresses with -msign-return-address=<scope>{+<key>},
  either the A key instructions or the B key instructions can be used. To
  correctly authenticate the return address, the unwinder/debugger must know
  which key was used to sign the return address.
- When and exception is thrown or a break point reached, it may be necessary to
  unwind the stack. To accomplish this, the unwinder/debugger must be able to
  first authenticate an the return address if it has been signed.
- To enable this, the augmentation string of CIEs has been extended to allow
  inclusion of a 'B' character. Functions that are signed using the B key
  variant of the instructions should have and FDE whose associated CIE has a 'B'
  in the augmentation string.
- One must also be able to preserve these semantics when first stepping from a
  high level language into assembly and then, as a second step, into an object
  file. To achieve this, I have introduced a new assembly directive
  '.cfi_b_key_frame ', that tells the assembler the current frame uses return
  address signing with the B key.
- This ensures that the FDE is associated with a CIE that has 'B' in the
  augmentation string.

Differential Revision: https://reviews.llvm.org/D51798

llvm-svn: 349895
2018-12-21 10:45:08 +00:00
Jessica Paquette a6b9c68a85 [GlobalISel][AArch64] Add G_FCEIL to isPreISelGenericFloatingPointOpcode
If you don't do this, then if you hit a G_LOAD in getInstrMapping, you'll end
up with GPRs on the G_FCEIL instead of FPRs. This causes a fallback.

Add it to the switch, and add a test verifying that this happens.

llvm-svn: 349822
2018-12-20 21:14:15 +00:00
Eli Friedman 48397102d0 [MC] [AArch64] Correctly resolve ":abs_g1:3" etc.
We have to treat constructs like this as if they were "symbolic", to use
the correct codepath to resolve them.  This mostly only affects movz
etc. because the other uses of classifySymbolRef conservatively treat
everything that isn't a constant as if it were a symbol.

Differential Revision: https://reviews.llvm.org/D55906

llvm-svn: 349800
2018-12-20 19:46:14 +00:00
Eli Friedman 4648209e16 [MC] [AArch64] Support resolving fixups for abs_g0 etc.
This requires a bit more code than other fixups, to distingush between
abs_g0/abs_g1/etc.  Actually, I think some of the other fixups are
missing some checks, but I won't try to address that here.

I haven't seen any real-world code that uses a construct like this, but
it clearly should work, and we're considering using it in the
implementation of localescape/localrecover on Windows (see
https://reviews.llvm.org/D53540). I've verified that binutils produces
the same code as llvm-mc for the testcase.

This currently doesn't include support for the *_s variants (that
requires a bit more work to set the opcode).

Differential Revision: https://reviews.llvm.org/D55896

llvm-svn: 349799
2018-12-20 19:38:07 +00:00
Amara Emerson 321bfb210a Fix build errors introduced by r349712 on aarch64 bots.
llvm-svn: 349723
2018-12-20 03:27:42 +00:00
Amara Emerson 8cb186ce17 [AArch64][GlobalISel] Implement selection og G_MERGE of two s32s into s64.
This code pattern is an unfortunate side effect of the way some types get split
at call lowering. Ideally we'd either not generate it at all or combine it away
in the legalizer artifact combiner.

Until then, add selection support anyway which is a significant proportion of
our current fallbacks on CTMark.

rdar://46491420

llvm-svn: 349712
2018-12-20 01:11:04 +00:00
Evandro Menezes 374ccf6768 [AArch64] Improve Exynos predicates
Expand the predicate `ExynosResetPred` to include all forms of immediate
moves.

llvm-svn: 349686
2018-12-19 22:24:36 +00:00
Evandro Menezes ff827d737a [AArch64] Use canonical copy idiom
Use only the canonical form of the alias for register transfers in the
`IsCopyIdiomPred` predicate.

llvm-svn: 349685
2018-12-19 22:24:31 +00:00
Jessica Paquette 3560e93dc1 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Evandro Menezes 5d409b2278 [AArch64] Improve the Exynos M3 pipeline model
llvm-svn: 349652
2018-12-19 17:37:51 +00:00
Evandro Menezes f03c45d582 [AArch64] Simplify the Exynos M3 pipeline model
llvm-svn: 349569
2018-12-18 23:19:57 +00:00
Evandro Menezes 4e39fa4474 [AArch64] Fix instructions order (NFC)
llvm-svn: 349568
2018-12-18 23:19:55 +00:00
Luke Cheeseman f57d7d8237 [AArch64] - Return address signing dwarf support
- Reapply changes intially introduced in r343089
- The archtecture info is no longer loaded whenever a DWARFContext is created
- The runtimes libraries (santiziers) make use of the dwarf context classes but
  do not intialise the target info
- The architecture of the object can be obtained without loading the target info
- Adding a method to the dwarf context to get this information and multiplex the
  string printing later on

Differential Revision: https://reviews.llvm.org/D55774

llvm-svn: 349472
2018-12-18 10:37:42 +00:00
Kristof Beyls e66bc1f756 Introduce control flow speculation tracking pass for AArch64
The pass implements tracking of control flow miss-speculation into a "taint"
register. That taint register can then be used to mask off registers with
sensitive data when executing under miss-speculation, a.k.a. "transient
execution".
This pass is aimed at mitigating against SpectreV1-style vulnarabilities.

At the moment, it implements the tracking of miss-speculation of control
flow into a taint register, but doesn't implement a mechanism yet to then
use that taint register to mask off vulnerable data in registers (something
for a follow-on improvement). Possible strategies to mask out vulnerable
data that can be implemented on top of this are:
- speculative load hardening to automatically mask of data loaded
  in registers.
- using intrinsics to mask of data in registers as indicated by the
  programmer (see https://lwn.net/Articles/759423/).

For AArch64, the following implementation choices are made.
Some of these are different than the implementation choices made in
the similar pass implemented in X86SpeculativeLoadHardening.cpp, as
the instruction set characteristics result in different trade-offs.
- The speculation hardening is done after register allocation. With a
  relative abundance of registers, one register is reserved (X16) to be
  the taint register. X16 is expected to not clash with other register
  reservation mechanisms with very high probability because:
  . The AArch64 ABI doesn't guarantee X16 to be retained across any call.
  . The only way to request X16 to be used as a programmer is through
    inline assembly. In the rare case a function explicitly demands to
    use X16/W16, this pass falls back to hardening against speculation
    by inserting a DSB SYS/ISB barrier pair which will prevent control
    flow speculation.
- It is easy to insert mask operations at this late stage as we have
  mask operations available that don't set flags.
- The taint variable contains all-ones when no miss-speculation is detected,
  and contains all-zeros when miss-speculation is detected. Therefore, when
  masking, an AND instruction (which only changes the register to be masked,
  no other side effects) can easily be inserted anywhere that's needed.
- The tracking of miss-speculation is done by using a data-flow conditional
  select instruction (CSEL) to evaluate the flags that were also used to
  make conditional branch direction decisions. Speculation of the CSEL
  instruction can be limited with a CSDB instruction - so the combination of
  CSEL + a later CSDB gives the guarantee that the flags as used in the CSEL
  aren't speculated. When conditional branch direction gets miss-speculated,
  the semantics of the inserted CSEL instruction is such that the taint
  register will contain all zero bits.
  One key requirement for this to work is that the conditional branch is
  followed by an execution of the CSEL instruction, where the CSEL
  instruction needs to use the same flags status as the conditional branch.
  This means that the conditional branches must not be implemented as one
  of the AArch64 conditional branches that do not use the flags as input
  (CB(N)Z and TB(N)Z). This is implemented by ensuring in the instruction
  selectors to not produce these instructions when speculation hardening
  is enabled. This pass will assert if it does encounter such an instruction.
- On function call boundaries, the miss-speculation state is transferred from
  the taint register X16 to be encoded in the SP register as value 0.

Future extensions/improvements could be:
- Implement this functionality using full speculation barriers, akin to the
  x86-slh-lfence option. This may be more useful for the intrinsics-based
  approach than for the SLH approach to masking.
  Note that this pass already inserts the full speculation barriers if the
  function for some niche reason makes use of X16/W16.
- no indirect branch misprediction gets protected/instrumented; but this
  could be done for some indirect branches, such as switch jump tables.

Differential Revision: https://reviews.llvm.org/D54896

llvm-svn: 349456
2018-12-18 08:50:02 +00:00
Martin Storsjo 8f0cb9c3a8 [AArch64] [MinGW] Allow enabling SEH exceptions
The default still is dwarf, but SEH exceptions can now be enabled
optionally for the MinGW target.

Differential Revision: https://reviews.llvm.org/D55748

llvm-svn: 349451
2018-12-18 08:32:37 +00:00
Tim Northover 256a16d031 FastIsel: take care to update iterators when removing instructions.
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.

llvm-svn: 349365
2018-12-17 17:25:53 +00:00
Alexandros Lamprineas 490ae11717 [AArch64] Re-run load/store optimizer after aggressive tail duplication
The Load/Store Optimizer runs before Machine Block Placement. At O3 the
Tail Duplication Threshold is set to 4 instructions and this can create
new opportunities for the Load/Store Optimizer. It seems worthwhile to
run it once again.

llvm-svn: 349338
2018-12-17 10:45:43 +00:00
Evandro Menezes ea9d90083f [AArch64] Simplify the scheduling predicates (NFC)
The instruction encodings make it unnecessary to distinguish extended W-form
from X-form instructions.

llvm-svn: 349185
2018-12-14 20:04:58 +00:00
Evandro Menezes 6fe51ac973 [AArch64] Fix Exynos predicates (NFC)
Fix the logic in the definition of the `ExynosShiftExPred` as a more
specific version of `ExynosShiftPred`.  But, since `ExynosShiftExPred` is
not used yet, this change has NFC.

llvm-svn: 349091
2018-12-13 23:19:46 +00:00
Arnaud A. de Grandmaison dfe861087d [AArch64] Catch some more CMN opportunities.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33486

llvm-svn: 349022
2018-12-13 10:31:32 +00:00
Mandeep Singh Grang 802dc40f41 [COFF, ARM64] Emit COFF function header
Summary:
Emit COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.

This patch mimics X86 and ARM COFF behavior for function header emission.

Reviewers: rnk, mstorsjo, compnerd, TomTan, ssijaric

Reviewed By: mstorsjo

Subscribers: dmajor, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D55535

llvm-svn: 348875
2018-12-11 18:36:14 +00:00
Aditya Nandakumar cef44a2342 [GISel]: Refactor MachineIRBuilder to allow passing additional parameters to build Instrs
https://reviews.llvm.org/D55294

Previously MachineIRBuilder::buildInstr used to accept variadic
arguments for sources (which were either unsigned or
MachineInstrBuilder). While this worked well in common cases, it doesn't
allow us to build instructions that have multiple destinations.
Additionally passing in other optional parameters in the end (such as
flags) is not possible trivially. Also a trivial call such as

B.buildInstr(Opc, Reg1, Reg2, Reg3)
can be interpreted differently based on the opcode (2defs + 1 src for
unmerge vs 1 def + 2srcs).
This patch refactors the buildInstr to

buildInstr(Opc, ArrayRef<DstOps>, ArrayRef<SrcOps>)
where DstOps and SrcOps are typed unions that know how to add itself to
MachineInstrBuilder.
After this patch, most invocations would look like

B.buildInstr(Opc, {s32, DstReg}, {SrcRegs..., SrcMIBs..});
Now all the other calls (such as buildAdd, buildSub etc) forward to
buildInstr. It also makes it possible to build instructions with
multiple defs.
Additionally in a subsequent patch, we should make it possible to add
flags directly while building instructions.
Additionally, the main buildInstr method is now virtual and other
builders now only have to override buildInstr (for say constant
folding/cseing) is straightforward.

Also attached here (https://reviews.llvm.org/F7675680) is a clang-tidy
patch that should upgrade the API calls if necessary.

llvm-svn: 348815
2018-12-11 00:48:50 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Evandro Menezes 53f0d41dc4 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Evandro Menezes 1ec1a0d342 [AArch64] Refactor the scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  Augment the
number of helper predicates used by processor specific predicates.

Differential revision: https://reviews.llvm.org/D55375

llvm-svn: 348768
2018-12-10 16:24:30 +00:00
David Green ca29c271d2 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Evandro Menezes 799b76eae2 [AArch64] Fix Exynos predicate
Fix predicate for arithmetic instructions with shift and/or extend.

llvm-svn: 348510
2018-12-06 18:25:37 +00:00
Diogo N. Sampaio 9c9067316b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html


Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

--

It was reverted in rL348249 due a	build bot failure in one of the
regression tests:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

The problem seems to be that FileCheck behaves
different in windows and linux. This new patch
splits the test file in multiple,
and does more exact pattern matching attempting
to circumvent the issue.

llvm-svn: 348493
2018-12-06 15:39:17 +00:00
Matthias Braun d041212c07 AArch64: Fix invalid CCMP emission
The code emitting AND-subtrees used to check whether any of the operands
was an OR in order to figure out if the result needs to be negated.
However the OR could be hidden in further subtrees and not immediately
visible.

Change the code so that canEmitConjunction() determines whether the
result of the generated subtree needs to be negated. Cleanup emission
logic to use this. I also changed the code a bit to make all negation
decisions early before we actually emit the subtrees.

This fixes http://llvm.org/PR39550

Differential Revision: https://reviews.llvm.org/D54137

llvm-svn: 348444
2018-12-06 01:40:23 +00:00
Aditya Nandakumar f75d4f329c [GISel]: Provide standard interface to observe changes in GISel passes
https://reviews.llvm.org/D54980

This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.

Reviewed by: vkeles.

llvm-svn: 348406
2018-12-05 20:14:52 +00:00
Evandro Menezes 4b5707550c [AArch64] Reword description of feature (NFC)
Reword the description of the feature that enables custom handling of cheap
instructions.

llvm-svn: 348398
2018-12-05 18:42:57 +00:00
Saleem Abdulrasool efd2cb8a0d AArch64: support funclets in fastcall and swift_call
Functions annotated with `__fastcall` or `__attribute__((__fastcall__))`
or `__attribute__((__swiftcall__))` may contain SEH handlers even on
Win64.  This matches the behaviour of cl which allows for
`__try`/`__except` inside a `__fastcall` function.  This was detected
while trying to self-host clang on Windows ARM64.

llvm-svn: 348337
2018-12-05 07:09:20 +00:00
Amara Emerson 8547f4fb7f [AArch64][GlobalISel] Re-enable selection of volatile loads.
We previously disabled this in r323371 because of a bug where we selected an
extending load, but didn't delete the old G_LOAD, resulting in two loads being
generated for volatile loads.

Since we now have dedicated G_SEXTLOAD/G_ZEXTLOAD operations, and that the
tablegen patterns should no longer be able to select (ext(load x)) patterns, it
should be safe to re-enable it.

The old test case should still work as expected.

llvm-svn: 348320
2018-12-05 00:03:09 +00:00
Saleem Abdulrasool a9248fdfab AArch64: clean up some whitespace in Windows CC (NFC)
Drive by clean up for Windows ARM64 variadic CC (NFC).

llvm-svn: 348310
2018-12-04 22:19:29 +00:00
Simon Pilgrim 58d44235e5 Fix -Wparentheses warning. NFCI.
llvm-svn: 348254
2018-12-04 12:24:10 +00:00
Simon Pilgrim 1a2e0200ac Revert rL348121 from llvm/trunk: [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

........

This has been causing buildbots failures for the past 24 hours: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/14386

llvm-svn: 348249
2018-12-04 10:55:48 +00:00
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette 2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Pablo Barrio a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Diogo N. Sampaio 3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Francis Visoiu Mistrih 0b8dd4488e [MachineScheduler] Order FI-based memops based on stack direction
It makes more sense to order FI-based memops in descending order when
the stack goes down. This allows offsets to stay "consecutive" and allow
easier pattern matching.

llvm-svn: 347906
2018-11-29 20:03:19 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Petr Pavlu e6406d568c [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Francis Visoiu Mistrih 879087ce5b [MachineScheduler] Add support for clustering mem ops with FI base operands
Before this patch, the following stores in `merge_fail` would fail to be
merged, while they would get merged in `merge_ok`:

```
void use(unsigned long long *);
void merge_fail(unsigned key, unsigned index)
{
  unsigned long long args[8];
  args[0] = key;
  args[1] = index;
  use(args);
}
void merge_ok(unsigned long long *dst, unsigned a, unsigned b)
{
  dst[0] = a;
  dst[1] = b;
}
```

The reason is that `getMemOpBaseImmOfs` would return false for FI base
operands.

This adds support for this.

Differential Revision: https://reviews.llvm.org/D54847

llvm-svn: 347747
2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih d7eebd6d83 [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

llvm-svn: 347746
2018-11-28 12:00:20 +00:00
Evandro Menezes 9ef79c884a [TableGen] Refactor macro names (NFC)
Make the names for the macros for `TargetInstrInfo` uniform.

llvm-svn: 347706
2018-11-27 20:58:27 +00:00
David Blaikie 1fecbec5fa AArch64ISelLowering: Remove a return-of-assignment to allow NRVO
Patch by Arthur O'Dwyer!

llvm-svn: 347609
2018-11-26 22:57:18 +00:00
Evandro Menezes 6a38a5effe [AArch64] Refactor the scheduling predicates (3/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasExtendedReg()`.

Differential revision: https://reviews.llvm.org/D54822

llvm-svn: 347599
2018-11-26 21:47:46 +00:00
Evandro Menezes 56368c6fa5 [AArch64] Refactor the scheduling predicates (2/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::hasShiftedReg()`.

Differential revision: https://reviews.llvm.org/D54820

llvm-svn: 347598
2018-11-26 21:47:41 +00:00
Evandro Menezes b02ac8bd21 [AArch64] Refactor the scheduling predicates (1/3) (NFC)
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, `AArch64InstrInfo::isScaledAddr()`

Differential revision: https://reviews.llvm.org/D54777

llvm-svn: 347597
2018-11-26 21:47:28 +00:00
Than McIntosh 30c804bbb1 [CodeGen] Support custom format of stack maps
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.

For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.

This patch authored by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, apilipenko, reames

Reviewed By: reames

Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D53892

llvm-svn: 347584
2018-11-26 18:43:48 +00:00
Luke Cheeseman 6db3a6a4a7 Revert r347490 as it breaks address sanitizer builds
llvm-svn: 347499
2018-11-23 17:13:06 +00:00
Luke Cheeseman d6dbd64104 Revert r343341
- Cannot reproduce the build failure locally and the build logs have
  been deleted.

llvm-svn: 347490
2018-11-23 11:01:47 +00:00
John Brawn d6e0ebea10 [AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR
A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into
BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn
BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop.

Fix this by making LowerBUILD_VECTOR not do this transformation for those
vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element
constant vector. Doing that means we get a DUP, which we then need to recognise
in ISel as a copy.

llvm-svn: 347456
2018-11-22 11:45:23 +00:00
Simon Pilgrim c9cc6cca42 Fix MSVC 'truncation of constant value' warning. NFCI.
llvm-svn: 347308
2018-11-20 14:29:40 +00:00
Martin Elshuber fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Peter Collingbourne 527024469a AArch64: Emit a call frame instruction for the shadow call stack register.
When unwinding past a function that uses shadow call stack, we must
subtract 8 from the value of the x18 register. This patch causes us
to emit a call frame instruction that causes that to happen.

Differential Revision: https://reviews.llvm.org/D54609

llvm-svn: 347089
2018-11-16 20:08:54 +00:00
Jessica Paquette 27e1754fc9 [MachineOutliner][NFC] Don't compute liveness if X16/X17/NZCV are unused
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.

If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.

llvm-svn: 346901
2018-11-14 22:23:38 +00:00
Jessica Paquette 4e97ec94d9 [MachineOutliner][NFC] Use flags set in all candidates to check for calls
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.

This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)

llvm-svn: 346816
2018-11-13 23:41:31 +00:00
Jessica Paquette cad864d49e [MachineOutliner][NFC] Use MBB flags to avoid call checks in getOutliningInfo
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.

The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.

llvm-svn: 346809
2018-11-13 23:01:34 +00:00
Jessica Paquette b2d53c5d7d [MachineOutliner][NFC] Exit getOutliningType if there are < 2 candidates
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.

llvm-svn: 346803
2018-11-13 22:16:27 +00:00
Jessica Paquette 106946329d [MachineOutliner][NFC] Simplify isMBBSafeToOutlineFrom check in AArch64 outliner
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.

llvm-svn: 346721
2018-11-13 00:32:09 +00:00
Jessica Paquette 82d9c0a3fa [MachineOutliner][NFC] Change getMachineOutlinerMBBFlags to isMBBSafeToOutlineFrom
Instead of returning Flags, return true if the MBB is safe to outline from.

This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.

llvm-svn: 346718
2018-11-12 23:51:32 +00:00
Sanjay Patel 0a515595a7 [x86] allow vector load narrowing with multi-use values
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.

I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.

For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.

Differential Revision: https://reviews.llvm.org/D54073

llvm-svn: 346595
2018-11-10 20:05:31 +00:00
Eli Friedman ad1151cf6a [ARM64] [Windows] Handle funclets
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.

Patch by Sanjin Sijaric, with some fixes by me.

Differential Revision: https://reviews.llvm.org/D51524

llvm-svn: 346568
2018-11-09 23:33:30 +00:00
Bryan Chan 123553921f [AArch64] Support HiSilicon's TSV110 processor
Reviewers: t.p.northover, SjoerdMeijer, kristof.beyls

Reviewed By: kristof.beyls

Subscribers: olista01, javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D53908

llvm-svn: 346546
2018-11-09 19:32:08 +00:00
Clement Courbet eee2e06e2a [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Mandeep Singh Grang 397765bc51 [COFF, ARM64] Add support for MSVC buffer security check
Reviewers: rnk, mstorsjo, compnerd, efriedma, TomTan

Reviewed By: rnk

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D54248

llvm-svn: 346469
2018-11-09 02:48:36 +00:00
Eli Friedman 0917d0c80c [AArch64] [Windows] Address post-commit review comment on r346358.
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.

llvm-svn: 346366
2018-11-07 22:30:56 +00:00
Eli Friedman d00fb2e0a8 [AArch64] [Windows] Trap after noreturn calls.
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.

Differential Revision: https://reviews.llvm.org/D54129

llvm-svn: 346358
2018-11-07 21:31:14 +00:00
Evandro Menezes f1a0d93b1d [PATCH] [AArch64] Refactor helper functions (NFC)
Refactor helper functions in AArch64InstrInfo to be static methods.

llvm-svn: 346273
2018-11-06 22:17:14 +00:00
Matthias Braun 96d12513a1 AArch64: Cleanup CCMP code; NFC
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
  (it won't accept arbitrary disjunctions and is really about whether we
   can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`

llvm-svn: 346203
2018-11-06 03:15:22 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Mandeep Singh Grang 547a0d765a [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Patch by: Yin Ma (yinma@codeaurora.org)

Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53996

llvm-svn: 345909
2018-11-01 23:22:25 +00:00
Mandeep Singh Grang df19e57a1c [COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic
Reviewers: rnk, mstorsjo, efriedma, TomTan

Reviewed By: efriedma

Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits

Differential Revision: https://reviews.llvm.org/D53962

llvm-svn: 345892
2018-11-01 21:23:47 +00:00
Volkan Keles 0a8dc9eb0f [GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.

Reviewers: dsanders, aemerson, aditya_nandakumar

Reviewed By: dsanders

Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53734

llvm-svn: 345875
2018-11-01 19:01:53 +00:00
Reid Kleckner eb56894a4b [AArch64] Fix unintended fallthrough and strengthen cast
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.

This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.

llvm-svn: 345864
2018-11-01 18:02:27 +00:00
Mandeep Singh Grang b0cdf56dd7 Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64"
This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2.

llvm-svn: 345863
2018-11-01 17:53:57 +00:00
Chad Rosier 1546efd4a7 [AArch64] Add support for ARMv8.4 in Saphira.
llvm-svn: 345827
2018-11-01 13:45:16 +00:00
Mandeep Singh Grang 88ad9ac720 [COFF, ARM64] Implement Intrinsic.sponentry for AArch64
Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform.

Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma

Reviewed By: efriedma

Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53673

llvm-svn: 345791
2018-10-31 23:16:20 +00:00
Evandro Menezes 3a06c46470 [AArch64] Sort switch cases (NFC)
llvm-svn: 345786
2018-10-31 21:56:49 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Sanjin Sijaric fadebc8aae [ARM64] [Windows] Exception handling support in frame lowering
Emit pseudo instructions indicating unwind codes corresponding to each
instruction inside the prologue/epilogue.  These are used by the MCLayer to
populate the .xdata section.

Differential Revision: https://reviews.llvm.org/D50288

llvm-svn: 345701
2018-10-31 09:27:01 +00:00
Martin Storsjo 315357faca [AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk
This is similar to SVN r311061 for ARM.

Differential Revision: https://reviews.llvm.org/D53878

llvm-svn: 345698
2018-10-31 08:14:09 +00:00
Mandeep Singh Grang 71e0cc2a0b [COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg
Summary:
    Thunk functions in Windows are varag functions that call a musttail function
    to pass the arguments after the fixup is done.  We need to make sure that we
    forward the arguments from the caller vararg to the callee vararg function.
    This is the same mechanism that is used for Windows on X86.

Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma

Reviewed By: efriedma

Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D53843

llvm-svn: 345641
2018-10-30 20:46:10 +00:00
Eli Friedman 93d0129b78 [AArch64] [Windows] SEH opcodes should be scheduling boundaries.
Prevents the post-RA scheduler from modifying the prologue sequences
emitting by frame lowering. This is roughly similar to what we do for
other targets: TargetInstrInfo::isSchedulingBoundary checks
isPosition(), which checks for CFI_INSTRUCTION.

isSEHInstruction is taken from D50288; it'll land with whatever patch
lands first.

Differential Revision: https://reviews.llvm.org/D53851

llvm-svn: 345634
2018-10-30 19:24:51 +00:00
David Greene 3e89fa8e08 [AArch64] Create proper memoperand for multi-vector stores
Re-apply r345315 with testcase fixes.

Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345631
2018-10-30 19:17:51 +00:00
Diogo N. Sampaio a3783b2ac2 [AArch64] Add support for UDF instruction
Summary: Add support for AArch64 UDF instruction.
UDF - Permanently Undefined generates an Undefined
Instruction exception (ESR_ELx.EC = 0b000000).

Reviewers: DavidSpickett, javed.absar, t.p.northover 

Reviewed By: javed.absar

Subscribers: nhaehnle, kristof.beyls

Differential Revision: https://reviews.llvm.org/D53319

llvm-svn: 345581
2018-10-30 11:06:50 +00:00
Reid Kleckner 23c9efc071 Remove unneeded friend declarations that clang-cl warns on
llvm-svn: 345549
2018-10-29 22:38:13 +00:00
Bryan Chan bfd32d4377 [AArch64] Rename FP16FML instruction format (NFC)
Rename SIMDThreeSameMult (etc.) to SIMDThreeSameVectorFML (etc.) to follow
usual naming convention, and add some comments in the .td files.

llvm-svn: 345515
2018-10-29 17:27:34 +00:00
Luke Cheeseman 71c989ae1f [AArch64] Return address signing B key support
- Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return
  address signing
- The key used to sign the function is controlled by the function attribute
  "sign-return-address-key"

Differential Revision: https://reviews.llvm.org/D51427

llvm-svn: 345511
2018-10-29 16:26:58 +00:00
Sanjin Sijaric 96f2ea3dd4 [ARM64][Windows] MCLayer support for exception handling
Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted
by the frame lowering patch to follow.  We only emit unwind codes into object
object files for now.

Differential Revision: https://reviews.llvm.org/D50166

llvm-svn: 345450
2018-10-27 06:13:06 +00:00
Vlad Tsyrklevich 21beeb29ea Revert "[AArch64] Create proper memoperand for multi-vector stores"
This reverts commit r345315, it was causing test failures on
sanitizer-x86_64-linux-fast.

llvm-svn: 345356
2018-10-26 02:00:14 +00:00
Bryan Chan f0923f16f8 [AArch64] Implement FP16FML intrinsics
Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a
DAG pattern to define the indexed-form intrinsics in terms of the vector-form
ones, similarly to how the Dot Product intrinsics were implemented.

Based on a patch by Gao Yiling.

Differential Revision: https://reviews.llvm.org/D53632

llvm-svn: 345337
2018-10-25 23:36:41 +00:00
David Greene 53e869da7d [AArch64] Create proper memoperand for multi-vector stores
Include all of the store's source vector operands when creating the
MachineMemOperand. Previously, we were missing the first operand,
making the store size seem smaller than it really is.

Differential Revision: https://reviews.llvm.org/D52816

llvm-svn: 345315
2018-10-25 21:10:39 +00:00
Volkan Keles f87473fe1c [GISel] LegalizerInfo: Rename MemDesc::Size to SizeInBits to make the value clearer
Requested in D53679.

llvm-svn: 345288
2018-10-25 17:37:07 +00:00
Volkan Keles 3a103b1d25 [AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.

This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.

Reviewers: dsanders, aemerson

Reviewed By: dsanders

Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53679

llvm-svn: 345282
2018-10-25 17:23:25 +00:00
Evandro Menezes b53cf99388 [AArch64] Refactor Exynos feature sets (NFC)
llvm-svn: 345279
2018-10-25 16:45:46 +00:00
John Brawn 958865202d [AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.

Differential Revision: https://reviews.llvm.org/D53582

llvm-svn: 345275
2018-10-25 15:31:51 +00:00
John Brawn b8e7887f33 [AArch64] Refactor definition of EXT patterns to use a multiclass
Using a multiclass reduces duplication, and makes it easier to add new patterns
later. This refactoring does add some new patterns, but as far as I can tell
there's no IR that will end up triggering them so this is effectively NFC.

Differential Revision: https://reviews.llvm.org/D53580

llvm-svn: 345271
2018-10-25 15:00:10 +00:00
John Brawn 49e61d90ca [AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.

Differential Revision: https://reviews.llvm.org/D53579

llvm-svn: 345270
2018-10-25 14:56:48 +00:00
Amara Emerson cbd86d8429 [GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index.
Allows for better imported pattern re-use.

llvm-svn: 345265
2018-10-25 14:04:54 +00:00
Simon Pilgrim 071e82218f [TTI] Add generic SK_Broadcast shuffle costs
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.

This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.

Differential Revision: https://reviews.llvm.org/D53570

llvm-svn: 345253
2018-10-25 10:52:36 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Evandro Menezes 096e2497b5 [AArch64] Refactor Exynos machine model
Effectively, NFC.

llvm-svn: 345201
2018-10-24 21:40:43 +00:00
Tim Northover 1c353419ab AArch64: add a pass to compress jump-table entries when possible.
llvm-svn: 345188
2018-10-24 20:19:09 +00:00
Evandro Menezes 769d4cebad [AArch64] Refactor Exynos machine model (NFC)
llvm-svn: 345187
2018-10-24 20:03:24 +00:00
Evandro Menezes 80bc136732 [AArch64] Fix overlapping instructions
Fix overlapping instruction descriptions in the machine model for Exynos M3.
Effectively, NFC.

llvm-svn: 345186
2018-10-24 20:03:20 +00:00
Fangrui Song 2e83b2e9ee Use llvm::{all,any,none}_of instead std::{all,any,none}_of. NFC
llvm-svn: 344774
2018-10-19 06:12:02 +00:00
Evandro Menezes c98decf864 [PATCH] [NFC][AArch64] Fix refactoring of macro fusion
Fix compiler error.

llvm-svn: 344632
2018-10-16 17:41:45 +00:00
Evandro Menezes de655c6d3a [NFC][AArch64] Refactor macro fusion
Simplify API of checking functions.

llvm-svn: 344624
2018-10-16 17:19:28 +00:00
Simon Pilgrim 095a7fe635 [AARCH64] Improve vector popcnt lowering with ADDLP
AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors.

This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258.

Differential Revision: https://reviews.llvm.org/D53259

llvm-svn: 344554
2018-10-15 21:15:58 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Arnaud A. de Grandmaison 162435e7b5 [AArch64] Swap comparison operands if that enables some folding.
Summary:
AArch64 can fold some shift+extend operations on the RHS operand of
comparisons, so swap the operands if that makes sense.

This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751

Reviewers: efriedma, t.p.northover, javed.absar

Subscribers: mcrosier, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D53067

llvm-svn: 344439
2018-10-13 07:43:56 +00:00
Diogo N. Sampaio 352a2fa1e7 [AARCH64][FIX] Emit data symbol for constant pool data
The ARM64 elf emitter would omit printing data
symbol for zero filled constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D53132

llvm-svn: 344248
2018-10-11 14:10:32 +00:00
Oliver Stannard 367b4741f4 [AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled
When branch target identification is enabled, we can only do indirect
tail-calls through x16 or x17. This means that the outliner can't
transform a BLR instruction at the end of an outlined region into a BR.

Differential revision: https://reviews.llvm.org/D52869

llvm-svn: 343969
2018-10-08 14:12:08 +00:00
Oliver Stannard c922116a51 [AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI
When branch target identification is enabled, all indirectly-callable
functions start with a BTI C instruction. this instruction can only be
the target of certain indirect branches (direct branches and
fall-through are not affected):
- A BLR instruction, in either a protected or unprotected page.
- A BR instruction in a protected page, using x16 or x17.
- A BR instruction in an unprotected page, using any register.

Without BTI, we can use any non call-preserved register to hold the
address for an indirect tail call. However, when BTI is enabled, then
the code being compiled might be loaded into a BTI-protected page, where
only x16 and x17 can be used for indirect tail calls.

Legacy code withiout this restriction can still indirectly tail-call
BTI-protected functions, because they will be loaded into an unprotected
page, so any register is allowed.

Differential revision: https://reviews.llvm.org/D52868

llvm-svn: 343968
2018-10-08 14:09:15 +00:00
Oliver Stannard 250e5a5b65 [AArch64][v8.5A] Branch Target Identification code-generation pass
The Branch Target Identification extension, introduced to AArch64 in
Armv8.5-A, adds the BTI instruction, which is used to mark valid targets
of indirect branches. When enabled, the processor will trap if an
instruction in a protected page tries to perform an indirect branch to
any instruction other than a BTI. The BTI instruction uses encodings
which were NOPs in earlier versions of the architecture, so BTI-enabled
code will still run on earlier hardware, just without the extra
protection.

There are 3 variants of the BTI instruction, which are valid targets for
different kinds or branches:
- BTI C can be targeted by call instructions, and is inteneded to be
  used at function entry points. These are the BLR instruction, as well
  as BR with x16 or x17. These BR instructions are allowed for use in
  PLT entries, and we can also use them to allow indirect tail-calls.
- BTI J can be targeted by BR only, and is intended to be used by jump
  tables.
- BTI JC acts ab both a BTI C and a BTI J instruction, and can be
  targeted by any BLR or BR instruction.

Note that RET instructions are not restricted by branch target
identification, the reason for this is that return addresses can be
protected more effectively using return address signing. Direct branches
and calls are also unaffected, as it is assumed that an attacker cannot
modify executable pages (if they could, they wouldn't need to do a
ROP/JOP attack).

This patch adds a MachineFunctionPass which:
- Adds a BTI C at the start of every function which could be indirectly
  called (either because it is address-taken, or externally visible so
  could be address-taken in another translation unit).
- Adds a BTI J at the start of every basic block which could be
  indirectly branched to. This could be either done by a jump table, or
  by taking the address of the block (e.g. the using GCC label values
  extension).

We only need to use BTI JC when a function is indirectly-callable, and
takes the address of the entry block. I've not been able to trigger this
from C or IR, but I've included a MIR test just in case.

Using BTI C at function entries relies on the fact that no other code in
BTI-protected pages uses indirect tail-calls, unless they use x16 or x17
to hold the address. I'll add that code-generation restriction as a
separate patch.

Differential revision: https://reviews.llvm.org/D52867

llvm-svn: 343967
2018-10-08 14:04:24 +00:00
Oliver Stannard 9ecdac8ee0 [AArch64] Fix verifier error when outlining indirect calls
The MachineOutliner for AArch64 transforms indirect calls into indirect
tail calls, replacing the call with the TCRETURNri pseudo-instruction.
This pseudo lowers to a BR, but has the isCall and isReturn flags set.

The problem is that TCRETURNri takes a tcGPR64 as the register argument,
to prevent indiret tail-calls from using caller-saved registers. The
indirect calls transformed by the outliner could use caller-saved
registers. This is fine, because the outliner ensures that the register
is available at all call sites. However, this causes a verifier failure
when the register is not in tcGPR64. The fix is to add a new
pseudo-instruction like TCRETURNri, but which accepts any GPR.

Differential revision: https://reviews.llvm.org/D52829

llvm-svn: 343959
2018-10-08 09:18:48 +00:00
Matthias Braun 81578e9f77 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Jonas Paulsson faad1b3056 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Matthias Braun 0c67a4e958 AArch64: Fix XSeqPairs/WSeqPairs problems
- Fix spill/reloads of XSeqPairs failing with vregs (only physregs
  worked correctly)
- Add missing spill/reload code for WSeqPairs class

Differential Revision: https://reviews.llvm.org/D52761

llvm-svn: 343799
2018-10-04 17:02:53 +00:00
Daniel Sanders 34eac35a60 Add the missing new files from r343654
llvm-svn: 343655
2018-10-03 02:21:30 +00:00
Daniel Sanders c973ad1878 Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

The previous commit failed portions of the test-suite on GreenDragon due to
duplicate COPY instructions and iterator invalidation. Both issues have now
been fixed. To assist with this, a helper (cloneVirtualRegister) has been added
to MachineRegisterInfo that can be used to get another register that has the same
type and class/bank as an existing one.

llvm-svn: 343654
2018-10-03 02:12:17 +00:00
Matt Morehouse 4b1ec17fb0 Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Oliver Stannard c41902807e [AArch64][v8.5A] Add Memory Tagging instructions
This adds new instructions to manipluate tagged pointers, and to load
and store the tags associated with memory.

Patch by Pablo Barrio, David Spickett and Oliver Stannard!

Differential revision: https://reviews.llvm.org/D52490

llvm-svn: 343572
2018-10-02 10:04:39 +00:00
Oliver Stannard 2a5fcba94b [AArch64][v8.5A] Add Memory Tagging system registers
This adds new system registers introduced by the Memory Tagging
extension.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52488

llvm-svn: 343571
2018-10-02 09:54:35 +00:00
Oliver Stannard 4493f421ac [AArch64][v8.5A] Add MTE system instructions
The Memory Tagging Extension adds system instructions for data cache
maintenance, implemented as new operands to the DC instruction.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52487

llvm-svn: 343570
2018-10-02 09:48:43 +00:00
Oliver Stannard 85de54090e [AArch64][v8.5A] Add MTE as an optional AArch64 extension
This adds the memory tagging extension, which is an optional extension
introduced in v8.5A. The new instructions and registers will be added by
subsequent patches.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52486

llvm-svn: 343563
2018-10-02 09:36:28 +00:00
Daniel Sanders 33f42f97af Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
There's a strange assertion on two of the Green Dragon bots that goes away when
this is reverted. The assertion is in RegBankAlloc and if it is this commit then
-verify-machine-instrs should have caught it earlier in the pipeline.

llvm-svn: 343546
2018-10-01 22:32:08 +00:00
Daniel Sanders 9659bfda5a [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
Summary: Depends on D45541

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson

Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D45543

llvm-svn: 343521
2018-10-01 18:56:47 +00:00
Matthias Braun 3e081703c3 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Evandro Menezes 55b9a5395b [AArch64] Refactor cheap cost model
Refactor the order in `TII::isAsCheapAsAMove()` to ease future development
and maintenance.  Practically NFC.

llvm-svn: 343489
2018-10-01 16:11:19 +00:00
Evandro Menezes fc1852ff1c [AArch64] Split zero cycle feature more granularly
Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp`
and `zcz-fp`, respectively, while retaining the original feature option to
mean both.

Differential revision: https://reviews.llvm.org/D52621

llvm-svn: 343354
2018-09-28 19:05:09 +00:00
Luke Cheeseman 10981cc884 Revert r343317
- asan buildbots are breaking and I need to investigate the issue

llvm-svn: 343341
2018-09-28 17:01:50 +00:00
Luke Cheeseman 21f2955bb2 Reapply changes reverted by r343235
- Add fix so that all code paths that create DWARFContext
  with an ObjectFile initialise the target architecture in the context
- Add an assert that the Arch is known in the Dwarf CallFrameString method

llvm-svn: 343317
2018-09-28 13:37:27 +00:00
David Spickett a799fe40dc Remove extra whitespace. NFC. (test commit)
llvm-svn: 343301
2018-09-28 08:45:28 +00:00
Luke Cheeseman 8e5676b1aa Revert r343192 as an ubsan build is currently failing
llvm-svn: 343235
2018-09-27 16:47:30 +00:00
Oliver Stannard 2721e6f0ed [AArch64] Refactor immediate details out of add/sub tblgen class (NFCI)
Bits [23-22] are used in Add and Sub to specify the shift. The value of the
shift field must be 0x; values of 1x are unallocated. MTE adds some instructions
that use such encodings, and this patch refactors the Add/Sub class so that
another class could derive from this one to implement other encodings and other
formats of bitfields.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52489

llvm-svn: 343231
2018-09-27 16:19:04 +00:00
Oliver Stannard a4f68bf4ad [AArch64][v8.5A] Add speculation barriers SSBB and PSSBB
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52483

llvm-svn: 343229
2018-09-27 16:09:05 +00:00
Oliver Stannard a9a5eee169 [AArch64][v8.5A] Add Branch Target Identification instructions
This adds new instructions used by the Branch Target Identification
feature. When this is enabled, these are the only instructions which can
be targeted by indirect branch instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52485

llvm-svn: 343225
2018-09-27 14:54:33 +00:00
Oliver Stannard 8459d34e82 [AArch64][v8.5A] Add speculation restriction system registers
This adds some new system registers which can be used to restrict
certain types of speculative execution.

Patch by Pablo Barrio and David Spickett!

Differential revision: https://reviews.llvm.org/D52482

llvm-svn: 343218
2018-09-27 14:05:46 +00:00
Oliver Stannard dc837e3f1f [AArch64][v8.5A] Add Armv8.5-A random number instructions
This adds two new system registers, used to generate random numbers.

This is an optional extension to v8.5-A, and will be controlled by the
"+rng" modifier of the -march= and -mcpu= options.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52481

llvm-svn: 343217
2018-09-27 14:01:40 +00:00
Oliver Stannard 6930b12d53 [AArch64][v8.5A] Add Armv8.5-A "DC CVADP" instruction
This adds a new variant of the DC system instruction for persistent
memory.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52480

llvm-svn: 343216
2018-09-27 13:53:35 +00:00
Oliver Stannard 224428c06a [AArch64][v8.5A] Add prediction invalidation instructions to AArch64
This adds new system instructions which act as barriers to speculative
execution based on earlier execution within a particular execution
context.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52479

llvm-svn: 343214
2018-09-27 13:47:40 +00:00
Oliver Stannard e481f1d95a [AArch64][v8.5A] Add speculation barrier to AArch64 instruction set
This is a new barrier which limits speculative execution of the
instructions following it.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52476

llvm-svn: 343211
2018-09-27 13:39:06 +00:00
Oliver Stannard ddb7d46aa5 [AArch64][v8.5A] Add FRINT[32,64][Z,X] instructions
These are some new variants of the "Floating-point Round to Integral"
family of instructions, which round to the nearest floating-point value
which fits in a 32- or 64-bit integer.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52475

llvm-svn: 343209
2018-09-27 13:32:06 +00:00
Luke Cheeseman f6844b307a Reapply changes reverted in r343114, lldb patch to follow shortly
llvm-svn: 343192
2018-09-27 10:39:20 +00:00
Oliver Stannard 31af178f4a [AArch64][v8.5A] Add PSTATE manipulation instructions XAFlag and AXFlag
These new instructions manipluate the NZCV bits, to convert between the
regular Arm floating-point comare format and an alternative format.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52473

llvm-svn: 343187
2018-09-27 09:11:27 +00:00
Fangrui Song 0cac726a00 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Oliver Stannard 2905937435 [AArch64] Extend single-operand FP insns to match Arm ARM (NFCI)
The Armv8.3-A reference manual defines floating-point data-processing
instructions with one source operand to have an opcode of 6 bits
[20:15]. The current class in tablegen, BaseSingleOperandFPData, only
allows [18:15]. This was ok because [20:19] could only be '00', with
other encodings unallocated. Armv8.5-A brings in the FRINT group of
instructions which use other values for these bits.

This patch refactors the existing class a bit to allow using the full 6
bits of the opcode, as defined in the Arm ARM.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52474

llvm-svn: 343120
2018-09-26 15:42:47 +00:00
Luke Cheeseman 77aaa22081 Revert r343112 as CallFrameString API change has broken lldb builds
llvm-svn: 343114
2018-09-26 14:48:03 +00:00
Oliver Stannard c5d192b611 [AArch64] Refactor instructions that write PSTATE (NFCI)
Reuse some code in preparation for the v8.5A XAFlag/AXFlag instructions,
which shares part of the encoding of the MSR-immediate.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52472

llvm-svn: 343113
2018-09-26 14:42:59 +00:00
Luke Cheeseman 03ad8812f5 [AArch64] - Return address signing dwarf support
- Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll

llvm-svn: 343112
2018-09-26 14:30:29 +00:00
Oliver Stannard 89b1604935 [AArch64][AsmParser] Show name of missing feature for system instructions
Parsing of the system instructions (IC, DC, AT and TLBI) uses this
function to show the required architecture when the operand is valid,
but the architecture is not enabled. Armv8.5A adds a few different
system instructions as part of optional features, so we need to extend
it to show individual features, not just base architectures.

This is NFC for now, but will be used by three different features added
in v8.5A, and will be tested by them.

Patch by David Spickett!

Differential revision: https://reviews.llvm.org/D52478

llvm-svn: 343109
2018-09-26 13:52:27 +00:00
Hans Wennborg 00b88bbcaf Revert r343089 "[AArch64] - Return address signing dwarf support"
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.

> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
>   state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
>   i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
>   (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
>   https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136

llvm-svn: 343103
2018-09-26 12:57:45 +00:00
Oliver Stannard 7c3c4baa3f [ARM/AArch64][v8.5A] Add Armv8.5-A target
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52470

llvm-svn: 343102
2018-09-26 12:48:21 +00:00
Luke Cheeseman f755e687fc [AArch64] - Return address signing dwarf support
Functions that have signed return addresses need additional dwarf support:
- After signing the LR, and before authenticating it, the LR register is in a
  state the is unusable by a debugger or unwinder
- To account for this a new directive, .cfi_negate_ra_state, is added
- This directive says the signed state of the LR register has now changed,
  i.e. unsigned -> signed or signed -> unsigned
- This directive has the same CFA code as the SPARC directive GNU_window_save
  (0x2d), adding a macro to account for multiply defined codes
- This patch matches the gcc implementation of this support:
  https://patchwork.ozlabs.org/patch/800271/

Differential Revision: https://reviews.llvm.org/D50136

llvm-svn: 343089
2018-09-26 10:14:15 +00:00
Nirav Dave e40e2bbd37 [AArch64] Share search bookkeeping in combines. NFCI.
Share predecessor search bookkeeping in both perform PostLD1Combine
and performNEONPostLDSTCombine. This should be approximately a 4x and
2x performance improvement.

llvm-svn: 342986
2018-09-25 15:30:22 +00:00
Benjamin Kramer b3478fcf0e [Aarch64] Fix memcpy that was copying 4x too many bytes
Found by asan.

llvm-svn: 342845
2018-09-23 18:43:28 +00:00
Tri Vo 6c47c62588 [AArch64] Support adding X[8-15,18] registers as CSRs.
Summary:
Specifying X[8-15,18] registers as callee-saved is used to support
CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we:
- use custom CSR list/mask when user specifies custom CSRs
- update Machine Register Info's list of CSRs with additional custom CSRs in
LowerCall and LowerFormalArguments.

Reviewers: srhines, nickdesaulniers, efriedma, javed.absar

Reviewed By: nickdesaulniers

Subscribers: kristof.beyls, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52216

llvm-svn: 342824
2018-09-22 22:17:50 +00:00
Matthias Braun c0ef786004 AArch64FastISel: Abort if we failed to select operand of intrinsic
rdar://44642447

Differential Revision: https://reviews.llvm.org/D52335

llvm-svn: 342742
2018-09-21 15:47:41 +00:00
Matthias Braun 28d6a4ac9a AArch64: Add FuseCryptoEOR fusion rules
There's some additional rules available on newer apple CPUs.

rdar://41235346

llvm-svn: 342590
2018-09-19 20:50:51 +00:00
Alex Bradbury 79518b02cd [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Calixte Denizet 7413a43886 Verify commit access in fixing typo
llvm-svn: 342538
2018-09-19 11:26:20 +00:00
Matthias Braun 934be5fecf AArch64MacroFusion: Factor out some opcode handling code; NFC
llvm-svn: 342521
2018-09-19 00:23:37 +00:00
David Green 85d6a55995 [AArch64] Attempt to parse more operands as expressions
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.

This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.

Differential Revision: https://reviews.llvm.org/D51792

llvm-svn: 342455
2018-09-18 09:44:53 +00:00
Sander de Smalen 2d77e788f2 [AArch64] Implement aarch64_vector_pcs codegen support.
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51479

llvm-svn: 342049
2018-09-12 12:10:22 +00:00
Sander de Smalen 7140363cd0 [AArch64] NFC: Refactoring to prepare for vector PCS.
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.

Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D51478

llvm-svn: 342038
2018-09-12 09:44:46 +00:00
Sander de Smalen 4dbc512676 [AArch64] Add parsing of aarch64_vector_pcs attribute.
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.

More information describing the vector ABI and procedure call standard
can be found here:

  https://developer.arm.com/products/software-development-tools/\
                            hpc/arm-compiler-for-hpc/vector-function-abi

Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D51477

llvm-svn: 342030
2018-09-12 08:54:06 +00:00
JF Bastien 49ddd5aca1 NFC: use bit_cast more in AArch64AddressingModes
The was previously committed as r341749 then reverted as r341750 because
bit_cast needed to do its own thing to check is_trivially_copyable on GCC 4.x.
This is now done and std;:array should now get accepted.

llvm-svn: 341897
2018-09-11 04:08:05 +00:00
Benjamin Kramer 27c769d28a [Target] Untangle disassemblers
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.

llvm-svn: 341822
2018-09-10 12:53:46 +00:00
JF Bastien 6d010103db Revert "NFC: use bit_cast more in AArch64AddressingModes"
It seems some bots think std::array is either not trivially-copyable, or isn't
the right size.

llvm-svn: 341750
2018-09-08 16:50:56 +00:00
JF Bastien 1825d10d3c NFC: use bit_cast more in AArch64AddressingModes
llvm-svn: 341749
2018-09-08 16:43:49 +00:00
JF Bastien c4986cef12 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

This was originally committed as r341728 and reverted in r341730.

Reviewers: javed.absar, steven_wu, srhines

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341741
2018-09-08 03:55:25 +00:00
JF Bastien 05430cc6e5 Revert "ADT: add <bit> header, implement C++20 bit_cast, use"
Bots sad. Looks like missing std::is_trivially_copyable.

llvm-svn: 341730
2018-09-07 23:23:47 +00:00
JF Bastien 28655081a4 ADT: add <bit> header, implement C++20 bit_cast, use
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.

Reviewers: javed.absar

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51693

llvm-svn: 341728
2018-09-07 23:08:26 +00:00
Nick Desaulniers 287a3be379 [AArch64] Support reserving x1-7 registers.
Summary:
Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.

Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma

Reviewed By: nickdesaulniers, efriedma

Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D48580

llvm-svn: 341706
2018-09-07 20:58:57 +00:00
JF Bastien 2920061105 ARM64: improve non-zero memset isel by ~2x
Summary:
I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated
where the generated code was bad. This patch fixes the majority of the issues by
requesting that a 2xi64 vector be used for memset of 32 bytes and above.

The patch leaves the former request for f128 unchanged, despite f128
materialization being suboptimal: doing otherwise runs into other asserts in
isel and makes this patch too broad.

This patch hides the issue that was present in bzero_40_stack and bzero_72_stack
because the code now generates in a better order which doesn't have the store
offset issue. I'm not aware of that issue appearing elsewhere at the moment.

<rdar://problem/44157755>

Reviewers: t.p.northover, MatzeB, javed.absar

Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D51706

llvm-svn: 341558
2018-09-06 16:03:32 +00:00