Commit Graph

41809 Commits

Author SHA1 Message Date
Matt Arsenault 52d1b62a28 AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

llvm-svn: 297251
2017-03-08 01:06:58 +00:00
Matt Arsenault d8ed207a20 AMDGPU: Constant fold rcp node
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.

llvm-svn: 297248
2017-03-08 00:48:46 +00:00
Changpeng Fang 6b49fa4ca7 AMDGPU/SI: Do not insert EndCf in an unreachable block
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D22025

llvm-svn: 297243
2017-03-07 23:29:36 +00:00
Daniel Sanders 52b4ce727a Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297241
2017-03-07 23:20:35 +00:00
Krzysztof Parzyszek 434d50a796 [Hexagon] Check for presence before looking registers up in bit tracker
llvm-svn: 297240
2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek 8e4d2e0512 [Hexagon] Generate bitsplit instruction
llvm-svn: 297239
2017-03-07 23:08:35 +00:00
Artem Belevich 2524a22562 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

llvm-svn: 297198
2017-03-07 20:33:38 +00:00
Joel Jones 2852088126 [AArch64] Vulcan is now ThunderXT99
Broadcom Vulcan is now Cavium ThunderX2T99.

LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113

Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).

Patch was tested with SpecCPU2006.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D30510

llvm-svn: 297190
2017-03-07 19:42:40 +00:00
Daniel Sanders 8ebec37d26 Revert r297177: Change LLT constructor string into an LLT-based object ...
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.

Somehow, this change causes the build to need Attributes.gen before it's been
generated.

llvm-svn: 297188
2017-03-07 19:21:23 +00:00
Sanjoy Das c08a79fbf2 [X86] Add option to specify preferable loop alignment
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.

Patch by Serguei Katkov!

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D30391

llvm-svn: 297178
2017-03-07 18:47:22 +00:00
Daniel Sanders 8612326a08 [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297177
2017-03-07 18:32:25 +00:00
John Brawn eba9fdac7e [ARM] Correct handling of LSL #0 in an IT block
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.

Differential Revision: https://reviews.llvm.org/D30692

llvm-svn: 297142
2017-03-07 14:42:03 +00:00
Krzysztof Parzyszek 3cceffb752 [Hexagon] Do not insert instructions before PHI nodes
llvm-svn: 297141
2017-03-07 14:20:19 +00:00
Ranjeet Singh 3d0af578cc [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645

llvm-svn: 297137
2017-03-07 11:17:53 +00:00
Jonas Paulsson 1d33cd3988 [SystemZ] Add check VT.isSimple() in canTreateAsByteVector()
Since BB-vectorizer can produce vectors of for example 3 elements,
this check is needed.

Review: Ulrich Weigand
llvm-svn: 297136
2017-03-07 09:49:31 +00:00
Artyom Skrobov 1388e2f792 In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

llvm-svn: 297134
2017-03-07 09:38:16 +00:00
Ayman Musa 850fc977c8 [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables.
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.

Differential Revision: https://reviews.llvm.org/D30451

llvm-svn: 297127
2017-03-07 08:11:19 +00:00
Ayman Musa ac5a2c43af [X86][AVX512] Add missing entries to EVEX2VEX tables
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.

Differential Revision: https://reviews.llvm.org/D30501

llvm-svn: 297126
2017-03-07 08:05:53 +00:00
Tim Shen 70054bb827 Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

llvm-svn: 297124
2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov e8aaab8abe Revert "AMDGPU: Set MCAsmInfo::PointerSize"
It breaks line tables because the patch is not complete, working on a complete one at the moment

This reverts commit r294031.

llvm-svn: 297118
2017-03-07 04:44:33 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Chad Rosier 9a70c7c02a [AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.

Differential Revision: https://reviews.llvm.org/D30576.

llvm-svn: 297078
2017-03-06 21:20:00 +00:00
Jan Vesely 3ea1704434 AMDGPU/R600: Fix ALU clause markers use detection
also exit early on kill instead of redefinition.

Differential Revision: https://reviews.llvm.org/D30230

llvm-svn: 297060
2017-03-06 20:10:05 +00:00
Reid Kleckner 812191584f [X86] Fix arg copy elision for illegal types
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.

Fixes PR32136 and re-enables copy elision from i64 arguments.

Reverts the workaround in from r296950.

llvm-svn: 297045
2017-03-06 18:39:39 +00:00
Krzysztof Parzyszek 8a4c601abc [Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.

For example:
  loop:                           loop:
    // loop body                      // loop body
    if (...) jump exit      -->       // more body
  more:                               if (...) jump exit
    // more body                      jump loop
    jump loop

llvm-svn: 297033
2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek e16ce15687 [Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.

llvm-svn: 297032
2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek 143158b72e [Hexagon] Pick a dot-old instruction that matches the architecture
llvm-svn: 297031
2017-03-06 17:03:16 +00:00
Nemanja Ivanovic 12e67d868a [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615

llvm-svn: 297001
2017-03-06 07:32:13 +00:00
Benjamin Kramer bb635e034c [X86] Silence GCC enum compare warning.
X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional
expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'
[-Werror=enum-compare]

llvm-svn: 296986
2017-03-05 12:53:20 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Sanjay Patel b974be5ef4 [x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in 
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're 
creating ADC/SBB in lots of places where we shouldn't.

This was intended to be an NFC change, but avx-512 has something 
strange going on. It doesn't seem like any of the affected tests 
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...

llvm-svn: 296978
2017-03-04 20:35:19 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Simon Pilgrim 40a0e66b37 [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.

Differential Revision: https://reviews.llvm.org/D30213

llvm-svn: 296966
2017-03-04 12:50:47 +00:00
Matthias Braun 21f340fd25 X86ISelLowering: Only perform copy elision on legal types.
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.

This fixes http://llvm.org/PR32136

llvm-svn: 296950
2017-03-04 01:40:40 +00:00
Sanjay Patel a84fd041c6 [x86] check for commuted add pattern to find ADC/SBB
llvm-svn: 296933
2017-03-04 00:18:31 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Sanjay Patel 7ee83b41e0 [x86] refactor combineAddOrSubToADCOrSBB(); NFCI
The comments were wrong, and this is not an obvious transform.
This hopefully makes it clearer that we're missing the commuted
patterns for adds. It's less clear that this is actually a good
transform for all micro-arch.

This is prep work for trying to clean up the current adc/sbb 
codegen because it's definitely not happening optimally.

llvm-svn: 296918
2017-03-03 22:35:11 +00:00
Krzysztof Parzyszek cc31871dc4 Make TargetInstrInfo::isPredicable take a const reference, NFC
llvm-svn: 296901
2017-03-03 18:30:54 +00:00
Sanjay Patel 58e241896d [x86] clean up materializeSBB(); NFCI
This is producing SBB where it is obviously not necessary, so it needs to be limited.

llvm-svn: 296894
2017-03-03 17:58:39 +00:00
Sanjay Patel e8674825fe [x86] fix formatting; NFC
llvm-svn: 296875
2017-03-03 15:17:41 +00:00
Simon Pilgrim c37a32d2b9 Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation
llvm-svn: 296874
2017-03-03 14:37:57 +00:00
Dmitry Preobrazhensky 03880f8d24 [AMDGPU][MC] Fix for Bug 30829 + LIT tests
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.

llvm-svn: 296873
2017-03-03 14:31:06 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Amjad Aboud 4f97751798 [X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859
2017-03-03 09:03:24 +00:00
Sjoerd Meijer 69bccf96bd [AArch64AsmParser] rewrite of function parseSysAlias
This is a cleanup/rewrite of the parseSysAlias function. It was not using the
tablegen instruction descriptions, but was “manually” matching the mnemonics
and recreating the operands whereas all this information is already in
tablegen; all this code has been replaced with calls to lookupXYZByName
tablegen calls.

Differential Revision: https://reviews.llvm.org/D30491

llvm-svn: 296857
2017-03-03 08:12:47 +00:00
Igor Breger 321cf3c650 [GlobalISel][X86] Support float/double and vector types.
Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector.

Reviewers: delena, zvi

Reviewed By: zvi

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30533

llvm-svn: 296856
2017-03-03 08:06:46 +00:00
Matt Arsenault 31a58c6ac0 AMDGPU: Fix missing dominator tree dependency
llvm-svn: 296842
2017-03-02 23:50:51 +00:00
Krzysztof Parzyszek e720feb1c6 [Hexagon] Pick the right branch opcode depending on branch probabilities
Specifically, pick the opcode with the correct branch prediction, i.e.
jump:t or jump:nt.

llvm-svn: 296821
2017-03-02 21:49:49 +00:00
Eli Friedman bb821276d0 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296815
2017-03-02 21:39:39 +00:00
Guozhi Wei ed28e742ee [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362

llvm-svn: 296811
2017-03-02 21:07:59 +00:00
Chad Rosier ea25eca04a [AArch64] Extend redundant copy elimination pass to handle non-zero stores.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle non-zero cases such as:

BB#0:
  cmp x0, #1
  b.eq .LBB0_1
.LBB0_1:
  orr x0, xzr, #0x1  ; <-- redundant copy; x0 known to hold #1.

Differential Revision: https://reviews.llvm.org/D29344

llvm-svn: 296809
2017-03-02 20:48:11 +00:00
Vadzim Dambrouski eafb805506 [MSP430] Add SRet support to MSP430 target
This patch adds support for struct return values to the MSP430
target backend. It also reverses the order of argument and return
registers in the calling convention to bring it into closer
alignment with the published EABI from TI.

Patch by Andrew Wygle (awygle).

Differential Revision: https://reviews.llvm.org/D29069

llvm-svn: 296807
2017-03-02 20:25:10 +00:00
Artem Belevich ee7dd12ff4 [NVPTX] Reduce amount of boilerplate code used to select load instruction opcode.
Make opcode selection code for the load instruction a bit easier
to read and maintain.

This patch also catches number of f16 load/store variants that were
not handled before.

Differential Revision: https://reviews.llvm.org/D30513

llvm-svn: 296785
2017-03-02 19:14:14 +00:00
Artem Belevich 5920babc4f [NVPTX] Added missing LDU/LDG intrinsics for f16.
Differential Revision: https://reviews.llvm.org/D30512

llvm-svn: 296784
2017-03-02 19:14:10 +00:00
Simon Pilgrim b3067dc374 [X86][MMX] Fixed i32 extraction on 32-bit targets
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal

llvm-svn: 296782
2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek 056c945a5d [Hexagon] Skip blocks that define vector predicate registers in early-if
llvm-svn: 296777
2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek fcbb7d10fe [Hexagon] Properly handle 'q' constraint in 128-byte vector mode
llvm-svn: 296772
2017-03-02 17:50:24 +00:00
Nemanja Ivanovic db8425eff0 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

llvm-svn: 296771
2017-03-02 17:38:59 +00:00
Tim Northover e80d6d1360 GlobalISel: record correct stack usage for signext parameters.
The CallingConv.td rules allocate 8 bytes for these kinds of arguments
on AAPCS targets, but we were only recording the smaller amount. The
difference is theoretical on AArch64 because we don't actually store
more than the smaller amount, but it's still much better to have these
two components in agreement.

Based on Diana Picus's ARM equivalent patch (where it matters a lot
more).

llvm-svn: 296754
2017-03-02 15:34:18 +00:00
Matthew Simpson aee9771ae2 [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

llvm-svn: 296751
2017-03-02 15:15:35 +00:00
Matthew Simpson 1bfa159db9 [ARM/AArch64] Support wide interleaved accesses
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.

Differential Revision: https://reviews.llvm.org/D29466

llvm-svn: 296750
2017-03-02 15:11:20 +00:00
Eli Friedman 933863ce61 Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.

llvm-svn: 296718
2017-03-02 00:08:50 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Eli Friedman 1c9216b003 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296708
2017-03-01 23:20:29 +00:00
Eli Friedman 28c2c0e311 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368

llvm-svn: 296701
2017-03-01 22:56:20 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Krzysztof Parzyszek 8144f37dd8 [RDF] Replace {} with explicit constructor, since not all compilers like it
llvm-svn: 296666
2017-03-01 19:59:28 +00:00
Krzysztof Parzyszek ebabd99adb [RDF] Add recursion limit to getAllReachingDefsRec
For large programs this function can take significant amounts of time.
Let it abort gracefully when the program is too complex.

llvm-svn: 296662
2017-03-01 19:30:42 +00:00
Krzysztof Parzyszek 8f23dd6d68 [Hexagon] Fix lowering of formal arguments of type i1
On Hexagon, values of type i1 are passed in registers of type i32,
even though i1 is not a legal value for these registers. This is a
special case and needs special handling to maintain consistency of
the lowering information.

This fixes PR32089.

llvm-svn: 296645
2017-03-01 17:30:10 +00:00
Diana Picus 3841522259 clang-format r296631
Apparently I forgot to run it after fixing up some things...

llvm-svn: 296634
2017-03-01 15:54:21 +00:00
Diana Picus 9c52309b37 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

llvm-svn: 296631
2017-03-01 15:35:14 +00:00
Oliver Stannard 5d35b9e56c [ARM] Fix parsing of special register masks
This parsing code was incorrectly checking for invalid characters, so an
invalid instruction like:
  msr spsr_w, r0
would be emitted as:
  msr spsr_cxsf, r0

Differential revision: https://reviews.llvm.org/D30462

llvm-svn: 296607
2017-03-01 10:51:04 +00:00
Ayman Musa 9b802e4650 [X86] Fix creating vreg def after use.
llvm-svn: 296601
2017-03-01 10:20:48 +00:00
Dan Gohman 7d7409e553 [WebAssembly] Convert the remaining unit tests to the new wasm-object-file target.
To facilitate this, add a new hidden command-line option to disable
the explicit-locals pass. That causes llc to emit invalid code that doesn't
have all locals converted to get_local/set_local, however it simplifies
testwriting in many cases.

llvm-svn: 296540
2017-02-28 23:37:04 +00:00
Eli Friedman 36795239f5 [ARM] Don't generate deprecated T1 STM.
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.

Patch by Zhaoshi Zheng.

Differential Revision: https://reviews.llvm.org/D27910

llvm-svn: 296538
2017-02-28 23:32:55 +00:00
Krzysztof Parzyszek 33fd0bbbe8 [Hexagon] Generate extract instructions more aggressively
llvm-svn: 296537
2017-02-28 23:27:33 +00:00
Krzysztof Parzyszek f208681731 [Hexagon] Fix instruction selection for sign-extending i1 to i64
llvm-svn: 296532
2017-02-28 22:37:01 +00:00
Matt Arsenault 8f016df1ed AMDGPU: Fix types for VOP_I16_I16_I16
llvm-svn: 296523
2017-02-28 21:31:45 +00:00
Matt Arsenault 4d263f6f18 AMDGPU: Add definition for v_swap_b32
This is somewhat tricky because there are two
pairs of tied operands, and it isn't allowed to be
VOP3 encoded.

llvm-svn: 296519
2017-02-28 21:09:04 +00:00
Matt Arsenault 03612631cb AMDGPU: Add definition for v_xad_u32
llvm-svn: 296515
2017-02-28 20:27:30 +00:00
Matt Arsenault 781249833b AMDGPU: Add ds_nop to assembler
llvm-svn: 296513
2017-02-28 20:15:46 +00:00
Matt Arsenault dedc544ac7 AMDGPU: Add definitions for ds_{read|write}_b{96|128}
It's not clear to me if this is always better than
doing ds_write2_b64 This adds the constraint of
a 128-bit register input instead of a pair of
64-bit.

llvm-svn: 296512
2017-02-28 20:15:43 +00:00
Stanislav Mekhanoshin 357d3db0a4 [AMDGPU] Add second pass of the scheduler
If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.

Differential Revision: https://reviews.llvm.org/D30442

llvm-svn: 296506
2017-02-28 19:20:33 +00:00
Stanislav Mekhanoshin 282e8e4a72 [AMDGPU] New method to estimate register pressure
This change introduces new method to estimate register pressure in
GCNScheduler. Standard RPTracker gives huge error due to the following
reasons:

1. It does not account for live-ins or live-outs if value is not used
in the region itself. That creates a huge error in a very common case
if there are a lot of live-thu registers.
2. It does not properly count subregs.
3. It assumes a register used as an input operand can be reused as an
output. This is not always possible by itself, this is not what RA
will finally do in many cases for various reasons not limited to RA's
inability to do so, and this is not so if the value is actually a
live-thu.

In addition we can now see clear separation between live-in pressure
which we cannot change with the scheduling and tentative pressure
which we can change.

Differential Revision: https://reviews.llvm.org/D30439

llvm-svn: 296491
2017-02-28 17:22:39 +00:00
Konstantin Zhuravlyov 182e9cc6d5 [AMDGPU] Change amd_kernel_code_t's minor version to 1
- We do emit amd_kernel_code_t v1.1

Differential Revision: https://reviews.llvm.org/D30433

llvm-svn: 296489
2017-02-28 17:17:52 +00:00
Stanislav Mekhanoshin 080889cad7 [AMDGPU] Fix read-undef flags when schedule is reverted
If two subregs of the same register are defined and we need to revert
schedule changing def order, we will end up with both instructions
having def,read-undef flags because adjustLaneLiveness() will only set
this flag but will not remove it.

Fix this by removing read-undef flags before calling adjustLaneLiveness.

Differential Revision: https://reviews.llvm.org/D30428

llvm-svn: 296484
2017-02-28 16:26:27 +00:00
Simon Dardis e3cceed3b4 [mips] Fix 64bit slt/sltu/nor with immediates
Patch By: Alexander Richardson

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30330

llvm-svn: 296482
2017-02-28 15:55:23 +00:00
Daniel Sanders 983c9b98e9 Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.

llvm-svn: 296478
2017-02-28 15:00:27 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Daniel Sanders a5afdefec6 [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 296474
2017-02-28 14:21:31 +00:00
Diana Picus 1ffca2aeaf [ARM] GlobalISel: Lower i32 and fp call parameters on the stack
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.

We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.

llvm-svn: 296473
2017-02-28 14:17:53 +00:00
Diana Picus 5a7203a0af [ARM] GlobalISel: Select 32-bit G_CONSTANT
Put it into a register by means of a MOVi.

llvm-svn: 296471
2017-02-28 13:05:42 +00:00
Diana Picus 5b8514559e [ARM] GlobalISel: Add mapping for G_CONSTANT
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.

llvm-svn: 296469
2017-02-28 12:13:58 +00:00
Diana Picus e6beac6742 [ARM] GlobalISel: Legalize 32-bit constants
llvm-svn: 296468
2017-02-28 11:33:46 +00:00
Diana Picus 9d07094913 [ARM] GlobalISel: Select G_GEP
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.

llvm-svn: 296462
2017-02-28 10:14:38 +00:00
Oliver Stannard 85d4d5b493 [ARM] Diagnose PC-writing instructions in IT blocks
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.

Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.

Differential Revision: https://reviews.llvm.org/D30398

llvm-svn: 296459
2017-02-28 10:04:36 +00:00
Diana Picus 566a15d749 [ARM] GlobalISel: Add reg bank mapping for G_GEP
This should be the same as the mapping for G_ADD etc.

llvm-svn: 296455
2017-02-28 09:35:10 +00:00
Diana Picus 8598b17076 [ARM] GlobalISel: Legalize G_GEP with 32-bit offsets
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.

llvm-svn: 296452
2017-02-28 09:02:42 +00:00
Vadzim Dambrouski cc33fc8722 Test commit, fix typo, NFC.
llvm-svn: 296447
2017-02-28 08:27:43 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Dan Gohman d37dc2f773 [WebAssembly] Add some comments and tidy up whitespace.
llvm-svn: 296402
2017-02-27 22:41:39 +00:00
Matt Arsenault 10268f93e8 AMDGPU: Use v_med3_{f16|i16|u16}
llvm-svn: 296401
2017-02-27 22:40:39 +00:00
Dan Gohman f52ee17a09 [WebAssembly] Split CFG-sorting into its own pass. NFC.
CFG sorting was already an independent algorithm from block/loop insertion;
this change makes it more convenient to debug.

llvm-svn: 296399
2017-02-27 22:38:58 +00:00
Matt Arsenault eb522e68bc AMDGPU: Support v2i16/v2f16 packed operations
llvm-svn: 296396
2017-02-27 22:15:25 +00:00
Sanjay Patel ae7873fe55 [ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition
The transform in question claims to be doing:

// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))

...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.

This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.

The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's 
not affected by this code change.

Differential Revision:
https://reviews.llvm.org/D30355

llvm-svn: 296389
2017-02-27 21:30:54 +00:00
Matt Arsenault c9f2517e96 AMDGPU: Add some of the new gfx9 VOP3 instructions
llvm-svn: 296382
2017-02-27 21:04:41 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Matt Arsenault 7596f13d15 AMDGPU: Support inlineasm for packed instructions
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.

llvm-svn: 296379
2017-02-27 20:52:10 +00:00
Matt Arsenault 2ed2193218 AMDGPU: Don't fold immediate if clamp/omod are set
Doesn't fix any practical problems because clamp/omod
are currently folded after peephole optimizer.

llvm-svn: 296375
2017-02-27 20:21:31 +00:00
Matt Arsenault 3cb390498e AMDGPU: Fold omod into instructions
llvm-svn: 296372
2017-02-27 19:35:42 +00:00
Matt Arsenault e2d1d3a940 AMDGPU: Add f16 to shader calling conventions
Mostly useful for writing tests for f16 features.

llvm-svn: 296370
2017-02-27 19:24:47 +00:00
Matt Arsenault 9be7b0d485 AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

llvm-svn: 296368
2017-02-27 18:49:11 +00:00
Krzysztof Parzyszek e9be35596e [Hexagon] Defs and clobbers can overlap
llvm-svn: 296365
2017-02-27 18:03:35 +00:00
Craig Topper 7502119ce8 [X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector.
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30392

llvm-svn: 296355
2017-02-27 16:15:32 +00:00
Craig Topper 3917ca2af4 [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30390

llvm-svn: 296354
2017-02-27 16:15:30 +00:00
Craig Topper e1be95c3d0 [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation
Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized.

Differential Revision: https://reviews.llvm.org/D30387

llvm-svn: 296353
2017-02-27 16:15:27 +00:00
Craig Topper 53e5a38da9 [X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30386

llvm-svn: 296352
2017-02-27 16:15:25 +00:00
Sjoerd Meijer 32ecac7ac8 AArch64InstPrinter: rewrite of printSysAlias
This is a cleanup/rewrite of the printSysAlias function. This was not using the
tablegen instruction descriptions, but was "manually" decoding the
instructions. This has been replaced with calls to lookup_XYZ_ByEncoding
tablegen calls.

This revealed several problems. First, instruction IVAU had the wrong encoding.
This was cancelled out by the parser that incorrectly matched the wrong
encoding. Second, instruction CVAP was missing from the SystemOperands tablegen
descriptions, so this has been added. And third, the required target features
were not captured in the tablegen descriptions, so support for this has also
been added.

Differential Revision: https://reviews.llvm.org/D30329

llvm-svn: 296343
2017-02-27 14:45:34 +00:00
John Brawn c97b714ffb [ARM] LSL #0 is an alias of MOV
Currently we handle this correctly in arm, but in thumb we don't which leads to
an unpredictable instruction being emitted for LSL #0 in an IT block and SP not
being permitted in some cases when it should be.

For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the
.td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to
get the IT handling right. We also need to adjust the handling of
MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We
should also adjust it to allow SP in the same way that it is allowed in
MOV rd, rn, but I haven't done that here because it looks like it would take
quite a lot of work to get right.

Additionally correct the selection of the 16-bit shift instructions in
processInstruction, where it was checking if the two registers were equal when
it should have been checking if they were low. It appears that previously this
code was never executed and the 16-bit encoding was selected by default, but
the other changes I've done here have somehow made it start being used.

Differential Revision: https://reviews.llvm.org/D30294

llvm-svn: 296342
2017-02-27 14:40:51 +00:00
Sjoerd Meijer 6d171006f4 AArch64AsmParser: don't try to parse “[1]” for non-vector register operands
There are no instructions that have "[1]" as part of the assembly string;
FMOVXDhighr is out of date. This removes dead code.

Differential Revision: https://reviews.llvm.org/D30165

llvm-svn: 296327
2017-02-27 10:51:11 +00:00
Konstantin Zhuravlyov 972948b36e [AMDGPU] Runtime metadata fixes:
- Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it:
    .amdgpu_runtime_metadata
    { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ...
  - Make IsaInfo optional, and always emit it.

Differential Revision: https://reviews.llvm.org/D30349

llvm-svn: 296324
2017-02-27 07:55:17 +00:00
Craig Topper ed0101a0b9 [X86] Check for less than 0 rather than explicit compare with -1. NFC
llvm-svn: 296321
2017-02-27 06:05:30 +00:00
Craig Topper 6028584d8c [X86] Fix execution domain for cmpss/sd instructions.
llvm-svn: 296293
2017-02-26 06:45:59 +00:00
Craig Topper 036693302b [AVX-512] Fix execution domain for scalar commutable min/max instructions.
llvm-svn: 296292
2017-02-26 06:45:56 +00:00
Craig Topper e70231be51 [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.
llvm-svn: 296291
2017-02-26 06:45:54 +00:00
Craig Topper fe25988c68 [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.
llvm-svn: 296290
2017-02-26 06:45:51 +00:00
Craig Topper 49ba3f5406 [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and VPBROADCASTWr_Alt instructions. NFC
llvm-svn: 296289
2017-02-26 06:45:48 +00:00
Craig Topper 6bf9b809ce [AVX-512] Fix execution domain for VPMADD52 instructions.
llvm-svn: 296288
2017-02-26 06:45:45 +00:00
Craig Topper aa8e903150 [AVX-512] Fix the execution domain for VSCALEF instructions.
llvm-svn: 296286
2017-02-26 06:45:40 +00:00
Craig Topper cac5d698df [AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae.
llvm-svn: 296285
2017-02-26 06:45:37 +00:00
Craig Topper ed64904c74 [X86] Fix the execution domain for scalar SQRT intrinsic instruction.
llvm-svn: 296284
2017-02-26 06:45:35 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Eric Christopher 4a8208c266 vec perm can go down either pipeline on P8.
No observable changes, spotted while looking at the scheduling description.

llvm-svn: 296277
2017-02-26 00:11:58 +00:00
Simon Pilgrim 0f5fb5f549 [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296272
2017-02-25 20:01:58 +00:00
Craig Topper 2caa97c891 [AVX-512] Fix the execution domain for scalar FMA instructions.
llvm-svn: 296271
2017-02-25 19:36:28 +00:00
Craig Topper 176f3310b6 [AVX-512] Fix the execution domain on some instructions.
llvm-svn: 296270
2017-02-25 19:18:11 +00:00
Craig Topper d2011e3612 [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS using the scalar register class. We only have patterns for the masked intrinsics.
llvm-svn: 296264
2017-02-25 18:43:42 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Junmo Park 7ff4c045eb Minor code cleanup. NFC.
llvm-svn: 296207
2017-02-25 00:08:53 +00:00
Dan Gohman 82607f56bd [WebAssembly] Add support for using a wasm global for the stack pointer.
This replaces the __stack_pointer variable which was allocated in linear
memory.

llvm-svn: 296201
2017-02-24 23:46:05 +00:00
Krzysztof Parzyszek 0d67b10a3c [Hexagon] Undo shift folding where it could simplify addressing mode
For example, avoid (single shift):
  r0 = and(##536870908,lsr(r0,#3))
  r0 = memw(r1+r0<<#0)

in favor of (two shifts):
  r0 = lsr(r0,#5)
  r0 = memw(r1+r0<<#2)

llvm-svn: 296196
2017-02-24 23:34:24 +00:00
Dan Gohman d934cb8806 [WebAssembly] Basic support for Wasm object file encoding.
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.

llvm-svn: 296190
2017-02-24 23:18:00 +00:00
Krzysztof Parzyszek be5028aed3 [Hexagon] Prettify code in HexagonDAGToDAGISel::Select
llvm-svn: 296187
2017-02-24 23:00:40 +00:00
Wei Ding 4d3d4ca1b3 AMDGPU : Replace FMAD with FMA when denormals are enabled.
Differential Revision: http://reviews.llvm.org/D29958

llvm-svn: 296186
2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin 42259cf35e Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Dan Gohman 6999c4fd28 [WebAssembly] Handle f16 in fast-isel.
llvm-svn: 296172
2017-02-24 21:05:35 +00:00
Davide Italiano 74f27b80d4 [Target/MIPS] Kill dead code, no functional change intended.
Hopefully placates gcc with -Werror.

llvm-svn: 296153
2017-02-24 18:48:10 +00:00
Simon Pilgrim cdf2bd656a Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296147
2017-02-24 18:31:04 +00:00
Nemanja Ivanovic 195c5452d3 [PowerPC] Use subfic instruction for subtract from immediate
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29387

llvm-svn: 296144
2017-02-24 18:16:06 +00:00
Nemanja Ivanovic 82d53ed492 [PowerPC] Use rldicr instruction for AND with an immediate if possible
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29388

llvm-svn: 296143
2017-02-24 18:03:16 +00:00
Simon Pilgrim bd9fb2ae95 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296141
2017-02-24 17:46:18 +00:00
Simon Dardis ae6f2bcb25 Recommit "[mips] Fix atomic compare and swap at O0."
This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296134
2017-02-24 16:32:18 +00:00
Simon Dardis 3c58c18ff0 Revert "[mips] Fix atomic compare and swap at O0."
This reverts r296132. I forgot to include the tests.

llvm-svn: 296133
2017-02-24 16:30:27 +00:00
Simon Dardis cf0e06d375 [mips] Fix atomic compare and swap at O0.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296132
2017-02-24 16:27:45 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Simon Pilgrim 7f6a7c97a7 [X86][SSE] Target shuffle combine can try to combine up to 16 vectors
Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).

llvm-svn: 296130
2017-02-24 15:35:52 +00:00
Sanjay Patel 9f0fa52aa2 [x86] use DAG.getAllOnesConstant(); NFCI
llvm-svn: 296128
2017-02-24 15:09:59 +00:00
Simon Dardis aa20881749 [mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64
Previously LLVM was assuming 32-bit signed immediates which results in and with
a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result.
After applying this patch I can now compile all of the FreeBSD mips assembly
code with clang.

This issue also affects the nor, slt and sltu macros and I will fix those in a
separate review.

Patch By: Alexander Richardson

Commit message reformatted by sdardis.

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30298

llvm-svn: 296125
2017-02-24 14:34:32 +00:00
Diana Picus 3b99c64ba1 [ARM] GlobalISel: Select G_STORE
Same as selecting G_LOAD.

llvm-svn: 296122
2017-02-24 14:01:27 +00:00
Diana Picus 1f432f995a [ARM] GlobalISel: Add reg bank mappings for stores
Same as the ones for loads.

llvm-svn: 296115
2017-02-24 13:07:25 +00:00
Diana Picus a2b632a353 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296108
2017-02-24 11:28:24 +00:00
Simon Dardis a5f52dc00d [mips][mc] Fix a crash when disassembling odd sized sections
Make the MIPS disassembler consistent with the other targets in returning
a Size of zero when the input buffer cannot contain an instruction due
to it's size. Previously it reported the minimum instruction size when
it failed due to the buffer not being big enough for an instruction
causing llvm-objdump to crash when disassembling all sections.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D29984

llvm-svn: 296105
2017-02-24 10:50:27 +00:00
Diana Picus c21d1e5d94 Revert "[ARM] GlobalISel: Legalize stores"
This reverts commit r296103 because the test broke on one of the bots. Sorry!

llvm-svn: 296104
2017-02-24 10:35:39 +00:00
Diana Picus a5f1cfd1a7 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296103
2017-02-24 10:19:23 +00:00
Simon Pilgrim aed352273e [APInt] Add APInt::setBits() method to set all bits in range
The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

llvm-svn: 296102
2017-02-24 10:15:29 +00:00
Dan Gohman 411dc07aba [WebAssembly] Add a README.txt entry for mergeable sections.
llvm-svn: 296095
2017-02-24 07:33:55 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper f2529c188b [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.

llvm-svn: 296091
2017-02-24 05:35:04 +00:00
Petr Hosek a7d5916308 [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

llvm-svn: 296081
2017-02-24 03:10:10 +00:00
Artem Belevich 620db1f3dd [NVPTX] Added support for .f16x2 instructions.
This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

llvm-svn: 296032
2017-02-23 22:38:24 +00:00
Tim Northover 063a56e81c ARM: make sure FastISel bails on f64 operations for Cortex-M4.
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.

The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.

llvm-svn: 296031
2017-02-23 22:35:00 +00:00
Krzysztof Parzyszek 128e191eac [Hexagon] Handle saturations in Hexagon bit tracker
llvm-svn: 296026
2017-02-23 22:11:52 +00:00
Krzysztof Parzyszek 998e49e5c8 [Hexagon] Allow setting register in BitVal without storing into map
In the bit tracker, references to other bit values in which the register
is 0 are prohibited. This means that generating self-referential register
cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order
to get a self-referential cell, it had to be stored into a map and then
reloaded from it. To avoid this step, add a function that will set the
register to a given value without going through the map.

llvm-svn: 296025
2017-02-23 22:08:50 +00:00
Stanislav Mekhanoshin 78468e48cf [AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.
Clang issues warning about hidden overload. That was intended, so
add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it.

llvm-svn: 296021
2017-02-23 21:51:28 +00:00
Evgeniy Stepanov ee2d77f6d6 Disable TLS for stack protector on Android API<17.
The TLS slot did not exist back then.

llvm-svn: 296014
2017-02-23 21:06:35 +00:00
Stanislav Mekhanoshin ce3ddd2de4 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

llvm-svn: 296009
2017-02-23 20:19:44 +00:00
Krzysztof Parzyszek 2cfc7a48de [Hexagon] Avoid IMPLICIT_DEFs as new-value producers
llvm-svn: 295997
2017-02-23 17:47:34 +00:00
Jan Vesely 70293a045b AMDGPU/SI: Fix trunc i16 pattern
Hit on ASICs that support 16bit instructions.

Differential Revision: https://reviews.llvm.org/D30281

llvm-svn: 295990
2017-02-23 16:12:21 +00:00
Krzysztof Parzyszek af5ff65d67 [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE
llvm-svn: 295981
2017-02-23 15:02:09 +00:00
Diana Picus a8cb0cd8f2 [ARM] GlobalISel: Lower call returns
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).

llvm-svn: 295973
2017-02-23 14:18:41 +00:00
Diana Picus a606713c33 [ARM] GlobalISel: Lower call parameters in regs
Add support for lowering calls with parameters than can fit into regs.  Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.

llvm-svn: 295971
2017-02-23 13:25:43 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Simon Dardis d410fc8f28 [mips][ias] Further relax operands of certain assembly instructions
This patch adjusts the most relaxed predicate of immediate operands to accept
immediate forms such as ~(0xf0000000|0x000f00000). Previously these forms
would be accepted by GAS and rejected by IAS.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29218

llvm-svn: 295965
2017-02-23 12:40:58 +00:00
Kristof Beyls 5ac6adbb6d Fix assertion failure in ARMConstantIslandPass.
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.

This fixes PR31997.

llvm-svn: 295964
2017-02-23 12:24:55 +00:00
Ayman Musa 524dbdaa2b [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding.
(Quick fix to buildbot fail). 

llvm-svn: 295946
2017-02-23 08:13:36 +00:00
Ayman Musa 6e670cf44f [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.

Differential Revision: https://reviews.llvm.org/D29988

llvm-svn: 295940
2017-02-23 07:24:21 +00:00
Matt Arsenault f0a88dbaab LoadStoreVectorizer: Split even sized illegal chains properly
Implement isLegalToVectorizeLoadChain for AMDGPU to avoid
producing private address spaces accesses that will need to be
split up later. This was doing the wrong thing in the case
where the queried chain was an even number of elements.

A possible <4 x i32> store was being split into
store <2 x i32>
store i32
store i32

rather than
store <2 x i32>
store <2 x i32>

when legal.

llvm-svn: 295933
2017-02-23 03:58:53 +00:00
Matt Arsenault a9e16e6597 AMDGPU: Add another BFE pattern
This is the pattern that falls out of the instruction's
definition if offset == 0.

llvm-svn: 295912
2017-02-23 00:23:43 +00:00
Matt Arsenault 79a45db7f5 AMDGPU: Use clamp with f64
llvm-svn: 295908
2017-02-22 23:53:37 +00:00
Matt Arsenault d5c6515b68 AMDGPU: Fold FP clamp as modifier bit
The manual is unclear on the details of this. It's not
clear to me if denormals are not allowed with clamp,
or if that is only omod. Not allowing denorms for
fp16 or fp64 isn't useful so I also question if that
is really a restriction. Same with whether this is valid
without IEEE mode enabled.

llvm-svn: 295905
2017-02-22 23:27:53 +00:00
Wei Ding f2cce02eb2 AMDGPU : Update TrapCode based on Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D30232

llvm-svn: 295904
2017-02-22 23:22:19 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Krzysztof Parzyszek ab57c2bad3 [Hexagon] Implement @llvm.readcyclecounter()
llvm-svn: 295892
2017-02-22 22:28:47 +00:00
Matt Arsenault 7b6c5d28f5 AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR
This should avoid reporting any stack needs to be allocated in the
case where no stack is truly used. An unused stack slot is still
left around in other cases where there are real stack objects
but no spilling occurs.

llvm-svn: 295891
2017-02-22 22:23:32 +00:00
Krzysztof Parzyszek 3596a81c69 [RDF] Support for partial structural aliases in RegisterAggr
llvm-svn: 295883
2017-02-22 21:42:15 +00:00
Krzysztof Parzyszek 65971d97b0 [Hexagon] Add intrinsics for masked vector stores
Patch by Harsha Jagasia.

llvm-svn: 295879
2017-02-22 21:23:09 +00:00
Matt Arsenault 93e65ea733 AMDGPU: Don't look at chain users when adjusting writemask
Fixes not adjusting using new intrinsics with chains.

llvm-svn: 295878
2017-02-22 21:16:41 +00:00
Matt Arsenault 707780b420 AMDGPU: Always allocate emergency stack slot at offset 0
This allows us to ensure that 0 is never a valid pointer
to a user object, and ensures that the offset is always legal
without needing a register to access it. This comes at the cost
of usable offsets and wasted stack space.

llvm-svn: 295877
2017-02-22 21:05:25 +00:00