Commit Graph

41667 Commits

Author SHA1 Message Date
Sjoerd Meijer 7f1a982d3d [ARM] remove FIXMEs and add vcmp MC test
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.

Differential Revision: https://reviews.llvm.org/D30745

llvm-svn: 297376
2017-03-09 13:28:37 +00:00
Simon Dardis 158956c6cc [mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.

This partially resolves PR/27458

Thanks to Quentin Colombet for reporting the issue!

llvm-svn: 297372
2017-03-09 11:19:48 +00:00
Changpeng Fang 1be9b9f816 AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized.
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D30719

llvm-svn: 297328
2017-03-09 00:07:00 +00:00
Krzysztof Parzyszek 1b7197e690 [Hexagon] Use correct offset when extracting from the high word
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).

llvm-svn: 297288
2017-03-08 15:46:28 +00:00
Daniel Cederman 9db582a656 [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: davide, RKSimon, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D27089

llvm-svn: 297285
2017-03-08 15:23:10 +00:00
Simon Pilgrim 836bcc689f [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide
By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage.

llvm-svn: 297267
2017-03-08 09:36:39 +00:00
Tim Shen c7472d912b Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

llvm-svn: 297256
2017-03-08 02:41:35 +00:00
Justin Lebar 1d1cf7ba5d [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & isImageReadWrite calls
This is repetition of isImage() function in NVPTXUtilities.cpp.

Patch by Briana Grace!

Differential Revision: https://reviews.llvm.org/D30706

llvm-svn: 297252
2017-03-08 01:14:15 +00:00
Matt Arsenault 52d1b62a28 AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

llvm-svn: 297251
2017-03-08 01:06:58 +00:00
Matt Arsenault d8ed207a20 AMDGPU: Constant fold rcp node
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.

llvm-svn: 297248
2017-03-08 00:48:46 +00:00
Changpeng Fang 6b49fa4ca7 AMDGPU/SI: Do not insert EndCf in an unreachable block
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D22025

llvm-svn: 297243
2017-03-07 23:29:36 +00:00
Daniel Sanders 52b4ce727a Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297241
2017-03-07 23:20:35 +00:00
Krzysztof Parzyszek 434d50a796 [Hexagon] Check for presence before looking registers up in bit tracker
llvm-svn: 297240
2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek 8e4d2e0512 [Hexagon] Generate bitsplit instruction
llvm-svn: 297239
2017-03-07 23:08:35 +00:00
Artem Belevich 2524a22562 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

llvm-svn: 297198
2017-03-07 20:33:38 +00:00
Joel Jones 2852088126 [AArch64] Vulcan is now ThunderXT99
Broadcom Vulcan is now Cavium ThunderX2T99.

LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113

Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).

Patch was tested with SpecCPU2006.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D30510

llvm-svn: 297190
2017-03-07 19:42:40 +00:00
Daniel Sanders 8ebec37d26 Revert r297177: Change LLT constructor string into an LLT-based object ...
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.

Somehow, this change causes the build to need Attributes.gen before it's been
generated.

llvm-svn: 297188
2017-03-07 19:21:23 +00:00
Sanjoy Das c08a79fbf2 [X86] Add option to specify preferable loop alignment
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.

Patch by Serguei Katkov!

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D30391

llvm-svn: 297178
2017-03-07 18:47:22 +00:00
Daniel Sanders 8612326a08 [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297177
2017-03-07 18:32:25 +00:00
John Brawn eba9fdac7e [ARM] Correct handling of LSL #0 in an IT block
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.

Differential Revision: https://reviews.llvm.org/D30692

llvm-svn: 297142
2017-03-07 14:42:03 +00:00
Krzysztof Parzyszek 3cceffb752 [Hexagon] Do not insert instructions before PHI nodes
llvm-svn: 297141
2017-03-07 14:20:19 +00:00
Ranjeet Singh 3d0af578cc [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645

llvm-svn: 297137
2017-03-07 11:17:53 +00:00
Jonas Paulsson 1d33cd3988 [SystemZ] Add check VT.isSimple() in canTreateAsByteVector()
Since BB-vectorizer can produce vectors of for example 3 elements,
this check is needed.

Review: Ulrich Weigand
llvm-svn: 297136
2017-03-07 09:49:31 +00:00
Artyom Skrobov 1388e2f792 In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

llvm-svn: 297134
2017-03-07 09:38:16 +00:00
Ayman Musa 850fc977c8 [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables.
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.

Differential Revision: https://reviews.llvm.org/D30451

llvm-svn: 297127
2017-03-07 08:11:19 +00:00
Ayman Musa ac5a2c43af [X86][AVX512] Add missing entries to EVEX2VEX tables
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.

Differential Revision: https://reviews.llvm.org/D30501

llvm-svn: 297126
2017-03-07 08:05:53 +00:00
Tim Shen 70054bb827 Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

llvm-svn: 297124
2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov e8aaab8abe Revert "AMDGPU: Set MCAsmInfo::PointerSize"
It breaks line tables because the patch is not complete, working on a complete one at the moment

This reverts commit r294031.

llvm-svn: 297118
2017-03-07 04:44:33 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Chad Rosier 9a70c7c02a [AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.

Differential Revision: https://reviews.llvm.org/D30576.

llvm-svn: 297078
2017-03-06 21:20:00 +00:00
Jan Vesely 3ea1704434 AMDGPU/R600: Fix ALU clause markers use detection
also exit early on kill instead of redefinition.

Differential Revision: https://reviews.llvm.org/D30230

llvm-svn: 297060
2017-03-06 20:10:05 +00:00
Reid Kleckner 812191584f [X86] Fix arg copy elision for illegal types
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.

Fixes PR32136 and re-enables copy elision from i64 arguments.

Reverts the workaround in from r296950.

llvm-svn: 297045
2017-03-06 18:39:39 +00:00
Krzysztof Parzyszek 8a4c601abc [Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.

For example:
  loop:                           loop:
    // loop body                      // loop body
    if (...) jump exit      -->       // more body
  more:                               if (...) jump exit
    // more body                      jump loop
    jump loop

llvm-svn: 297033
2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek e16ce15687 [Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.

llvm-svn: 297032
2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek 143158b72e [Hexagon] Pick a dot-old instruction that matches the architecture
llvm-svn: 297031
2017-03-06 17:03:16 +00:00
Nemanja Ivanovic 12e67d868a [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615

llvm-svn: 297001
2017-03-06 07:32:13 +00:00
Benjamin Kramer bb635e034c [X86] Silence GCC enum compare warning.
X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional
expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'
[-Werror=enum-compare]

llvm-svn: 296986
2017-03-05 12:53:20 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Sanjay Patel b974be5ef4 [x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in 
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're 
creating ADC/SBB in lots of places where we shouldn't.

This was intended to be an NFC change, but avx-512 has something 
strange going on. It doesn't seem like any of the affected tests 
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...

llvm-svn: 296978
2017-03-04 20:35:19 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Simon Pilgrim 40a0e66b37 [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.

Differential Revision: https://reviews.llvm.org/D30213

llvm-svn: 296966
2017-03-04 12:50:47 +00:00
Matthias Braun 21f340fd25 X86ISelLowering: Only perform copy elision on legal types.
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.

This fixes http://llvm.org/PR32136

llvm-svn: 296950
2017-03-04 01:40:40 +00:00
Sanjay Patel a84fd041c6 [x86] check for commuted add pattern to find ADC/SBB
llvm-svn: 296933
2017-03-04 00:18:31 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Sanjay Patel 7ee83b41e0 [x86] refactor combineAddOrSubToADCOrSBB(); NFCI
The comments were wrong, and this is not an obvious transform.
This hopefully makes it clearer that we're missing the commuted
patterns for adds. It's less clear that this is actually a good
transform for all micro-arch.

This is prep work for trying to clean up the current adc/sbb 
codegen because it's definitely not happening optimally.

llvm-svn: 296918
2017-03-03 22:35:11 +00:00
Krzysztof Parzyszek cc31871dc4 Make TargetInstrInfo::isPredicable take a const reference, NFC
llvm-svn: 296901
2017-03-03 18:30:54 +00:00
Sanjay Patel 58e241896d [x86] clean up materializeSBB(); NFCI
This is producing SBB where it is obviously not necessary, so it needs to be limited.

llvm-svn: 296894
2017-03-03 17:58:39 +00:00
Sanjay Patel e8674825fe [x86] fix formatting; NFC
llvm-svn: 296875
2017-03-03 15:17:41 +00:00
Simon Pilgrim c37a32d2b9 Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation
llvm-svn: 296874
2017-03-03 14:37:57 +00:00