Commit Graph

43055 Commits

Author SHA1 Message Date
Jun Bum Lim dee5565869 [CodeGenPrep] move aarch64-type-promotion to CGP
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.

Reviewers: jmolloy, mcrosier, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: llvm-commits, aemerson, rengolin, mcrosier

Differential Revision: https://reviews.llvm.org/D28680

llvm-svn: 299379
2017-04-03 19:20:07 +00:00
Matt Arsenault 754dd3eaef AMDGPU: Remove legacy bfe intrinsics
llvm-svn: 299372
2017-04-03 18:08:08 +00:00
Krzysztof Parzyszek 44173f7d02 [Hexagon] Factor out some common code in HexagonEarlyIfConv.cpp, NFC
llvm-svn: 299367
2017-04-03 17:26:40 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Sjoerd Meijer 1179470ff8 ARMAsmParser: clean up of isImmediate functions
- we are now using immediate AsmOperands so that the range check functions are
  tablegen'ed.
- Big bonus is that error messages become much more accurate, i.e. instead of a
  useless "invalid operand" error message it will not say that the immediate
  operand must in range [x,y], which is why regression tests needed updating.

More tablegen operand descriptions could probably benefit from using
immediateAsmOperand, but this is a first good step to get rid of most of the
nearly identical range check functions. I will address the remaining immediate
operands in next clean ups.

Differential Revision: https://reviews.llvm.org/D31333

llvm-svn: 299358
2017-04-03 14:50:04 +00:00
Simon Pilgrim 0e2f8cd875 [X86][MMX] Improve support for folding fptosi from XMM to MMX
llvm-svn: 299338
2017-04-02 17:45:41 +00:00
Simon Pilgrim ba28263b03 [X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64
llvm-svn: 299336
2017-04-02 16:20:34 +00:00
Simon Pilgrim e56a2d7b4c [X86][MMX] Added support for subvector extraction to MMX register
llvm-svn: 299335
2017-04-02 15:52:28 +00:00
Davide Italiano c88169e61b [AMDGPU] Garbage collect now unused dead code. NFCI.
llvm-svn: 299310
2017-04-01 19:30:17 +00:00
Quentin Colombet 49d70d0529 Revert "Feature generic option to setup start/stop-after/before"
This reverts commit r299282.

Didn't intend to commit this :(

llvm-svn: 299288
2017-04-01 01:26:24 +00:00
Quentin Colombet 35a47010b1 Revert "Instrument SDISel C++ patterns"
This reverts commit r299284.

Didn't intend to commit this :(

llvm-svn: 299286
2017-04-01 01:26:17 +00:00
Quentin Colombet b43da15602 Instrument SDISel C++ patterns
llvm-svn: 299284
2017-04-01 01:21:32 +00:00
Quentin Colombet ffe3053a66 Feature generic option to setup start/stop-after/before
This patch refactors the code used in llc such that all the users of the
addPassesToEmitFile API have access to a homogeneous way of handling
start/stop-after/before options right out of the box.

Previously each user would have needed to duplicate this logic and set
up its own options.

NFC

llvm-svn: 299282
2017-04-01 01:21:24 +00:00
Eric Christopher 60a245e0ff Reduce the number of times we query the subtarget for the same information.
llvm-svn: 299278
2017-03-31 23:12:27 +00:00
Eric Christopher cf965f2f03 Small cleanup to remove extraneous cast.
llvm-svn: 299277
2017-03-31 23:12:24 +00:00
Krzysztof Parzyszek d04c9b999c [Hexagon] Remove unused variables
Found by PVS-Studio. Fixes llvm.org/PR31676.

llvm-svn: 299262
2017-03-31 21:03:59 +00:00
Krzysztof Parzyszek b326411fdc [Hexagon] Fix typo in HexagonEarlyIfCConv.cpp
Found by PVS-Studio. Fixes llvm.org/PR32480.

llvm-svn: 299258
2017-03-31 20:36:00 +00:00
Stanislav Mekhanoshin 12aa5b733e [AMDGPU] Remove assumption that vector and scalar types do not alias
Differential Revision: https://reviews.llvm.org/D31547

llvm-svn: 299250
2017-03-31 20:16:54 +00:00
Matt Arsenault 8edfaee7be AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

llvm-svn: 299246
2017-03-31 19:53:03 +00:00
Jan Vesely 3c99441ef4 AMDGPU/R600: Fix amdgpu alias analysis pass.
R600 uses higher AS number to access kernel parameters

Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520

llvm-svn: 299245
2017-03-31 19:26:23 +00:00
Balaram Makam 2aba753e84 [AArch64] Add new subtarget feature to fold LSL into address mode.
Summary:
This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL.

Reviewers: mcrosier, rengolin, t.p.northover

Subscribers: junbuml, gberry, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31113

llvm-svn: 299240
2017-03-31 18:16:53 +00:00
Craig Topper 9601168670 [AVX-512] Update lowering for gather/scatter prefetch intrinsics to match the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers.
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.

Fixes PR32411.

llvm-svn: 299234
2017-03-31 17:24:29 +00:00
Petar Jovanovic 9bff3b7818 [mips][msa] Prevent output operand from commuting for dpadd_[su].df ins
Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target,
restricting commutativity to second and third operand only for
dpaadd_[su].df instructions therein.

Prior to this change, there were cases where the vector that is to be added
to the dot product of the other two could take a position other than the
first one in the instruction, generating false output in the destination
vector.

Such behavior has been noticed in the two functions generating v2i64 output
values so far. Other ones may exhibit such behavior as well, just not for
the vector operands which are present in the test at the moment.

Tests altered so that the function's first operand is a constant splat so
that it can be loaded with a ldi instruction, since that is the case in
which the erroneous instruction operand placement has occurred. We check
that the register which is present in the ldi instruction is placed as the
first operand in the corresponding dpadd instruction.

Patch by Stefan Maksimovic.

Differential Revision: https://reviews.llvm.org/D30827

llvm-svn: 299223
2017-03-31 14:31:55 +00:00
Jonas Paulsson c7bb22e75f [SystemZ] Make sure of correct regclasses in insertSelect()
Since LOCR only accepts GR32 virtual registers, its operands must be copied
into this regclass in insertSelect(), when an LOCR is built. Otherwise, the
case where the source operand was GRX32 will produce invalid IR.

Review: Ulrich Weigand
llvm-svn: 299220
2017-03-31 14:06:59 +00:00
Simon Pilgrim 3c81c34d8d [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

llvm-svn: 299219
2017-03-31 13:54:09 +00:00
Jonas Paulsson 56bb0857e9 [SystemZ] Skip DAGCombining of vector node for older subtargets.
Even on older subtargets that lack vector support, there may be vector values
with just one element in the input program. These are converted during DAG
legalization to scalar values.

The pre-legalize SystemZ DAGCombiner methods should in this circumstance not
touch these nodes. This patch adds a check for this in
SystemZTargetLowering::combineEXTRACT_VECTOR_ELT().

Review: Ulrich Weigand
llvm-svn: 299213
2017-03-31 13:22:59 +00:00
Sam Kolton 27e0f8bc72 [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.

llvm-svn: 299202
2017-03-31 11:42:43 +00:00
Simon Pilgrim 37b536e4b3 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Eric Christopher 9fd267c221 Temporarily revert "[PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64" as it's causing test failures, I've given Carrot a testcase offline.
This reverts commit r298955.

llvm-svn: 299153
2017-03-31 02:16:54 +00:00
Dan Gohman 970d02c42d [WebAssembly] Initial linking metadata support
Add support for the new relocations and linking metadata section support in
https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In
particular, this allows LLVM to indicate which variable is the stack pointer,
so that it can be linked with other objects.

This also adds support for emitting type relocations for call_indirect
instructions.

Right now, this is mainly tested by using wabt and hexdump to examine the
output on selected testcases. We'll add more tests as the design stablizes
and more of the pieces are in place.

llvm-svn: 299141
2017-03-30 23:58:19 +00:00
Matt Arsenault 1074cb5420 AMDGPU: Rename isKernel
What we really want to do is distinguish functions that may
be called by other functions, and graphics shaders are not
called kernels.

llvm-svn: 299140
2017-03-30 23:58:04 +00:00
Matt Arsenault 79f837c254 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

llvm-svn: 299122
2017-03-30 22:21:40 +00:00
Craig Topper 3001b35189 [AVX-512] Fix bad comment from r299112. NFC
llvm-svn: 299114
2017-03-30 21:05:33 +00:00
Craig Topper 533b1bde1b [AVX-512] Fix another case where fastisel was generating a GR8 to VK1 copy. This time after calls returning i1.
Fixes PR32472.

llvm-svn: 299112
2017-03-30 21:02:52 +00:00
Stanislav Mekhanoshin 89653dfd2a [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108
2017-03-30 20:16:02 +00:00
Davide Italiano a0bd28c4d9 [AArch64ISelLowering] Remove `else` after `return` in LowerGlobalTLSAddress.
llvm-svn: 299103
2017-03-30 19:52:31 +00:00
Davide Italiano de05686ec6 [AArch64] Simplify isSingExtended()/isZeroExtended(). NFCI.
llvm-svn: 299102
2017-03-30 19:46:18 +00:00
Simon Pilgrim 68168d17b9 Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Simon Pilgrim ef4509b36e Spelling mistakes in comments. NFCI.
llvm-svn: 299069
2017-03-30 12:30:15 +00:00
Davide Italiano e13920f407 [X86IselLowering] Remove extraneous semicolon. NFCI.
Unbreaks the build with GCC -Werror.

llvm-svn: 299030
2017-03-29 21:34:58 +00:00
Simon Pilgrim 8362c95257 [X86] Tidied up comment - we don't custom lower add/sub i64 on i686 anymore. NFCI.
llvm-svn: 299004
2017-03-29 15:41:58 +00:00
Simon Pilgrim fc97d5049f Spelling mistakes in comments. NFCI.
llvm-svn: 299000
2017-03-29 15:27:24 +00:00
Simon Pilgrim 2845189bd1 [X86][AVX2] Prevent unary interleaving patterns from calling lowerVectorShuffleAsSplitOrBlend (PR32453)
llvm-svn: 298993
2017-03-29 13:00:00 +00:00
Simon Pilgrim b670ba4e87 [AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI.
Based on comment in D31249.

llvm-svn: 298991
2017-03-29 12:09:25 +00:00
Simon Pilgrim ebd433d9fc [X86] Removed old comment. NFCI.
No longer makes sense as the previous opcode mnemonic it was referring to is long gone.

llvm-svn: 298988
2017-03-29 10:44:51 +00:00
Eric Christopher 5829741c46 Move the x86 cpu feature rtm from Haswell to Skylake matching clang commit r298956.
llvm-svn: 298986
2017-03-29 07:40:44 +00:00
Craig Topper d9f51350b8 [AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to GR32 is enough.
llvm-svn: 298985
2017-03-29 07:31:56 +00:00
Craig Topper d284606327 [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.

llvm-svn: 298984
2017-03-29 06:55:28 +00:00
Craig Topper 331297c62e [AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.

After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.

This fixes PR32451.

llvm-svn: 298957
2017-03-28 23:20:37 +00:00
Guozhi Wei f8d40181c9 [PPC] In PPCBoolRetToInt change the bool value to i64 if the target is ppc64
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.

This patch fixed PR32442.

Differential Revision: https://reviews.llvm.org/D31407

llvm-svn: 298955
2017-03-28 22:55:01 +00:00
Stanislav Mekhanoshin baf31ac7c8 [AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.

Differential Revision: https://reviews.llvm.org/D31412

llvm-svn: 298948
2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin b933c3f554 [AMDGPU] Fix recorded region boundaries in max-occupancy scheduler
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.

Differential Revision: https://reviews.llvm.org/D31434

llvm-svn: 298945
2017-03-28 21:48:54 +00:00
Simon Pilgrim c7c5aa47cf [X86][MMX] Match MMX fp_to_sint conversions from XMM registers
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).

This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.

Differential Revision: https://reviews.llvm.org/D30868

llvm-svn: 298943
2017-03-28 21:32:11 +00:00
Stanislav Mekhanoshin 9053f22eeb [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935
2017-03-28 18:23:24 +00:00
Sanjay Patel f01a1dad7f [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Follow-up to:
https://reviews.llvm.org/rL298775

llvm-svn: 298933
2017-03-28 17:23:49 +00:00
Weiming Zhao da4d12a8e5 Revert "Dont emit Mapping symbols for sections that contain only data."
It breaks some lld tests.

This reverts commit 3a50eea6d9732ab40e9a7aebe6be777b53a8b35c.

llvm-svn: 298932
2017-03-28 17:15:11 +00:00
Simon Pilgrim 3e2aa7f40e [X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW
llvm-svn: 298929
2017-03-28 16:40:38 +00:00
Craig Topper 058f2f6d72 [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

Differential Revision: https://reviews.llvm.org/D30968

llvm-svn: 298928
2017-03-28 16:35:29 +00:00
Simon Pilgrim 6b30172372 [X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support easier. NFCI.
Call the matchVectorShuffleAsBlend test as early as possible.

llvm-svn: 298925
2017-03-28 15:50:23 +00:00
Simon Pilgrim aa675ca77d Fix signed/unsigned comparison warning
llvm-svn: 298917
2017-03-28 13:40:09 +00:00
Simon Pilgrim d48f47e25c [X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining.
Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining.

llvm-svn: 298914
2017-03-28 13:05:48 +00:00
Simon Pilgrim 61437ebaf4 Wdocumentation fix
llvm-svn: 298911
2017-03-28 12:29:09 +00:00
Simon Pilgrim 6afe0e2833 [X86][SSE] Set second operand to undef instead of first operand in unary shuffle combines.
Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier.

llvm-svn: 298910
2017-03-28 12:16:42 +00:00
Simon Pilgrim defee5683c Strip trailing whitespace
llvm-svn: 298909
2017-03-28 11:15:17 +00:00
Sanne Wouda d4658ee634 [AArch64] [Assembler] option to disable negative immediate conversions
Summary:
Similar to the ARM target in https://reviews.llvm.org/rL298380, this
patch adds identical infrastructure for disabling negative immediate
conversions, and converts the existing aliases to the new infrastucture.

Reviewers: rengolin, javed.absar, olista01, SjoerdMeijer, samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D31243

llvm-svn: 298908
2017-03-28 10:02:56 +00:00
Igor Breger f580fce2c3 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    G_LOAD/G_STORE, add alternative RegisterBank mapping.
    For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.

    Reviewers: zvi, rovka, qcolombet, ab

    Reviewed By: zvi

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30979

llvm-svn: 298907
2017-03-28 09:35:06 +00:00
Valery Pykhtin 9f3eca96eb [AMDGPU] Update SI scheduler colorHighLatenciesGroups
Depends on rL298896: MachineScheduler/ScheduleDAG: Add support for GetSubGraph

Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30152

llvm-svn: 298902
2017-03-28 07:19:48 +00:00
Weiming Zhao 320848458b Dont emit Mapping symbols for sections that contain only data.
Summary:
Dont emit mapping symbols for sections that contain only data.

Patched by Shankar Easwaran <shankare@codeaurora.org>

Reviewers: rengolin, peter.smith, weimingz, kparzysz, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, llvm-commits

Differential Revision: https://reviews.llvm.org/D30724

llvm-svn: 298901
2017-03-28 05:40:36 +00:00
Eric Christopher f48ef3355f Remove an oddly unnecessary temporary.
llvm-svn: 298888
2017-03-27 22:40:51 +00:00
Javed Absar 3d59437093 Improve machine schedulers for in-order processors
This patch enables schedulers to specify instructions that 
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.

Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744

llvm-svn: 298885
2017-03-27 20:46:37 +00:00
Valery Pykhtin fb9905545c [AMDGPU] SISched: Detect dependency types between blocks
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30153

llvm-svn: 298872
2017-03-27 18:22:39 +00:00
Ahmed Bougacha f0b22c471b [GlobalISel][AArch64] Extract a variable out of an NDEBUG block. NFC.
r298863 used PtrReg, but that's never defined in release builds. Fix it.

llvm-svn: 298869
2017-03-27 18:14:20 +00:00
Ahmed Bougacha f75782f9dc [GlobalISel][AArch64] Fold FI into LDR/STR ui addressing mode.
A majority of loads and stores at O0 access an alloca.

It's trivial to fold the G_FRAME_INDEX into the instruction; do it.

llvm-svn: 298864
2017-03-27 17:31:56 +00:00
Ahmed Bougacha 8a654085d0 [GlobalISel][AArch64] Fold G_GEP into LDR/STR ui addressing mode.
We're not to the point of supporting the load/store patterns yet
(because they extensively use PatFrags).

But in the meantime, we can implement some of the simplest addressing
modes.

llvm-svn: 298863
2017-03-27 17:31:52 +00:00
Ahmed Bougacha 85a66a6d9f [GlobalISel][AArch64] Select store of zero to WZR/XZR.
These occur very frequently, and are quite trivial to catch.

llvm-svn: 298862
2017-03-27 17:31:48 +00:00
Valery Pykhtin ba3a4def29 [AMDGPU] SISched: Update colorEndsAccordingToDependencies
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30150

llvm-svn: 298861
2017-03-27 17:26:40 +00:00
Valery Pykhtin f70f683670 [AMDGPU] Fix SI scheduler LiveOut Refcount issue
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30145

llvm-svn: 298857
2017-03-27 17:06:36 +00:00
Ahmed Bougacha 641cb203b6 [GlobalISel][AArch64] Select CBZ.
CBZ/CBNZ represent a substantial portion of all conditional branches.
Look through G_ICMP to select them.

We can't use tablegen yet because the existing patterns match an
AArch64ISD node.

llvm-svn: 298856
2017-03-27 16:35:31 +00:00
Dmitry Preobrazhensky c512d44845 [AMDGPU][MC] Fix for Bug 28207 + LIT tests
Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type

Reviewers: vpykhtin, arsenm

Differential Revision: https://reviews.llvm.org/D31327

llvm-svn: 298852
2017-03-27 15:57:17 +00:00
Chad Rosier 862a41270f [AArch64] Mark mrs of TPIDR_EL0 (thread pointer) as not having side effects.
Among other things, this allows Machine LICM to hoist a costly 'mrs'
instruction from within a loop.

Differential Revision: http://reviews.llvm.org/D31151

llvm-svn: 298851
2017-03-27 15:52:38 +00:00
Yaxun Liu 1a14bfa022 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846
2017-03-27 14:04:01 +00:00
Gadi Haber 89d5f9391a [X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
 The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
 In this case using vector unpack instructions is more efficient.

Reviewers:
zvi  
delena  
igorb  
craig.topper  
guyblank  
eladcohen  
m_zuckerman  
aymanmus  
RKSimon 

llvm-svn: 298840
2017-03-27 12:13:37 +00:00
Davide Italiano a2c4e4b929 [Target] Remove some code probably copy/pasted from another backend.
llvm-svn: 298825
2017-03-26 21:45:04 +00:00
Davide Italiano 5c2aa5d3e4 [MachineScheduler] Reference the correct header.
llvm-svn: 298823
2017-03-26 21:27:21 +00:00
Simon Pilgrim 92925ea701 [X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions
llvm-svn: 298806
2017-03-26 13:17:55 +00:00
Simon Pilgrim 049d9c921f [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk
Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481)

The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X

Differential Revision: https://reviews.llvm.org/D31200

llvm-svn: 298805
2017-03-26 12:52:28 +00:00
Igor Breger 531a203a06 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    Support G_FRAME_INDEX instruction selection.

    Reviewers: zvi, rovka, ab, qcolombet

    Reviewed By: ab

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30980

llvm-svn: 298800
2017-03-26 08:11:12 +00:00
Simon Pilgrim bec234c970 [X86] Pull out repeated ScalarValueSizeInBits code. NFCI.
llvm-svn: 298783
2017-03-25 21:22:12 +00:00
Simon Pilgrim c0720a4052 [X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1))
Part 3 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298782
2017-03-25 20:43:01 +00:00
Simon Pilgrim 6397963c81 [X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI
Part 2 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298780
2017-03-25 19:58:36 +00:00
Simon Pilgrim 5400a4d0af [X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK
Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask.

Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm.

Part 1 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298779
2017-03-25 19:50:14 +00:00
Sanjay Patel 9ebb68843e [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, 
we can replace memcmp calls with inline code that is both smaller and faster.

Differential Revision: https://reviews.llvm.org/D31290

llvm-svn: 298775
2017-03-25 16:05:33 +00:00
Balaram Makam cf0e5e1c62 [AArch64] Refine Falkor Machine Model - Part1
llvm-svn: 298768
2017-03-25 04:02:39 +00:00
Yaxun Liu 14834c3e3d [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758
2017-03-25 02:05:44 +00:00
Eli Friedman 95ddd18703 [ARM] Fix mixup between Lo and Hi in SMLALBB formation.
llvm-svn: 298752
2017-03-25 00:13:24 +00:00
Jessica Paquette eac8633d6d [Outliner] Revert r298734.
When I tested r298734, I thought that red zones were enabled by default like in
X86. Since red zones are behind a flag on AArch64 the testing wasn't true.

llvm-svn: 298747
2017-03-24 23:00:21 +00:00
Matt Arsenault 0607a4427b AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.

llvm-svn: 298737
2017-03-24 20:57:10 +00:00
Jessica Paquette 167af85ec7 [Outliner] Remove no red zone requirment for AArch64
AArch64 doesn't require -mno-red-zone; stack fixups are sufficient here. This was
unnecessarily copied over from the X86 target.

(You can now outline with red zones! Yay!)

Removing the requirement passes all Single/MultiSource tests.

llvm-svn: 298734
2017-03-24 20:47:59 +00:00
Matt Arsenault b5d23271e2 AMDGPU: Implement f16 fround
llvm-svn: 298730
2017-03-24 20:04:18 +00:00
Matt Arsenault b8f8dbc227 AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729
2017-03-24 19:52:05 +00:00
Matt Arsenault 18bb24a1be TTI: Split IsSimple in MemIntrinsicInfo
All this did before was assert in EarlyCSE.

llvm-svn: 298724
2017-03-24 18:56:43 +00:00
Stanislav Mekhanoshin 70603dcef2 [AMDGPU] Fold V_CNDMASK with identical source operands
Such instructions sometimes appear after lowering and folding.

Differential Revision: https://reviews.llvm.org/D31318

llvm-svn: 298723
2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov 4986d9fb45 [AMDGPU] Rename Kind to ValueKind in metadata to be consistent
llvm-svn: 298722
2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin a27b2cac03 [AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline
Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

llvm-svn: 298721
2017-03-24 18:01:14 +00:00
Benjamin Kramer 80e3d5bb24 [AMDGPU] Don't enforce constexpr, there are still old standard libraries around that don't have a constexpr std::pair.
llvm-svn: 298719
2017-03-24 17:53:06 +00:00
Valery Pykhtin e2419dc907 [AMDGPU] Remove double map lookups in SI scheduler
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30382

llvm-svn: 298718
2017-03-24 17:49:05 +00:00
Valery Pykhtin f7d1023a73 [AMDGPU] Fix SGPR usage count in SI scheduler
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30149

llvm-svn: 298710
2017-03-24 16:45:50 +00:00
Valery Pykhtin 57ab699933 [AMDGPU] Add a new line after a debug message
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30146

llvm-svn: 298708
2017-03-24 16:37:48 +00:00
Simon Pilgrim 6aac646308 [X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all bits' result, not just comparisons.
Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode

Covered by existing tests added for D22814.

llvm-svn: 298704
2017-03-24 16:12:31 +00:00
Benjamin Kramer c06d672a7a Don't build up std::vectors with constant sizes when an array suffices.
NFC.

llvm-svn: 298701
2017-03-24 14:11:47 +00:00
Meador Inge 5d3c599e82 [AVR] Fix build after r298178
r298178 capitalized the fields in `ArgListEntry`.  All the official
targets were updated accordingly, but as an experimental target AVR
was missed.

llvm-svn: 298677
2017-03-24 01:57:29 +00:00
Krzysztof Parzyszek 10fbac009d [Hexagon] Avoid infinite loops in HexagonLoopIdiomRecognition
- Avoid explosive growth of the simplification queue by not queuing
  expressions that are alredy in it.
- Add an iteration counter and abort after a sufficiently large number
  of iterations (assuming that it's a symptom of an infinite loop).

llvm-svn: 298655
2017-03-23 23:01:22 +00:00
Eric Christopher c78be4d3be Kill some trailing whitespace to make some new changes a bit easier.
llvm-svn: 298637
2017-03-23 19:41:10 +00:00
Nirav Dave 9ebefeb9b1 [X86] Fix Stale SDNode use in X86ISelDAGtoDAG
Summary: Fixes pr32329.

Reviewers: spatel, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31286

llvm-svn: 298633
2017-03-23 18:25:17 +00:00
Eric Christopher cff8492492 Remove the subtarget argument from LowerFP_TO_INT since there's one
stored on X86TargetLowering.

llvm-svn: 298628
2017-03-23 17:35:08 +00:00
Eric Christopher a19a14b42f Remove unused X86Subtarget argument from getOnesVector.
llvm-svn: 298627
2017-03-23 17:35:06 +00:00
Pirama Arumuga Nainar bc26482717 [ARM] Fix computeKnownBits for ARMISD::CMOV
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead.  This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.

This patch fixes the issue and adds a test case.

Reviewers: kristof.beyls, jmolloy

Subscribers: llvm-commits, aemerson, srhines, rengolin

Differential Revision: https://reviews.llvm.org/D31265

llvm-svn: 298624
2017-03-23 16:47:47 +00:00
Simon Pilgrim 1c048ab6ba [X86][SSE] Extract elements from narrower shuffle masks.
Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle.

llvm-svn: 298616
2017-03-23 16:09:34 +00:00
Igor Breger a8ba572dcf [GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE

Reviewers: zvi, rovka, ab, qcolombet

Reviewed By: rovka

Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

Differential Revision: https://reviews.llvm.org/D30973

llvm-svn: 298609
2017-03-23 15:25:57 +00:00
Zvi Rackover db4b032205 X86FixupBWInsts: Minor cleanup. NFC
Summary: Cleanup some remnants of code from when the X86FixupBWInsts pass did both forward liveness analysis and backward liveness analysis.

Reviewers: MatzeB, myatsina, DavidKreitzer

Reviewed By: MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31264

llvm-svn: 298599
2017-03-23 14:08:26 +00:00
Strahinja Petrovic f9fa62e576 [Mips] Emit the correct DINS variant
This patch fixes emitting of correct variant of DINS instruction.

Differential Revision: https://reviews.llvm.org/D30988

llvm-svn: 298596
2017-03-23 13:40:07 +00:00
Simon Pilgrim 8a18299f20 [X86][SSE] Tidyup canWidenShuffleElements. NFCI.
Pull out mask elements at the start, allowing us to make the widening pattern matching more readable.

llvm-svn: 298594
2017-03-23 13:33:03 +00:00
Strahinja Petrovic cac14b5334 [Mips] Fix for decoding DINS instruction - disassembler
This patch fixes decoding of size and position for DINSM
and DINSU instructions.

Differential Revision: https://reviews.llvm.org/D31072

llvm-svn: 298593
2017-03-23 13:19:04 +00:00
Igor Breger 8a924bea78 [GlobalISel][X86] clang-format. NFC
llvm-svn: 298590
2017-03-23 12:13:29 +00:00
Michael Zuckerman 85436ece89 [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).

Differential Revision: https://reviews.llvm.org/D30654

llvm-svn: 298586
2017-03-23 09:57:01 +00:00
Davide Italiano 6974dd6412 [ARM] Reduce code duplication by factoring out in a lambda. NFCI.
llvm-svn: 298572
2017-03-23 01:34:45 +00:00
Davide Italiano 0145e751c4 [AArch64] Drive-by cleanup, make this code shorter. NFCI.
llvm-svn: 298563
2017-03-22 23:37:58 +00:00
Artyom Skrobov 92c0653095 Reapply r298417 "[ARM] Recommit the glueless lowering of addc/adde in Thumb1"
The UB in t2_so_imm_neg conversion has been addressed under D31242 / r298512

This reverts commit r298482.

llvm-svn: 298562
2017-03-22 23:35:51 +00:00
Konstantin Zhuravlyov 4cbb68959b [AMDGPU] Do not emit isa info as code object metadata
- It was decided to expose this information through other means (rocr)

Differential Revision: https://reviews.llvm.org/D30970

llvm-svn: 298560
2017-03-22 23:27:09 +00:00
Artyom Skrobov ee66d6e18e [ARM] simplifying t2_so_imm_neg as suggested by Eli Friedman in D31242 (NFC)
llvm-svn: 298559
2017-03-22 23:12:59 +00:00
Konstantin Zhuravlyov a780ffaac2 [AMDGPU] Emit kernel debug properties as code object metadata
Differential Revision: https://reviews.llvm.org/D30969

llvm-svn: 298558
2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov ca0e7f6472 [AMDGPU] Emit kernel code properties as code object metadata
- These are not required for low level runtime

Differential Revision: https://reviews.llvm.org/D29949

llvm-svn: 298556
2017-03-22 22:54:39 +00:00
Eric Christopher fd8510cfec Clean up some Subtarget uses and casts in the X86 backend, removing unnecessary work or calls.
llvm-svn: 298555
2017-03-22 22:44:52 +00:00
Konstantin Zhuravlyov 7498cd61fb [AMDGPU] Restructure code object metadata creation
- Rename runtime metadata -> code object metadata
  - Make metadata not flow
  - Switch enums to use ScalarEnumerationTraits
  - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
  - Introduce in-memory representation for attributes
  - Code object metadata streamer
  - Create metadata for isa and printf during EmitStartOfAsmFile
  - Create metadata for kernel during EmitFunctionBodyStart
  - Finalize and emit metadata to .note during EmitEndOfAsmFile
  - Other minor improvements/bug fixes

Differential Revision: https://reviews.llvm.org/D29948

llvm-svn: 298552
2017-03-22 22:32:22 +00:00
Konstantin Zhuravlyov eb685e5f27 [AMDGPU] Fix bug 31610
Differential Revision: https://reviews.llvm.org/D31258

llvm-svn: 298551
2017-03-22 21:48:18 +00:00
Artyom Skrobov 50a066b313 [ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one.
Summary: Thanks to Vitaly Buka for helping catch this.

Reviewers: rengolin, jmolloy, efriedma, vitalybuka

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31242

llvm-svn: 298512
2017-03-22 15:09:30 +00:00
Dmitry Preobrazhensky 895d377dc7 [AMDGPU][MC] Fix for Bug 28204 + LIT tests
Fixed v_mad_i64_i32/u64_u32 encoding

Reviewers: artem.tamazov

Differential Revision: https://reviews.llvm.org/D30828

llvm-svn: 298502
2017-03-22 13:31:01 +00:00
Simon Pilgrim b19a507a88 [X86] Remove unnecessary duplicate code (PR30649). NFCI.
llvm-svn: 298495
2017-03-22 11:23:49 +00:00
Craig Topper 3eb6ff9d09 [X86] Remove an unused function from release builds. Reported by gccs unused function warning.
llvm-svn: 298485
2017-03-22 06:07:58 +00:00
Jonas Paulsson 808c89f467 [SystemZ] Don't drop any operands in expandZExtPseudo()
Make sure that any operands, e.g. of an implicit def of a super reg is
transferred to the new instruction.

Review: Ulrich Weigand
llvm-svn: 298484
2017-03-22 06:03:32 +00:00
Vitaly Buka e69c137f90 Revert "[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648."
Fails check-llvm with ubsan

This reverts commit r298417.

llvm-svn: 298482
2017-03-22 05:07:44 +00:00
Matt Arsenault 513cb7a87d AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOG
llvm-svn: 298454
2017-03-21 22:28:48 +00:00
Matt Arsenault 5b20fbb748 AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452
2017-03-21 22:18:10 +00:00
George Burgess IV 56c7e88c2c Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430
2017-03-21 20:08:59 +00:00
Coby Tayree 07a8974c48 [X86][MS-compatability][llvm] allow MS TYPE/SIZE/LENGTH operators as a part of a compound expression
This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions.
Currently - only supported as a stand alone Operand, e.g.:
"TYPE X"
now allowed :
"4 + TYPE X * 128"

Clang side: https://reviews.llvm.org/D31174

Differential Revision: https://reviews.llvm.org/D31173

llvm-svn: 298425
2017-03-21 19:31:55 +00:00
Davide Italiano 200e5e184a [X86] Remove extra semicolon to placate GCC. NFCI.
llvm-svn: 298423
2017-03-21 19:17:23 +00:00
Artyom Skrobov 40a4f40679 [ARM] Recommit the glueless lowering of addc/adde in Thumb1,
including the amended (no UB anymore) fix for adding/subtracting -2147483648.

This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""

llvm-svn: 298417
2017-03-21 18:39:41 +00:00
Krzysztof Parzyszek d033d1fd82 Recommit r298282 with fixes for memory allocation/deallocation
[Hexagon] Recognize polynomial-modulo loop idiom again

Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298400
2017-03-21 17:09:27 +00:00
Marek Olsak 5c7a61d221 AMDGPU: Buffer descriptor changes for GFX9
Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31158

llvm-svn: 298397
2017-03-21 17:00:39 +00:00
Marek Olsak e22fdb9cac AMDGPU: Always use VGPR indexing on GFX9
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31157

llvm-svn: 298396
2017-03-21 17:00:32 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Matt Arsenault 5af82a7ae1 AMDGPU: Fix not including v2i16/v2f16 in register class
llvm-svn: 298390
2017-03-21 16:42:50 +00:00
Matt Arsenault f8fb605a68 AMDGPU: Fix asserting on 0 dmask for image intrinsics
Fold these to undef during lowering so users get eliminated.

llvm-svn: 298387
2017-03-21 16:32:17 +00:00
Sanne Wouda 2409c6403d [ARM] [Assembler] Support negative immediates for A32, T32 and T16
Summary:
To support negative immediates for certain arithmetic instructions, the
instruction is converted to the inverse instruction with a negated (or inverted)
immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD
instruction.  However, "SUB r0, r1, #1" is equivalent.

These conversions are different from instruction aliases.  An alias maps
several assembler instructions onto one encoding.  A conversion, however, maps
an *invalid* instruction--e.g. with an immediate that cannot be represented in
the encoding--to a different (but equivalent) instruction.

Several instructions with negative immediates were being converted already, but
this was not systematically tested, nor did it cover all instructions.

This patch implements all possible substitutions for ARM, Thumb1 and
Thumb2 assembler and adds tests.  It also adds a feature flag
(-mattr=+no-neg-immediates) to turn these substitutions off.  This is
helpful for users who want their code to assemble to exactly what they
wrote.

Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin

Reviewed By: javed.absar

Subscribers: aadg, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D30571

llvm-svn: 298380
2017-03-21 14:59:17 +00:00
Sanjay Patel 79379cae15 [x86] use PMOVMSK for vector-sized equality comparisons
We could do better by splitting any oversized type into whatever vector size the target supports, 
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.

Differential Revision: https://reviews.llvm.org/D31156

llvm-svn: 298376
2017-03-21 13:50:33 +00:00
Valery Pykhtin fd4c410f4d [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
2017-03-21 13:15:46 +00:00
Sam Kolton f60ad58dad [ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
2017-03-21 12:51:34 +00:00
Andrea Di Biagio 7937be7dd3 [DebugInfo][X86] Teach Optimize LEAs pass to handle debug values
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.

For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.

Patch by Andrew Ng.

Differential Revision: https://reviews.llvm.org/D30835

llvm-svn: 298360
2017-03-21 11:36:21 +00:00
Jonas Paulsson bd65421f08 [SystemZ] Don't drop MO flags in foldMemoryOperandImpl()
The def operand of the new LG/LD should have the old def operands
flags and subreg index.

New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll

Review: Ulrich Weigand
llvm-svn: 298341
2017-03-21 05:49:40 +00:00
Vitaly Buka c12716e742 Revert "[Hexagon] Recognize polynomial-modulo loop idiom again"
Fix memory leaks on check-llvm tests detected by Asan.

This reverts commit r298282.

llvm-svn: 298329
2017-03-21 00:59:51 +00:00
Eli Friedman 76732acc23 [ARM] Revert r297443 and r297820.
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs.  It's not
clear when they will be fixed, so let's just take them out
of the tree for now.

(I resolved a small conflict with r297453.)

llvm-svn: 298328
2017-03-21 00:26:39 +00:00
Vadzim Dambrouski ba789cbd3d [ARM] Fix PR32130: Handle promotion of zero sized constants.
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.

Patch by Tim Neumann.

Differential Revision: https://reviews.llvm.org/D31116

llvm-svn: 298320
2017-03-20 22:59:57 +00:00
Evgeniy Stepanov e829eecc05 [Fuchsia] Use %gs for ABI slots under -mcmodel=kernel
Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather
than %fs to access ABI slots for stack-protector and safe-stack

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D30870

llvm-svn: 298302
2017-03-20 20:35:37 +00:00
Krzysztof Parzyszek 8490251de3 [Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298282
2017-03-20 18:12:58 +00:00
Konstantin Zhuravlyov 2534bc07f4 [AMDGPU] Run always inliner early in opt
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
2017-03-20 18:06:45 +00:00
Dmitry Preobrazhensky 1e124e1825 [AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT tests
This fix enables sp3 abs modifier with constants

Reviewers: artem.tamazov

Differential Revision: https://reviews.llvm.org/D30825

llvm-svn: 298265
2017-03-20 16:33:20 +00:00
Jessica Paquette 02cbfb2926 [Outliner] ACTUALLY remove the errs output
I don't know how to type. This fixes the last commit which would have made all
of the overflows legal, and kept the screaming.

llvm-svn: 298263
2017-03-20 16:25:04 +00:00
Jessica Paquette 5d59a4ee19 [Outliner] Remove output for offset range check
Forgot to remove some output before committing last time. (Instruction fixups
don't actually overflow anywhere in the test suite so far, so I missed it).

To prevent the outliner from screaming "Overflow!" in the event that that
does happen, this commit removes that output.

llvm-svn: 298260
2017-03-20 15:51:45 +00:00
Dmitry Preobrazhensky 40af9c35d3 [AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT tests
Fixed several related issues with VOP3 fp modifiers.

Reviewers: artem.tamazov

Differential Revision: https://reviews.llvm.org/D30821

llvm-svn: 298255
2017-03-20 14:50:35 +00:00
Diana Picus d79253a9f7 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

llvm-svn: 298254
2017-03-20 14:40:18 +00:00
Konstantin Zhuravlyov 8a67eb144f Revert "[AMDGPU] Run always inliner early in opt"
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
2017-03-20 09:26:08 +00:00
Craig Topper 5992c8d1dc [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel
Summary:
Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust.

This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely.

Reviewers: zvi, delena, RKSimon, igorb

Reviewed By: igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31056

llvm-svn: 298228
2017-03-19 17:11:09 +00:00
Simon Pilgrim 5fa1b9a12f Fix MSVC warning: "switch statement contains 'default' but no 'case' labels". NFCI.
llvm-svn: 298225
2017-03-19 16:39:04 +00:00
Oren Ben Simhon 0ef61ec32a [MIR] Support Customed Register Mask and CSRs
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.

Differential Revision: https://reviews.llvm.org/D30971

llvm-svn: 298207
2017-03-19 08:14:18 +00:00
Matthias Braun e6ff30b696 ExecutionDepsFix: Let targets specialize the pass; NFC
Let targets specialize the pass with the register class so we can get a
parameterless default constructor and can put the pass into the pass
registry to enable testing with -run-pass=.

llvm-svn: 298184
2017-03-18 05:08:58 +00:00
Matthias Braun e9f8209e87 ExecutionDepsFix: Normalize names; NFC
Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and
ExecutionDepsFix to the last one.

llvm-svn: 298183
2017-03-18 05:05:40 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Nirav Dave 6de2c77944 Capitalize ArgListEntry fields. NFC.
llvm-svn: 298178
2017-03-18 00:43:57 +00:00
Stanislav Mekhanoshin 8e45acfc38 [AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172
2017-03-17 23:56:58 +00:00
Jessica Paquette ea8cc09be0 [Outliner] Add outliner for AArch64
This commit adds the necessary target hooks for outlining in AArch64. It also
refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a
more general function, `getMemOpInfo`. This allows the outliner to share that
code without copying and pasting it.

The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with
the X86-64 outliner.

The test for this pass verifies that the outliner does, in fact outline
functions, fixes up the stack accesses properly, and can correctly generate a
tail call. In the future, this test should be replaced with a MIR test, so that
we can properly test immediate offset overflows in fixed-up instructions.

llvm-svn: 298162
2017-03-17 22:26:55 +00:00
Matt Arsenault 59ece95f6c AMDGPU: Fix broken condition in hazard recognizer
Fixes bug 32248.

llvm-svn: 298125
2017-03-17 21:36:28 +00:00
Matt Arsenault e70d5dcf3e AMDGPU: Fix handling of constant phi input loop conditions
If the loop condition was an i1 phi with a constantexpr input, this
would add a loop intrinsic fed by a phi dependent on a call to
if.break in the same block. Insert the call in the loop header.

llvm-svn: 298121
2017-03-17 20:52:21 +00:00
Matt Arsenault c5b641ac02 AMDGPU: Cleanup control flow intrinsics
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.

It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.

llvm-svn: 298119
2017-03-17 20:41:45 +00:00
Sanjay Patel 455703a0c6 [x86] clean up setcc with negated operand transform and add missing test; NFCI
llvm-svn: 298118
2017-03-17 20:29:40 +00:00
Reid Kleckner edf1cbb580 [X86] Emit fewer instructions to allocate >16GB stack frames
Summary:
Use this code pattern when RAX is live, instead of emitting up to 2
billion adjustments:
  pushq %rax
  movabsq +-$Offset+-8, %rax
  addq %rsp, %rax
  xchg %rax, (%rsp)
  movq (%rsp), %rsp

Try to clean this code up a bit while I'm here. In particular, hoist the
logic that handles the entire adjustment with `movabsq $imm, %rax` out
of the loop.

This negates the offset in the prologue and uses ADD because X86 only
has a two operand subtract which always subtracts from the destination
register, which can no longer be RSP.

Fixes PR31962

Reviewers: majnemer, sdardis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30052

llvm-svn: 298116
2017-03-17 20:25:49 +00:00
Sanjay Patel 25bd713d33 [x86] avoid adc/sbb assert when both sides of add are zexted (PR32316)
As noted in the comment, we might want to account for this case,
but I didn't look at what that would mean for the asm. 

I'm also not sure why this only reproduces with avx512, but I'm 
putting a conservative fix in for now to avoid the crash. 

Also, if both sides of an add are zexted, shouldn't we shrink that add?

https://bugs.llvm.org/show_bug.cgi?id=32316

llvm-svn: 298107
2017-03-17 17:27:31 +00:00
Reid Kleckner 98e56430b9 Fix wasm build after arg_begin iterator type change
llvm-svn: 298106
2017-03-17 17:24:03 +00:00
Stanislav Mekhanoshin ee2dd785f6 Only unswitch loops with uniform conditions
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.

Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.

Differential Revision: https://reviews.llvm.org/D30796

llvm-svn: 298104
2017-03-17 17:13:41 +00:00
Chad Rosier a69dcb6b66 [AArch64] Use alias analysis in the load/store optimization pass.
This allows the optimization to rearrange loads and stores more aggressively.

Differential Revision: http://reviews.llvm.org/D30903

llvm-svn: 298092
2017-03-17 14:19:55 +00:00
Andre Vieira 913ffeb5ba [ARM] Fix triple format in test branch disassemble test
Fixing triple format in the tests added for the branch label fix for Thumb
Targets. Also recommitting previously approved patch, see
https://reviews.llvm.org/D30943.

Reviewed by: samparker

Differential Revision: https://reviews.llvm.org/D30987

llvm-svn: 298056
2017-03-17 09:37:10 +00:00
Craig Topper a8d4097445 [AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line.
We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't.

I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled.

Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better.

llvm-svn: 298051
2017-03-17 07:37:31 +00:00
Craig Topper 02cd0bfa46 [X86] Remove unused predicate. NFC
llvm-svn: 298050
2017-03-17 07:37:27 +00:00
Jonas Paulsson 8a7bd24c82 [SystemZ] Add use of super-reg in splitMove()
If one of the subregs of the 128 bit reg is undefined when splitMove() splits
a store into two instructions, a use of an undefined physical register
results.

To remedy this, an implicit use of the super register is added onto both new
instructions, along with propagated kill and undef flags.

This was discovered with llvm-stress, and that test case is attached as
test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll

Thanks to Matthias Braun for helping with a nice explanation.

Review: Ulrich Weigand
llvm-svn: 298047
2017-03-17 06:47:08 +00:00
Craig Topper 6a1290a0fd [AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX.
We were giving priority if VLX was enabled.

llvm-svn: 298046
2017-03-17 06:10:37 +00:00
Craig Topper e4d5aa7efc [X86] Cleanup the AddedComplexity values on move immediate instructions. NFC
This makes the values a little more consistent between similar instruction and reduces the values some. This results in better grouping in the isel table saving a few bytes.

llvm-svn: 298043
2017-03-17 05:59:54 +00:00
Eric Christopher 53da761570 Remove LessPreciseFPMADOption from TargetOptions along with all of the
associated command line options and functions - it's currently unused
in all of llvm and clang other than being set and reset.

llvm-svn: 298023
2017-03-17 00:38:03 +00:00
Eli Friedman da228fee0c [ARM] Use alias analysis in ARMPreAllocLoadStoreOpt.
This allows the optimization to rearrange loads and stores more
aggressively. This doesn't really affect performance, but it helps
codesize.

Differential Revision: https://reviews.llvm.org/D30839

llvm-svn: 298021
2017-03-17 00:34:26 +00:00
Jacques Pienaar da9352c173 clean Lanai namespace
Summary: This patch cleans the namespace of the Lanai target.

Reviewers: jpienaar

Reviewed By: jpienaar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30955

llvm-svn: 298015
2017-03-16 23:22:10 +00:00
Reid Kleckner 45707d4d5a Remove getArgumentList() in favor of arg_begin(), args(), etc
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.

In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D31052

llvm-svn: 298010
2017-03-16 22:59:15 +00:00
Derek Schuff b879539aac [WebAssembly] Fix some broken type encodings in wasm binary
A recent change switch the in-memory wasm value types
to be signed integers, but I missing a few cases where
these were being writing to the binary.

Differential Revision: https://reviews.llvm.org/D31014

Patch by Sam Clegg

llvm-svn: 297991
2017-03-16 20:49:48 +00:00
Matthias Braun e959544517 TargetInstrInfo: Provide default implementation of isTailCall().
In fact this default implementation should be the only implementation,
keep it virtual for now to accomodate targets that don't model flags
correctly.

Differential Revision: https://reviews.llvm.org/D30747

llvm-svn: 297980
2017-03-16 20:02:30 +00:00
Daniel Sanders 0e64202871 [globalisel] Correct G_CONSTANT path of selectArithImmed()
Earlier stages of GlobalISel always use ConstantInt in G_CONSTANT so that's
what we should check for.

This fixes a crash introduced in r297782.

llvm-svn: 297968
2017-03-16 18:04:50 +00:00
Hiroshi Inoue 138a3faa3e Test commit.
llvm-svn: 297959
2017-03-16 16:30:06 +00:00
Stanislav Mekhanoshin f80507979d [AMDGPU] Run always inliner early in opt
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958
2017-03-16 16:11:46 +00:00
Colin LeMahieu ddebad956e [Hexagon] Updating inline saturate lanes for v62 version.
llvm-svn: 297920
2017-03-16 00:35:28 +00:00
Simon Pilgrim cee3fc61cb Remove redundant condition (PR32263). NFCI.
llvm-svn: 297915
2017-03-15 23:27:43 +00:00
Matt Arsenault 7dc01c96ae AMDGPU: Allow sinking of addressing modes for atomic_inc/dec
llvm-svn: 297913
2017-03-15 23:15:12 +00:00
Simon Pilgrim 06c70adcf0 [X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars
Prep work for PR31810

llvm-svn: 297876
2017-03-15 19:34:55 +00:00
Ahmed Bougacha 62cd73d989 [GlobalISel][AArch64] Select ADDXri.
We're now able to select ADDWri thanks to the new complex pattern
support.  Extend that to ADDXri.

llvm-svn: 297874
2017-03-15 19:20:59 +00:00
Matt Arsenault 86e02ce2dc AMDGPU: Fix unnecessary ands when packing f16 vectors
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.

llvm-svn: 297873
2017-03-15 19:04:26 +00:00
Tim Northover 0d98b03b9f ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.

Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.

llvm-svn: 297871
2017-03-15 18:38:13 +00:00
Matt Arsenault 0e6e018054 AMDGPU: Minor SIAnnotateControlFlow cleanups
Newline fixes, early return, range loops.

llvm-svn: 297865
2017-03-15 18:00:12 +00:00
Nemanja Ivanovic ffcf0fb1cc [PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic
mfvrd and mffprd are both alias to mfvrsd.
This patch enables correct parsing of the aliases, but we still emit a mfvrsd.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29177

llvm-svn: 297849
2017-03-15 16:04:53 +00:00
Sanjay Patel fa929a2134 Cyle -> Cycle; NFCI
llvm-svn: 297846
2017-03-15 15:37:42 +00:00
Artyom Skrobov e72e1ba434 Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"
This reverts r297820 which apparently fails on A15 hosts.

llvm-svn: 297842
2017-03-15 14:50:43 +00:00
Simon Pilgrim 6778b8f715 Reverted unintended commit
llvm-svn: 297841
2017-03-15 14:47:30 +00:00
Simon Pilgrim 3804a12fc3 Fix Wint-in-bool-context warning (PR32248)
llvm-svn: 297840
2017-03-15 14:38:19 +00:00
Sam Parker db20d48336 Reverting r297821 due to breaking lld test.
llvm-svn: 297838
2017-03-15 14:06:42 +00:00
Simon Pilgrim 493f4462bf [X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs
Turns out it can happen, so the assertion was too harsh

Found during fuzz testing

llvm-svn: 297833
2017-03-15 13:16:46 +00:00
Petar Jovanovic b71386a4a4 [Mips] Add support to match more patterns for DEXT and CINS
This patch adds support for recognizing more patterns to match to DEXT and
CINS instructions.
It finds cases where multiple instructions could be replaced with a single
DEXT or CINS instruction.

For example, for the following:

define i64 @dext_and32(i64 zeroext %a) {
entry:

 %and = and i64 %a, 4294967295
 ret i64 %and
}

instead of generating:

 0000000000000088 <dext_and32>:

 88:   64010001        daddiu  at,zero,1
 8c:   0001083c        dsll32  at,at,0x0
 90:   6421ffff        daddiu  at,at,-1
 94:   03e00008        jr      ra
 98:   00811024        and     v0,a0,at
 9c:   00000000        nop

the following gets generated:

 0000000000000068 <dext_and32>:

 68:   03e00008        jr      ra
 6c:   7c82f803        dext    v0,a0,0x0,0x20

Cases that are covered:

DEXT:

 1. and $src, mask where mask > 0xffff
 2. zext $src zero extend from i32 to i64

CINS:

 1. and (shl $src, pos), mask
 2. shl (and $src, mask), pos
 3. zext (shl $src, pos) zero extend from i32 to i64

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D30464

llvm-svn: 297832
2017-03-15 13:10:08 +00:00
Simon Pilgrim a0b0b74b9a Align cost model columns. NFCI.
llvm-svn: 297824
2017-03-15 11:57:42 +00:00
Sam Parker 274472f7c5 [ARM] Fix for branch label disassembly for Thumb
Different MCInstrAnalysis classes for arm and thumb mode, each with
their own evaluateBranch implementation. I added a test case and
fixed the coff-relocations test to use '<label>:' rather than
'<label>' in the CHECK-LABEL entries, since the ones without the
colon would match branch targets. Might be worth noticing that
llvm-objdump does not lookup the relocation and thus assigns it a
target depending on the encoded immediate which #0, so it thinks it
branches to the next instruction.

Committed on behalf of Andre Vieira (avieira).

Differential Revision: https://reviews.llvm.org/D30943

llvm-svn: 297821
2017-03-15 10:21:23 +00:00
Artyom Skrobov 3fa5fd1dd2 [Thumb1] Fix the bug when adding/subtracting -2147483648
Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297820
2017-03-15 10:19:16 +00:00
Sam Parker 654cb8263a [ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.

Differential Revision: https://reviews.llvm.org/D30044

llvm-svn: 297809
2017-03-15 08:27:11 +00:00
Daniel Sanders a228df75c0 [globalisel] LLVM_BUILD_GLOBAL_ISEL=OFF should prevent GlobalISel instruction selector from being declared.
llvm-svn: 297786
2017-03-14 22:09:29 +00:00
Daniel Sanders 8a4bae9993 [globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.

Depends on D30046, D29712

Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30089

llvm-svn: 297782
2017-03-14 21:32:08 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Derek Schuff e2688c432f [WebAssembly] Use LEB encoding for value types
Previously we were using the encoded LEB hex values
for the value types.  This change uses the decoded
negative value and the LEB encoder to write them out.

Differential Revision: https://reviews.llvm.org/D30847

Patch by Sam Clegg

llvm-svn: 297777
2017-03-14 20:23:22 +00:00
Evgeniy Stepanov 43dcf4d330 Fix asm printing of associated sections.
Make MCSectionELF::AssociatedSection be a link to a symbol, because
that's how it works in the assembly, and use it in the asm printer.

llvm-svn: 297769
2017-03-14 19:28:51 +00:00
Eli Friedman caea769f11 [ARM] Replace some C++ selection code with TableGen patterns. NFC.
Differential Revision: https://reviews.llvm.org/D30794

llvm-svn: 297768
2017-03-14 18:43:37 +00:00
Krzysztof Parzyszek 9416abbc4a [Hexagon] Fix a condition in HexagonEarlyIfConv.cpp
This fixes llvm.org/PR32265.

llvm-svn: 297745
2017-03-14 15:21:33 +00:00
Artyom Skrobov 2de256642b Fix typo in comment
llvm-svn: 297742
2017-03-14 14:13:19 +00:00
Oliver Stannard 6ee22c41f8 [ARM] Diagnose ARM MOVT without :lower16: or :upper16: expression
This instruction was missing from the list of opcodes that we check, so we were
hitting an llvm_unreachable in ARMMCCodeEmitter.cpp for the ARM MOVT
instruction, rather than the diagnostic that is emitted for the other MOVW/MOVT
instructions.

Differential revision: https://reviews.llvm.org/D30936

llvm-svn: 297739
2017-03-14 13:50:10 +00:00
Artyom Skrobov 283316b5c0 De-duplicate the two implementations of ARMBaseInstrInfo::isProfitableToIfCvt() [NFC]
Reviewers: congh, rengolin

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D30934

llvm-svn: 297738
2017-03-14 13:38:45 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Oren Ben Simhon fe34c5e429 Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.

Differential Revision: https://reviews.llvm.org/D28566

llvm-svn: 297715
2017-03-14 09:09:26 +00:00
Craig Topper 7a5ee1c5ed [AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index.
llvm-svn: 297707
2017-03-14 06:40:04 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Craig Topper 9d50e187cd [AVX-512] Pre-emptively fix more places in fastisel where we might copy a VK1 register into a AH/BH/CH/DH register.
llvm-svn: 297704
2017-03-14 04:18:25 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Artyom Skrobov bf19d4bc29 [Thumb1] combine ADDC/SUBC with a negative immediate
Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400

Reviewers: efriedma, jmolloy

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297682
2017-03-13 22:36:14 +00:00
Craig Topper 784f241b59 [AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel.
Fixes PR32256. Still planning to do an audit for other possible cases.

llvm-svn: 297678
2017-03-13 21:58:54 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Andrew Kaylor a11d020699 Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier
I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced.

llvm-svn: 297664
2017-03-13 20:35:10 +00:00
Matt Arsenault 747bf8afa8 AMDGPU: Re-use TM.getNullPointerValue
llvm-svn: 297662
2017-03-13 20:18:14 +00:00
Matt Arsenault 971c85ebb4 AMDGPU: Treat 0 as private null pointer in addrspacecast lowering
llvm-svn: 297658
2017-03-13 19:47:31 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Craig Topper 616641632e [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies.
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.

llvm-svn: 297652
2017-03-13 18:34:46 +00:00
Craig Topper eb7ea28bdd [AVX-512] If gather mask is all ones, force the input to a zero vector.
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.

Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.

llvm-svn: 297651
2017-03-13 18:17:46 +00:00
Diana Picus 94db2e288b [ARM] GlobalISel: Support SP in regbankselect
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.

llvm-svn: 297621
2017-03-13 14:28:34 +00:00
Balaram Makam cacc08bb46 [AArch64] Map Sched Read/Write resources for Falkor.
llvm-svn: 297611
2017-03-13 10:42:17 +00:00
Sjoerd Meijer aea3a990a2 ARMDisassembler: loop over ARM decode tables
Loop over the ARM decode tables; this is a clean-up to reduce some code
duplication.

Differential Revision: https://reviews.llvm.org/D30814

llvm-svn: 297608
2017-03-13 09:41:10 +00:00
Craig Topper 48ba1e2d66 [AVX-512] Add VEX_WIG to VEX vcvtsd2ss/vcvtss2sd intrinsic instructions so they can be correctly matched by EVEX2VEX table generation.
llvm-svn: 297601
2017-03-13 05:14:47 +00:00
Craig Topper 08b413acf2 [AVX-512] Use sse_loadf32/f64 for vcvtss2sd and vcvtsd2ss intrinsic patterns.
llvm-svn: 297600
2017-03-13 05:14:44 +00:00
Craig Topper 5a63ca2ad2 [AVX-512] Use sse_load_f64/f32 in VCVTSS2SI/VCVTSD2SI patterns.
llvm-svn: 297599
2017-03-13 03:59:06 +00:00
Craig Topper 111b2d6997 [X86] Remove unused SDTypeProfile. NFC
llvm-svn: 297594
2017-03-12 23:05:03 +00:00
Craig Topper 2b92542908 [X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.
This allows us to remove a duplicate set of patterns.

llvm-svn: 297593
2017-03-12 23:05:00 +00:00
Craig Topper 7d56c8315b [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.

llvm-svn: 297591
2017-03-12 22:29:12 +00:00
Sanjay Patel f06b963a2b [x86] don't blindly transform SETB into SBB
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. 
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.

This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.

Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.

The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.

Differential Revision: https://reviews.llvm.org/D30611

llvm-svn: 297586
2017-03-12 18:28:48 +00:00
Azharuddin Mohammed 473b75c3d5 Remove CRC32 instructions from AArch64InstrInfo::hasShiftedReg
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.

The case statements corresponding to CRC instructions are incorrect and should
be removed.

Also adding a testcase while on this.

Reviewers: t.p.northover, javed.absar, apazos, rengolin

Reviewed By: rengolin

Subscribers: evandro, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30274

llvm-svn: 297582
2017-03-12 14:02:32 +00:00
Craig Topper 58647b16e5 [AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.

I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.

llvm-svn: 297574
2017-03-12 03:37:37 +00:00
Craig Topper 6ab5edfa73 [AVX-512] Remove unused field in X86VectorVTInfo tablegen class.
llvm-svn: 297572
2017-03-12 03:37:32 +00:00
Simon Pilgrim 18debfa5b4 [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.

This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.

Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?

Differential Revision: https://reviews.llvm.org/D29841

llvm-svn: 297568
2017-03-11 20:42:31 +00:00
Simon Pilgrim 9ff5732c92 Remove unnecessary whitespace.
llvm-svn: 297567
2017-03-11 20:23:59 +00:00
Craig Topper 02b463270c [X86] Remove unnecessary commented out code. NFC
llvm-svn: 297563
2017-03-11 18:25:56 +00:00
Matt Arsenault dd905b0e9b AMDGPU: Remove packf16 intrinsic
llvm-svn: 297557
2017-03-11 05:51:16 +00:00
Matt Arsenault 3cb9ff8863 AMDGPU: Keep track of modifiers when converting v_mac to v_mad
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.

This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.

Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 297556
2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin 79da2a7698 [AMDGPU] Remove getBidirectionalReasonRank
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.

The CandReason enum is properly sorted, so just remove artificial
ranking.

Differential Revision: https://reviews.llvm.org/D30557

llvm-svn: 297536
2017-03-11 00:29:27 +00:00
Krzysztof Parzyszek 0e7b1f83b7 [RDF] Remove the map of reaching defs from copy propagation
Use Liveness::getNearestAliasedRef to find the reaching def instead.

llvm-svn: 297526
2017-03-10 22:44:24 +00:00
Krzysztof Parzyszek 0b8f184d12 [RDF] Implement Liveness::getNearestAliasedRef(Reg, Inst)
This function will find the closest ref node aliased to Reg that is
in an instruction preceding Inst. This could be used to identify the
hypothetical reaching def of Reg, if Reg was a member of Inst.

llvm-svn: 297524
2017-03-10 22:42:17 +00:00
Simon Pilgrim 128a10a41d [X86][SSE] Fix load folding for (V)CVTDQ2PD
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl

llvm-svn: 297523
2017-03-10 22:35:07 +00:00
Simon Pilgrim bfe263352a [X86] Fix Wunused-lambda-capture warning
llvm-svn: 297521
2017-03-10 22:10:34 +00:00
Eric Christopher f025a89b3c Sink accessing TII to fix release Werror builds.
llvm-svn: 297507
2017-03-10 21:20:17 +00:00
Evandro Menezes 8f70e249a7 [AArch64, X86] Additional debug information for MacroFusion
In order to make it easier to parse information about the performance of
MacroFusion, this patch adds the function and the instruction names to the
debug output of this pass.

llvm-svn: 297504
2017-03-10 20:20:04 +00:00
Konstantin Zhuravlyov ffdb00eda9 [AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets for SI
Differential Revision: https://reviews.llvm.org/D29674

llvm-svn: 297499
2017-03-10 19:39:07 +00:00
Yaxun Liu 874d26a89d Rename PT_NOTE namespace name used in AMDGPUPTNote.h
Patch by Guansong Zhang.

Differential Revision: https://reviews.llvm.org/D30750

llvm-svn: 297498
2017-03-10 19:35:43 +00:00
Simon Pilgrim b02667c469 [APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt
We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64).

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target.

Differential Revision: https://reviews.llvm.org/D30780

llvm-svn: 297458
2017-03-10 13:44:32 +00:00
Simon Dardis 7090d145e8 [mips][msa] Accept more values for constant splats
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.

As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30640

llvm-svn: 297457
2017-03-10 13:27:14 +00:00
Artyom Skrobov 94fb0bb65f imm_comp_XFORM (defined in ARMInstrThumb.td) duplicates imm_not_XFORM (defined in ARMInstrInfo.td)
Reviewers: grosbach, rengolin, jmolloy

Reviewed By: jmolloy

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D30782

llvm-svn: 297456
2017-03-10 13:21:12 +00:00
Artyom Skrobov 0cc80c1f5a Refactor the multiply-accumulate combines to act on
ARMISD::ADD[CE] nodes, instead of the generic ISD::ADD[CE].

Summary:
This allows for some simplification because the combines
are no longer limited to just one go at the node before
it gets legalized into an ARM target-specific one.

Reviewers: jmolloy, rogfer01

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30401

llvm-svn: 297453
2017-03-10 12:41:33 +00:00
Artyom Skrobov 0c93ceb5d8 For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.

Reviewers: jmolloy, rogfer01, efriedma

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30400

llvm-svn: 297443
2017-03-10 07:40:27 +00:00
Dan Gohman 3a74cfec20 [WebAssembly] Fix the opcode numbers for floating-point le and gt.
llvm-svn: 297420
2017-03-09 23:08:21 +00:00
Krzysztof Parzyszek 544210304f [Hexagon] Fixes to the bitsplit generation
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
  could be reused.

llvm-svn: 297414
2017-03-09 22:02:14 +00:00
Krzysztof Parzyszek fe267a37f4 [Hexagon] Refactor the DAG preprocessing code, NFC
Extract individual transformations into their own functions.

llvm-svn: 297401
2017-03-09 19:14:23 +00:00
Krzysztof Parzyszek 7a0981aa38 [Hexagon] Add -mhvx option to the Hexagon backend
llvm-svn: 297393
2017-03-09 17:05:11 +00:00
Krzysztof Parzyszek 78c4fcf12e [Hexagon] Propagate zext of i1 into arithmetic code in selection DAG
(op ... (zext i1 c) ...) -> (select c (op ... 1 ...),
                                      (op ... 0 ...))

llvm-svn: 297391
2017-03-09 16:29:30 +00:00
Simon Pilgrim e86b7e2256 [X86][SSE] Speed up constant pool shuffle mask decoding with direct copy (PR32037).
If the constants are already the correct size, we can copy them directly into the shuffle mask.

llvm-svn: 297381
2017-03-09 14:06:39 +00:00
Simon Dardis 7577ce2140 [mips] Revert fixes for PR32020.
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.

Revert "[Target/MIPS] Kill dead code, no functional change intended."

This reverts commit r296153.

Revert "Recommit "[mips] Fix atomic compare and swap at O0.""

This reverts commit r296134.

llvm-svn: 297380
2017-03-09 14:03:26 +00:00
Sjoerd Meijer 7f1a982d3d [ARM] remove FIXMEs and add vcmp MC test
Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for
vcmp that was actually missing.

Differential Revision: https://reviews.llvm.org/D30745

llvm-svn: 297376
2017-03-09 13:28:37 +00:00
Simon Dardis 158956c6cc [mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.

This partially resolves PR/27458

Thanks to Quentin Colombet for reporting the issue!

llvm-svn: 297372
2017-03-09 11:19:48 +00:00
Changpeng Fang 1be9b9f816 AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized.
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D30719

llvm-svn: 297328
2017-03-09 00:07:00 +00:00
Krzysztof Parzyszek 1b7197e690 [Hexagon] Use correct offset when extracting from the high word
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).

llvm-svn: 297288
2017-03-08 15:46:28 +00:00
Daniel Cederman 9db582a656 [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: davide, RKSimon, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D27089

llvm-svn: 297285
2017-03-08 15:23:10 +00:00
Simon Pilgrim 836bcc689f [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide
By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage.

llvm-svn: 297267
2017-03-08 09:36:39 +00:00
Tim Shen c7472d912b Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

llvm-svn: 297256
2017-03-08 02:41:35 +00:00
Justin Lebar 1d1cf7ba5d [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & isImageReadWrite calls
This is repetition of isImage() function in NVPTXUtilities.cpp.

Patch by Briana Grace!

Differential Revision: https://reviews.llvm.org/D30706

llvm-svn: 297252
2017-03-08 01:14:15 +00:00
Matt Arsenault 52d1b62a28 AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

llvm-svn: 297251
2017-03-08 01:06:58 +00:00
Matt Arsenault d8ed207a20 AMDGPU: Constant fold rcp node
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.

llvm-svn: 297248
2017-03-08 00:48:46 +00:00
Changpeng Fang 6b49fa4ca7 AMDGPU/SI: Do not insert EndCf in an unreachable block
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D22025

llvm-svn: 297243
2017-03-07 23:29:36 +00:00
Daniel Sanders 52b4ce727a Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297241
2017-03-07 23:20:35 +00:00
Krzysztof Parzyszek 434d50a796 [Hexagon] Check for presence before looking registers up in bit tracker
llvm-svn: 297240
2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek 8e4d2e0512 [Hexagon] Generate bitsplit instruction
llvm-svn: 297239
2017-03-07 23:08:35 +00:00
Artem Belevich 2524a22562 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

llvm-svn: 297198
2017-03-07 20:33:38 +00:00
Joel Jones 2852088126 [AArch64] Vulcan is now ThunderXT99
Broadcom Vulcan is now Cavium ThunderX2T99.

LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113

Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).

Patch was tested with SpecCPU2006.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D30510

llvm-svn: 297190
2017-03-07 19:42:40 +00:00
Daniel Sanders 8ebec37d26 Revert r297177: Change LLT constructor string into an LLT-based object ...
More module problems. This time it only showed up in the stage 2 compile of
clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile.

Somehow, this change causes the build to need Attributes.gen before it's been
generated.

llvm-svn: 297188
2017-03-07 19:21:23 +00:00
Sanjoy Das c08a79fbf2 [X86] Add option to specify preferable loop alignment
Summary:
Loop alignment can cause a significant change of
the perfromance for short loops.
To be able to evaluate the impact of loop alignment this change
introduces the new option x86-experimental-pref-loop-alignment.
The alignment will be 2^Value bytes, the default value is 4.

Patch by Serguei Katkov!

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D30391

llvm-svn: 297178
2017-03-07 18:47:22 +00:00
Daniel Sanders 8612326a08 [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 297177
2017-03-07 18:32:25 +00:00
John Brawn eba9fdac7e [ARM] Correct handling of LSL #0 in an IT block
The check for LSL #0 in an IT block was checking if operand 4 was zero, but
operand 4 is the condition code operand so it was actually checking for LSLEQ.
Fix this by checking operand 3, which really is the immediate operand, and add
some tests.

Differential Revision: https://reviews.llvm.org/D30692

llvm-svn: 297142
2017-03-07 14:42:03 +00:00
Krzysztof Parzyszek 3cceffb752 [Hexagon] Do not insert instructions before PHI nodes
llvm-svn: 297141
2017-03-07 14:20:19 +00:00
Ranjeet Singh 3d0af578cc [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645

llvm-svn: 297137
2017-03-07 11:17:53 +00:00
Jonas Paulsson 1d33cd3988 [SystemZ] Add check VT.isSimple() in canTreateAsByteVector()
Since BB-vectorizer can produce vectors of for example 3 elements,
this check is needed.

Review: Ulrich Weigand
llvm-svn: 297136
2017-03-07 09:49:31 +00:00
Artyom Skrobov 1388e2f792 In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

llvm-svn: 297134
2017-03-07 09:38:16 +00:00
Ayman Musa 850fc977c8 [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX compressing tables.
X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible.
It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals.
This TableGen backend replaces the tables by automatically generating them.

Differential Revision: https://reviews.llvm.org/D30451

llvm-svn: 297127
2017-03-07 08:11:19 +00:00
Ayman Musa ac5a2c43af [X86][AVX512] Add missing entries to EVEX2VEX tables
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.

Differential Revision: https://reviews.llvm.org/D30501

llvm-svn: 297126
2017-03-07 08:05:53 +00:00
Tim Shen 70054bb827 Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

llvm-svn: 297124
2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov e8aaab8abe Revert "AMDGPU: Set MCAsmInfo::PointerSize"
It breaks line tables because the patch is not complete, working on a complete one at the moment

This reverts commit r294031.

llvm-svn: 297118
2017-03-07 04:44:33 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Chad Rosier 9a70c7c02a [AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.

Differential Revision: https://reviews.llvm.org/D30576.

llvm-svn: 297078
2017-03-06 21:20:00 +00:00
Jan Vesely 3ea1704434 AMDGPU/R600: Fix ALU clause markers use detection
also exit early on kill instead of redefinition.

Differential Revision: https://reviews.llvm.org/D30230

llvm-svn: 297060
2017-03-06 20:10:05 +00:00
Reid Kleckner 812191584f [X86] Fix arg copy elision for illegal types
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.

Fixes PR32136 and re-enables copy elision from i64 arguments.

Reverts the workaround in from r296950.

llvm-svn: 297045
2017-03-06 18:39:39 +00:00
Krzysztof Parzyszek 8a4c601abc [Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.

For example:
  loop:                           loop:
    // loop body                      // loop body
    if (...) jump exit      -->       // more body
  more:                               if (...) jump exit
    // more body                      jump loop
    jump loop

llvm-svn: 297033
2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek e16ce15687 [Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.

llvm-svn: 297032
2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek 143158b72e [Hexagon] Pick a dot-old instruction that matches the architecture
llvm-svn: 297031
2017-03-06 17:03:16 +00:00
Nemanja Ivanovic 12e67d868a [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615

llvm-svn: 297001
2017-03-06 07:32:13 +00:00
Benjamin Kramer bb635e034c [X86] Silence GCC enum compare warning.
X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional
expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType'
[-Werror=enum-compare]

llvm-svn: 296986
2017-03-05 12:53:20 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Sanjay Patel b974be5ef4 [x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in 
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're 
creating ADC/SBB in lots of places where we shouldn't.

This was intended to be an NFC change, but avx-512 has something 
strange going on. It doesn't seem like any of the affected tests 
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...

llvm-svn: 296978
2017-03-04 20:35:19 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Simon Pilgrim 40a0e66b37 [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.

Differential Revision: https://reviews.llvm.org/D30213

llvm-svn: 296966
2017-03-04 12:50:47 +00:00
Matthias Braun 21f340fd25 X86ISelLowering: Only perform copy elision on legal types.
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.

This fixes http://llvm.org/PR32136

llvm-svn: 296950
2017-03-04 01:40:40 +00:00
Sanjay Patel a84fd041c6 [x86] check for commuted add pattern to find ADC/SBB
llvm-svn: 296933
2017-03-04 00:18:31 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Sanjay Patel 7ee83b41e0 [x86] refactor combineAddOrSubToADCOrSBB(); NFCI
The comments were wrong, and this is not an obvious transform.
This hopefully makes it clearer that we're missing the commuted
patterns for adds. It's less clear that this is actually a good
transform for all micro-arch.

This is prep work for trying to clean up the current adc/sbb 
codegen because it's definitely not happening optimally.

llvm-svn: 296918
2017-03-03 22:35:11 +00:00
Krzysztof Parzyszek cc31871dc4 Make TargetInstrInfo::isPredicable take a const reference, NFC
llvm-svn: 296901
2017-03-03 18:30:54 +00:00
Sanjay Patel 58e241896d [x86] clean up materializeSBB(); NFCI
This is producing SBB where it is obviously not necessary, so it needs to be limited.

llvm-svn: 296894
2017-03-03 17:58:39 +00:00
Sanjay Patel e8674825fe [x86] fix formatting; NFC
llvm-svn: 296875
2017-03-03 15:17:41 +00:00
Simon Pilgrim c37a32d2b9 Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation
llvm-svn: 296874
2017-03-03 14:37:57 +00:00
Dmitry Preobrazhensky 03880f8d24 [AMDGPU][MC] Fix for Bug 30829 + LIT tests
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.

llvm-svn: 296873
2017-03-03 14:31:06 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Amjad Aboud 4f97751798 [X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859
2017-03-03 09:03:24 +00:00
Sjoerd Meijer 69bccf96bd [AArch64AsmParser] rewrite of function parseSysAlias
This is a cleanup/rewrite of the parseSysAlias function. It was not using the
tablegen instruction descriptions, but was “manually” matching the mnemonics
and recreating the operands whereas all this information is already in
tablegen; all this code has been replaced with calls to lookupXYZByName
tablegen calls.

Differential Revision: https://reviews.llvm.org/D30491

llvm-svn: 296857
2017-03-03 08:12:47 +00:00
Igor Breger 321cf3c650 [GlobalISel][X86] Support float/double and vector types.
Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector.

Reviewers: delena, zvi

Reviewed By: zvi

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30533

llvm-svn: 296856
2017-03-03 08:06:46 +00:00
Matt Arsenault 31a58c6ac0 AMDGPU: Fix missing dominator tree dependency
llvm-svn: 296842
2017-03-02 23:50:51 +00:00
Krzysztof Parzyszek e720feb1c6 [Hexagon] Pick the right branch opcode depending on branch probabilities
Specifically, pick the opcode with the correct branch prediction, i.e.
jump:t or jump:nt.

llvm-svn: 296821
2017-03-02 21:49:49 +00:00
Eli Friedman bb821276d0 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296815
2017-03-02 21:39:39 +00:00
Guozhi Wei ed28e742ee [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362

llvm-svn: 296811
2017-03-02 21:07:59 +00:00
Chad Rosier ea25eca04a [AArch64] Extend redundant copy elimination pass to handle non-zero stores.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle non-zero cases such as:

BB#0:
  cmp x0, #1
  b.eq .LBB0_1
.LBB0_1:
  orr x0, xzr, #0x1  ; <-- redundant copy; x0 known to hold #1.

Differential Revision: https://reviews.llvm.org/D29344

llvm-svn: 296809
2017-03-02 20:48:11 +00:00
Vadzim Dambrouski eafb805506 [MSP430] Add SRet support to MSP430 target
This patch adds support for struct return values to the MSP430
target backend. It also reverses the order of argument and return
registers in the calling convention to bring it into closer
alignment with the published EABI from TI.

Patch by Andrew Wygle (awygle).

Differential Revision: https://reviews.llvm.org/D29069

llvm-svn: 296807
2017-03-02 20:25:10 +00:00
Artem Belevich ee7dd12ff4 [NVPTX] Reduce amount of boilerplate code used to select load instruction opcode.
Make opcode selection code for the load instruction a bit easier
to read and maintain.

This patch also catches number of f16 load/store variants that were
not handled before.

Differential Revision: https://reviews.llvm.org/D30513

llvm-svn: 296785
2017-03-02 19:14:14 +00:00
Artem Belevich 5920babc4f [NVPTX] Added missing LDU/LDG intrinsics for f16.
Differential Revision: https://reviews.llvm.org/D30512

llvm-svn: 296784
2017-03-02 19:14:10 +00:00
Simon Pilgrim b3067dc374 [X86][MMX] Fixed i32 extraction on 32-bit targets
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal

llvm-svn: 296782
2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek 056c945a5d [Hexagon] Skip blocks that define vector predicate registers in early-if
llvm-svn: 296777
2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek fcbb7d10fe [Hexagon] Properly handle 'q' constraint in 128-byte vector mode
llvm-svn: 296772
2017-03-02 17:50:24 +00:00
Nemanja Ivanovic db8425eff0 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

llvm-svn: 296771
2017-03-02 17:38:59 +00:00
Tim Northover e80d6d1360 GlobalISel: record correct stack usage for signext parameters.
The CallingConv.td rules allocate 8 bytes for these kinds of arguments
on AAPCS targets, but we were only recording the smaller amount. The
difference is theoretical on AArch64 because we don't actually store
more than the smaller amount, but it's still much better to have these
two components in agreement.

Based on Diana Picus's ARM equivalent patch (where it matters a lot
more).

llvm-svn: 296754
2017-03-02 15:34:18 +00:00
Matthew Simpson aee9771ae2 [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

llvm-svn: 296751
2017-03-02 15:15:35 +00:00
Matthew Simpson 1bfa159db9 [ARM/AArch64] Support wide interleaved accesses
This patch teaches (ARM|AArch64)ISelLowering.cpp to match illegal vector types
to interleaved access intrinsics as long as the types are multiples of the
vector register width. A "wide" access will now be mapped to multiple
interleave intrinsics similar to the way in which non-interleaved accesses with
illegal types are legalized into multiple accesses. I'll update the associated
TTI costs (in getInterleavedMemoryOpCost) as a follow-on.

Differential Revision: https://reviews.llvm.org/D29466

llvm-svn: 296750
2017-03-02 15:11:20 +00:00
Eli Friedman 933863ce61 Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.

llvm-svn: 296718
2017-03-02 00:08:50 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Eli Friedman 1c9216b003 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296708
2017-03-01 23:20:29 +00:00
Eli Friedman 28c2c0e311 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368

llvm-svn: 296701
2017-03-01 22:56:20 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Krzysztof Parzyszek 8144f37dd8 [RDF] Replace {} with explicit constructor, since not all compilers like it
llvm-svn: 296666
2017-03-01 19:59:28 +00:00
Krzysztof Parzyszek ebabd99adb [RDF] Add recursion limit to getAllReachingDefsRec
For large programs this function can take significant amounts of time.
Let it abort gracefully when the program is too complex.

llvm-svn: 296662
2017-03-01 19:30:42 +00:00
Krzysztof Parzyszek 8f23dd6d68 [Hexagon] Fix lowering of formal arguments of type i1
On Hexagon, values of type i1 are passed in registers of type i32,
even though i1 is not a legal value for these registers. This is a
special case and needs special handling to maintain consistency of
the lowering information.

This fixes PR32089.

llvm-svn: 296645
2017-03-01 17:30:10 +00:00
Diana Picus 3841522259 clang-format r296631
Apparently I forgot to run it after fixing up some things...

llvm-svn: 296634
2017-03-01 15:54:21 +00:00
Diana Picus 9c52309b37 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

llvm-svn: 296631
2017-03-01 15:35:14 +00:00
Oliver Stannard 5d35b9e56c [ARM] Fix parsing of special register masks
This parsing code was incorrectly checking for invalid characters, so an
invalid instruction like:
  msr spsr_w, r0
would be emitted as:
  msr spsr_cxsf, r0

Differential revision: https://reviews.llvm.org/D30462

llvm-svn: 296607
2017-03-01 10:51:04 +00:00
Ayman Musa 9b802e4650 [X86] Fix creating vreg def after use.
llvm-svn: 296601
2017-03-01 10:20:48 +00:00
Dan Gohman 7d7409e553 [WebAssembly] Convert the remaining unit tests to the new wasm-object-file target.
To facilitate this, add a new hidden command-line option to disable
the explicit-locals pass. That causes llc to emit invalid code that doesn't
have all locals converted to get_local/set_local, however it simplifies
testwriting in many cases.

llvm-svn: 296540
2017-02-28 23:37:04 +00:00
Eli Friedman 36795239f5 [ARM] Don't generate deprecated T1 STM.
This prevents generating stm r1!, {r0, r1} on Thumb1, where value
stored for r1 is UNKONWN.

Patch by Zhaoshi Zheng.

Differential Revision: https://reviews.llvm.org/D27910

llvm-svn: 296538
2017-02-28 23:32:55 +00:00
Krzysztof Parzyszek 33fd0bbbe8 [Hexagon] Generate extract instructions more aggressively
llvm-svn: 296537
2017-02-28 23:27:33 +00:00
Krzysztof Parzyszek f208681731 [Hexagon] Fix instruction selection for sign-extending i1 to i64
llvm-svn: 296532
2017-02-28 22:37:01 +00:00
Matt Arsenault 8f016df1ed AMDGPU: Fix types for VOP_I16_I16_I16
llvm-svn: 296523
2017-02-28 21:31:45 +00:00
Matt Arsenault 4d263f6f18 AMDGPU: Add definition for v_swap_b32
This is somewhat tricky because there are two
pairs of tied operands, and it isn't allowed to be
VOP3 encoded.

llvm-svn: 296519
2017-02-28 21:09:04 +00:00
Matt Arsenault 03612631cb AMDGPU: Add definition for v_xad_u32
llvm-svn: 296515
2017-02-28 20:27:30 +00:00
Matt Arsenault 781249833b AMDGPU: Add ds_nop to assembler
llvm-svn: 296513
2017-02-28 20:15:46 +00:00
Matt Arsenault dedc544ac7 AMDGPU: Add definitions for ds_{read|write}_b{96|128}
It's not clear to me if this is always better than
doing ds_write2_b64 This adds the constraint of
a 128-bit register input instead of a pair of
64-bit.

llvm-svn: 296512
2017-02-28 20:15:43 +00:00
Stanislav Mekhanoshin 357d3db0a4 [AMDGPU] Add second pass of the scheduler
If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.

Differential Revision: https://reviews.llvm.org/D30442

llvm-svn: 296506
2017-02-28 19:20:33 +00:00
Stanislav Mekhanoshin 282e8e4a72 [AMDGPU] New method to estimate register pressure
This change introduces new method to estimate register pressure in
GCNScheduler. Standard RPTracker gives huge error due to the following
reasons:

1. It does not account for live-ins or live-outs if value is not used
in the region itself. That creates a huge error in a very common case
if there are a lot of live-thu registers.
2. It does not properly count subregs.
3. It assumes a register used as an input operand can be reused as an
output. This is not always possible by itself, this is not what RA
will finally do in many cases for various reasons not limited to RA's
inability to do so, and this is not so if the value is actually a
live-thu.

In addition we can now see clear separation between live-in pressure
which we cannot change with the scheduling and tentative pressure
which we can change.

Differential Revision: https://reviews.llvm.org/D30439

llvm-svn: 296491
2017-02-28 17:22:39 +00:00
Konstantin Zhuravlyov 182e9cc6d5 [AMDGPU] Change amd_kernel_code_t's minor version to 1
- We do emit amd_kernel_code_t v1.1

Differential Revision: https://reviews.llvm.org/D30433

llvm-svn: 296489
2017-02-28 17:17:52 +00:00
Stanislav Mekhanoshin 080889cad7 [AMDGPU] Fix read-undef flags when schedule is reverted
If two subregs of the same register are defined and we need to revert
schedule changing def order, we will end up with both instructions
having def,read-undef flags because adjustLaneLiveness() will only set
this flag but will not remove it.

Fix this by removing read-undef flags before calling adjustLaneLiveness.

Differential Revision: https://reviews.llvm.org/D30428

llvm-svn: 296484
2017-02-28 16:26:27 +00:00
Simon Dardis e3cceed3b4 [mips] Fix 64bit slt/sltu/nor with immediates
Patch By: Alexander Richardson

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30330

llvm-svn: 296482
2017-02-28 15:55:23 +00:00
Daniel Sanders 983c9b98e9 Revert r296474 - [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1.

llvm-svn: 296478
2017-02-28 15:00:27 +00:00
Nirav Dave f830dec3f2 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296476
2017-02-28 14:24:15 +00:00
Daniel Sanders a5afdefec6 [globalisel] Change LLT constructor string into an LLT subclass that knows how to generate it.
Summary:
This will allow future patches to inspect the details of the LLT. The implementation is now split between
the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns.

Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem.

Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar

Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30046

llvm-svn: 296474
2017-02-28 14:21:31 +00:00
Diana Picus 1ffca2aeaf [ARM] GlobalISel: Lower i32 and fp call parameters on the stack
Lower i32, float and double parameters that need to live on the stack. This
boils down to creating some G_GEPs starting from the stack pointer and storing
the values there. During the process we also keep track of the stack size and
use the final value in the ADJCALLSTACKDOWN/UP instructions.

We currently assert for smaller types, since they usually require extensions.
They will be handled in a separate patch.

llvm-svn: 296473
2017-02-28 14:17:53 +00:00
Diana Picus 5a7203a0af [ARM] GlobalISel: Select 32-bit G_CONSTANT
Put it into a register by means of a MOVi.

llvm-svn: 296471
2017-02-28 13:05:42 +00:00
Diana Picus 5b8514559e [ARM] GlobalISel: Add mapping for G_CONSTANT
Like G_FRAME_INDEX, G_CONSTANT has one register operand and one non-register
operand.

llvm-svn: 296469
2017-02-28 12:13:58 +00:00
Diana Picus e6beac6742 [ARM] GlobalISel: Legalize 32-bit constants
llvm-svn: 296468
2017-02-28 11:33:46 +00:00
Diana Picus 9d07094913 [ARM] GlobalISel: Select G_GEP
At this point, G_GEP is just an add, so we treat it exactly like a G_ADD.

llvm-svn: 296462
2017-02-28 10:14:38 +00:00
Oliver Stannard 85d4d5b493 [ARM] Diagnose PC-writing instructions in IT blocks
In Thumb2, instructions which write to the PC are UNPREDICTABLE if they are in
an IT block but not the last instruction in the block.

Previously, we only diagnosed this for LDM instructions, this patch extends the
diagnostic to cover all of the relevant instructions.

Differential Revision: https://reviews.llvm.org/D30398

llvm-svn: 296459
2017-02-28 10:04:36 +00:00
Diana Picus 566a15d749 [ARM] GlobalISel: Add reg bank mapping for G_GEP
This should be the same as the mapping for G_ADD etc.

llvm-svn: 296455
2017-02-28 09:35:10 +00:00
Diana Picus 8598b17076 [ARM] GlobalISel: Legalize G_GEP with 32-bit offsets
At the moment we're only interested in GEPs for putting call parameters on the
stack, so we'll stick to 32-bit offsets.

llvm-svn: 296452
2017-02-28 09:02:42 +00:00
Vadzim Dambrouski cc33fc8722 Test commit, fix typo, NFC.
llvm-svn: 296447
2017-02-28 08:27:43 +00:00
Matthias Braun 81f68ec3a9 Revert "Add MIR-level outlining pass"
Revert Machine Outliner for now, as it breaks the asan bot.

This reverts commit r296418.

llvm-svn: 296426
2017-02-28 02:24:30 +00:00
Matthias Braun d36410945f Add MIR-level outlining pass
This is a patch for the outliner described in the RFC at:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding
repeated sequences of instructions in a program, and replacing them with
calls to functions. This is useful to people working in low-memory
environments, where sacrificing performance for space is acceptable.

This adds an interprocedural outliner directly before printing assembly.
For reference on how this would work, this patch also includes X86
target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

Patch by Jessica Paquette<jpaquette@apple.com>!

rdar://29166825

Differential Revision: https://reviews.llvm.org/D26872

llvm-svn: 296418
2017-02-28 00:33:32 +00:00
Dan Gohman d37dc2f773 [WebAssembly] Add some comments and tidy up whitespace.
llvm-svn: 296402
2017-02-27 22:41:39 +00:00
Matt Arsenault 10268f93e8 AMDGPU: Use v_med3_{f16|i16|u16}
llvm-svn: 296401
2017-02-27 22:40:39 +00:00
Dan Gohman f52ee17a09 [WebAssembly] Split CFG-sorting into its own pass. NFC.
CFG sorting was already an independent algorithm from block/loop insertion;
this change makes it more convenient to debug.

llvm-svn: 296399
2017-02-27 22:38:58 +00:00
Matt Arsenault eb522e68bc AMDGPU: Support v2i16/v2f16 packed operations
llvm-svn: 296396
2017-02-27 22:15:25 +00:00
Sanjay Patel ae7873fe55 [ARM] don't transform an add(ext Cond), C to select unless there's a setcc of the condition
The transform in question claims to be doing:

// fold (add (select cc, 0, c), x) -> (select cc, x, (add, x, c))

...starting in PerformADDCombineWithOperands(), but it wasn't actually checking for a setcc node
for the sext/zext patterns.

This is exactly the opposite of a transform I'd like to add to DAGCombiner's foldSelectOfConstants(),
so I was seeing infinite loops with my draft of a patch applied.

The changes in select_const.ll look positive (less instructions). The change in arm-and-tst-peephole.ll
is unrelated. We're changing the input IR in that test to preserve the intent of the test, but that's 
not affected by this code change.

Differential Revision:
https://reviews.llvm.org/D30355

llvm-svn: 296389
2017-02-27 21:30:54 +00:00
Matt Arsenault c9f2517e96 AMDGPU: Add some of the new gfx9 VOP3 instructions
llvm-svn: 296382
2017-02-27 21:04:41 +00:00
Simon Pilgrim 5c4efcdddf [X86][SSE] Attempt to extract vector elements through target shuffles
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Differential Revision: https://reviews.llvm.org/D30176

llvm-svn: 296381
2017-02-27 21:01:57 +00:00
Matt Arsenault 7596f13d15 AMDGPU: Support inlineasm for packed instructions
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.

llvm-svn: 296379
2017-02-27 20:52:10 +00:00
Matt Arsenault 2ed2193218 AMDGPU: Don't fold immediate if clamp/omod are set
Doesn't fix any practical problems because clamp/omod
are currently folded after peephole optimizer.

llvm-svn: 296375
2017-02-27 20:21:31 +00:00
Matt Arsenault 3cb390498e AMDGPU: Fold omod into instructions
llvm-svn: 296372
2017-02-27 19:35:42 +00:00
Matt Arsenault e2d1d3a940 AMDGPU: Add f16 to shader calling conventions
Mostly useful for writing tests for f16 features.

llvm-svn: 296370
2017-02-27 19:24:47 +00:00
Matt Arsenault 9be7b0d485 AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

llvm-svn: 296368
2017-02-27 18:49:11 +00:00
Krzysztof Parzyszek e9be35596e [Hexagon] Defs and clobbers can overlap
llvm-svn: 296365
2017-02-27 18:03:35 +00:00
Craig Topper 7502119ce8 [X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector.
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30392

llvm-svn: 296355
2017-02-27 16:15:32 +00:00
Craig Topper 3917ca2af4 [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30390

llvm-svn: 296354
2017-02-27 16:15:30 +00:00
Craig Topper e1be95c3d0 [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap allocation
Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized.

Differential Revision: https://reviews.llvm.org/D30387

llvm-svn: 296353
2017-02-27 16:15:27 +00:00
Craig Topper 53e5a38da9 [X86] Use APInt instead of SmallBitVector for tracking undef elements in constant pool shuffle decoding
Summary:
SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc.

APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30386

llvm-svn: 296352
2017-02-27 16:15:25 +00:00
Sjoerd Meijer 32ecac7ac8 AArch64InstPrinter: rewrite of printSysAlias
This is a cleanup/rewrite of the printSysAlias function. This was not using the
tablegen instruction descriptions, but was "manually" decoding the
instructions. This has been replaced with calls to lookup_XYZ_ByEncoding
tablegen calls.

This revealed several problems. First, instruction IVAU had the wrong encoding.
This was cancelled out by the parser that incorrectly matched the wrong
encoding. Second, instruction CVAP was missing from the SystemOperands tablegen
descriptions, so this has been added. And third, the required target features
were not captured in the tablegen descriptions, so support for this has also
been added.

Differential Revision: https://reviews.llvm.org/D30329

llvm-svn: 296343
2017-02-27 14:45:34 +00:00
John Brawn c97b714ffb [ARM] LSL #0 is an alias of MOV
Currently we handle this correctly in arm, but in thumb we don't which leads to
an unpredictable instruction being emitted for LSL #0 in an IT block and SP not
being permitted in some cases when it should be.

For the thumb2 LSL we can handle this by making LSL #0 an alias of MOV in the
.td file, but for thumb1 we need to handle it in checkTargetMatchPredicate to
get the IT handling right. We also need to adjust the handling of
MOV rd, rn, LSL #0 to avoid generating the 16-bit encoding in an IT block. We
should also adjust it to allow SP in the same way that it is allowed in
MOV rd, rn, but I haven't done that here because it looks like it would take
quite a lot of work to get right.

Additionally correct the selection of the 16-bit shift instructions in
processInstruction, where it was checking if the two registers were equal when
it should have been checking if they were low. It appears that previously this
code was never executed and the 16-bit encoding was selected by default, but
the other changes I've done here have somehow made it start being used.

Differential Revision: https://reviews.llvm.org/D30294

llvm-svn: 296342
2017-02-27 14:40:51 +00:00
Sjoerd Meijer 6d171006f4 AArch64AsmParser: don't try to parse “[1]” for non-vector register operands
There are no instructions that have "[1]" as part of the assembly string;
FMOVXDhighr is out of date. This removes dead code.

Differential Revision: https://reviews.llvm.org/D30165

llvm-svn: 296327
2017-02-27 10:51:11 +00:00
Konstantin Zhuravlyov 972948b36e [AMDGPU] Runtime metadata fixes:
- Verify that runtime metadata is actually valid runtime metadata when assembling, otherwise we could accept the following when assembling, but ocl runtime will reject it:
    .amdgpu_runtime_metadata
    { amd.MDVersion: [ 2, 1 ], amd.RandomUnknownKey, amd.IsaInfo: ...
  - Make IsaInfo optional, and always emit it.

Differential Revision: https://reviews.llvm.org/D30349

llvm-svn: 296324
2017-02-27 07:55:17 +00:00
Craig Topper ed0101a0b9 [X86] Check for less than 0 rather than explicit compare with -1. NFC
llvm-svn: 296321
2017-02-27 06:05:30 +00:00
Craig Topper 6028584d8c [X86] Fix execution domain for cmpss/sd instructions.
llvm-svn: 296293
2017-02-26 06:45:59 +00:00
Craig Topper 036693302b [AVX-512] Fix execution domain for scalar commutable min/max instructions.
llvm-svn: 296292
2017-02-26 06:45:56 +00:00
Craig Topper e70231be51 [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.
llvm-svn: 296291
2017-02-26 06:45:54 +00:00
Craig Topper fe25988c68 [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.
llvm-svn: 296290
2017-02-26 06:45:51 +00:00
Craig Topper 49ba3f5406 [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and VPBROADCASTWr_Alt instructions. NFC
llvm-svn: 296289
2017-02-26 06:45:48 +00:00
Craig Topper 6bf9b809ce [AVX-512] Fix execution domain for VPMADD52 instructions.
llvm-svn: 296288
2017-02-26 06:45:45 +00:00
Craig Topper aa8e903150 [AVX-512] Fix the execution domain for VSCALEF instructions.
llvm-svn: 296286
2017-02-26 06:45:40 +00:00
Craig Topper cac5d698df [AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae.
llvm-svn: 296285
2017-02-26 06:45:37 +00:00
Craig Topper ed64904c74 [X86] Fix the execution domain for scalar SQRT intrinsic instruction.
llvm-svn: 296284
2017-02-26 06:45:35 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Eric Christopher 4a8208c266 vec perm can go down either pipeline on P8.
No observable changes, spotted while looking at the scheduling description.

llvm-svn: 296277
2017-02-26 00:11:58 +00:00
Simon Pilgrim 0f5fb5f549 [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296272
2017-02-25 20:01:58 +00:00
Craig Topper 2caa97c891 [AVX-512] Fix the execution domain for scalar FMA instructions.
llvm-svn: 296271
2017-02-25 19:36:28 +00:00
Craig Topper 176f3310b6 [AVX-512] Fix the execution domain on some instructions.
llvm-svn: 296270
2017-02-25 19:18:11 +00:00
Craig Topper d2011e3612 [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS using the scalar register class. We only have patterns for the masked intrinsics.
llvm-svn: 296264
2017-02-25 18:43:42 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Junmo Park 7ff4c045eb Minor code cleanup. NFC.
llvm-svn: 296207
2017-02-25 00:08:53 +00:00
Dan Gohman 82607f56bd [WebAssembly] Add support for using a wasm global for the stack pointer.
This replaces the __stack_pointer variable which was allocated in linear
memory.

llvm-svn: 296201
2017-02-24 23:46:05 +00:00
Krzysztof Parzyszek 0d67b10a3c [Hexagon] Undo shift folding where it could simplify addressing mode
For example, avoid (single shift):
  r0 = and(##536870908,lsr(r0,#3))
  r0 = memw(r1+r0<<#0)

in favor of (two shifts):
  r0 = lsr(r0,#5)
  r0 = memw(r1+r0<<#2)

llvm-svn: 296196
2017-02-24 23:34:24 +00:00
Dan Gohman d934cb8806 [WebAssembly] Basic support for Wasm object file encoding.
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.

llvm-svn: 296190
2017-02-24 23:18:00 +00:00
Krzysztof Parzyszek be5028aed3 [Hexagon] Prettify code in HexagonDAGToDAGISel::Select
llvm-svn: 296187
2017-02-24 23:00:40 +00:00
Wei Ding 4d3d4ca1b3 AMDGPU : Replace FMAD with FMA when denormals are enabled.
Differential Revision: http://reviews.llvm.org/D29958

llvm-svn: 296186
2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin 42259cf35e Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Dan Gohman 6999c4fd28 [WebAssembly] Handle f16 in fast-isel.
llvm-svn: 296172
2017-02-24 21:05:35 +00:00
Davide Italiano 74f27b80d4 [Target/MIPS] Kill dead code, no functional change intended.
Hopefully placates gcc with -Werror.

llvm-svn: 296153
2017-02-24 18:48:10 +00:00
Simon Pilgrim cdf2bd656a Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296147
2017-02-24 18:31:04 +00:00
Nemanja Ivanovic 195c5452d3 [PowerPC] Use subfic instruction for subtract from immediate
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29387

llvm-svn: 296144
2017-02-24 18:16:06 +00:00
Nemanja Ivanovic 82d53ed492 [PowerPC] Use rldicr instruction for AND with an immediate if possible
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29388

llvm-svn: 296143
2017-02-24 18:03:16 +00:00
Simon Pilgrim bd9fb2ae95 [APInt] Add APInt::extractBits() method to extract APInt subrange
The current pattern for extract bits in range is typically:

Mask.lshr(BitOffset).trunc(SubSizeInBits);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable.

This is another of the compile time issues identified in PR32037 (see also D30265).

This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation.

Differential Revision: https://reviews.llvm.org/D30336

llvm-svn: 296141
2017-02-24 17:46:18 +00:00
Simon Dardis ae6f2bcb25 Recommit "[mips] Fix atomic compare and swap at O0."
This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296134
2017-02-24 16:32:18 +00:00
Simon Dardis 3c58c18ff0 Revert "[mips] Fix atomic compare and swap at O0."
This reverts r296132. I forgot to include the tests.

llvm-svn: 296133
2017-02-24 16:30:27 +00:00
Simon Dardis cf0e06d375 [mips] Fix atomic compare and swap at O0.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296132
2017-02-24 16:27:45 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Simon Pilgrim 7f6a7c97a7 [X86][SSE] Target shuffle combine can try to combine up to 16 vectors
Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth).

llvm-svn: 296130
2017-02-24 15:35:52 +00:00
Sanjay Patel 9f0fa52aa2 [x86] use DAG.getAllOnesConstant(); NFCI
llvm-svn: 296128
2017-02-24 15:09:59 +00:00
Simon Dardis aa20881749 [mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64
Previously LLVM was assuming 32-bit signed immediates which results in and with
a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result.
After applying this patch I can now compile all of the FreeBSD mips assembly
code with clang.

This issue also affects the nor, slt and sltu macros and I will fix those in a
separate review.

Patch By: Alexander Richardson

Commit message reformatted by sdardis.

Reviewers: atanasyan, theraven, sdardis

Differential Revision: https://reviews.llvm.org/D30298

llvm-svn: 296125
2017-02-24 14:34:32 +00:00
Diana Picus 3b99c64ba1 [ARM] GlobalISel: Select G_STORE
Same as selecting G_LOAD.

llvm-svn: 296122
2017-02-24 14:01:27 +00:00
Diana Picus 1f432f995a [ARM] GlobalISel: Add reg bank mappings for stores
Same as the ones for loads.

llvm-svn: 296115
2017-02-24 13:07:25 +00:00
Diana Picus a2b632a353 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296108
2017-02-24 11:28:24 +00:00
Simon Dardis a5f52dc00d [mips][mc] Fix a crash when disassembling odd sized sections
Make the MIPS disassembler consistent with the other targets in returning
a Size of zero when the input buffer cannot contain an instruction due
to it's size. Previously it reported the minimum instruction size when
it failed due to the buffer not being big enough for an instruction
causing llvm-objdump to crash when disassembling all sections.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D29984

llvm-svn: 296105
2017-02-24 10:50:27 +00:00
Diana Picus c21d1e5d94 Revert "[ARM] GlobalISel: Legalize stores"
This reverts commit r296103 because the test broke on one of the bots. Sorry!

llvm-svn: 296104
2017-02-24 10:35:39 +00:00
Diana Picus a5f1cfd1a7 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296103
2017-02-24 10:19:23 +00:00
Simon Pilgrim aed352273e [APInt] Add APInt::setBits() method to set all bits in range
The current pattern for setting bits in range is typically:

Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos);

Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable.

This is one of the key compile time issues identified in PR32037.

This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible.

I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial.

Differential Revision: https://reviews.llvm.org/D30265

llvm-svn: 296102
2017-02-24 10:15:29 +00:00
Dan Gohman 411dc07aba [WebAssembly] Add a README.txt entry for mergeable sections.
llvm-svn: 296095
2017-02-24 07:33:55 +00:00
Craig Topper 8783bbb598 [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC
llvm-svn: 296094
2017-02-24 07:21:10 +00:00
Craig Topper f2529c188b [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.

llvm-svn: 296091
2017-02-24 05:35:04 +00:00
Petr Hosek a7d5916308 [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

llvm-svn: 296081
2017-02-24 03:10:10 +00:00
Artem Belevich 620db1f3dd [NVPTX] Added support for .f16x2 instructions.
This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

llvm-svn: 296032
2017-02-23 22:38:24 +00:00
Tim Northover 063a56e81c ARM: make sure FastISel bails on f64 operations for Cortex-M4.
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.

The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.

llvm-svn: 296031
2017-02-23 22:35:00 +00:00
Krzysztof Parzyszek 128e191eac [Hexagon] Handle saturations in Hexagon bit tracker
llvm-svn: 296026
2017-02-23 22:11:52 +00:00
Krzysztof Parzyszek 998e49e5c8 [Hexagon] Allow setting register in BitVal without storing into map
In the bit tracker, references to other bit values in which the register
is 0 are prohibited. This means that generating self-referential register
cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order
to get a self-referential cell, it had to be stored into a map and then
reloaded from it. To avoid this step, add a function that will set the
register to a given value without going through the map.

llvm-svn: 296025
2017-02-23 22:08:50 +00:00
Stanislav Mekhanoshin 78468e48cf [AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.
Clang issues warning about hidden overload. That was intended, so
add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it.

llvm-svn: 296021
2017-02-23 21:51:28 +00:00
Evgeniy Stepanov ee2d77f6d6 Disable TLS for stack protector on Android API<17.
The TLS slot did not exist back then.

llvm-svn: 296014
2017-02-23 21:06:35 +00:00
Stanislav Mekhanoshin ce3ddd2de4 Correct register pressure calculation in presence of subregs
If a subreg is used in an instruction it counts as a whole superreg
for the purpose of register pressure calculation. This patch corrects
improper register pressure calculation by examining operand's lane mask.

Differential Revision: https://reviews.llvm.org/D29835

llvm-svn: 296009
2017-02-23 20:19:44 +00:00
Krzysztof Parzyszek 2cfc7a48de [Hexagon] Avoid IMPLICIT_DEFs as new-value producers
llvm-svn: 295997
2017-02-23 17:47:34 +00:00
Jan Vesely 70293a045b AMDGPU/SI: Fix trunc i16 pattern
Hit on ASICs that support 16bit instructions.

Differential Revision: https://reviews.llvm.org/D30281

llvm-svn: 295990
2017-02-23 16:12:21 +00:00
Krzysztof Parzyszek af5ff65d67 [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSE
llvm-svn: 295981
2017-02-23 15:02:09 +00:00
Diana Picus a8cb0cd8f2 [ARM] GlobalISel: Lower call returns
Introduce a common ValueHandler for call returns and formal arguments, and
inherit two different versions for handling the differences (at the moment the
only difference is the way physical registers are marked as used).

llvm-svn: 295973
2017-02-23 14:18:41 +00:00
Diana Picus a606713c33 [ARM] GlobalISel: Lower call parameters in regs
Add support for lowering calls with parameters than can fit into regs.  Use the
same ValueHandler that we used for function returns, but rename it to match its
new, extended purpose.

llvm-svn: 295971
2017-02-23 13:25:43 +00:00
Ayman Musa 4b2c968c43 [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register class of their first input when creating node in fast-isel.
(Quick fix to buildbot failure after rL295940 commit).

llvm-svn: 295970
2017-02-23 13:15:44 +00:00
Simon Dardis d410fc8f28 [mips][ias] Further relax operands of certain assembly instructions
This patch adjusts the most relaxed predicate of immediate operands to accept
immediate forms such as ~(0xf0000000|0x000f00000). Previously these forms
would be accepted by GAS and rejected by IAS.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29218

llvm-svn: 295965
2017-02-23 12:40:58 +00:00
Kristof Beyls 5ac6adbb6d Fix assertion failure in ARMConstantIslandPass.
The ARMConstantIslandPass didn't have support for handling accesses to
constant island objects through ARM::t2LDRBpci instructions. This adds
support for that.

This fixes PR31997.

llvm-svn: 295964
2017-02-23 12:24:55 +00:00
Ayman Musa 524dbdaa2b [X86][AVX512] Remove VCVTSS2SDZ & VCVTSD2SSZ from memory folding tables as they introduce new read dependency when folding.
(Quick fix to buildbot fail). 

llvm-svn: 295946
2017-02-23 08:13:36 +00:00
Ayman Musa 6e670cf44f [X86][AVX512] Change VCVTSS2SD and VCVTSD2SS node types to keep consistency between VEX/EVEX versions.
AVX versions of the converts work on f32/f64 types, while AVX512 version work on vectors.

Differential Revision: https://reviews.llvm.org/D29988

llvm-svn: 295940
2017-02-23 07:24:21 +00:00
Matt Arsenault f0a88dbaab LoadStoreVectorizer: Split even sized illegal chains properly
Implement isLegalToVectorizeLoadChain for AMDGPU to avoid
producing private address spaces accesses that will need to be
split up later. This was doing the wrong thing in the case
where the queried chain was an even number of elements.

A possible <4 x i32> store was being split into
store <2 x i32>
store i32
store i32

rather than
store <2 x i32>
store <2 x i32>

when legal.

llvm-svn: 295933
2017-02-23 03:58:53 +00:00
Matt Arsenault a9e16e6597 AMDGPU: Add another BFE pattern
This is the pattern that falls out of the instruction's
definition if offset == 0.

llvm-svn: 295912
2017-02-23 00:23:43 +00:00
Matt Arsenault 79a45db7f5 AMDGPU: Use clamp with f64
llvm-svn: 295908
2017-02-22 23:53:37 +00:00
Matt Arsenault d5c6515b68 AMDGPU: Fold FP clamp as modifier bit
The manual is unclear on the details of this. It's not
clear to me if denormals are not allowed with clamp,
or if that is only omod. Not allowing denorms for
fp16 or fp64 isn't useful so I also question if that
is really a restriction. Same with whether this is valid
without IEEE mode enabled.

llvm-svn: 295905
2017-02-22 23:27:53 +00:00
Wei Ding f2cce02eb2 AMDGPU : Update TrapCode based on Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D30232

llvm-svn: 295904
2017-02-22 23:22:19 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Krzysztof Parzyszek ab57c2bad3 [Hexagon] Implement @llvm.readcyclecounter()
llvm-svn: 295892
2017-02-22 22:28:47 +00:00
Matt Arsenault 7b6c5d28f5 AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR
This should avoid reporting any stack needs to be allocated in the
case where no stack is truly used. An unused stack slot is still
left around in other cases where there are real stack objects
but no spilling occurs.

llvm-svn: 295891
2017-02-22 22:23:32 +00:00
Krzysztof Parzyszek 3596a81c69 [RDF] Support for partial structural aliases in RegisterAggr
llvm-svn: 295883
2017-02-22 21:42:15 +00:00
Krzysztof Parzyszek 65971d97b0 [Hexagon] Add intrinsics for masked vector stores
Patch by Harsha Jagasia.

llvm-svn: 295879
2017-02-22 21:23:09 +00:00
Matt Arsenault 93e65ea733 AMDGPU: Don't look at chain users when adjusting writemask
Fixes not adjusting using new intrinsics with chains.

llvm-svn: 295878
2017-02-22 21:16:41 +00:00
Matt Arsenault 707780b420 AMDGPU: Always allocate emergency stack slot at offset 0
This allows us to ensure that 0 is never a valid pointer
to a user object, and ensures that the offset is always legal
without needing a register to access it. This comes at the cost
of usable offsets and wasted stack space.

llvm-svn: 295877
2017-02-22 21:05:25 +00:00
Matt Arsenault 61ec6a03ca AMDGPU: Change exp with compr bit printing
llvm-svn: 295873
2017-02-22 20:37:12 +00:00
Wei Ding 6ade56e0a0 Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."
This reverts commit r295867.

llvm-svn: 295871
2017-02-22 20:29:22 +00:00
Wei Ding 4991d3570f AMDGPU : Update TrapCode based on Trap Handler ABI.
Differential Revision: http://reviews.llvm.org/D30232

llvm-svn: 295867
2017-02-22 20:05:06 +00:00
Geoff Berry 6bb79157dd [AArch64] Extend AArch64RedundantCopyElimination to do simple copy propagation.
Summary:
Extend AArch64RedundantCopyElimination to catch cases where the register
that is known to be zero is COPY'd in the predecessor block.  Before
this change, this pass would catch cases like:

      CBZW %W0, <BB#1>
  BB#1:
      %W0 = COPY %WZR // removed

After this change, cases like the one below are also caught:

      %W0 = COPY %W1
      CBZW %W1, <BB#1>
  BB#1:
      %W0 = COPY %WZR // removed

This change results in a 4% increase in static copies removed by this
pass when compiling the llvm test-suite.  It also fixes regressions
caused by doing post-RA copy propagation (a separate change to be put up
for review shortly).

Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30113

llvm-svn: 295863
2017-02-22 19:10:45 +00:00
Dan Gohman 38b42b4a95 [WebAssembly] Define a table of function signatures for runtime library calls.
LLVM CodeGen emits references to external symbols that are never declared in
LLVM IR level, so they have no declared signature. However, WebAssembly requires
all functions be declared with signatures. This patch adds a table for providing
signatures for known runtime libcalls that will be used in subsequent patches to
emit declarations for such functions.

llvm-svn: 295857
2017-02-22 18:34:16 +00:00
Krzysztof Parzyszek ace1b89060 [RDF] Skip undef uses when calculating kill flags
llvm-svn: 295856
2017-02-22 18:29:16 +00:00
Krzysztof Parzyszek ba36b92bef [RDF] Only access block live-ins when tracking liveness
llvm-svn: 295855
2017-02-22 18:27:36 +00:00
Dan Gohman a63e8eb138 [WebAssembly] Configure codegen to legalize f16 values.
llvm-svn: 295850
2017-02-22 16:28:00 +00:00
Simon Pilgrim 13cdd57964 [X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks.
Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead.

llvm-svn: 295848
2017-02-22 15:38:13 +00:00
Simon Pilgrim 3a895c4873 [X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI.
llvm-svn: 295845
2017-02-22 15:04:55 +00:00
Simon Pilgrim 3b97067ae8 Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' checking
llvm-svn: 295830
2017-02-22 13:37:31 +00:00
Benjamin Kramer 5a7e0f8357 [GlobalISel] Fix compiler warnings and make assert assert something.
llvm-svn: 295827
2017-02-22 12:59:47 +00:00
Igor Breger f7359d893a [X86][GlobalISel] Initial implementation , select G_ADD gpr, gpr
Summary: Initial implementation for X86InstructionSelector. Handle selection COPY and G_ADD/G_SUB gpr, gpr .

Reviewers: qcolombet, rovka, zvi, ab

Reviewed By: rovka

Subscribers: mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29816

llvm-svn: 295824
2017-02-22 12:25:09 +00:00
Roger Ferrer Ibanez 56db97d4de [ARM] Fix constant islands pass.
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.

Thus, remove the tPUSH as well in this case.

Differential Revision: https://reviews.llvm.org/D30207

llvm-svn: 295816
2017-02-22 09:06:21 +00:00
Ayman Musa ceea56c705 [X86] Fix memory operands definition for some instructions.
Change integer memory operands to FP memory operands to some FP instructions.

Differential Revision: https://reviews.llvm.org/D30201

llvm-svn: 295813
2017-02-22 08:06:29 +00:00
Javed Absar b672722810 [ARM] Classification Improvements to ARM Sched-Models. NFCI.
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.

Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953

llvm-svn: 295811
2017-02-22 07:22:57 +00:00
Craig Topper 56d4022997 [AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.

I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.

Differential revision: https://reviews.llvm.org/D30186

llvm-svn: 295810
2017-02-22 06:54:18 +00:00
Dan Gohman 18eafb6c68 [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722

llvm-svn: 295803
2017-02-22 01:23:18 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Matt Arsenault 9417505f7d AMDGPU: Remove llvm.AMDGPU.clamp intrinsic
llvm-svn: 295789
2017-02-21 23:46:04 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00
Artem Belevich 29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
Matt Arsenault 7d6b71db4f AMDGPU: Formatting fixes
llvm-svn: 295783
2017-02-21 22:50:41 +00:00
Evandro Menezes a8d3301ee1 [AArch64, X86] Add statistics for the MacroFusion pass
llvm-svn: 295777
2017-02-21 22:16:13 +00:00
Evandro Menezes b9b7f4b8d3 [AArch64, X86] Guard against both instrs being wild cards
If both instrs are wild cards, the result can be a crash.

llvm-svn: 295776
2017-02-21 22:16:11 +00:00
Evgeniy Stepanov 1fd19c6e5d Fix PR31896.
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).

llvm-svn: 295762
2017-02-21 20:17:34 +00:00
Matt Arsenault c2a44e4c3c AMDGPU: Remove llvm.AMDGPU.flbit intrinsic
llvm-svn: 295754
2017-02-21 19:27:33 +00:00
Matt Arsenault e0bf7d02f0 AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.

I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.

The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.

llvm-svn: 295753
2017-02-21 19:12:08 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Simon Pilgrim 8eb515d8c4 [X86] EltsFromConsecutiveLoads SDLoc argument should be const&.
There appears never to have been a time that the reference was updated.

llvm-svn: 295739
2017-02-21 17:42:28 +00:00
Simon Pilgrim 791955819c [X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.
As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ.

llvm-svn: 295733
2017-02-21 16:41:44 +00:00
John Brawn a6e95e1652 [ARM] Correct SP/PC handling in t2MOVr
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.

Differential Revision: https://reviews.llvm.org/D30199

llvm-svn: 295732
2017-02-21 16:41:29 +00:00
Simon Pilgrim 3546156122 [X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.

llvm-svn: 295723
2017-02-21 15:09:00 +00:00
Igor Breger 812f319794 [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.
Differential Revision: https://reviews.llvm.org/D30189

llvm-svn: 295718
2017-02-21 14:01:25 +00:00
Diana Picus 613b65696a [ARM] GlobalISel: Lower calls to void() functions
For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair
with amount 0.

llvm-svn: 295716
2017-02-21 11:33:59 +00:00
Craig Topper d88389aa7e [X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.

This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.

Reviewers: zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30181

llvm-svn: 295697
2017-02-21 06:39:13 +00:00
Craig Topper 16d9730b86 [X86] Fix formatting. NFC
llvm-svn: 295695
2017-02-21 06:27:13 +00:00
Craig Topper d9fe664868 [AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns.
llvm-svn: 295693
2017-02-21 04:26:10 +00:00
Craig Topper d890db6952 [AVX-512] Fix the ExeDomain for vcmpss/vcmpsd.
llvm-svn: 295691
2017-02-21 04:26:04 +00:00
Sanjoy Das 90208720e3 Add a wrapper around copy_if in STLExtras; NFC
I will add one more use for this in a later change.

llvm-svn: 295685
2017-02-21 00:38:44 +00:00
Craig Topper 2012dda9a0 [AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0.
llvm-svn: 295673
2017-02-20 17:44:09 +00:00
Simon Pilgrim 2967ed1c7e [X86] Tidyup combineExtractVectorElt. NFCI.
Pull out repeated code for extraction index operand and source vector value type.

Use isNullConstant helper to check for zero extraction index.

llvm-svn: 295670
2017-02-20 16:09:45 +00:00
Diana Picus 1c33c9f0b0 [ARM] GlobalISel: Don't select atomic loads
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.

In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.

llvm-svn: 295662
2017-02-20 14:45:58 +00:00
Igor Breger fda32d266a [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. 
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

llvm-svn: 295660
2017-02-20 14:16:29 +00:00
Simon Pilgrim 5910ebe720 [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX
Use v8i64 ASHR instructions if we don't have VLX.

Differential Revision: https://reviews.llvm.org/D28537

llvm-svn: 295656
2017-02-20 12:16:38 +00:00
Sjoerd Meijer e22a79e898 AArch64AsmParser: tablegen the isBranchTarget helper functions
Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup
that removes almost identical functions that differ only in a few constants.

Differential Revision: https://reviews.llvm.org/D30160

llvm-svn: 295649
2017-02-20 10:57:54 +00:00
Ayman Musa 51ffeab8c8 [X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).

Differential Revision: https://reviews.llvm.org/D29876

llvm-svn: 295643
2017-02-20 08:27:54 +00:00
Craig Topper c6c68f5958 [AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0.
llvm-svn: 295640
2017-02-20 07:00:40 +00:00
Craig Topper a5fa2e40f9 [AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns.
llvm-svn: 295638
2017-02-20 07:00:34 +00:00
Craig Topper 5b4e36aafa [AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2.
llvm-svn: 295634
2017-02-20 02:47:42 +00:00
Craig Topper c184b671d9 [X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size.
An earlier commit already did this for the register form.

llvm-svn: 295626
2017-02-20 00:37:23 +00:00
Craig Topper 63801df251 [AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority.
llvm-svn: 295619
2017-02-19 21:44:35 +00:00
Simon Pilgrim 14a7eee0b4 [X86] Use peekThroughOneUseBitcasts helper. NFCI.
llvm-svn: 295618
2017-02-19 21:40:51 +00:00
Davide Italiano 16b476ffcc [X86] Prefer static_cast<> to C-style cast. NFCI.
llvm-svn: 295617
2017-02-19 21:35:41 +00:00
Craig Topper 489057715e [AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own.
llvm-svn: 295616
2017-02-19 21:32:15 +00:00
Davide Italiano 1aef59eb44 [AArch64] Prefer static_cast<> to C-style cast. NFCI.
llvm-svn: 295615
2017-02-19 21:31:14 +00:00
Simon Pilgrim d590de2998 [X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements.
Replaces existing approach that could only search BUILD_VECTOR nodes.

Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. 

llvm-svn: 295613
2017-02-19 19:40:31 +00:00
Craig Topper 4e794c71a6 [AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0.
This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched.

llvm-svn: 295612
2017-02-19 19:36:58 +00:00
Simon Pilgrim 4271186f9c [X86][SSE] Enable initial support for domain crossing at high shuffle combine depths.
As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths.

We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles.

llvm-svn: 295608
2017-02-19 17:19:38 +00:00
Simon Pilgrim 6d07d514de [X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains.
Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted.

llvm-svn: 295606
2017-02-19 15:15:40 +00:00
Simon Pilgrim b4460cf5a9 [X86][SSE] Add domain crossing support for target shuffle combines.
Add the infrastructure to flag whether float and/or int domains are permitable.

A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors.

llvm-svn: 295604
2017-02-19 14:12:25 +00:00
Craig Topper 218d1a020e [AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch.
The instructions are marked commutable, but without special handling we don't get the immediate correct.

While here also remove the masked memory forms that aren't commutable.

llvm-svn: 295602
2017-02-19 08:03:26 +00:00
Craig Topper 007c93b2b9 [X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type.
llvm-svn: 295589
2017-02-19 02:08:48 +00:00
Craig Topper 06ae5e821c [X86] Tighten up some of the SDNode type constraints.
llvm-svn: 295588
2017-02-19 01:54:47 +00:00
Simon Pilgrim 599b872ca2 [X86] Fix enumeral/non-enumeral conditional expression warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295586
2017-02-19 00:04:30 +00:00
Simon Pilgrim ee6ef4d6dd Fix 'variable set but not used' warning when assertions are disabled.
llvm-svn: 295585
2017-02-19 00:03:46 +00:00
Simon Pilgrim 2f2d8dc630 Fix signed/unsigned comparison warning.
llvm-svn: 295580
2017-02-18 22:56:17 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Simon Pilgrim b092166a76 [AArch64] Fix enumeral/non-enumeral conditional expression warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295577
2017-02-18 22:50:28 +00:00
Simon Pilgrim 7a87eebcad [X86] Fix enumeral/non-enumeral comparison warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295576
2017-02-18 22:40:58 +00:00
Simon Pilgrim 2e78c94ea5 [X86][SSE] Avoid repeated calls to SDValue::getValueType.
Added assertion to check input type of X86ISD::VZEXT during target known bits calculation.

llvm-svn: 295575
2017-02-18 22:25:27 +00:00
Craig Topper de10312bea Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
Clang has now been fixed to not use these intrinsics.

llvm-svn: 295571
2017-02-18 21:50:58 +00:00
Sanjay Patel 12c2093e1e [x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1
This is the same transform that is current used for:
select Bool, 0, -1

llvm-svn: 295568
2017-02-18 21:03:28 +00:00
Craig Topper ba2a726cc6 Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support.

llvm-svn: 295565
2017-02-18 20:14:20 +00:00
Craig Topper 884db3f85d [X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.

llvm-svn: 295564
2017-02-18 19:51:25 +00:00
Matt Arsenault 2021f08080 AMDGPU: Fix assembler subtarget predicate for gfx9
This was accepting GFX9 instructions on VI.

llvm-svn: 295557
2017-02-18 19:12:26 +00:00
Matt Arsenault a3b3b489fb AMDGPU: Fix disassembly of aperture registers
llvm-svn: 295555
2017-02-18 18:41:41 +00:00
Matt Arsenault e823d92f7f AMDGPU: Merge initial gfx9 support
llvm-svn: 295554
2017-02-18 18:29:53 +00:00
Craig Topper a505169ca5 [AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to legacy unmasked intrinsics and select instructions.
llvm-svn: 295543
2017-02-18 07:07:50 +00:00
Jan Vesely 4b1243facb AMDGPU/R600: Assert on infinite loop in EmitClauseMarkers
Differential Revision: https://reviews.llvm.org/D29792

llvm-svn: 295539
2017-02-18 04:24:10 +00:00
Dylan McKay 83788ff349 [AVR] Set UseIntegratedAssembler
llvm-svn: 295535
2017-02-18 02:26:11 +00:00
Matthias Braun d9a59a8df8 AArch64LoadStoreOptimizer: Correctly clear kill flags
When promoting the Load of a Store-Load pair to a COPY all kill flags
between the store and the load need to be cleared.

rdar://30402435

Differential Revision: https://reviews.llvm.org/D30110

llvm-svn: 295512
2017-02-17 23:15:03 +00:00
Guozhi Wei 7ec2c72095 [PPC] Give unaligned memory access lower cost on processor that supports it
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.

This patch fixes pr31492.

Differential Revision: https://reviews.llvm.org/D28630

This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug.

llvm-svn: 295506
2017-02-17 22:29:39 +00:00
Krzysztof Parzyszek 1aaf41af54 [Hexagon] Start using regmasks on calls
Reapply r295371 with a fix for the Windows bot failures.

llvm-svn: 295504
2017-02-17 22:14:51 +00:00
Simon Pilgrim 7db8f42fe3 [X86] Simplify by pulling out valuetype. NFCI.
llvm-svn: 295502
2017-02-17 22:10:10 +00:00
Simon Pilgrim a4c350ff17 [X86][SSE] Add (V)MOVD folding pattern with zextloadi64i32 load node.
Fixes PRPR31309

llvm-svn: 295492
2017-02-17 20:43:32 +00:00
Matt Arsenault f6cf1032fd AMDGPU: Fix crashes on invalid icmp/fcmp intrinsics
llvm-svn: 295489
2017-02-17 19:49:10 +00:00
Artyom Skrobov 4592f6206c In Thumb1 mode, the custom lowering for ARMISD::CMPZ could never emit tADDi3
Reviewers: jmolloy, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D30097

llvm-svn: 295478
2017-02-17 18:59:16 +00:00
Joel Jones ab0f3b43e3 [AArch64] Add Cavium ThunderX support
This set of patches adds support for Cavium ThunderX ARM64 processors:

  * ThunderX
  * ThunderX T81
  * ThunderX T83
  * ThunderX T88

Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D28891

llvm-svn: 295475
2017-02-17 18:34:24 +00:00
Sam Parker 58af0c55d2 [ARM] Replace HasT2ExtractPack with HasDSP
Removed the HasT2ExtractPack feature and replaced its references
with HasDSP. This then allows the Thumb2 extend instructions to be
selected for ARMv8M +dsp. These instruction descriptions have also
been refactored and more target tests have been added for their isel.

Differential Revision: https://reviews.llvm.org/D29623

llvm-svn: 295452
2017-02-17 15:42:44 +00:00
Diana Picus e836878bf1 [ARM] GlobalISel: Clean up some helpers
Return invalid opcodes when some of the helpers in the instruction selection
pass can't handle a given combination.

llvm-svn: 295446
2017-02-17 13:44:19 +00:00
Diana Picus 38699dbac5 [ARM] GlobalISel: Check mappings used by reg bank select
Add some asserts to make sure we're using the mappings that we think we're
using. This is to keep us from accidentally breaking functionality while moving
to TableGen'erated mappings.

llvm-svn: 295441
2017-02-17 13:14:25 +00:00
Diana Picus 7cab0786bd [ARM] GlobalISel: Use Subtarget in Legalizer
Start using the Subtarget to make decisions about what's legal. In particular,
we only mark floating point operations as legal if we have VFP2, which is
something we should've done from the very start.

llvm-svn: 295439
2017-02-17 11:25:17 +00:00
Rafael Espindola 6eab4044b9 Revert "[Hexagon] Start using regmasks on calls"
This reverts commit r295371.

It broke windows bots:

http://bb.pgr.jp/builders/ninja-clang-i686-msc19-R/builds/11402/steps/test-llvm/logs/stdio

llvm-svn: 295402
2017-02-17 02:08:58 +00:00
David Blaikie fc4857f80b Fix -Wunused-lambda-capture by removing some unused lambda captures
llvm-svn: 295373
2017-02-16 20:55:48 +00:00
Krzysztof Parzyszek fb9503c080 [Hexagon] Start using regmasks on calls
All the cool targets are doing it...

llvm-svn: 295371
2017-02-16 20:25:23 +00:00
Krzysztof Parzyszek cac10f9768 [RDF] Aggregate shadow phi uses into one cluster when propagating live info
llvm-svn: 295366
2017-02-16 19:28:06 +00:00
Matt Arsenault b95ddd7cea AMDGPU: Remove llvm.AMDGPU.cube intrinsic
llvm-svn: 295359
2017-02-16 19:09:04 +00:00
Matt Arsenault eb65cda986 AMDGPU: Remove llvm.AMDGPU.rsq intrinsic
llvm-svn: 295358
2017-02-16 19:08:58 +00:00
Hans Wennborg 35905d6a67 Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)"
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.

llvm-svn: 295357
2017-02-16 19:04:42 +00:00
Krzysztof Parzyszek 84cd4ea301 [RDF] Differentiate between defining and clobbering nodes
Defining nodes should not alias with one another, while clobbering
nodes can. When pushing defs on stacks, push clobbers first, link
non-clobbering defs, then push the defs.

The data flow in a statement is now: uses -> clobbers -> defs.

llvm-svn: 295356
2017-02-16 18:53:04 +00:00
Krzysztof Parzyszek 5226ba8daa [RDF] Move normalize(RegisterRef) to PhysicalRegisterInfo
Remove the duplicate from DFG and make some members of PRI private.

llvm-svn: 295351
2017-02-16 18:45:23 +00:00
Andrea Di Biagio 42f7712e23 x86 interrupt calling convention: only save xmm registers if the target supports SSE
The existing code always saves the xmm registers for 64-bit targets even if the
target doesn't support SSE (which is common for kernels). Thus, the compiler
inserts movaps instructions which lead to CPU exceptions when an interrupt
handler is invoked.

This commit fixes this bug by returning a register set without xmm registers
from getCalleeSavedRegs and getCallPreservedMask for such targets.

Patch by Philipp Oppermann.

Differential Revision: https://reviews.llvm.org/D29959

llvm-svn: 295347
2017-02-16 18:25:37 +00:00
Sjoerd Meijer cb2d950214 [AArch64] AArch64AsmParser clean up of isImmediate functions. NFC
Regression test neon-diagnostics.s needed changing because it now
produces a more specific diagnostic about the immediate ranges. One
change in the expected error message is not obvious, but there multiple
candidate and it happens to pick the immediate diagnostic.

Differential Revision: https://reviews.llvm.org/D29939

llvm-svn: 295331
2017-02-16 15:52:22 +00:00
Dan Gohman 4a5496902c [WebAssembly] Add a cast to void to fix an unused private member warning, for now.
llvm-svn: 295327
2017-02-16 15:21:37 +00:00
Simon Pilgrim 2fe568c95e [X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead.
llvm-svn: 295326
2017-02-16 15:11:49 +00:00
Diana Picus 1540b06ef8 [ARM] GlobalISel: Select floating point loads
llvm-svn: 295321
2017-02-16 14:10:50 +00:00
Diana Picus b1701e0b05 [ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACT
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.

llvm-svn: 295310
2017-02-16 12:19:57 +00:00
Diana Picus 6beef3c087 [ARM] GlobalISel: Select double G_FADD and copies
Just use VADDD if available, bail out if not.

llvm-svn: 295309
2017-02-16 12:19:52 +00:00
Diana Picus 9b32faa821 [ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFP
llvm-svn: 295308
2017-02-16 11:25:09 +00:00
Diana Picus a93803b9fe [ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACT
Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating
point values in the soft-fp float mode.

llvm-svn: 295306
2017-02-16 11:00:31 +00:00
Diana Picus 7f82c87022 [ARM] GlobalISel: Make the FPR bank 64-bit wide
Also add mappings for single and double precision FP, and use them for G_FADD
and G_LOAD.

llvm-svn: 295302
2017-02-16 10:12:49 +00:00
Diana Picus 21c3d8e0fc [ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOAD
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.

llvm-svn: 295300
2017-02-16 09:09:49 +00:00
Diana Picus ca6a890d7f [ARM] GlobalISel: Lower double precision FP args
For the hard float calling convention, we just use the D registers.

For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.

For pure soft float targets, we still bail out because we don't support the
libcalls yet.

llvm-svn: 295295
2017-02-16 07:53:07 +00:00
Craig Topper 715873ead3 [AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics.
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.

llvm-svn: 295290
2017-02-16 06:31:54 +00:00
Matt Arsenault d3e5cb77e4 AMDGPU: Remove llvm.SI.sendmsg
llvm-svn: 295270
2017-02-16 02:01:17 +00:00
Matt Arsenault d2c8a337aa AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics
Update test uses with expansion in terms of new intrinsics.

llvm-svn: 295269
2017-02-16 02:01:13 +00:00
Hans Wennborg a468601e0e [X86] Re-enable conditional tail calls and fix PR31257.
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.

Differential Revision: https://reviews.llvm.org/D29856

llvm-svn: 295262
2017-02-16 00:04:05 +00:00
Tim Northover 9136617a3f GlobalISel: legalize va_arg on AArch64.
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.

llvm-svn: 295255
2017-02-15 23:22:50 +00:00
Matt Arsenault 824de226a1 AMDGPU: Remove dead node definitions
llvm-svn: 295247
2017-02-15 22:23:04 +00:00
Matt Arsenault a78ca62c64 AMDGPU: Consolidate sendmsg/sendmsghalt handling and tests
llvm-svn: 295244
2017-02-15 22:17:09 +00:00
Matt Arsenault d122abead4 AMDGPU: Replace assert with report_fatal_error
Also use a more refined condition.

llvm-svn: 295239
2017-02-15 21:50:34 +00:00
Simon Pilgrim 5b4c30fb32 [X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing.
Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow.

llvm-svn: 295235
2017-02-15 21:09:00 +00:00
Ahmed Bougacha f8acf568f1 [AArch64] Make am_ldrlit an iPTR - not OtherVT - operand. NFC-ish.
am_ldrlit diverged from am_brcond in r207105, but kept the OtherVT
operand type.  It made sense for branch targets, as those are
represented as MVT::Other in SDAG.  But loads operate on pointers.

This shouldn't have an observable effect on any in-tree code, but helps
make the patterns consistent for external users.

llvm-svn: 295229
2017-02-15 20:38:31 +00:00
Simon Pilgrim da25d5c7b6 [X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining
Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this.

llvm-svn: 295208
2017-02-15 17:41:33 +00:00
Stanislav Mekhanoshin 582a5237f9 [AMDGPU] Revert failed scheduling
This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206
2017-02-15 17:19:50 +00:00
Simon Pilgrim 0f0e5bd3c6 [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs
Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets

llvm-svn: 295169
2017-02-15 11:46:15 +00:00
Sagar Thakur ec65792910 [LLVM][XRAY][MIPS] Support xray on mips/mipsel/mips64/mips64el
Summary: Adds support for xray instrumentation on mips for both 32-bit and 64-bit.

Reviewed by sdardis, dberris
Differential: D27697

llvm-svn: 295164
2017-02-15 10:48:11 +00:00
Ayman Musa b8a4f255dd [X86][AVX] Remove REX_W from AVX instructions.
There is no meaning for REX_W in VEX encoded AVX instruction.

Differential Revision: https://reviews.llvm.org/D29894

llvm-svn: 295157
2017-02-15 08:12:16 +00:00
Craig Topper fbc7805e25 [X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.

As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.

I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.

This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.

Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.

Reviewers: delena, RKSimon, zvi

Reviewed By: zvi

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D28747

llvm-svn: 295155
2017-02-15 06:58:47 +00:00
Craig Topper ec5df5f4aa [AVX-512] Add PACKSS/PACKUS instructions to load folding tables.
llvm-svn: 295154
2017-02-15 06:51:39 +00:00
Stanislav Mekhanoshin 19f98c6a09 [AMDGPU] Fix MaxWorkGroupsPerCU for large workgroups
This patch corrects the maximum workgroups per CU if we have big
workgroups (more than 128). This calculation contributes to the
occupancy calculation in respect to LDS size.

Differential Revision: https://reviews.llvm.org/D29974

llvm-svn: 295134
2017-02-15 01:03:59 +00:00
Simon Dardis 454f2e7840 [mips] Correct mips16 return instructions definitions
Correct the definition of MIPS16 instructions that act as return instructions
so that isReturn = 1 as expected.

llvm-svn: 295109
2017-02-14 21:53:23 +00:00
Tim Northover 398c5f57f9 GlobalISel: deal with new G_PTR_MASK instruction on AArch64.
It's just an AND-immediate instruction for us, surprisingly simple to select.

llvm-svn: 295104
2017-02-14 20:56:29 +00:00
Krzysztof Parzyszek d3b5641586 [Hexagon] Remove leftover debugging code
llvm-svn: 295078
2017-02-14 17:37:44 +00:00
Diego Novillo 8adfc8ef3a Remove unused variable.
llvm-svn: 295065
2017-02-14 16:39:54 +00:00
Simon Pilgrim 6f732e026d [X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs
Add support for specifying an UNPCK input as UNDEF

llvm-svn: 295061
2017-02-14 16:22:04 +00:00
Alexander Timofeev 9f61feac4a Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.

llvm-svn: 295054
2017-02-14 14:29:05 +00:00
Simon Pilgrim a0878dea9e [X86][SSE] Move unary inputs handling inside matchVectorShuffleWithUNPCK.
llvm-svn: 295053
2017-02-14 13:47:17 +00:00
Simon Pilgrim 3efdffcb27 [X86][SSE] Tidyup matchVectorShuffleWithUNPCK helper function call.
Don't bother setting the V1/V2 operands again for unary shuffles.

Don't bother legalizing the value type unless the match succeeds.

llvm-svn: 295051
2017-02-14 12:54:39 +00:00
Craig Topper d2d50cba2a [AVX-512] Add PAVGB/PAVGW to load folding tables.
llvm-svn: 295035
2017-02-14 06:54:57 +00:00
Alex Bradbury e4f731b813 [RISCV] Fix RV32 datalayout string and ensure initAsmInfo is called
llvm-svn: 295028
2017-02-14 05:20:20 +00:00
Alex Bradbury 6be16fbfb8 [RISCV] Pseudo instructions are isCodeGenOnly, have blank asmstr
llvm-svn: 295027
2017-02-14 05:17:23 +00:00
Alex Bradbury d36e04cb6c [RISCV] Fix unused variable in RISCVMCTargetDesc. NFC
Also, for better uniformity use TargetRegistry::RegisterMCAsmInfo rather than 
RegisterMCAsmInfoFn. Again, no functional change.

llvm-svn: 295026
2017-02-14 05:15:24 +00:00
Eugene Zelenko d96089b248 [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
Same changes in files affected by reduced MC headers dependencies.

llvm-svn: 295009
2017-02-14 00:33:36 +00:00
Andrew Kaylor 709f1c2a9b [X86] Add MXCSR register
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.

Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.

Differential Revision: https://reviews.llvm.org/D29903

llvm-svn: 295004
2017-02-13 23:38:52 +00:00
Tim Northover 48dfa1a6ed GlobalISel: represent atomic loads & stores via the MachineMemOperand.
Also make sure the AArch64 backend doesn't try to convert them into normal
loads and stores.

llvm-svn: 294993
2017-02-13 22:14:16 +00:00
James Molloy 0ae2202235 [ARM] Fix crash caused by r294945
I'd missed a creator of FCMP nodes - duplicateCmp().

Kindly and promptly reported by Gabor Ballabas, due to his CSiBE test suite.

llvm-svn: 294968
2017-02-13 17:18:00 +00:00
Simon Dardis 509da1a46d [mips] divide macro instruction cleanup.
Clean up the implementation of divide macro expansion by getting rid of a
FIXME regarding magic numbers and branch instructions. Match GAS' behaviour
for expansion of ddiv / div in the two and three operand cases. Add the two
operand alias for MIPSR6. Finally, optimize macro expansion cases where the
divisior is the $zero register.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D29887

llvm-svn: 294960
2017-02-13 16:06:48 +00:00
Simon Pilgrim fd6a84fbaa Fix indentation. NFCI.
llvm-svn: 294959
2017-02-13 15:31:08 +00:00
Sanne Wouda 490d4a6da6 [CodeGen] fix alignment of JUMPTABLE_INSTS on v8M.base
Summary:
The attached test case fails with "fatal error: error in backend:
misaligned pc-relative fixup value" as the jump table is misaligned.
The EmitAlignment existed already for ARM and Thumb-1 code, but was
missing for Thumb-2.

The test checks that the fatal error disappears when generating an obj
file, as well as checking the align directive is there when producing an
asm file.


Reviewers: rengolin, grosbach, t.p.northover, jmolloy, SjoerdMeijer, samparker

Reviewed By: samparker

Subscribers: samparker, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D29650

llvm-svn: 294950
2017-02-13 14:07:45 +00:00
James Molloy 92497542e7 [Thumb-1] TBB generation: spot redefinitions of index register
We match a sequence of 3-4 instructions into a tTBB pseudo. One of our checks is that
a particular register in that sequence is killed (so it can be clobbered by the pseudo).

We weren't noticing if an errant MOV or other instruction had infiltrated the
sequence we were walking. If it had, and it defined the register we've already
identified as killed, it makes it live across the tBR_JT and thus unclobberable.

Notice this case and bail out.

llvm-svn: 294949
2017-02-13 14:07:39 +00:00
James Molloy 9b3b899669 [ARM] Register ConstantIslands with the pass manager
This allows us to use -stop-before/-stop-after/-run-pass - we can now write
.mir tests.

llvm-svn: 294948
2017-02-13 14:07:25 +00:00
James Molloy d508789668 [ARM] Use VCMP, not VCMPE, for floating point equality comparisons
When generating a floating point comparison we currently unconditionally
generate VCMPE. This has the sideeffect of setting the cumulative Invalid
bit in FPSCR if any of the operands are QNaN.

It is expected that use of a relational predicate on a QNaN value should
raise Invalid. Quoting from the C standard:

  The relational and equality operators support the usual mathematical
  relationships between numeric values. For any ordered pair of numeric
  values exactly one of relationships the less, greater, equal and is true.
  Relational operators may raise the floating-point exception when argument
  values are NaNs.

The standard doesn't explicitly state the expectation for equality operators,
but the implication and obvious expectation is that equality operators
should not raise Invalid on a QNaN input, as those predicates are wholly
defined on unordered inputs (to return not equal).

Therefore, add a new operand to ARMISD::FPCMP and FPCMPZ indicating if
QNaN should raise Invalid, and pipe that through to TableGen.

llvm-svn: 294945
2017-02-13 12:32:47 +00:00
Simon Pilgrim 828dee1f70 [X86][SSE] Create matchVectorShuffleWithUNPCK helper function.
Currently only used by target shuffle combining - will use it for lowering as well in a future patch.

llvm-svn: 294943
2017-02-13 11:52:58 +00:00
Ayman Musa f77219e035 [X86][AVX512] Fix operand classes for some AVX512 instructions to keep consistency between VEX/EVEX versions of the same instruction.
Differential Revision: https://reviews.llvm.org/D29873

llvm-svn: 294937
2017-02-13 09:55:48 +00:00
Craig Topper 680c73e7ab [X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR to support 512-bit vectors with 128-bit or 256-bit subvectors.
We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors.

llvm-svn: 294931
2017-02-13 04:53:29 +00:00
Craig Topper 53eafa8ea4 [X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR.
This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist.

llvm-svn: 294929
2017-02-12 23:49:46 +00:00
Simon Pilgrim cc9242bd1c [X86] Fix typo in function name. NFCI.
convertBitVectorToUnsiged - convertBitVectorToUnsigned

llvm-svn: 294914
2017-02-12 20:53:44 +00:00
Craig Topper cfe8ce3a58 [AVX-512] Add various EVEX move instructions to load folding tables using the VEX equivalents as a guide.
llvm-svn: 294908
2017-02-12 18:47:46 +00:00
Craig Topper 5971b5488e [AVX-512] Add VMOV64toSDZrm CodeGenOnly instruction based on the same instruction from AVX/SSE.
I can't prove that we can select this instruction or the AVX/SSE version, but I'm adding it for consistency for now so I can continue matching the load folding tables.

llvm-svn: 294907
2017-02-12 18:47:44 +00:00
Craig Topper ec26801483 [X86] Fix a couple instruction names to use 'mr' instead of 'rm' to indicate they are stores. AVX-512 version was already named with 'mr'.
llvm-svn: 294906
2017-02-12 18:47:40 +00:00
Craig Topper 6eca3170a8 [AVX-512] Add VPEXTRD/Q to load folding tables.
llvm-svn: 294905
2017-02-12 18:47:37 +00:00
Simon Pilgrim 04ec0f2b2a [X86][SSE] Update argument names to match function name. NFCI.
The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently.

llvm-svn: 294900
2017-02-12 16:46:41 +00:00
Simon Pilgrim 4cd841757a [X86][AVX2] Add support for combining target shuffles to VPMOVZX
Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch.

llvm-svn: 294896
2017-02-12 14:31:23 +00:00
NAKAMURA Takumi 022c6e4f33 AMDGPU::expandMemIntrinsicUses(): Fix an uninitialized variable. This function returned true or undef.
llvm-svn: 294895
2017-02-12 13:15:31 +00:00
Elena Demikhovsky 5d91ab46c0 AVX-512: Fixed DWARF register numbers for XMM16-31
The reference is here: 
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

llvm-svn: 294890
2017-02-12 07:56:50 +00:00
Craig Topper 1c37e991e6 [X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend.
llvm-svn: 294876
2017-02-11 22:57:12 +00:00
Simon Pilgrim 755d9127f5 [X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG
Preparatory step for PR31712

llvm-svn: 294874
2017-02-11 22:47:06 +00:00
Simon Pilgrim 437d64c49e [X86][SSE] Improve VSEXT/VZEXT constant folding.
Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR .

llvm-svn: 294873
2017-02-11 21:55:24 +00:00
Simon Pilgrim 4ef9672f0f [X86][SSE] Add early-out when trying to match blend shuffle. NFCI.
llvm-svn: 294864
2017-02-11 18:06:24 +00:00
Amaury Sechet 58ce15aba1 Fix indentation in X86ISelLowering. NFC
llvm-svn: 294859
2017-02-11 17:48:48 +00:00
Craig Topper 255343483d [AVX-512] Add VPMINS/MINU/MAXS/MAXU instructions to load folding tables.
llvm-svn: 294858
2017-02-11 17:35:28 +00:00
Craig Topper b2fa216dd5 [X86] Improve alphabetizing of load folding tables. NFC
llvm-svn: 294857
2017-02-11 17:35:25 +00:00
Simon Pilgrim 0e6945e48a [X86][SSE] Convert getTargetShuffleMaskIndices to use getTargetConstantBitsFromNode.
Removes duplicate constant extraction code in getTargetShuffleMaskIndices.

getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits.

llvm-svn: 294856
2017-02-11 17:27:21 +00:00
Simon Pilgrim d59fa0e38a [X86] Merge repeated getScalarValueSizeInBits calls. NFCI.
llvm-svn: 294852
2017-02-11 16:42:07 +00:00
Simon Pilgrim 6411a0ebed [X86][3DNow!] Enable PFSUB<->PFSUBR commutation
llvm-svn: 294847
2017-02-11 13:51:14 +00:00
Simon Pilgrim 4ead1d4aa9 [X86][3DNow!] Enable commutation for PFADD/PFMUL/PFCMPEQ/PAVGUSB/PMULHRW
All commutations confirmed to give identical results - note PFMAX/PFMIN do not

PFSUB<->PFSUBR should be commutable as well

llvm-svn: 294846
2017-02-11 13:32:55 +00:00
Vitaly Buka bcb6622c95 Fix "left shift of negative value -1" introduced by r294805
llvm-svn: 294843
2017-02-11 12:44:03 +00:00
Benjamin Kramer efcf06f5f2 Move symbols from the global namespace into (anonymous) namespaces. NFC.
llvm-svn: 294837
2017-02-11 11:06:55 +00:00
Craig Topper 1f6153bab4 [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.
llvm-svn: 294830
2017-02-11 07:01:40 +00:00
Craig Topper a9818aadab [AVX-512] Fix apparent typo in instruction name VMOVSSDrr_REV->VMOVSDZrr_REV.
llvm-svn: 294829
2017-02-11 07:01:38 +00:00
Craig Topper 3afa777f10 [AVX-512] Add VPSADBW instructions to load folding tables.
llvm-svn: 294827
2017-02-11 06:24:03 +00:00
Craig Topper 464b8cb244 [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 is available.
Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available.

This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp.

Overall I think this produces better results in the modified test cases.

llvm-svn: 294824
2017-02-11 05:32:57 +00:00
Ahmed Bougacha 8425f453ef [ARM] Make f16 interleaved accesses expensive.
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.

Teach the cost model to consider f16 interleaved operations as
expensive.  Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.

llvm-svn: 294819
2017-02-11 01:53:04 +00:00
Ahmed Bougacha fc979dc9dd [ARM] Don't lower f16 interleaved accesses.
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.

Reject f16 interleaved accesses.  If we try to emit the f16 intrinsics,
we'll just end up with a selection failure.

llvm-svn: 294818
2017-02-11 01:53:00 +00:00
Dan Gohman dfe6ce7abd [WebAssembly] Remove old experimental disassemler code.
Remove support for disassembling an old experimental wasm binary format, which
is no longer in use anywhere.

llvm-svn: 294809
2017-02-11 00:02:23 +00:00
Krzysztof Parzyszek f9015e62fd [Hexagon] Introduce Hexagon V62
llvm-svn: 294805
2017-02-10 23:46:45 +00:00
Benjamin Kramer aa5adfa360 [PPC] Silence warning in Release builds.
llvm-svn: 294791
2017-02-10 22:13:34 +00:00
Tim Shen 21a960b6a6 Fix a silly syntax error.
llvm-svn: 294783
2017-02-10 21:17:35 +00:00
Tim Shen 918ed871df [XRay] Implement powerpc64le xray.
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.

Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.

Reviewers: dberris, echristo, iteratee, kbarton, hfinkel

Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D29742

llvm-svn: 294781
2017-02-10 21:03:24 +00:00
Krzysztof Parzyszek fe58ca3218 [Hexagon] Remove unused .td files
llvm-svn: 294775
2017-02-10 19:54:00 +00:00
Ahmed Bougacha 2e275e272f [X86] Bitcast subvector before broadcasting it.
Since r274013, we've been looking through bitcasts on broadcast inputs.
In the scalar-folding case (from a load, build_vector, or sc2vec),
the input type didn't matter, as we'd simply bitcast the resulting
scalar back.

However, when broadcasting a 128-bit-lane-aligned element, we create an
EXTRACT_SUBVECTOR.  Use proper types, by creating an extract_subvector
of the original input type.

llvm-svn: 294774
2017-02-10 19:51:47 +00:00
John Brawn e60f4e4b8d [ARM] Fix incorrect mask bits in MSR encoding for write_register intrinsic
In the encoding of system registers in the M-class MSR instruction the mask bits
should be 2 for registers that don't take a _<bits> qualifier (the instruction
is unpredictable otherwise), and should also be 2 if the register takes a
_<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq.

Differential Revision: https://reviews.llvm.org/D29828

llvm-svn: 294762
2017-02-10 17:41:08 +00:00
Krzysztof Parzyszek a72fad980c [Hexagon] Replace instruction definitions with auto-generated ones
llvm-svn: 294753
2017-02-10 15:33:13 +00:00
Rafael Espindola be99157127 Move some error handling down to MCStreamer.
This makes sure we get the same redefinition rules regardless of who
is printing (asm parser, codegen) and to what (asm, obj).

This fixes an unintentional regression in r293936.

llvm-svn: 294752
2017-02-10 15:13:12 +00:00
Simon Pilgrim 8c8b10389d [X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI.
Also reordered an if statement to test low cost comparisons first

llvm-svn: 294748
2017-02-10 14:27:59 +00:00
Simon Pilgrim c371159aac [X86][SSE] Add support for extracting target constants from BUILD_VECTOR
In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet

Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode
llvm-svn: 294746
2017-02-10 14:04:11 +00:00
Simon Pilgrim 1140281413 [X86][SSE] Add missing comment describing combing to SHUFPS. NFCI
llvm-svn: 294745
2017-02-10 13:16:01 +00:00
Igor Breger 6677999e17 add #ifdef, fix compilation error in case LLVM_BUILD_GLOBAL_ISEL=OFF
llvm-svn: 294726
2017-02-10 07:33:14 +00:00
Igor Breger b4442f34cd [X86][GlobalISel] Add general-purpose Register Bank
Summary:
[X86][GlobalISel] Add general-purpose Register Bank.
Add trivial  handling of G_ADD legalization .
Add Regestry Bank selection for COPY and G_ADD  instructions

Reviewers: rovka, zvi, ab, t.p.northover, qcolombet

Reviewed By: qcolombet

Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29771

llvm-svn: 294723
2017-02-10 07:05:56 +00:00
Eric Christopher 0824096cc0 Temporarily revert "For X86-64 linux and PPC64 linux align int128 to 16 bytes."
until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise
we can't code generate existing bitcode without a string equality data layout.

This reverts commit r294702.

llvm-svn: 294709
2017-02-10 04:35:32 +00:00
Eric Christopher 42b9248803 For X86-64 linux and PPC64 linux align int128 to 16 bytes.
For other platforms we should find out what they need and likely
make the same change, however, a smaller additional change is easier
for platforms we know have it specified	in the ABI. As part of this
rewrite some of the handling in the backends for data layout and update
a bunch of testcases.

Based on a patch by Simonas Kazlauskas!

llvm-svn: 294702
2017-02-10 03:32:21 +00:00
Matt Arsenault b4493e909f AMDGPU: Fix trailing whitespace
llvm-svn: 294694
2017-02-10 02:42:31 +00:00
Wei Ding 205bfdb3e9 AMDGPU : Add trap handler support.
Differential Revision: http://reviews.llvm.org/D26010

llvm-svn: 294692
2017-02-10 02:15:29 +00:00
Stanislav Mekhanoshin 6dec24316b [AMDGPU] Override PSet for M0
This change returns empty PSet list for M0 register. Otherwise its
PSet as defined by tablegen is SReg_32. This results in incorrect
register pressure calculation every time an instruction uses M0.
Such uses count as SReg_32 PSet and inadequately increase pressure
on SGPRs.

Differential Revision: https://reviews.llvm.org/D29798

llvm-svn: 294691
2017-02-10 02:07:58 +00:00
Dan Gohman df4f4d45f0 [WebAssembly] Pass an MCContext to WebAssemblyMCCodeEmitter. NFC.
llvm-svn: 294679
2017-02-10 00:14:42 +00:00
Matthias Braun 2bef2a08c0 Fix syntax error
llvm-svn: 294678
2017-02-10 00:09:20 +00:00
Matthias Braun 62e1e8531b ARMSubtarget.h: Change to one line per enum element; NFC
Change syntax to have enum elements sorted alphabetically and one per
line as that is more merge/cherry pick friendly.

llvm-svn: 294677
2017-02-10 00:06:44 +00:00
George Burgess IV ccf11c2f9f [ARM] Add support for armv7ve triple in llvm (PR31358).
Gcc supports target armv7ve which is armv7-a with virtualization
extensions. This change adds support for this in llvm for gcc
compatibility.

Also remove redundant FeatureHWDiv, FeatureHWDivARM for a few models as
this is specified automatically by FeatureVirtualization.

Patch by Manoj Gupta.

Differential Revision: https://reviews.llvm.org/D29472

llvm-svn: 294661
2017-02-09 23:29:14 +00:00
Dan Gohman b6afd2070a [WebAssembly] Refactor void return peephole using MaybeRewriteToFallthrough. NFC.
llvm-svn: 294652
2017-02-09 23:19:03 +00:00
Simon Pilgrim 7f0d7e08b2 [X86] Remove duplicate call to getValueType. NFCI.
llvm-svn: 294640
2017-02-09 22:35:59 +00:00
Peter Collingbourne ef089bdb4b X86: Introduce relocImm-based patterns for cmp.
Differential Revision: https://reviews.llvm.org/D28690

llvm-svn: 294636
2017-02-09 22:02:28 +00:00
Matt Arsenault 0699ef39ce AMDGPU: Add pass to expand memcpy/memmove/memset
llvm-svn: 294635
2017-02-09 22:00:42 +00:00
Peter Collingbourne d7dd65ad7c X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.

Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.

Differential Revision: https://reviews.llvm.org/D28621

llvm-svn: 294634
2017-02-09 21:58:24 +00:00
Konstantin Zhuravlyov fd87137710 [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement
Differential Revision: https://reviews.llvm.org/D29741

llvm-svn: 294627
2017-02-09 21:33:23 +00:00
Daniel Berlin 73ad5cb9b1 Drop graph_ prefix
llvm-svn: 294621
2017-02-09 20:37:46 +00:00
Daniel Berlin 58a6e57394 GraphTraits: Add range versions of graph traits functions (graph_nodes, graph_children, inverse_graph_nodes, inverse_graph_children).
Summary:
Convert all obvious node_begin/node_end and child_begin/child_end
pairs to range based for.

Sending for review in case someone has a good idea how to make
graph_children able to be inferred. It looks like it would require
changing GraphTraits to be two argument or something. I presume
inference does not happen because it would have to check every
GraphTraits in the world to see if the noderef types matched.

Note: This change was 3-staged with clang as well, which uses
Dominators/etc from LLVM.

Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith

Subscribers: arsenm, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D29767

llvm-svn: 294620
2017-02-09 20:37:24 +00:00
Simon Pilgrim e0b5c2acbd Convert to for-range loop. NFCI.
llvm-svn: 294610
2017-02-09 18:52:24 +00:00
Simon Pilgrim 6bf1bd3ed6 [X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode.
llvm-svn: 294596
2017-02-09 17:08:47 +00:00
Pierre Gousseau 6953b32475 [X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math.
In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants.

Differential Revision: https://reviews.llvm.org/D29756

llvm-svn: 294588
2017-02-09 14:43:58 +00:00
Diana Picus 7232af352f [ARM] GlobalISel: Lower single precision FP args
Both for aapcscc and aapcs_vfpcc. We currently filter out soft float targets
because we don't support libcalls yet.

llvm-svn: 294584
2017-02-09 13:09:59 +00:00
Simon Pilgrim 563e23e66e [X86][SSE] Attempt to break register dependencies during lowerBuildVector
LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register.

This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD.

On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.)

Differential Revision: https://reviews.llvm.org/D29720

llvm-svn: 294581
2017-02-09 11:50:19 +00:00
Craig Topper 3cac763532 [X86] Remove the HLE feature flag.
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.

llvm-svn: 294562
2017-02-09 06:51:02 +00:00
Craig Topper 86576bd921 [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by any instructions and not tested.
If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing.

llvm-svn: 294561
2017-02-09 06:50:59 +00:00
Craig Topper 50f3d1452c [X86] Clzero intrinsic and its addition under znver1
This patch does the following.

1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.

Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.

Differential revision: https://reviews.llvm.org/D29385

llvm-svn: 294558
2017-02-09 04:27:34 +00:00
Arnold Schwaighofer 26f016f143 SwiftCC: swifterror register cannot be as the base register
Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM.  We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:

 X21 on AArch64
 R8 on ARM

rdar://30433803

llvm-svn: 294551
2017-02-09 01:52:17 +00:00
Tim Northover e041841811 GlobalISel: legalize G_FPOW to a libcall on AArch64.
There's no instruction to implement it.

llvm-svn: 294531
2017-02-08 23:23:39 +00:00
Arnold Schwaighofer db7bbcbe78 [ARM/AArch ISel] SwiftCC: First parameters that are marked swiftself are not 'this returns'
We mark X0 as preserved by a call that passes the returned parameter.

 x0 = ...
 fun(x0) // no implicit def of x0

This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).

x20 = ...
fun(x20) // there should be an implict def of x8

rdar://30425845

llvm-svn: 294527
2017-02-08 22:30:47 +00:00
Eugene Zelenko 41e7e34c49 [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MC headers dependencies.

llvm-svn: 294525
2017-02-08 22:19:56 +00:00
Amara Emerson c3a4b282bb Revert r294437 as it broke an asan buildbot.
llvm-svn: 294523
2017-02-08 21:41:16 +00:00
Tim Northover 9dd78f8a6d GlobalISel: select G_[SU]MULH on AArch64.
Hopefully this'll be nuked by tablegen pretty soon, but until then it's
reasonably important for supporting C++ operator new[].

llvm-svn: 294520
2017-02-08 21:22:25 +00:00
Tim Northover 0a9b27933a GlobalISel: expand mul-with-overflow into mul-hi on AArch64.
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].

llvm-svn: 294519
2017-02-08 21:22:15 +00:00
Stanislav Mekhanoshin 4a24705dd6 [AMDGPU] Implement register pressure callbacks
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.

This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.

Note, these are used not only by scheduling, but in places like
MachineLICM.

Differential Revision: https://reviews.llvm.org/D29700

llvm-svn: 294518
2017-02-08 21:22:03 +00:00
Chris Bieneman e7a982040b [CMake] Fix `is_llvm_target_library` and support out-of-order components
Summary: This patch is required by D28855, and enables us to rely on CMake's ability to handle out of order target dependencies.

Reviewers: mgorny, chapuni, bryant

Subscribers: llvm-commits, jgosnell

Differential Revision: https://reviews.llvm.org/D28869

llvm-svn: 294514
2017-02-08 20:58:37 +00:00
Simon Dardis 2e8cdbd795 [DebugInfo] Rename EmitDebugValue to EmitDebugThreadLocal (NFC)
As pointed out by David Blaikie in the post commit review of
r292624, EmitDebugValue should be called EmitDebugThreadLocal.

llvm-svn: 294500
2017-02-08 19:03:46 +00:00
Tim Northover e9600d861c GlobalISel: select G_VASTART on iOS AArch64.
The AAPCS ABI is substantially more complicated so that's coming in a separate
patch. For now we can generate correct code for iOS though.

llvm-svn: 294493
2017-02-08 17:57:27 +00:00
Tim Northover f19d467ff6 GlobalISel: translate @llvm.va_start intrinsic.
Because we need to preserve the memory access being performed we need a
separate instruction to represent this.

llvm-svn: 294492
2017-02-08 17:57:20 +00:00
Matt Arsenault 560665250f NVPTX: Extract mem intrinsic expansions into utilities
llvm-svn: 294490
2017-02-08 17:49:52 +00:00
Krzysztof Parzyszek 1df58fc2f9 [Hexagon] Fix decoding conflict between A2_zxtb and A4_ext
llvm-svn: 294472
2017-02-08 16:31:00 +00:00
Simon Dardis 3c82a64636 [mips] MUL macro variations
[mips] MUL macro variations

Adds support for MUL macro variations.

Patch by: Srdjan Obucina

Reviewers: zoran.jovanovic, vkalintiris, dsanders, sdardis, obucina, seanbruno

Differential Revision: https://reviews.llvm.org/D16807

llvm-svn: 294471
2017-02-08 16:25:05 +00:00
Simon Pilgrim dcd10344a3 [X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI.
Run clang-format and standardized variable names between functions.

llvm-svn: 294456
2017-02-08 14:44:45 +00:00
Konstantin Zhuravlyov b5acb8ec47 [AMDGPU][NFC] Assign IsaInfo to reference variable in order to shorten long lines
llvm-svn: 294454
2017-02-08 14:34:10 +00:00
Konstantin Zhuravlyov 9f89ede107 [AMDGPU] Add target information that is required by tools to metadata
Differential Revision: https://reviews.llvm.org/D28760#fb670e28

llvm-svn: 294449
2017-02-08 14:05:23 +00:00
Konstantin Zhuravlyov 27d64c3566 [AMDGPU][NFC] De-tabify
llvm-svn: 294445
2017-02-08 13:29:23 +00:00
Diana Picus 4fa83c03fd [ARM] GlobalISel: Add FPR reg bank
Add a register bank for floating point values and select simple instructions
using them (add, copies from GPR).

This assumes that the hardware can cope with a single precision add (VADDS)
instruction, so the legalizer will treat G_FADD as legal and the instruction
selector will refuse to select if the hardware doesn't support it. In the future
we'll want to be more careful about this, and legalize to libcalls if we have to
use soft float.

llvm-svn: 294442
2017-02-08 13:23:04 +00:00
Konstantin Zhuravlyov e22fbcb264 [AMDGPU] Distinguish between S/VGPR allocation and encoding granularities
Differential Revision: https://reviews.llvm.org/D29633

llvm-svn: 294441
2017-02-08 13:18:40 +00:00
Konstantin Zhuravlyov e03b1d7b6a [AMDGPU] Move register related queries to subtarget class
Differential Revision: https://reviews.llvm.org/D29318

llvm-svn: 294440
2017-02-08 13:02:33 +00:00
Amara Emerson fecdb36f92 [AArch64][TableGen] Skip tied result operands for InstAlias
This patch checks the number of operands in the resulting
instruction instead of just the alias, then skips over
tied operands when generating the printing method.

This allows us to generate the preferred assembly syntax
for the AArch64 'ins' instruction, which should always be
displayed as 'mov' according to the ARMARM.

Several unit tests have changed as a result, but only to
reflect the preferred disassembly.

Some other InstAlias patterns (movk/bic/orr) needed a
slight adjustment to stop them becoming the default
and breaking other unit tests.

Patch by Graham Hunter.

Differential Revision: https://reviews.llvm.org/D29219

llvm-svn: 294437
2017-02-08 11:28:08 +00:00
Dylan McKay 2e35987dc8 [AVR] Add missing #includes
A previous change seems to have remove #includes from header files. This
fixes the build.

llvm-svn: 294427
2017-02-08 08:52:46 +00:00
Matt Arsenault 417e0072d6 AMDGPU: Enable InferAddressSpaces
llvm-svn: 294408
2017-02-08 06:16:04 +00:00
Craig Topper 3fd463a15a [X86] Add test for clflushopt intrinsic and only enable it to be selected if the feature flag is set.
llvm-svn: 294407
2017-02-08 05:45:46 +00:00
Craig Topper 6c05192018 [X86] Remove the VMFUNC feature flag. It was only partially implemented and we have no support for codegening vmfunc instructions today.
If that support ever gets added, the full feature flag support should come along with it.

llvm-svn: 294406
2017-02-08 05:45:42 +00:00
Craig Topper e0ac7f3beb [X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it.
Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction

llvm-svn: 294405
2017-02-08 05:45:39 +00:00
Craig Topper 55bc6cb4a7 Move mnemonicIsValid to Mips target.
Summary:
The Mips target is the only user of mnemonicIsValid. This patch
moves this method from AsmMatcherEmitter.cpp to MipsAsmParser.cpp,
getting rid of the method in all other targets where it generated
warnings about an unused function.

Patch by Gonsolo.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: sdardis

Differential Revision: https://reviews.llvm.org/D28748

llvm-svn: 294400
2017-02-08 02:54:12 +00:00
Eugene Zelenko ee513ed84c [PowerPC] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MC headers dependencies.

llvm-svn: 294368
2017-02-07 22:59:46 +00:00
Hans Wennborg 819e3e02a9 [X86] Disable conditional tail calls (PR31257)
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).

Reverting until we figure out the proper solution.

llvm-svn: 294348
2017-02-07 20:37:45 +00:00
Sanjay Patel b0cee9b273 [x86] improve comments for SHRUNKBLEND node creation; NFC
llvm-svn: 294344
2017-02-07 19:54:16 +00:00
Sanjoy Das 2f63cbcc0c [ImplicitNullCheck] Extend Implicit Null Check scope by using stores
Summary:
This change allows usage of store instruction for implicit null check.

Memory Aliasing Analisys is not used and change conservatively supposes
that any store and load may access the same memory. As a result
re-ordering of store-store, store-load and load-store is prohibited.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: atrick, llvm-commits

Differential Revision: https://reviews.llvm.org/D29400

llvm-svn: 294338
2017-02-07 19:19:49 +00:00
Sanjay Patel ef6d573f67 [x86] use range-for loops; NFCI
llvm-svn: 294337
2017-02-07 19:18:25 +00:00
Sanjay Patel 633ecbf3c4 [x86] use getSignBit() for clarity; NFCI
llvm-svn: 294333
2017-02-07 19:01:35 +00:00
Nemanja Ivanovic 17aeb5a260 [PowerPC][Altivec] Add vnot extended mnemonic
Adds the vnot extended mnemonic for the vnor instruction.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29225

llvm-svn: 294330
2017-02-07 18:57:29 +00:00
Alexander Timofeev a3dace3619 [AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track
lane masks.

	 Differential revision: https://reviews.llvm.org/D29442

llvm-svn: 294324
2017-02-07 17:57:48 +00:00
Krzysztof Parzyszek 5ea971ced5 [Hexagon] Update instruction types
Remove TypeXTYPE, TypeALU32, TypeSYSTEM, TypeJR, and instead use their
architecture counterparts.

Patch by Colin LeMahieu.

llvm-svn: 294321
2017-02-07 17:47:37 +00:00
Krzysztof Parzyszek c8d676ef72 [Hexagon] Remove encoding bits from mapped instructions
- Map A2_zxtb to A2_andir.
- Map PS_call_nr J2_call.
- Map A2_tfr[t|f][new] to A2_padd[t|f][new].
    
Patch by Colin LeMahieu.

llvm-svn: 294320
2017-02-07 17:42:11 +00:00
Simon Pilgrim 8c0f62d293 [X86][SSE] Ensure that vector shift-by-immediate inputs are correctly bitcast to the result type
vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input.

Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them.

llvm-svn: 294308
2017-02-07 14:22:25 +00:00
Christof Douma d3ed8380e0 [ARM] Make RWPI use movw/movt when available
When constructing global address literals while targeting the RWPI
relocation model. LLVM currently only uses literal pools. If MOVW/MOVT
instructions are available we can use these instead. Beside being more
efficient it allows -arm-execute-only to work with
-relocation-model=RWPI as well.

When we generate MOVW/MOVT for global addresses when targeting the RWPI
relocation model, we need to use base relative relocations. This patch
does the needed plumbing in MC to generate these for MOVW/MOVT.

Differential Revision: https://reviews.llvm.org/D29487

Change-Id: I446786e43a6f5aa9b6a5bb2cd216d60d41c7755d
llvm-svn: 294298
2017-02-07 13:07:12 +00:00
Craig Topper 9191c3324a [AVX-512] Add masked and unmasked shift by immediate instructions to load folding tables.
llvm-svn: 294287
2017-02-07 07:31:00 +00:00
Craig Topper 62304d80e3 [AVX-512] Add masked shift instructions to load folding tables.
This adds the masked versions of everything, but the shift by immediate instructions.

llvm-svn: 294286
2017-02-07 07:30:57 +00:00
Craig Topper 45d9ddc687 [AVX-512] Add some of the shift instructions to the load folding tables.
This includes unmasked forms of variable shift and shifting by the lower element of a register.

Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.

llvm-svn: 294285
2017-02-07 07:30:54 +00:00
Matt Arsenault f48169862d AMDGPU: Fix missing static
llvm-svn: 294281
2017-02-07 04:37:59 +00:00
Craig Topper 39d86bb688 [X86] Change the Defs list for VZEROALL/VZEROUPPER back to not including YMM16-31.
llvm-svn: 294277
2017-02-07 04:10:57 +00:00
Krzysztof Parzyszek 0605f3fddd [Hexagon] Address ASAN and UBSAN failures after r294226
Reinstate r294256 with a fix.

llvm-svn: 294269
2017-02-07 02:31:53 +00:00
Yaxun Liu 8f844f3960 [AMDGPU] Lower null pointers in static variable initializer
For amdgcn target Clang generates addrspacecast to represent null pointers in private and local address spaces.

    In LLVM codegen, the static variable initializer is lowered by virtual function AsmPrinter::lowerConstant which is target generic. Since addrspacecast is target specific, AsmPrinter::lowerConst

    This patch overrides AsmPrinter::lowerConstant with AMDGPUAsmPrinter::lowerConstant, which is able to lower the target-specific addrspacecast in the null pointer representation so that -1 is co

    Differential Revision: https://reviews.llvm.org/D29284

llvm-svn: 294265
2017-02-07 00:43:21 +00:00
Tim Northover 868332d6bf GlobalISel: legalize narrow G_SELECTS on AArch64.
Otherwise there aren't any patterns to select them.

llvm-svn: 294261
2017-02-06 23:41:27 +00:00
Krzysztof Parzyszek becf0a362a Revert "[Hexagon] Address ASAN and UBSAN failures after r294226"
This reverts commit r294256. It seems to be causing more problems instead
of solving them.

llvm-svn: 294259
2017-02-06 23:30:17 +00:00
Krzysztof Parzyszek 5b4a6b67c5 [Hexagon] Adding gp+ to the syntax of gp-relative instructions
Patch by Colin LeMahieu.

llvm-svn: 294258
2017-02-06 23:18:57 +00:00
Stanislav Mekhanoshin 99be1aff31 [AMDGPU] Fix GCNSchedStrategy.cpp debug output
There is typo in the debug output: top and bottom candidates are switched.

Differential Revision: https://reviews.llvm.org/D29608

llvm-svn: 294257
2017-02-06 23:16:51 +00:00
Krzysztof Parzyszek 35a7838954 [Hexagon] Address ASAN and UBSAN failures after r294226
llvm-svn: 294256
2017-02-06 23:11:32 +00:00
Tim Northover 6f2db57dae GlobalISel: fall back gracefully when we can't map an operand's size.
AArch64 was asserting when it was asked to provide a register-bank of a size it
couldn't deal with (in this case an s128 IMPLICIT_DEF). But we want a robust
fallback path so this isn't allowed.

llvm-svn: 294248
2017-02-06 21:57:06 +00:00
Tim Northover 0e6afbdd77 GlobalISel: legalize G_INSERT instructions
We don't handle all cases yet (see arm64-fallback.ll for an example), but this
is enough to cover most common C++ code so it's a good place to start.

llvm-svn: 294247
2017-02-06 21:56:47 +00:00
Eugene Zelenko 90562dfb50 [X86] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.(vlsj-clangbuild)[622]

llvm-svn: 294246
2017-02-06 21:55:43 +00:00
Krzysztof Parzyszek 8cdfe8ecf3 [Hexagon] Update MCTargetDesc
Changes include:
- Updates to the instruction descriptor flags.
- Improvements to the packet shuffler and checker.
- Updates to the handling of certain relocations.
- Better handling of duplex instructions.

llvm-svn: 294226
2017-02-06 19:35:46 +00:00
John Brawn 3a9c842a9d [AArch64] Fix incorrect MachinePointerInfo in splitStoreSplat
When splitting up one store into several in splitStoreSplat we have to
make sure we get the MachinePointerInfo right, otherwise alias
analysis thinks they all store to the same location. This can then
cause invalid scheduling later on.

Differential Revision: https://reviews.llvm.org/D29446

llvm-svn: 294203
2017-02-06 18:07:20 +00:00
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Simon Dardis 3aa8a90eff [mips] dla expansion without the at register
Previously only the superscalar scheduled expansion of the dla macro for
MIPS64 was implemented. If assembler temporary register is not available
and the optional source register is not the destination register, synthesize
the address using the naive solution of adds and shifts.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29328

llvm-svn: 294182
2017-02-06 12:43:46 +00:00
Dylan McKay 0acfafdbd6 [AVR] Use 'print' instead of 'dump'
This should fix an undefined reference on the AVR buildbot.

llvm-svn: 294175
2017-02-06 08:43:30 +00:00
Igor Breger 5c31a4c9a3 [X86][GlobalISel] Add limited ret lowering support to the IRTranslator.
Summary:
Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only.
Support argument lowering for float/double types.

Reviewers: t.p.northover, zvi, ab, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D29261

llvm-svn: 294173
2017-02-06 08:37:41 +00:00
Craig Topper 5d9ecd23e8 [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.
llvm-svn: 294170
2017-02-06 05:12:14 +00:00
Craig Topper f0eb60a6f3 [AVX-512] Add VPABSB/D/Q/W to load folding tables.
llvm-svn: 294169
2017-02-06 03:18:01 +00:00
Craig Topper 864b1a5376 [AVX-512] Add VSHUFPS/PD to load folding tables.
llvm-svn: 294168
2017-02-06 03:17:58 +00:00
Craig Topper 75218fb6b1 [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.
llvm-svn: 294164
2017-02-06 01:19:26 +00:00
Craig Topper 452a7770e6 [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables.
llvm-svn: 294163
2017-02-05 23:31:48 +00:00
Simon Pilgrim 380ce75687 [X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector

llvm-svn: 294162
2017-02-05 22:50:29 +00:00
Craig Topper 8eb1f315ac [AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables.
llvm-svn: 294153
2017-02-05 22:25:46 +00:00
Craig Topper cb4bc8be5b [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables.
llvm-svn: 294152
2017-02-05 22:25:42 +00:00
Craig Topper 59af67206d [AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads.
llvm-svn: 294151
2017-02-05 22:25:40 +00:00
Dylan McKay ccd819ad94 [AVR] Implement stacksave/stackrestore by expanding (PR31342)
Summary:
Authored by Florian Zeitz.

This implements the missing stacksave/stackrestore intrinsics via expansion.

Output of `llc -O0 -march=avr ~/devel/llvm/test/CodeGen/Generic/stacksave-restore.ll` for sanity checking (comments mine):

```
	.text
	.file	".../llvm/test/CodeGen/Generic/stacksave-restore.ll"
	.globl	test
	.p2align	1
	.type	test,@function
test:                                   ; @test
; BB#0:
	push	r28
	push	r29

	in	r28, 61
	in	r29, 62
	sbiw	r28, 4
	in	r0, 63
	cli
	out	62, r29
	out	63, r0
	out	61, r28

	in	r18, 61
	in	r19, 62

	mov	r20, r22
	mov	r21, r23

	in	r30, 61
	in	r31, 62

	lsl	r22
	rol	r23
	lsl	r22
	rol	r23
	in	r26, 61
	in	r27, 62
	sub	r26, r22
	sbc	r27, r23
	andi	r26, 252
	in	r0, 63
	cli
	out	62, r27
	out	63, r0
	out	61, r26

	in	r0, 63
	cli
	out	62, r31
	out	63, r0
	out	61, r30

	in	r30, 61
	in	r31, 62
	sub	r30, r22
	sbc	r31, r23
	andi	r30, 252
	in	r0, 63
	cli
	out	62, r31
	out	63, r0
	out	61, r30

	std	Y+3, r24                ; 2-byte Folded Spill
	std	Y+4, r25                ; 2-byte Folded Spill

	mov	r24, r26
	mov	r25, r27

	in	r0, 63
	cli
	out	62, r19
	out	63, r0
	out	61, r18

	std	Y+1, r20                ; 2-byte Folded Spill
	std	Y+2, r21                ; 2-byte Folded Spill

	adiw	r28, 4
	in	r0, 63
	cli
	out	62, r29
	out	63, r0
	out	61, r28

	pop	r29
	pop	r28
	ret
.Lfunc_end0:
	.size	test, .Lfunc_end0-test
```

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29553

llvm-svn: 294146
2017-02-05 21:35:45 +00:00
Kamil Rytarowski 5d2bd8dd54 Revamp llvm::once_flag to be closer to std::once_flag
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.

This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.

Sponsored by <The NetBSD Foundation>

Reviewers: mehdi_amini, labath, joerg

Reviewed By: mehdi_amini

Subscribers: emaste, clayborg

Differential Revision: https://reviews.llvm.org/D29566

llvm-svn: 294143
2017-02-05 21:13:06 +00:00
Craig Topper cac328f25e [X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument.
llvm-svn: 294132
2017-02-05 18:33:31 +00:00
Craig Topper d7ae9ab1fa [X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage.
llvm-svn: 294131
2017-02-05 18:33:24 +00:00
Craig Topper 6a35a81fc5 [X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later.
Similar was already done for several other shuffles in this function.

The test changes are because the old code used explicity zeroing for elements that could have been undef.

While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.

llvm-svn: 294130
2017-02-05 18:33:14 +00:00
Daniel Sanders c3ac566754 [globalisel][arm] Tablegen-erate current Register Bank Information.
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.

Depends on D27338

Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar

Reviewed By: rovka

Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28567

llvm-svn: 294124
2017-02-05 12:07:55 +00:00
Dylan McKay b78f36657e [AVR] Fix a bug where asm operands are printed twice
We would unconditionally call printOperand, even if PrintAsmOperand
already printed the immediate.

llvm-svn: 294121
2017-02-05 10:42:49 +00:00
Dylan McKay 7a3eb290ef [AVR] Support zero-sized arguments in defined methods
It is sufficient to skip emission of these arguments as we have nothing
to actually pass through the function call.

The AVR-GCC reference has nothing to say about zero-sized arguments,
presumably because C/C++ doesn't support them. This means we don't have
to worry about ABI differences.

llvm-svn: 294119
2017-02-05 09:53:45 +00:00
Craig Topper 978fdb75a4 [X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1).
llvm-svn: 294112
2017-02-04 23:26:46 +00:00
Craig Topper 3d95228dbe [X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI
llvm-svn: 294111
2017-02-04 23:26:42 +00:00
Eric Christopher b128abcf7a Remove a bunch of unnecessary casts to a target specific version of TII and TRI as we're working from a target specific STI.
llvm-svn: 294081
2017-02-04 01:52:17 +00:00
Eugene Zelenko 3f37f07c7f [Sparc] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294072
2017-02-04 00:36:49 +00:00
Eugene Zelenko cd8ea02b4a [Mips] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294069
2017-02-03 23:39:33 +00:00
Eugene Zelenko 06869c04f3 [SystemZ] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294068
2017-02-03 23:39:06 +00:00
Eugene Zelenko e894b4dc59 [AMDGPU] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294067
2017-02-03 23:38:40 +00:00
Eugene Zelenko 939f6b0167 [AArch64] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294053
2017-02-03 21:49:13 +00:00
Eugene Zelenko 07dc38f67a [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294052
2017-02-03 21:48:12 +00:00
Eugene Zelenko c3164b9c2f [XCore] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294051
2017-02-03 21:46:55 +00:00
Matt Arsenault f15da6c419 AMDGPU: AsmParser cleanups
Use typedef, remove unnecessary enum, line wraps.

llvm-svn: 294039
2017-02-03 20:49:51 +00:00
Stanislav Mekhanoshin 81db53109d [AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000
This has quite positive performance impact according to measurements.
Before previous fixes to limit the optimization that was too high
and blowed compile time and scratch usage, but now this is gone and
we can bump the threshold.

Differential Revision: https://reviews.llvm.org/D29505

llvm-svn: 294032
2017-02-03 20:08:29 +00:00
Matt Arsenault 1fa5eacf9d AMDGPU: Set MCAsmInfo::PointerSize
llvm-svn: 294031
2017-02-03 20:02:23 +00:00
Matt Arsenault d9cd736585 AMDGPU: Don't unroll for private with dynamic allocas
This won't be elimnated, so this will just bloat code
if/when these are ever used/supported.

llvm-svn: 294030
2017-02-03 19:36:00 +00:00
Simon Pilgrim 034c1bd32c [X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle.
Correctly flagging upper elements as undef.

llvm-svn: 294020
2017-02-03 17:59:58 +00:00
Simon Dardis 68e9d94055 [mips] Remove absolute size assertion for end directive
The .end <symbol> directive for MIPS marks the end of a symbol and sets the
symbol's size. Previously, the corresponding emitDirective handler asserted
that a function's size could be evaluated to an absolute value at that point
in time.

This cannot be done with when directives like .align have been encountered,
instead set the function's size to the corresponding symbolic expression and
let ELFObjectWriter resolve the expression to an absolute value. This avoids
a redundant call to evaluateAsAbsolute.

llvm-svn: 294012
2017-02-03 15:48:53 +00:00
Justin Lebar e90c468444 [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D29477

llvm-svn: 294011
2017-02-03 15:13:50 +00:00
Artem Tamazov 43b61561b0 [AMDGPU][mc] Fix AddressSanitizer leftover issue in gfx7_asm_all test
Issue occurs when assembling "ds_ordered_count v0, v0 gds".

llvm-svn: 294004
2017-02-03 12:47:30 +00:00
Sanne Wouda a994185757 [ARM] Change TCReturn to tBL if tailcall optimization fails.
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Reviewers: rengolin, jmolloy, rovka, olista01

Reviewed By: olista01

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D29020

llvm-svn: 294000
2017-02-03 11:15:53 +00:00
Stanislav Mekhanoshin f29602df65 [AMDGPU] Unroll preferences improvements
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.

Differential Revision: https://reviews.llvm.org/D29473

llvm-svn: 293991
2017-02-03 02:20:05 +00:00
Matt Arsenault e1b595306d AMDGPU: Fold fneg into fmin/fmax_legacy
llvm-svn: 293972
2017-02-03 00:51:50 +00:00
Craig Topper bbb2b95ce5 [X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering.
llvm-svn: 293969
2017-02-03 00:24:49 +00:00
Matt Arsenault 2511c031de AMDGPU: Fold fneg into fminnum/fmaxnum
llvm-svn: 293968
2017-02-03 00:23:15 +00:00
Matt Arsenault a8fcfadf46 AMDGPU: Check if users of fneg can fold mods
In multi-use cases this can save a few instructions.

llvm-svn: 293962
2017-02-02 23:21:23 +00:00
Eugene Zelenko fbd13c5c12 [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293949
2017-02-02 22:55:55 +00:00
Reid Kleckner 3c467e225e [X86] Avoid sorted order check in release builds
Effectively reverts r290248 and fixes the unused function warning with
ifndef NDEBUG.

llvm-svn: 293945
2017-02-02 22:06:30 +00:00
Craig Topper c45657375b [X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.

llvm-svn: 293944
2017-02-02 22:02:57 +00:00
Reid Kleckner c35139ec0d [CodeGen] Remove dead call-or-prologue enum from CCState
This enum has been dead since Olivier Stannard re-implemented ARM byval
handling in r202985 (2014).

llvm-svn: 293943
2017-02-02 21:58:22 +00:00
Rafael Espindola 13a79bbfe5 Change how we handle section symbols on ELF.
On ELF every section can have a corresponding section symbol. When in
an assembly file we have

.quad .text

the '.text' refers to that symbol.

The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.

The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).

Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.

This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.

llvm-svn: 293936
2017-02-02 21:26:06 +00:00
Javed Absar bb8dcc6aec [ARM] Classification Improvements to ARM Sched-Model. NFCI.
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.

Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127

llvm-svn: 293935
2017-02-02 21:08:12 +00:00
Krzysztof Parzyszek d0d42f0ec8 [Hexagon] Adding opExtentBits and opExtentAlign to GPrel instructions
Patch by Colin LeMahieu.

llvm-svn: 293933
2017-02-02 20:35:12 +00:00
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Krzysztof Parzyszek e17b0bfb24 [Hexagon] Fix relocation kind for extended predicated calls
Patch by Sid Manning.

llvm-svn: 293931
2017-02-02 20:21:56 +00:00
Krzysztof Parzyszek 357b048666 [Hexagon] Remove A4_ext_* pseudo instructions
Patch by Colin LeMahieu.

llvm-svn: 293929
2017-02-02 19:58:22 +00:00
Krzysztof Parzyszek d67ab623f6 [Hexagon] Fix insertBranch for loops with multiple ENDLOOP instructions
llvm-svn: 293925
2017-02-02 19:36:37 +00:00
Dan Gohman b89f2d3d92 [WebAssembly] Add instruction definitions for drop and get/set_global.
llvm-svn: 293922
2017-02-02 19:29:44 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Simon Dardis 08ce5fb66b [mips] Expansion of BEQL and BNEL with immediate operands
Adds support for BEQL and BNEL macros with immediate operands.

Patch by: Srdjan Obucina

Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno

Differential Revision: https://reviews.llvm.org/D17040

llvm-svn: 293905
2017-02-02 16:13:49 +00:00
Jonas Paulsson b7a2ef8375 [SystemZ] Add comment for ISD::FP_TO_UINT expansion.
(Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp)

Review: Ulrich Weigand
llvm-svn: 293900
2017-02-02 15:42:14 +00:00
Krzysztof Parzyszek bc4dc9b4b9 [Hexagon] Emitting individual instructions without copying them
Patch by Colin LeMahieu.

llvm-svn: 293899
2017-02-02 15:32:26 +00:00
Krzysztof Parzyszek f65b8f14f4 [Hexagon] Rename TypeCOMPOUND to TypeCJ
llvm-svn: 293894
2017-02-02 15:03:30 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Nirav Dave e14300e270 [X86,ISEL] Fix X86 increment chain dependence calculation
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.

llvm-svn: 293892
2017-02-02 14:39:26 +00:00
Diana Picus 32cd9b434c [ARM] GlobalISel: Lower pointer args and returns
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.

llvm-svn: 293889
2017-02-02 14:01:00 +00:00
Diana Picus 0c11c7b5c7 [ARM] GlobalISel: Error out instead of asserting
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).

llvm-svn: 293888
2017-02-02 14:00:54 +00:00
Diana Picus fc19a8ff07 [ARM] GlobalISel: Legalize loading pointers
Make it legal to load pointer values. Also check that pointers are assigned
to the GPR reg bank by default.

llvm-svn: 293886
2017-02-02 13:20:49 +00:00
Simon Pilgrim 20ab6b875a [X86][SSE] Use MOVMSK for all_of/any_of reduction patterns
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).

So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.

Differential Revision: https://reviews.llvm.org/D28810

llvm-svn: 293880
2017-02-02 11:52:33 +00:00
Craig Topper 047a8be18a [X86] Remove some unused DAGCombinerInfo parameters. NFC
llvm-svn: 293873
2017-02-02 08:03:23 +00:00
Craig Topper 94ed54b49a [X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine.
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.

This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.

llvm-svn: 293872
2017-02-02 08:03:20 +00:00
Craig Topper b81e6c48f8 [AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.
llvm-svn: 293862
2017-02-02 04:17:18 +00:00
Matt Arsenault 9dba9bd4cf AMDGPU: Use source modifiers with f16->f32 conversions
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.

For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.

llvm-svn: 293857
2017-02-02 02:27:04 +00:00
Matthias Braun 5b49f95592 AArch64RegisterInfo: Simplify getReservedReg(); NFC
After marking a 32bit register and all its super registers the 64bit
register does not need to be marked again.

llvm-svn: 293855
2017-02-02 02:23:25 +00:00
Matt Arsenault 8e190b2f23 NVPTX: Fix not preserving volatile when expanding memset
llvm-svn: 293851
2017-02-02 01:20:34 +00:00
Peter Collingbourne dc5e583687 X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).
Differential Revision: https://reviews.llvm.org/D28689

llvm-svn: 293844
2017-02-02 00:32:03 +00:00
Stanislav Mekhanoshin 2b913b1f49 [AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Differential Revision: https://reviews.llvm.org/D29423

llvm-svn: 293837
2017-02-01 22:59:50 +00:00
Eugene Zelenko c5eb8e29d0 [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293836
2017-02-01 22:56:06 +00:00
Dehao Chen 0944a8c2ec Change debug-info-for-profiling from a TargetOption to a function attribute.
Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

llvm-svn: 293833
2017-02-01 22:45:09 +00:00
Matt Arsenault 74f64833bc AMDGPU: Allow clustering flat memory operations
llvm-svn: 293809
2017-02-01 20:22:51 +00:00
Simon Dardis ac9c30c37f [mips] Parse the 'bopt' and 'nobopt' directives in IAS.
The GAS assembler supports the ".set bopt" directive but according
to the sources it doesn't do anything. It's supposed to optimize
branches by filling the delay slot of a branch with it's target.

This patch teaches the MIPS asm parser to accept both and warn in
the case of 'bopt' that the bopt directive is unsupported.

This resolves PR/31841.

Thanks to Sean Bruno for reporting the issue!

llvm-svn: 293798
2017-02-01 18:50:24 +00:00
Matthew Simpson ba5cf9dfee [LV] Move interleaved access helper functions to VectorUtils (NFC)
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.

Differential Revision: https://reviews.llvm.org/D29398

llvm-svn: 293788
2017-02-01 17:45:46 +00:00
Simon Pilgrim ca931efc21 [X86][SSE] Remove unused argument. NFCI.
llvm-svn: 293777
2017-02-01 16:34:50 +00:00
Matt Arsenault d59e640455 AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops
These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.

nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.

Tested valid combinations with alive.

llvm-svn: 293776
2017-02-01 16:25:23 +00:00
Simon Dardis 6433d5af6b [mips] Fix an initialization issue with MipsABIInfo in MipsTargetELFStreamer
DebugInfoDWARFTests is the only user so far which initializes the
MCObjectStreamer without initializing the ASMParser. The MIPS backend
relies on the ASMParser to initialize the MipsABIInfo object and to
update the target streamer with it. This should turn the mips buildbots
green.

Reviewers: atanasyan, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D28025

llvm-svn: 293772
2017-02-01 15:39:23 +00:00
Kit Barton d26978796e [PowerPC] Fix sjlj pseduo instructions to use G8RC_NOX0 register class
The the following instructions:
  - LD/LWZ (expanded from sjLj pseudo-instructions)
  - LXVL/LXVLL vector loads
  - STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.

Differential Revision: https://reviews.llvm.org/D29289

Committed for Lei Huang

llvm-svn: 293769
2017-02-01 14:33:57 +00:00
Simon Pilgrim 55a9c79bd1 [X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI.
These are identical apart from the extra SSE41 guard for PINSRB.

llvm-svn: 293766
2017-02-01 13:32:19 +00:00
Javed Absar e5ad87e939 [ARM] Enable Cortex-M23 and Cortex-M33 support.
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.

Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.

Differential Revision: https://reviews.llvm.org/D29073

llvm-svn: 293761
2017-02-01 11:55:03 +00:00
NAKAMURA Takumi 468487d71a *MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation]
llvm-svn: 293744
2017-02-01 07:30:46 +00:00
Craig Topper 0bcba19cdf [X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types.
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.

This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.

llvm-svn: 293743
2017-02-01 07:17:16 +00:00
Evandro Menezes 455382ea22 [AArch64] Add new target feature to fuse literal generation
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, sections 4.14 and 4.15.

Differential revision: https://reviews.llvm.org/D28698

llvm-svn: 293739
2017-02-01 02:54:42 +00:00
Evandro Menezes b21fb29c26 [AArch64] Add new subtarget feature to fuse AES crypto operations
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, section 4.13, and on Exynos
M1.

Differential revision: https://reviews.llvm.org/D28491

llvm-svn: 293738
2017-02-01 02:54:39 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Eugene Zelenko 926883e1c2 [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293729
2017-02-01 01:22:51 +00:00
Matt Arsenault da7a656542 AMDGPU: Cleanup fmin/fmax legacy function
Use a more specific subtarget check and combine hasOneUse checks

llvm-svn: 293726
2017-02-01 00:42:40 +00:00
Matt Arsenault 1575cb893c AMDGPU: Fix warning
llvm-svn: 293717
2017-01-31 23:48:37 +00:00
Justin Lebar 06fcea4cd9 [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.

Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.

llvm-svn: 293713
2017-01-31 23:08:57 +00:00
Michael Kuperstein e18aad39ab Shut up GCC warning about operator precedence. NFC.
Technically, this is actually changes the expression and the original
assert was "wrong", but since the conjunction is with true, it doesn't
matter in this case.

llvm-svn: 293709
2017-01-31 22:48:45 +00:00
Peter Collingbourne d763c4cc85 MC: Introduce the ABS8 symbol modifier.
@ABS8 can be applied to symbols which appear as immediate operands to
instructions that have a 8-bit immediate form for that operand. It causes
the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8
or R_X86_64_8) for the symbol.

Differential Revision: https://reviews.llvm.org/D28688

llvm-svn: 293667
2017-01-31 18:28:44 +00:00
Matt Arsenault d5d78510c7 AMDGPU: Use source mods with fcanonicalize
llvm-svn: 293654
2017-01-31 17:28:40 +00:00
Nirav Dave a7c041d147 [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648
2017-01-31 17:00:27 +00:00
Tom Stellard 124f5cc8c2 AMDGPU/SI: Fix inst-select-load-smrd.mir on some builds
Summary:
For some reason instructions are being inserted in the wrong order with some
builds.  I'm not sure why this is happening.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D29325

llvm-svn: 293639
2017-01-31 15:24:11 +00:00
Simon Pilgrim 1b39d5db7b [X86][SSE] Add support for combining PINSRB into a target shuffle.
llvm-svn: 293637
2017-01-31 14:59:44 +00:00
Sam Parker 9bf658d5fe [ARM] Avoid using ARM instructions in Thumb mode
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.

This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).

Differential Revision: https://reviews.llvm.org/D29283

llvm-svn: 293634
2017-01-31 14:35:01 +00:00
Benjamin Kramer 94a833962c [X86] Silence unused variable warning in Release builds.
llvm-svn: 293631
2017-01-31 14:13:53 +00:00
Simon Pilgrim 4eab18f6b8 [X86][SSE] Detect unary PBLEND shuffles.
These can appear during shuffle combining.

llvm-svn: 293628
2017-01-31 13:58:01 +00:00
Simon Pilgrim c29eab52e8 [X86][SSE] Add support for combining PINSRW into a target shuffle.
Also add the ability to recognise PINSR(Vex, 0, Idx).

Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.

The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.

llvm-svn: 293627
2017-01-31 13:51:10 +00:00
Nemanja Ivanovic 2f2a6ab991 [PowerPC][Altivec] Add vmr extended mnemonic
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29133

llvm-svn: 293626
2017-01-31 13:43:11 +00:00
Simon Dardis 12850eeac5 [mips] Addition of the immediate cases for the instructions [d]div, [d]divu
Related to http://reviews.llvm.org/D15772

Depends on http://reviews.llvm.org/D16888

Adds support for immediate operand for [D]DIV[U] instructions.

Patch By: Srdjan Obucina

Reviewers: zoran.jovanovic, vkalintiris, dsanders, obucina

Differential Revision: https://reviews.llvm.org/D16889

llvm-svn: 293614
2017-01-31 10:49:24 +00:00
Craig Topper 2cfa2071bd [AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
2017-01-31 06:49:55 +00:00
Craig Topper 797e32dd98 [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version.
llvm-svn: 293607
2017-01-31 06:49:53 +00:00
Craig Topper 779e4c5bb4 [AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions.
llvm-svn: 293606
2017-01-31 06:49:50 +00:00
Justin Lebar 1c9692a46f [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Summary:

This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.

* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
  when fast-math is enabled.  Previously, we only would emit it for
  calls to @llvm.nvvm.sqrt.f.  (With this patch we no longer emit
  sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
  simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)

* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
  Previously, we only emitted rsqrt.approx when ftz was disabled.

Reviewers: hfinkel

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D28508

llvm-svn: 293605
2017-01-31 05:58:22 +00:00
Craig Topper 06e038c6de [X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain.
llvm-svn: 293603
2017-01-31 05:18:29 +00:00
Craig Topper e9e84c8284 [AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP.
llvm-svn: 293601
2017-01-31 05:18:24 +00:00
Matt Arsenault f84e5d9a27 AMDGPU: Generalize matching of v_med3_f32
I think this is safe as long as no inputs are known to ever
be nans.

Also add an intrinsic for fmed3 to be able to handle all safe
math cases.

llvm-svn: 293598
2017-01-31 03:07:46 +00:00
Craig Topper d064cc93b2 [X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite.
llvm-svn: 293592
2017-01-31 02:09:53 +00:00
Craig Topper 85935f69fb [X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves.
llvm-svn: 293591
2017-01-31 02:09:51 +00:00
Craig Topper ced68315ce [X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway.
llvm-svn: 293590
2017-01-31 02:09:49 +00:00
Craig Topper b76494e017 [X86] Remove 'else' after 'return'. NFC
llvm-svn: 293589
2017-01-31 02:09:46 +00:00
Craig Topper f9d901f0ea [X86] Use integer broadcast instructions for integer broadcast patterns.
I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it.

llvm-svn: 293588
2017-01-31 02:09:43 +00:00
Matt Arsenault b6491cc854 AMDGPU: Implement hook for InferAddressSpaces
For now just port some of the existing NVPTX tests
and from an old HSAIL optimization pass which
approximately did the same thing.

Don't enable the pass yet until more testing is done.

llvm-svn: 293580
2017-01-31 01:20:54 +00:00
Matt Arsenault 850657a439 NVPTX: Move InferAddressSpaces to generic code
llvm-svn: 293579
2017-01-31 01:10:58 +00:00
Eugene Zelenko 342257ea92 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293578
2017-01-31 00:56:17 +00:00
Matt Arsenault 9f432ec24c NVPTX: Trivial cleanups of NVPTXInferAddressSpaces
- Move DEBUG_TYPE below includes
- Change unknown address space constant to be consistent with other
  passes
- Grammar fixes in debug output

llvm-svn: 293567
2017-01-30 23:27:11 +00:00
Eugene Zelenko dde94e4c4f [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293565
2017-01-30 23:21:32 +00:00
Matt Arsenault 42b6478344 NVPTX: Refactor NVPTXInferAddressSpaces to check TTI
Add a new TTI hook for getting the generic address space value.

llvm-svn: 293563
2017-01-30 23:02:12 +00:00
Simon Pilgrim 3905e03a47 [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.
Thanks to @mkuper

llvm-svn: 293561
2017-01-30 22:58:44 +00:00
Simon Pilgrim a80a47afef [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.

This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.

llvm-svn: 293560
2017-01-30 22:48:49 +00:00
Eli Friedman 2345733246 Fix line endings.
llvm-svn: 293554
2017-01-30 22:04:23 +00:00
Tom Stellard 887a2562b7 AMDGPU: Fix release build broken by r293551
llvm-svn: 293553
2017-01-30 22:02:58 +00:00
Tom Stellard ca16621b2a Re-commit AMDGPU/GlobalISel: Add support for simple shaders
Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551
2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin a3b72798af [AMDGPU] Internalize non-kernel symbols
Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549
2017-01-30 21:05:18 +00:00
Matt Arsenault af635240d5 AMDGPU: Undo sub x, c -> add x, -c canonicalization
This is worse if the original constant is an inline immediate.

This should also be done for 64-bit adds, but requires fixing
operand folding bugs first.

llvm-svn: 293540
2017-01-30 19:30:24 +00:00
Krzysztof Parzyszek 3695d06a10 [RDF] Add support for regmasks
llvm-svn: 293538
2017-01-30 19:16:30 +00:00
Matt Arsenault 0c3293844b AMDGPU: Run AMDGPUCodeGenPrepare after inlining
With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525
2017-01-30 18:40:29 +00:00
Matt Arsenault ee3f0acf20 AMDGPU: Make i32 uaddo/usubo legal
llvm-svn: 293514
2017-01-30 18:11:38 +00:00
Krzysztof Parzyszek 49ffff12e5 [RDF] Extract the physical register information into a separate class
llvm-svn: 293510
2017-01-30 17:46:56 +00:00
Tom Stellard 7a19d56f73 Revert "AMDGPU/GlobalISel: Add support for simple shaders"
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
2017-01-30 17:42:41 +00:00
Matt Arsenault 41c1499504 AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent
llvm-svn: 293504
2017-01-30 17:09:47 +00:00
Tom Stellard e48f60aec8 AMDGPU/GlobalISel: Add support for simple shaders
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293503
2017-01-30 17:09:15 +00:00
Simon Pilgrim 098998aef0 [X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles
llvm-svn: 293500
2017-01-30 16:58:34 +00:00
Krzysztof Parzyszek b561cf953a [RDF] Add phis for entry block live-ins (in addition to function live-ins)
llvm-svn: 293491
2017-01-30 16:20:30 +00:00
Rafael Espindola e0eba3c493 Only print architecture dependent flags for that architecture.
Different architectures can have different meaning for flags in the
SHF_MASKPROC mask, so we should always check what the architecture use
before checking the flag.

NFC for now, but will allow fixing the value of an xmos flag.

llvm-svn: 293484
2017-01-30 15:38:43 +00:00
Benjamin Kramer 73564981fe [Hexagon] Make header self-contained.
llvm-svn: 293482
2017-01-30 14:55:33 +00:00
Asaf Badouh e11d2d73bf [X86][MCU] Minor bug fix for r293469 + test case
llvm-svn: 293478
2017-01-30 13:14:37 +00:00
Marek Olsak e81adb52b1 AMDGPU: Remove a useless VI SMRD pattern
Summary: already covered by complex patterns

Reviewers: arsenm, nhaehnle, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28995

llvm-svn: 293477
2017-01-30 12:25:14 +00:00
Marek Olsak 8e93529020 AMDGPU: Fix assembler encoding for EXP instructions on VI
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28992

llvm-svn: 293476
2017-01-30 12:25:03 +00:00
Kristof Beyls 65a12c012f [GlobalISel] Add support for indirectbr
Differential Revision: https://reviews.llvm.org/D28079

llvm-svn: 293470
2017-01-30 09:13:18 +00:00
Asaf Badouh 53713df0c2 [X86][MCU] replace select with bit manipulation instead of branches
Differential Revision: https://reviews.llvm.org/D28354


 

llvm-svn: 293469
2017-01-30 08:16:59 +00:00
Craig Topper f6df4a6978 [AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.

llvm-svn: 293464
2017-01-30 06:59:06 +00:00
Craig Topper 0265a39472 [AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1.
llvm-svn: 293458
2017-01-30 05:37:47 +00:00
Craig Topper 3b7e823f92 [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
2017-01-30 00:06:01 +00:00
Chris Ray 30b3fafb94 [X86][Disassembler] Added SALC instruction
Reviewers: joe.abbey, craig.topper

Reviewed By: craig.topper

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D29201

llvm-svn: 293447
2017-01-29 23:02:47 +00:00
Craig Topper db919caf1b [AVX-512] Fix lowering for mask register concatenation with undef in the lower half.
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.

llvm-svn: 293446
2017-01-29 22:53:33 +00:00
Chris Ray ba3741cb2b [X86] Fixing flag usage for RCL and RCR
Summary: The RCL and RCR instructions use the carry flag.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29237

llvm-svn: 293441
2017-01-29 20:05:30 +00:00
Simon Pilgrim 76073f8d22 [X86][SSE] Lower scalar_to_vector(0) to zero vector
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.

The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.

Differential Revision: https://reviews.llvm.org/D29097

llvm-svn: 293438
2017-01-29 18:13:37 +00:00
Saleem Abdulrasool 5282eed06c ARM: support `-mlong-calls` with AEABI TLS on ELF
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.

Resolves PR31769!

llvm-svn: 293433
2017-01-29 16:46:22 +00:00
Elena Demikhovsky 17fe27f1f2 [X86 Codegen] Fixed a bug in unsigned saturation
PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern.
AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned.

See https://llvm.org/bugs/show_bug.cgi?id=31773

Differential Revision: https://reviews.llvm.org/D29196

llvm-svn: 293431
2017-01-29 13:18:30 +00:00
Igor Breger 9ea154d4ad [X86][GlobalISel] Add limited argument lowering support to the IRTranslator.
Summary:
Add limited (i8/i16/i32/i64)  argument lowering support to the IRTranslator.
Inspired by commit 289940.

Reviewers: t.p.northover, qcolombet, ab, zvi, rovka

Reviewed By: rovka

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28987

llvm-svn: 293427
2017-01-29 08:35:42 +00:00
Justin Hibbits 10b6147e23 Add some Book-E instructions to the asm parser and printer.
Summary:
Adds the following instructions:
* mfpmr
* mtpmr
* icblc
* icblq
* icbtls

Fix the scheduling for mtspr on e5500, which uses CFX0, instead of
SFX0/SFX1 as on e500mc.

Addresses PR 31538.

Differential Revision: https://reviews.llvm.org/D29002

llvm-svn: 293417
2017-01-29 04:55:57 +00:00
Craig Topper 6533e40e9d [X86] Fix vector ANDN matching to work correctly when both inputs to the AND are XORs.
llvm-svn: 293403
2017-01-28 23:52:09 +00:00
Will Dietz 10294b932c AMDGPU: Add GlobalISel to required_libraries.
llvm-svn: 293387
2017-01-28 18:13:08 +00:00
Arpith Chacko Jacob 2b156edf56 [NVPTX] Add intrinsics to support named barriers.
Support for barrier synchronization between a subset of threads
in a CTA through one of sixteen explicitly specified barriers.
These intrinsics are not directly exposed in CUDA but are
critical for forthcoming support of OpenMP on NVPTX GPUs.

The intrinsics allow the synchronization of an arbitrary
(multiple of 32) number of threads in a CTA at one of 16
distinct barriers. The two intrinsics added are as follows:

call void @llvm.nvvm.barrier.n(i32 10)
waits for all threads in a CTA to arrive at named barrier #10.

call void @llvm.nvvm.barrier(i32 15, i32 992)
waits for 992 threads in a CTA to arrive at barrier #15.

Detailed description of these intrinsics are available in the PTX manual.
http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions

Reviewers: hfinkel, jlebar
Differential Revision: https://reviews.llvm.org/D17657

llvm-svn: 293384
2017-01-28 16:38:15 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Richard Trieu 3de487b2e8 [WebAssembly] Use print instead of dump method.
This fixes non-debug non-assert builds after r293359.

llvm-svn: 293368
2017-01-28 03:23:49 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Eugene Zelenko e79c077ef9 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293348
2017-01-27 23:58:02 +00:00
Artem Tamazov 33b01e9cfe [AMDGPU][mc] Fix memory corruption uncovered by AddressSanitizer during coverage/smoke Gfx7/8 testing.
Coverage/smoke Gfx7/8 tests were committed r292922 but then reverted
by r292974 due to AddressSanitizer failure, which is fixed by this patch.
Tests to be re-committed soon.

llvm-svn: 293338
2017-01-27 22:19:42 +00:00
Krzysztof Parzyszek 35ce5dac7f [Hexagon] Remove unused variable (and silence a warning)
llvm-svn: 293331
2017-01-27 20:40:14 +00:00
Tom Stellard 08efb7ebf6 AMDGPU/SI: Move some ISel helpers into utils so they can be shared with GISel
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D29068

llvm-svn: 293321
2017-01-27 18:41:14 +00:00
Konstantin Zhuravlyov a304c83608 [AMDGPU] Grab MCSubtargetInfo from TargetMachine instead of constructing it
Differential Revision: https://reviews.llvm.org/D29224

llvm-svn: 293318
2017-01-27 18:32:40 +00:00
Chris Ray 535e7d1547 [X86] Adding FFREEP instruction.
Summary: Small change to get the FREEP instruction to decode properly.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29193

llvm-svn: 293314
2017-01-27 18:02:53 +00:00
Matt Arsenault d8f7ea381f AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 293310
2017-01-27 17:42:26 +00:00
Matt Arsenault 32b9600a7e NVPTX: Make NVPTXInferAddressSpaces preserve CFG
llvm-svn: 293308
2017-01-27 17:30:39 +00:00
Stanislav Mekhanoshin f6c1feb8c3 [AMDGPU] Turn AMDGPUUnifyMetadata back into module pass
With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300
2017-01-27 16:38:10 +00:00
Simon Dardis ca74dd79e9 [mips] Recommit: "N64 static relocation model support"
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

This version corrects a "Conditional jump or move depends on uninitialised
value(s)" error detected by valgrind present in the original commit.

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293279
2017-01-27 11:36:52 +00:00
Saleem Abdulrasool 26c00e3700 ARM: fix vectorized division on WoA
The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks.  However, it is
designed to only handle non-vectorized division.  ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions.  Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.

Resolves PR31778!

llvm-svn: 293259
2017-01-27 03:41:53 +00:00
NAKAMURA Takumi 0d299191d0 NVPTXCodeGen: Add IPO to libdeps, since r293189.
llvm-svn: 293256
2017-01-27 02:11:10 +00:00
Quentin Colombet 89dbea06f1 [ARM][LegalizerInfo] Specify the type of the opcode.
This is to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293248
2017-01-27 01:30:46 +00:00
Quentin Colombet 24203cf997 [AArch64][LegalizerInfo] Specify the type of the opcode.
This is an attempt to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293246
2017-01-27 01:13:30 +00:00
Quentin Colombet e15e460c05 Revert "[AArch64][LegalizerInfo] Specify the type of the initialization list."
This reverts commit r293238.
Even with that the win7 bot is still failing:
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/3862

llvm-svn: 293245
2017-01-27 01:13:25 +00:00
Quentin Colombet 86fc8305ec [AArch64][LegalizerInfo] Specify the type of the initialization list.
This is an attempt to fix the win7 bot that does not seem to be very
good at infering the type.

llvm-svn: 293238
2017-01-27 00:39:03 +00:00
Eugene Zelenko e6cf4374b0 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293229
2017-01-26 23:40:06 +00:00
Krzysztof Parzyszek d6c8e3c9ce [Hexagon] Require IPO library in Hexagon build
This should unbreak the Hexagon build bots.

llvm-svn: 293221
2017-01-26 23:03:22 +00:00
Krzysztof Parzyszek c8b943860f [Hexagon] Add Hexagon-specific loop idiom recognition pass
llvm-svn: 293213
2017-01-26 21:41:10 +00:00
Balaram Makam b73d2962ba [AArch64] Refine Kryo Machine Model
Summary: Refine floating point SQRT and DIV with accurate latency information.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D29191

llvm-svn: 293204
2017-01-26 20:10:41 +00:00
Sean Fertile 3c8c385a77 [PPC] cleanup of mayLoad/mayStore flags and memory operands.
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
   instructions.
2) Updated the flags on a number of intrinsics indicating that they write
    memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
   propagate their memory operand

Review: https://reviews.llvm.org/D28818
llvm-svn: 293200
2017-01-26 18:59:15 +00:00
Stanislav Mekhanoshin 81598117b6 Replace addEarlyAsPossiblePasses callback with adjustPassManager
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189
2017-01-26 16:49:08 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Serge Rogatch e09ba748cf [XRay][Arm32] Reduce the portion of the stub and implement more staging for tail calls - in LLVM
Summary:
This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2.
Coupled patch:
- https://reviews.llvm.org/D28674

Reviewers: dberris, rengolin

Reviewed By: dberris

Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris

Differential Revision: https://reviews.llvm.org/D28673

llvm-svn: 293185
2017-01-26 16:17:03 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Rafael Espindola 82149a1aa9 Use shouldAssumeDSOLocal in classifyGlobalReference.
And teach shouldAssumeDSOLocal that ppc has no copy relocations.

The resulting code handle a few more case than before. For example, it
knows that a weak symbol can be resolved to another .o file, but it
will still be in the main executable.

llvm-svn: 293180
2017-01-26 15:02:31 +00:00
Simon Pilgrim 027bb453d9 [X86][SSE] Add support for combining ANDNP byte masks with target shuffles
llvm-svn: 293178
2017-01-26 14:31:12 +00:00
Simon Pilgrim 3057fd53f9 [X86][SSE] Pull out target shuffle resolve code into helper. NFCI.
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.

llvm-svn: 293175
2017-01-26 13:06:02 +00:00
Valery Pykhtin 75d1de903f [AMDGPU] Fix typo in GCNSchedStrategy
Differential revision: https://reviews.llvm.org/D28980

llvm-svn: 293171
2017-01-26 10:51:47 +00:00
Simon Dardis 5b67a4f75f Revert "[mips] N64 static relocation model support"
This reverts commit r293164. There are multiple tests failing.

llvm-svn: 293170
2017-01-26 10:46:07 +00:00
Simon Dardis 09e65efd09 [mips] N64 static relocation model support
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293164
2017-01-26 10:19:02 +00:00
Diana Picus 278c722e6d [ARM] GlobalISel: Load i1, i8 and i16 args from stack
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.

When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.

Differential Revision: https://reviews.llvm.org/D27803

llvm-svn: 293163
2017-01-26 09:20:47 +00:00
Craig Topper bad53cce26 [AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it.
llvm-svn: 293157
2017-01-26 07:17:58 +00:00
Craig Topper f0bab7b739 [X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand.
llvm-svn: 293156
2017-01-26 07:17:53 +00:00
Jonas Paulsson 8e2f948ef0 [TargetTransformInfo] Refactor and improve getScalarizationOverhead()
Refactoring to remove duplications of this method.

New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.

This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().

Review: Hal Finkel
https://reviews.llvm.org/D29017

llvm-svn: 293155
2017-01-26 07:03:25 +00:00
Matt Arsenault 53f0cc238c AMDGPU: Fold fneg into round instructions
llvm-svn: 293127
2017-01-26 01:25:36 +00:00
Daniel Jasper 65144c852d Revert "[PPC] Give unaligned memory access lower cost on processor that supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.

llvm-svn: 293092
2017-01-25 21:21:08 +00:00
Konstantin Zhuravlyov 400771edd6 [AMDGPU] Bump up n_type for metadata v2
Differential Revision: https://reviews.llvm.org/D29115

llvm-svn: 293083
2017-01-25 20:47:17 +00:00
Matt Arsenault 5d9101941f AMDGPU: Set call_convention bit in kernel_code_t
According to the documentation this is supposed to be -1
if indirect calls are not supported.

llvm-svn: 293081
2017-01-25 20:21:57 +00:00
Serge Rogatch bc2d34394d [XRay][AArch64] More staging for tail call support in XRay on AArch64 - in LLVM
Summary:
This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch.
Related: https://reviews.llvm.org/D28948 (compiler-rt)

Reviewers: dberris, rengolin

Reviewed By: dberris

Subscribers: llvm-commits, iid_iunknown, aemerson

Differential Revision: https://reviews.llvm.org/D28947

llvm-svn: 293080
2017-01-25 20:21:49 +00:00
Krzysztof Parzyszek ee9aa3ffee Add iterator_range<regclass_iterator> to {Target,MC}RegisterInfo, NFC
llvm-svn: 293077
2017-01-25 19:29:04 +00:00
Matthias Braun aeb8e33968 PowerPC: Slight cleanup of getReservedRegs(); NFC
Change getReservedRegs() to not mark a register as reserved and then
revert that decision in some cases. Motivated by the discussion in
https://reviews.llvm.org/D29056

llvm-svn: 293073
2017-01-25 17:12:10 +00:00
Chad Rosier 072e70b365 [AArch64] Minor code refactoring. NFC.
llvm-svn: 293063
2017-01-25 15:56:59 +00:00
Martin Bohme 8396e14e7f [ARM] GlobalISel: Fix stack-use-after-scope bug.
Summary:
Lifetime extension wasn't triggered on the result of BuildMI because the
reference was non-const. However, instead of adding a const, I've
removed the reference entirely as RVO should kick in anyway.

Reviewers: rovka, bkramer

Reviewed By: bkramer

Subscribers: aemerson, rengolin, dberris, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D29124

llvm-svn: 293059
2017-01-25 14:28:19 +00:00
Mohammed Agabaria 20caee95e1 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Diana Picus d83df5d372 [ARM] GlobalISel: Support i1 add and ABI extensions
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext

Differential Revision: https://reviews.llvm.org/D27706

llvm-svn: 293035
2017-01-25 08:47:40 +00:00
Diana Picus 8b6c6bedcb [ARM] GlobalISel: Support i8/i16 ABI extensions
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.

Note that this does not include support for other extensions (i8 to i16), those
will be added later.

Differential Revision: https://reviews.llvm.org/D27705

llvm-svn: 293034
2017-01-25 08:10:40 +00:00
Coby Tayree 77807d93af [X86]Enable the use of 'mov' with a 64bit GPR and a large immediate
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer

Differential Revision: https://reviews.llvm.org/D28988

llvm-svn: 293030
2017-01-25 07:09:42 +00:00
Diana Picus 1d8eaf4387 [ARM] GlobalISel: Bail out on Thumb. NFC
Thumb is not supported yet, so bail out early.

llvm-svn: 293029
2017-01-25 07:08:53 +00:00
Matt Arsenault 74a576e7d3 AMDGPU: Check nsz instead of unsafe math
llvm-svn: 293028
2017-01-25 06:27:02 +00:00
Matt Arsenault 732a531506 DAG: Recognize no-signed-zeros-fp-math attribute
clang already emits this with -cl-no-signed-zeros, but codegen
doesn't do anything with it. Treat it like the other fast math
attributes, and change one place to use it.

llvm-svn: 293024
2017-01-25 06:08:42 +00:00
Matt Arsenault 9f5e0ef0c5 AMDGPU: Implement early ifcvt target hooks.
Leave early ifcvt disabled for now since there are some
shader-db regressions.

This causes some immediate improvements, but could be better.
The cost checking that the pass does is based on critical path
length for out of order CPUs which we do not want so it skips out
on many cases we want.

llvm-svn: 293016
2017-01-25 04:25:02 +00:00
Ahmed Bougacha 05a5f7dc0b [GlobalISel] Generate selector for more integer binop patterns.
This surprisingly isn't NFC because there are patterns to select GPR
sub to SUBSWrr (rather than SUBWrr/rs); SUBS is later optimized to
SUB if NZCV is dead.  From ISel's perspective, both are fine.

llvm-svn: 293010
2017-01-25 02:41:38 +00:00
Tom Stellard 2f3f9855f0 AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

llvm-svn: 293000
2017-01-25 01:25:13 +00:00
Eugene Zelenko 11f6907f40 [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292996
2017-01-25 00:29:26 +00:00
Eugene Zelenko 8c6ed0f3a0 [XCore] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292988
2017-01-24 23:02:48 +00:00
Matt Arsenault bf67cf7e4b AMDGPU: Remove spurious out branches after a kill
The sequence like this:
  v_cmpx_le_f32_e32 vcc, 0, v0
  s_branch BB0_30
  s_cbranch_execnz BB0_30
  ; BB#29:
  exp null off, off, off, off done vm
  s_endpgm
  BB0_30:
  ; %endif110

is likely wrong. The s_branch instruction will unconditionally jump
to BB0_30 and the skip block (exp done + endpgm) inserted for
performing the kill instruction will never be executed. This results
in a GPU hang with Star Ruler 2.

The s_branch instruction is added during the "Control Flow Optimizer"
pass which seems to re-organize the basic blocks, and we assume
that SI_KILL_TERMINATOR is always the last instruction inside a
basic block. Thus, after inserting a skip block we just go to the
next BB without looking at the subsequent instructions after the
kill, and the s_branch op is never removed.

Instead, we should remove the unconditional out branches and let
skip the two instructions if the exec mask is non-zero.

This patch fixes the GPU hang and doesn't introduce any regressions
with "make check".

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 292985
2017-01-24 22:18:39 +00:00
Eugene Zelenko 3943d2b0d7 [SystemZ] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292983
2017-01-24 22:10:43 +00:00
Matt Arsenault 7aad8fd8f4 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Changpeng Fang c85abbd955 AMDGPU/SI: Give up in promote alloca when a pointer may be captured.
Differential Revision:
  http://reviews.llvm.org/D28970

Reviewer:
  Matt

llvm-svn: 292966
2017-01-24 19:06:28 +00:00
Chad Rosier 8e11fbd15d [AArch64] Fix typo. NFC.
llvm-svn: 292959
2017-01-24 18:08:10 +00:00
Stanislav Mekhanoshin 22a56f2f5a [AMDGPU] Add VGPR copies post regalloc fix pass
Regalloc creates COPY instructions which do not formally use VALU.
That results in v_mov instructions displaced after exec mask modification.
One pass which do it is SIOptimizeExecMasking, but potentially it can be
done by other passes too.

This patch adds a pass immediately after regalloc to add implicit exec
use operand to all VGPR copy instructions.

Differential Revision: https://reviews.llvm.org/D28874

llvm-svn: 292956
2017-01-24 17:46:17 +00:00
Evandro Menezes 7784cacd91 [AArch64] Rename 'no-quad-ldst-pairs' to 'slow-paired-128'
In order to follow the pattern of the existing 'slow-misaligned-128store'
option, rename the option 'no-quad-ldst-pairs' to 'slow-paired-128'.

llvm-svn: 292954
2017-01-24 17:34:31 +00:00
Chris Bieneman bef847c3ae [Lanai] Rename LanaiInstPrinter library to LanaiAsmPrinter
Summary:
    This is in keeping with LLVM convention. The classes are InstPrinters, but the library is ${target}AsmPrinter.

This patch is in response to bryant pointing out to me that Lanai was the only backend deviating from convention here. Thanks!

Reviewers: jpienaar, bryant

Subscribers: mgorny, jgosnell, llvm-commits

Differential Revision: https://reviews.llvm.org/D29043

llvm-svn: 292953
2017-01-24 17:27:01 +00:00
Simon Pilgrim 893d2119ee [X86][AVX512] Remove unused argument from PMOVX tablegen patterns. NFCI.
Seems to be a copy+paste legacy from the AVX2 patterns.

llvm-svn: 292941
2017-01-24 16:16:29 +00:00
Martin Bohme 526299c81c [X86][SSE] Add explicit braces to avoid -Wdangling-else warning.
Reviewers: RKSimon

Subscribers: llvm-commits, igorb

Differential Revision: https://reviews.llvm.org/D29076

llvm-svn: 292924
2017-01-24 12:31:30 +00:00
Simon Pilgrim 0c45338961 Fix unused variable warning
llvm-svn: 292921
2017-01-24 11:54:27 +00:00
Simon Pilgrim e1ec9072f6 [X86][SSE] Add support for constant folding vector arithmetic shift by immediates
llvm-svn: 292919
2017-01-24 11:46:13 +00:00
Simon Pilgrim 6340e54861 [X86][SSE] Add support for constant folding vector logical shift by immediates
llvm-svn: 292915
2017-01-24 11:21:57 +00:00
Craig Topper fc8798fa1b [X86] Remove unnecessary peakThroughBitcasts call that's already take care of by the ISD::isBuildVectorAllOnes check below.
llvm-svn: 292894
2017-01-24 06:57:29 +00:00
Wei Ding ee21a36f8a AMDGPU : Add trap handler support.
llvm-svn: 292893
2017-01-24 06:41:21 +00:00
Craig Topper b0cbd5b5b0 [AVX-512] Simplify multiclasses for integer logic operations. There were several inputs that didn't vary.
While there give them the same scheduling itinerary as the SSE/AVX versions.

llvm-svn: 292892
2017-01-24 06:25:34 +00:00
Jonas Paulsson 463e2a6f3d [SystemZ] Gracefully fail in GeneralShuffle::add() instead of assertion.
The GeneralShuffle::add() method used to have an assert that made sure that
source elements were at least as big as the destination elements. This was
wrong, since it is actually expected that an EXTRACT_VECTOR_ELT node with a
smaller source element type than the return type gets extended.

Therefore, instead of asserting this, it is just checked and if this is the
case 'false' is returned from the GeneralShuffle::add() method. This case
should be very rare and is not handled further by the backend.

Review: Ulrich Weigand.
llvm-svn: 292888
2017-01-24 05:43:03 +00:00
Craig Topper 993edc9db1 [X86] Don't split v8i32 all ones values if only AVX1 is available. Keep it intact and split it at isel.
This allows us to remove the check in ANDN combining that had to look through the extraction.

llvm-svn: 292881
2017-01-24 04:33:03 +00:00
Craig Topper eb440a14a5 [X86] Remove Undef handling from extractSubVector. This is now handled inside getNode.
llvm-svn: 292877
2017-01-24 02:43:54 +00:00
Matthias Braun 1d77599ba3 PowerPC: Mark super regs of reserved regs reserved.
When a register like R1 is reserved, X1 should be reserved as well. This
was already done "manually" when 64bit code was enabled, however using
the markSuperRegs() function on the base register is more convenient and
allows to use the checksAllSuperRegsMarked() function even in 32bit mode
to avoid accidental breakage in the future.

This is also necessary to allow https://reviews.llvm.org/D28881

Differential Revision: https://reviews.llvm.org/D29056

llvm-svn: 292870
2017-01-24 01:12:30 +00:00
Derek Schuff 4b320b72a2 [WebAssembly] Update LibFunc::Func -> LibFunc
Fixes compile failures after r292848

llvm-svn: 292857
2017-01-24 00:01:18 +00:00
Eugene Zelenko a63528cf9c [AMDGPU] Fix obsolete comments, spotted by Malcolm Parsons. (NFC)
llvm-svn: 292853
2017-01-23 23:41:16 +00:00
David L. Jones d21529fa0d [Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).

Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.

The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)

There are additional changes required in clang.

Reviewers: rsmith

Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D28476

llvm-svn: 292848
2017-01-23 23:16:46 +00:00
Matt Arsenault 3aef809384 AMDGPU: Custom lower more vector operations
This avoids stack usage.

llvm-svn: 292846
2017-01-23 23:09:58 +00:00
Krzysztof Parzyszek 09a8638724 [RDF] Add registers to live set even if they are live already
When calculating kills, a register may be considered live because a part
of it is live, but if there is a use of that (whole) register, the whole
register (and its subregisters) need to be added to the live set.

llvm-svn: 292845
2017-01-23 23:03:49 +00:00
Matt Arsenault a6867fd441 AMDGPU: Combine fp16/fp64 subtarget features
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.

llvm-svn: 292837
2017-01-23 22:31:03 +00:00
Krzysztof Parzyszek f86d385813 [Hexagon] Explicitly reserve aliases of reserved registers
llvm-svn: 292836
2017-01-23 22:13:05 +00:00
Ahmed Bougacha b6137063eb [AArch64][GlobalISel] Legalize narrow scalar fp->int conversions.
Since we're now avoiding operations using narrow scalar integer types,
we have to legalize the integer side of the FP conversions.

This requires teaching the legalizer how to do that.

llvm-svn: 292828
2017-01-23 21:10:14 +00:00
Ahmed Bougacha cfb384d39d [AArch64][GlobalISel] Legalize narrow scalar ops again.
Since r279760, we've been marking as legal operations on narrow integer
types that have wider legal equivalents (for instance, G_ADD s8).
Compared to legalizing these operations, this reduced the amount of
extends/truncates required, but was always a weird legalization decision
made at selection time.

So far, we haven't been able to formalize it in a way that permits the
selector generated from SelectionDAG patterns to be sufficient.

Using a wide instruction (say, s64), when a narrower instruction exists
(s32) would introduce register class incompatibilities (when one narrow
generic instruction is selected to the wider variant, but another is
selected to the narrower variant).

It's also impractical to limit which narrow operations are matched for
which instruction, as restricting "narrow selection" to ranges of types
clashes with potentially incompatible instruction predicates.

Concerns were also raised regarding  MIPS64's sign-extended register
assumptions, as well as wrapping behavior.
See discussions in https://reviews.llvm.org/D26878.

Instead, legalize the operations.

Should we ever revert to selecting these narrow operations, we should
try to represent this more accurately: for instance, by separating
a "concrete" type on operations, and an "underlying" type on vregs, we
could move the "this narrow-looking op is really legal" decision to the
legalizer, and let the selector use the "underlying" vreg type only,
which would be guaranteed to map to a register class.

In any case, we eventually should mitigate:
- the performance impact by selecting no-op extract/truncates to COPYs
  (which we currently do), and the COPYs to register reuses (which we
  don't do yet).
- the compile-time impact by optimizing away extract/truncate sequences
  in the legalizer.

llvm-svn: 292827
2017-01-23 21:10:05 +00:00
Javed Absar 00cce41752 [ARM] Classification Improvements to ARM Sched-Models. NFCI.
This is a series of patches to enable adding of machine sched
models for ARM processors easier and compact. They define new
sched-readwrites for groups of ARM instructions. This has been
missing so far, and as a consequence, machine scheduler models
for individual sub-targets have tended to be larger than they
needed to be. 

The current patch focuses on floating-point instructions.

Reviewers: Diana Picus (rovka), Renato Golin (rengolin)

Differential Revision: https://reviews.llvm.org/D28194

llvm-svn: 292825
2017-01-23 20:20:39 +00:00
Matt Arsenault 7b49ad74ed AMDGPU: Propagate fast math flags in fneg combines
Can't for fma/mad since it seems they can't have flags currently.

llvm-svn: 292818
2017-01-23 19:08:34 +00:00
Matt Arsenault 78916e17ea AMDGPU: Remove unnecessary check
There are no scalar FP types that can be extended.

llvm-svn: 292816
2017-01-23 19:00:15 +00:00
Jonas Paulsson d034e7ddc8 [SystemZ] Mark vector immediate load instructions with useful flags.
Vector immediate load instructions should have the isAsCheapAsAMove, isMoveImm
and isReMaterializable flags set. With them, these instruction will get
hoisted out of loops.

Review: Ulrich Weigand
llvm-svn: 292790
2017-01-23 14:09:58 +00:00
Simon Pilgrim 0218ce1080 [X86][SSE] Add missing X86ISD::ANDNP combines.
llvm-svn: 292767
2017-01-22 22:45:23 +00:00
Simon Pilgrim 7e1cc97513 [X86][SSE] Improve shuffle combining with zero insertions
Add support for handling shuffles with scalar_to_vector(0)

llvm-svn: 292766
2017-01-22 22:21:44 +00:00
Sanjay Patel 8f49aede82 [x86] avoid crashing with illegal vector type (PR31672)
https://llvm.org/bugs/show_bug.cgi?id=31672

llvm-svn: 292758
2017-01-22 17:06:12 +00:00
Craig Topper 8e0724d332 [X86] Don't allow commuting to form phsub operations.
Fixes PR31714.

llvm-svn: 292713
2017-01-21 06:59:38 +00:00
Matthias Braun 28eae8f4e0 LiveRegUnits: Add accumulateBackward() function
Re-Commit r292543 with a fix for the situation when the chain end is
MBB.end().

This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.

Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.

- The AArch64A57LoadBalancing code is using a backwards analysis now
  which is irrespective of kill flags. This is the main motivation for
  this change.

Differential Revision: http://reviews.llvm.org/D22082

llvm-svn: 292705
2017-01-21 02:21:04 +00:00
Eugene Zelenko 06f90ef45c [AMDGPU] Fix build broken in r292688.
llvm-svn: 292699
2017-01-21 01:34:25 +00:00
Justin Lebar 46624a822d [NVPTX] Auto-upgrade some NVPTX intrinsics to LLVM target-generic code.
Summary:
Specifically, we upgrade llvm.nvvm.:

 * brev{32,64}
 * clz.{i,ll}
 * popc.{i,ll}
 * abs.{i,ll}
 * {min,max}.{i,ll,u,ull}
 * h2f

These either map directly to an existing LLVM target-generic
intrinsic or map to a simple LLVM target-generic idiom.

In all cases, we check that the code we generate is lowered to PTX as we
expect.

These builtins don't need to be backfilled in clang: They're not
accessible to user code from nvcc.

Reviewers: tra

Subscribers: majnemer, cfe-commits, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28793

llvm-svn: 292694
2017-01-21 01:00:32 +00:00
Justin Lebar 077f8fb168 [NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of DAGToDAG and into TargetLowering.
Summary:
DADToDAG has access to TargetLowering, but not vice versa, so this is
the more general location for these functions.

NFC

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28795

llvm-svn: 292693
2017-01-21 01:00:14 +00:00
Eugene Zelenko 6620376da7 [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292688
2017-01-21 00:53:49 +00:00
Guozhi Wei a5c6ed5a5c [PPC] Give unaligned memory access lower cost on processor that supports it
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.

This patch fixes pr31492.

Differential Revision: https://reviews.llvm.org/D28630

llvm-svn: 292680
2017-01-20 23:35:27 +00:00
Jan Vesely f170504c41 AMDGPU/R600: Serialize vector trunc stores to private AS
Add DUMMY_CHAIN SDNode to denote stores of interest

Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411

Differential Revision: https://reviews.llvm.org/D27964

llvm-svn: 292651
2017-01-20 21:24:26 +00:00
Dan Gohman a99b717f52 [WebAssembly] Don't create bitcast-wrappers for varargs.
WebAssembly varargs functions use a significantly different ABI than
non-varargs functions, and the current code in
WebAssemblyFixFunctionBitcasts doesn't handle that difference. For now,
just avoid creating wrapper functions in the presence of varargs.

llvm-svn: 292645
2017-01-20 20:50:29 +00:00
Matthias Braun 856548a616 ARM: tLDR_postidx should be marked mayLoad
This fixes -verify-machineinstrs complaints.

llvm-svn: 292629
2017-01-20 18:30:28 +00:00
Matthias Braun 2e8c11e4b3 AArch64LoadStoreOptimizer: Update kill flags when merging stores
Kill flags need to be updated correctly when moving stores up/down to
form store pair instructions.
Those invalid flags have been ignored before but as of r290014 they are
recognized when using -mllvm -verify-machineinstrs.

Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it
to ldst-opt.mir test and adds a new tests for this change.

Differential Revision: https://reviews.llvm.org/D28875

llvm-svn: 292625
2017-01-20 18:04:27 +00:00
Petar Jovanovic dbb39356b4 [mips] Fix debug information for __thread variable
This patch fixes debug information for __thread variable on Mips
using .dtprelword and .dtpreldword directives.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D28770

llvm-svn: 292624
2017-01-20 17:53:30 +00:00
Eugene Zelenko 734bb7bb09 [AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 292623
2017-01-20 17:52:16 +00:00
Simon Pilgrim 3e5b525699 Remove trailing whitespace. NFCI.
llvm-svn: 292613
2017-01-20 15:15:59 +00:00
Simon Pilgrim 0da4d2bc03 [CostModel][X86] Removed unused cost. NFCI.
SHL v8i32 is already handled in the SSE41 cost table

llvm-svn: 292612
2017-01-20 15:14:38 +00:00
Sjoerd Meijer 2db2a947f6 [Thumb] Add support for tMUL in the compare instruction peephole optimizer.
We also want to optimise tests like this: return a*b == 0.  The MULS
instruction is flag setting, so we don't need the CMP instruction but can
instead branch on the result of the MULS. The generated instructions sequence
for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the
boolean values resulting from the select instruction, but these MOVS
instructions are flag setting and were thus preventing this optimisation. Now
we first reorder and move the MULS to before the CMP and generate sequence
MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the
MULS and MOVS is safe to do because the subsequent MOVS instructions just set
the CPSR register and don't use it, i.e. the CPSR is dead.

Differential Revision: https://reviews.llvm.org/D27990

llvm-svn: 292608
2017-01-20 13:10:12 +00:00
Benjamin Kramer 11590b8281 Pacify -Wreorder.
llvm-svn: 292599
2017-01-20 10:37:53 +00:00
Sam Kolton 07dbde214b [AMDGPU] Add subtarget features for SDWA/DPP
Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28900

llvm-svn: 292596
2017-01-20 10:01:25 +00:00
Diana Picus bd66b7dc87 [ARM] Use helpers for adding pred / CC operands. NFC
Hunt down some of the places where we use bare addReg(0) or addImm(AL).addReg(0)
and replace with add(condCodeOp()) and add(predOps()). This should make it
easier to understand what those operands represent (without having to look at
the definition of the instruction that we're adding to).

Differential Revision: https://reviews.llvm.org/D27984

llvm-svn: 292587
2017-01-20 08:15:24 +00:00
Matthias Braun d9217c0b86 Revert "LiveRegUnits: Add accumulateBackward() function"
This seems to be breaking some bots.

This reverts commit r292543.

llvm-svn: 292574
2017-01-20 03:58:42 +00:00
Ahmed Bougacha d294823930 [AArch64][GlobalISel] Widen scalar int->fp conversions.
It's incorrect to ignore the higher bits of the integer source.
Teach the legalizer how to widen it.

llvm-svn: 292563
2017-01-20 01:37:24 +00:00
Stanislav Mekhanoshin 6ec3e3a728 [AMDGPU] Prevent spills before exec mask is restored
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.

Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.

Differential Revision: https://reviews.llvm.org/D27997

llvm-svn: 292554
2017-01-20 00:44:31 +00:00
Matthias Braun 3ffeb68869 LiveRegUnits: Add accumulateBackward() function
This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.

Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.

- The AArch64A57LoadBalancing code is using a backwards analysis now
  which is irrespective of kill flags. This is the main motivation for
  this change.

Differential Revision: http://reviews.llvm.org/D22082

llvm-svn: 292543
2017-01-20 00:16:17 +00:00
Stanislav Mekhanoshin 68257700f8 [AMDGPU] Add exec copy to LiveIntervals in SILowerControlFlow::emitElse
This instruction is missing from LiveIntervals.
I'm not aware of any problems because of this though.

Differential Revision: https://reviews.llvm.org/D28879

llvm-svn: 292521
2017-01-19 21:26:22 +00:00
Serge Rogatch f83d2a25bf [XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623

Reviewers: rengolin, dberris

Reviewed By: dberris

Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown

Differential Revision: https://reviews.llvm.org/D28624

llvm-svn: 292516
2017-01-19 20:24:23 +00:00
Simon Pilgrim db101e4d57 [X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI.
llvm-svn: 292502
2017-01-19 18:18:32 +00:00
Simon Pilgrim 5f2f53b106 [X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further

llvm-svn: 292493
2017-01-19 16:25:02 +00:00
Kristof Beyls e9412b4d47 [GlobalISel] Pointers are legal operands for G_SELECT on AArch64
Differential Revision: https://reviews.llvm.org/D28805

llvm-svn: 292481
2017-01-19 13:32:14 +00:00
Elena Demikhovsky e01512cecf Recommiting unsigned saturation with a bugfix.
A test case that crached is added to avx512-trunc.ll.
(PR31589)

llvm-svn: 292479
2017-01-19 12:08:21 +00:00
Daniel Sanders d64d5024a4 Re-commit: [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals

Reviewers: t.p.northover, ab, rovka, qcolombet

Reviewed By: qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292478
2017-01-19 11:15:55 +00:00
Craig Topper 200ea31684 [AVX-512] Support ADD/SUB/MUL of mask vectors
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.

We already do this for scalar i1 operations so I just extended it to vectors of i1.

Reviewers: zvi, delena

Reviewed By: delena

Subscribers: guyblank, llvm-commits

Differential Revision: https://reviews.llvm.org/D28888

llvm-svn: 292474
2017-01-19 07:12:35 +00:00
Matt Arsenault 3e6f9b5773 AMDGPU: Disable some fneg combines unless nsz
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.

fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.

llvm-svn: 292473
2017-01-19 06:35:27 +00:00
Matt Arsenault 3b99f12a4e AMDGPU: Remove modifiers from v_div_scale_*
They seem to produce nonsense results when used.

This should be applied to the release branch.

llvm-svn: 292472
2017-01-19 06:04:12 +00:00
Craig Topper c227529105 [X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical.
llvm-svn: 292469
2017-01-19 03:49:29 +00:00
Craig Topper b561e66384 [AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load.
llvm-svn: 292466
2017-01-19 02:34:29 +00:00
Dehao Chen 1ce8d6ca59 Add -debug-info-for-profiling to emit more debug info for sample pgo profile collection
Summary:
SamplePGO binaries built with -gmlt to collect profile. The current -gmlt debug info is limited, and we need some additional info:

* start line of all subprograms
* linkage name of all subprograms
* standalone subprograms (functions that has neither inlined nor been inlined)

This patch adds these information to the -gmlt binary. The impact on speccpu2006 binary size (size increase comparing with -g0 binary, also includes data for -g binary, which does not change with this patch):

               -gmlt(orig) -gmlt(patched) -g
433.milc       4.68%       5.40%          19.73%
444.namd       8.45%       8.93%          45.99%
447.dealII     97.43%      115.21%        374.89%
450.soplex     27.75%      31.88%         126.04%
453.povray     21.81%      26.16%         92.03%
470.lbm        0.60%       0.67%          1.96%
482.sphinx3    5.77%       6.47%          26.17%
400.perlbench  17.81%      19.43%         73.08%
401.bzip2      3.73%       3.92%          12.18%
403.gcc        31.75%      34.48%         122.75%
429.mcf        0.78%       0.88%          3.89%
445.gobmk      6.08%       7.92%          42.27%
456.hmmer      10.36%      11.25%         35.23%
458.sjeng      5.08%       5.42%          14.36%
462.libquantum 1.71%       1.96%          6.36%
464.h264ref    15.61%      16.56%         43.92%
471.omnetpp    11.93%      15.84%         60.09%
473.astar      3.11%       3.69%          14.18%
483.xalancbmk  56.29%      81.63%         353.22%
geomean        15.60%      18.30%         57.81%

Debug info size change for -gmlt binary with this patch:

433.milc       13.46%
444.namd       5.35%
447.dealII     18.21%
450.soplex     14.68%
453.povray     19.65%
470.lbm        6.03%
482.sphinx3    11.21%
400.perlbench  8.91%
401.bzip2      4.41%
403.gcc        8.56%
429.mcf        8.24%
445.gobmk      29.47%
456.hmmer      8.19%
458.sjeng      6.05%
462.libquantum 11.23%
464.h264ref    5.93%
471.omnetpp    31.89%
473.astar      16.20%
483.xalancbmk  44.62%
geomean        16.83%

Reviewers: davidxl, echristo, dblaikie

Reviewed By: echristo, dblaikie

Subscribers: aprantl, probinson, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25434

llvm-svn: 292457
2017-01-19 00:44:11 +00:00
Artem Belevich 3d3f6190ab [NVPTX] Fix lowering of fp16 ISD::FNEG.
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.

Differential Revision: https://reviews.llvm.org/D28876

llvm-svn: 292452
2017-01-19 00:14:45 +00:00
Krzysztof Parzyszek 954dd8d9ba [Hexagon] Remove dead defs from the live set when expanding wstores
llvm-svn: 292445
2017-01-18 23:11:40 +00:00
Michael Kuperstein d3d2925933 Revert r291670 because it introduces a crash.
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.

PR31589 has the new reproducer.

llvm-svn: 292444
2017-01-18 23:05:58 +00:00
Evandro Menezes 7960b2e19a [AArch64] Generate literals by the little end
ARM seems to prefer that long literals be formed from their little end in
order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on
Cortex A57 and others (v.  "Cortex A57 Software Optimisation Guide", section
4.14).

Differential revision: https://reviews.llvm.org/D28697

llvm-svn: 292422
2017-01-18 18:57:08 +00:00
Stanislav Mekhanoshin a4e63ead4b [AMDGPU] Do not allow register coalescer to create big superregs
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.

With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.

Differential Revision: https://reviews.llvm.org/D28782

llvm-svn: 292413
2017-01-18 17:30:05 +00:00
Kirill Bobyrev 6afbaf0944 Revert 292404 due to buildbot failures.
llvm-svn: 292407
2017-01-18 16:34:25 +00:00
Kirill Bobyrev 9ad06dbe17 [X86] Minor code cleanup to fix several clang-tidy warnings. NFC
llvm-svn: 292404
2017-01-18 16:15:47 +00:00
Chad Rosier 771db6f895 [Assembler] Fix crash when assembling .quad for AArch32.
A 64-bit relocation does not exist in 32-bit ARMELF. Report an error
instead of crashing.

PR23870
Patch by Sanne Wouda (sanwou01).
Differential Revision: https://reviews.llvm.org/D28851

llvm-svn: 292373
2017-01-18 15:02:54 +00:00
Florian Hahn 8485cecd3f [thumb,framelowering] Reset NoVRegs in Thumb1FrameLowering::emitPrologue.
Summary:
In this function, virtual registers can be introduced (for example
through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs
will replace those virtual registers with concrete registers later on
in PrologEpilogInserter, which sets NoVRegs again.

This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which
failed with expensive checks.
https://llvm.org/bugs/show_bug.cgi?id=27484


Reviewers: rnk, bkramer, olista01

Reviewed By: olista01

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D28829

llvm-svn: 292372
2017-01-18 15:01:22 +00:00
Daniel Sanders af76f989b5 Re-revert: [globalisel] Tablegen-erate current Register Bank Information
More missing guards. My build didn't notice it due to a stale file left over
from a Global ISel build.

llvm-svn: 292369
2017-01-18 14:26:12 +00:00
Daniel Sanders 517b61cb69 Re-commit: [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Changes since last commit:
The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and
this should fix the buildbots however it may not be the whole fix. The previous
buildbot failures suggest there may be a memory bug lurking that I'm unable to
reproduce (including when using asan) or spot in the source. If they re-occur
on this commit then I'll need assistance from the bot owners to track it down.

Reviewers: t.p.northover, ab, rovka, qcolombet

Reviewed By: qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292367
2017-01-18 14:17:50 +00:00
Sam Parker df7c6ef96f [ARM] Create objdump subtarget from build attrs
Enable an ELFObjectFile to read the its arm build attributes to
produce a target triple with a specific ARM architecture.
llvm-objdump now uses this functionality to automatically produce
a more accurate target.

Differential Revision: https://reviews.llvm.org/D28769

llvm-svn: 292366
2017-01-18 13:52:12 +00:00
Michael Zuckerman 0c0240ce84 [X86] Improve mul combine for negative multiplayer (2^c - 1)
This patch improves the mul instruction combine function (combineMul) 
by adding new layer of logic. 
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) 
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.

Differential Revision: https://reviews.llvm.org/D28232

llvm-svn: 292358
2017-01-18 09:31:13 +00:00
Renato Golin 03c5e69d07 Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier"
This reverts commit r292210, as it broke the Thumb buldbot with:

clang-5.0: error: the clang compiler does not support '-fxray-instrument
on thumbv7-unknown-linux-gnueabihf'.

llvm-svn: 292357
2017-01-18 09:08:43 +00:00
Jonas Paulsson a9bb00d82b [SystemZ] Proper handling of undef flag while expanding pseudo.
During post-RA pseudo expansion, an 'undef' flag of the source operand should
be propagated by emitGRX32Move().

Review: Ulrich Weigand
llvm-svn: 292353
2017-01-18 08:32:54 +00:00
Marina Yatsina 197db00e3e [X86] Fix for bugzilla 31576 - add support for "data32" instruction prefix
This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576).

"data32" instruction prefix was not defined in the llvm.
An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes).

Differential Revision: https://reviews.llvm.org/D28468

llvm-svn: 292352
2017-01-18 08:07:51 +00:00
Dan Gohman 73e3aaa61e [WebAssembly] Update grow_memory's return type.
The grow_memory instruction now returns the previous memory size. Add the
return type to the LLVM intrinsic.

llvm-svn: 292322
2017-01-18 01:02:45 +00:00
Justin Lebar 1cf6bf4989 [NVPTX] Support global variables of integer type larger than i64.
Reviewers: tra, majnemer

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28825

llvm-svn: 292316
2017-01-18 00:29:53 +00:00
Justin Lebar 9c46450dbb [NVPTX] Standardize asm printer on "foo \tbar".
Some instructions were printed as "foo\tbar", but most are printed as
"foo \bar".  Standardize on the latter form.

llvm-svn: 292306
2017-01-18 00:09:36 +00:00
Justin Lebar 2a2d6f0ddd [NVPTX] Clean up nested !strconcat calls.
!strconcat is a variadic function; it will concatenate an arbitrary
number of strings.  There's no need to nest it.

llvm-svn: 292305
2017-01-18 00:09:19 +00:00
Justin Lebar cc938fc197 [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.
Summary:
This change also lets us use max.{s,u}16.  There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16.  It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).

In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc.  In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28732

llvm-svn: 292304
2017-01-18 00:09:01 +00:00
Justin Lebar 7dc3d6c341 [NVPTX] Lower integer absolute value idiom to abs instruction.
Summary: Previously we lowered it literally, to shifts and xors.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28722

llvm-svn: 292303
2017-01-18 00:08:44 +00:00
Justin Lebar 1091a9f566 [NVPTX] Improve lowering of llvm.ctpop.
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.

(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28721

llvm-svn: 292302
2017-01-18 00:08:27 +00:00
Justin Lebar c7d20128bd [NVPTX] Add lowering for llvm.bitreverse.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28720

llvm-svn: 292301
2017-01-18 00:08:10 +00:00
Justin Lebar d17de5380b [NVPTX] Improve lowering of llvm.ctlz.
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
  has defined behavior on x == 0 to check whether x is, in fact zero.

* Add DAG patterns that avoid re-truncating or re-expanding the result
  of the 16- and 64-bit ctz instructions.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28719

llvm-svn: 292299
2017-01-18 00:07:35 +00:00
Tim Northover 33a1a0b001 GlobalISel: fix comparison order for G_FCMP
As with G_ICMP we'd written the CSET instructions backwards.

llvm-svn: 292285
2017-01-17 23:04:01 +00:00
Tim Northover 509091f9e0 GlobalISel: add callseq instructions to record stack usage
llvm-svn: 292284
2017-01-17 22:43:34 +00:00
Tim Northover d943354216 GlobalISel: correctly handle varargs
Some platforms (notably iOS) use a different calling convention for unnamed vs
named parameters in varargs functions, so we need to keep track of this
information when translating calls.

Since not many platforms are involved, the guts of the special handling is in
the ValueHandler class (with a generic implementation that should work for most
targets).

llvm-svn: 292283
2017-01-17 22:30:10 +00:00
Alexei Starovoitov efefbc4a19 [bpf] fix stack-use-after-scope
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 292258
2017-01-17 21:14:00 +00:00
Joerg Sonnenberger 270dd41f75 Remove an overeager assert from r288844.
llvm-svn: 292244
2017-01-17 19:29:15 +00:00
Bob Wilson f2d0b68b3b Revert r291640 change to fold X86 comparison with atomic_load_add.
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.

llvm-svn: 292242
2017-01-17 19:18:57 +00:00
Sam Kolton 9dffada98b [AMDGPU] Assembler: fix v_mac_f16 immediates
Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28802

llvm-svn: 292224
2017-01-17 15:26:02 +00:00
Serge Rogatch 50be6b45a9 [XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623

Reviewers: rengolin, dberris

Reviewed By: dberris

Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown

Differential Revision: https://reviews.llvm.org/D28624

llvm-svn: 292210
2017-01-17 11:52:10 +00:00
Matt Arsenault 4165efdc58 AMDGPU: Add replacement export intrinsics
llvm-svn: 292205
2017-01-17 07:26:53 +00:00
Alexei Starovoitov e4975487f5 [bpf] error when unknown bpf helper is called
Emit error when BPF backend sees a call to a global function or to an external symbol.
The kernel verifier only allows calls to predefined helpers from bpf.h
which are defined in 'enum bpf_func_id'. Such calls in assembler must
look like 'call [1-9]+' where number matches bpf_func_id.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 292204
2017-01-17 07:26:17 +00:00
Craig Topper 729d30d0ae [AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation.
llvm-svn: 292201
2017-01-17 06:49:59 +00:00
Alexei Starovoitov 05de2e4818 [bpf] error when BPF stack size exceeds 512 bytes
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 292180
2017-01-17 01:05:17 +00:00
Matt Arsenault 2aab1d45ff AMDGPU: Remove dead pattern
This is the unsafe conversion pattern, but not guarded by
an unsafe math check. It is also already done in LegalizeDAG.

llvm-svn: 292173
2017-01-17 00:10:43 +00:00
Jan Vesely 334f51a6fe ADMGPU/EG,CM: Implement _noret global atomics
_RTN versions will be a lot more complicated

Differential Revision: https://reviews.llvm.org/D28067

llvm-svn: 292162
2017-01-16 21:20:13 +00:00
Tony Jiang 8e8c444d3d [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

llvm-svn: 292154
2017-01-16 20:12:26 +00:00
Chad Rosier 58fb5f5e58 [AArch64] Falkor supports Rounding Double Multiply Add/Subtract instructions.
Falkor only partially implements the ARMv8.1a extensions, so this patch
refactors the support for the SQRDML[A|S]H instruction into a separate
feature.

Differential Revision: https://reviews.llvm.org/D28681

llvm-svn: 292142
2017-01-16 16:28:43 +00:00
Daniel Sanders a83a1a69c5 Revert r292132: [globalisel] Tablegen-erate current Register Bank Information'...
Several buildbots encountered a crash in tablegen when building this commit.
Reverting while I investigate the cause.

llvm-svn: 292136
2017-01-16 15:34:43 +00:00
Daniel Sanders ab8194def0 [globalisel] Tablegen-erate current Register Bank Information
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.

Reviewers: t.p.northover, ab, rovka, qcolombet

Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27338

llvm-svn: 292132
2017-01-16 15:20:43 +00:00
Tony Jiang 8da139a9fd Revert "[PowerPC] Expand ISEL instruction into if-then-else sequence."
This reverts commit 1d0e0374438ca6e153844c683826ba9b82486bb1.

llvm-svn: 292131
2017-01-16 15:01:07 +00:00
Tony Jiang 7630b8c5ee [PowerPC] Expand ISEL instruction into if-then-else sequence.
Generally, the ISEL is expanded into if-then-else sequence, in some
cases (like when the destination register is the same with the true
or false value register), it may just be expanded into just the if
or else sequence.

llvm-svn: 292128
2017-01-16 14:43:12 +00:00
Simon Dardis 730fdb73a1 [mips] Correct c.cond.fmt instruction definition.
Permit explicit $fcc<X> operand in c.cond.fmt instruction.

Add c.cond.fmt to the MIPS to microMIPS instruction mapping table.

Check that $fcc1 - $fcc7 are unusable for MIPS-I to MIPS-III for
c.cond.fmt, bc1t, bc1f.

Reviewers: seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D24510

llvm-svn: 292117
2017-01-16 13:55:58 +00:00
Craig Topper fba613e407 [X86] Merge the disassemblers handling of the different TYPE_RELs by getting the size information from the ENCODING field. NFCI
llvm-svn: 292096
2017-01-16 06:49:09 +00:00
Craig Topper ad944a1cac [X86] Reduce the number of operand 'types' the disassembler needs to deal with. NFCI
We were frequently checking for a list of types and the different types
conveyed no real information. So lump them together explicitly.

llvm-svn: 292095
2017-01-16 06:49:03 +00:00
Craig Topper 3173a1f8ff [AVX-512] Teach the disassembler about all of the EVEX gather and scatter instructions.
llvm-svn: 292094
2017-01-16 05:44:33 +00:00
Craig Topper 33ac064137 [AVX-512] Begin giving the disassembler a way to recognize that VSIB is a different encoding than regular addressing modes.
This part first teaches it not to check error if EVEX.V2 is used by a VSIB instruction.

llvm-svn: 292093
2017-01-16 05:44:25 +00:00
Craig Topper 7dfd583644 [AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD
with ZMM index. Similar for SCATTER and the prefetch gather and scatter
instructions.

Fixes PR31618.

llvm-svn: 292088
2017-01-16 00:55:58 +00:00
Craig Topper 8be6ebce2b [AVX-512] Fix register class in one of the gather/scatter memory operands so that all 32 bit registers can be allowed.
llvm-svn: 292087
2017-01-16 00:55:50 +00:00
Simon Pilgrim 6ed996cdf0 [CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types
We already have patterns in place to support 128/256-bit shifts without AVX512VL

llvm-svn: 292077
2017-01-15 20:44:00 +00:00
Justin Lebar 38746d9718 [NVPTX] Let there be One True Way to set NVVMReflect params.
Summary:
Previously there were three ways to inform the NVVMReflect pass whether
you wanted to flush denormals to zero:

  * An LLVM command-line option
  * Parameters to the NVVMReflect constructor
  * Metadata on the module itself.

This change removes the first two, leaving only the third.

The motivation for this change, aside from simplifying things, is that
we want LLVM to be aware of whether it's operating in FTZ mode, so other
passes can use this information.  Ideally we'd have a target-generic
piece of metadata on the module.  This change moves us in that
direction.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28700

llvm-svn: 292068
2017-01-15 16:54:35 +00:00
Michael Zuckerman 6baa3838e9 Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE.
llvm-svn: 292066
2017-01-15 16:43:14 +00:00
Craig Topper f1388ef006 [AVX-512] Remove unnecessary duplicate broadcast patterns. NFC
llvm-svn: 292053
2017-01-15 06:15:45 +00:00
Craig Topper 52317e8b6e [AVX-512] Replicate some broadcast patterns to VLX and disable the AVX2 patterns when VLX is available.
llvm-svn: 292051
2017-01-15 05:47:45 +00:00
Craig Topper c294cff863 [X86] Remove untested MOVDDUP patterns.
These all involve bitcasts around the memory operands. This isn't
something we normally do for isel patterns. I suspect DAG combine should
convert the load type making this unnecessary.

llvm-svn: 292050
2017-01-15 05:21:29 +00:00
Davide Italiano 76de68eaf9 [TargetLowering] Simplfiy a bit. NFCI.
llvm-svn: 292024
2017-01-14 20:09:29 +00:00
Simon Pilgrim d419b73a42 [CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed
llvm-svn: 292023
2017-01-14 19:24:23 +00:00
Simon Pilgrim 8e5ecf8ad1 [X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns
VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator

llvm-svn: 292021
2017-01-14 18:52:13 +00:00
Simon Pilgrim b290805e94 [X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns
VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator

llvm-svn: 292020
2017-01-14 18:08:54 +00:00
Simon Pilgrim a1631749f8 [X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns
VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages

llvm-svn: 292019
2017-01-14 17:13:52 +00:00
Craig Topper 63e2cd6caa [AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial.
Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions.

This also picks up cases where the masked move was created due to a masked load intrinsic.

Differential Revision: https://reviews.llvm.org/D28454

llvm-svn: 292005
2017-01-14 07:50:52 +00:00
Craig Topper 09b7e0f01d [AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible.
We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD.

This makes it possible for the register allocator to have all 32 registers available to work with.

llvm-svn: 292004
2017-01-14 07:29:24 +00:00
Craig Topper 9cc685a56e [X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop.
llvm-svn: 291996
2017-01-14 04:29:15 +00:00
Craig Topper 9850210d03 [AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there.
This fixes a undefined sanitizer failure from r291888.

llvm-svn: 291994
2017-01-14 04:19:35 +00:00
David L. Jones 41cecba8e9 "Use" lambda captures which are otherwise only used in asserts. NFC
Summary:
The LLVM coding standards recommend "using" values that are only
needed by asserts:
http://llvm.org/docs/CodingStandards.html#assert-liberally

Without this change, LLVM cannot bootstrap with -Werror as the second
stage fails with this new warning:
https://reviews.llvm.org/rL291905

See also the previous fixes:
https://reviews.llvm.org/rL291916
https://reviews.llvm.org/rL291939
https://reviews.llvm.org/rL291940
https://reviews.llvm.org/rL291941

Reviewers: rsmith

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28695

llvm-svn: 291957
2017-01-13 21:02:41 +00:00
Artem Belevich 64dc9be7b4 [NVPTX] Added support for half-precision floating point.
Only scalar half-precision operations are supported at the moment.

- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
  (can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
  fp16 operations are promoted to fp32 and results are converted back
  to fp16 for storage.

Differential Revision: https://reviews.llvm.org/D28540

llvm-svn: 291956
2017-01-13 20:56:17 +00:00
Konstantin Zhuravlyov 7d88275577 [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)
Differential Revision: https://reviews.llvm.org/D28496

llvm-svn: 291954
2017-01-13 19:49:25 +00:00
Artem Belevich d109f46573 [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.

Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.

Differential Revision: https://reviews.llvm.org/D28619

llvm-svn: 291936
2017-01-13 18:48:13 +00:00
Malcolm Parsons 17d266bc96 Remove unused lambda captures. NFC
llvm-svn: 291916
2017-01-13 17:12:16 +00:00
Ivan Krasin 1ed7896c1b Revert r291903 and r291898. Reason: they break check-lld on the bots.
Summary:
Revert [ARM] Fix ubig32_t read in ARMAttributeParser

Now using support functions to read data instead of trying to
perform casts.
===========================================================

Revert [ARM] Enable objdump to construct triple for ARM

Now that The ARMAttributeParser has been moved into the library,
it has been modified so that it can parse the attributes without
printing them and stores them in a map. ELFObjectFile now queries
the attributes to fill out the architecture details of a provided
triple for 'arm' and 'thumb' targets. llvm-objdump uses this new
functionality.

Subscribers: llvm-commits, samparker, aemerson, mgorny

Differential Revision: https://reviews.llvm.org/D28683

llvm-svn: 291911
2017-01-13 16:45:15 +00:00
Saleem Abdulrasool 6ef45916c6 ARM: match GCC's behaviour for builtins
GCC changes the CC between the user-code and the builtins based on the
value of `-target` rather than `-mfloat-abi`.  When a HF target is used,
the VFP variant of the AAPCS CC is used.  Otherwise, the AAPCS variant
is used.  In all cases, the AEABI functions use the AAPCS CC.  Adjust
the calling convention based on the target.

Resolves PR30543!

llvm-svn: 291909
2017-01-13 16:25:33 +00:00
Benjamin Kramer 061f4a5fe6 Apply clang-tidy's performance-unnecessary-value-param to LLVM.
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.

llvm-svn: 291904
2017-01-13 14:39:03 +00:00
Daniel Sanders d6a1831ea7 [globalisel][aarch64] Make getCopyMapping() take register banks ID's rather than IsGPR booleans
Summary:
This allows the function to handle architectures with more than two register banks.

Depends on D27978

Reviewers: ab, t.p.northover, rovka, qcolombet

Subscribers: aditya_nandakumar, kristof.beyls, aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka

Differential Revision: https://reviews.llvm.org/D27339

llvm-svn: 291902
2017-01-13 14:16:33 +00:00