Commit Graph

40780 Commits

Author SHA1 Message Date
Craig Topper f51ba1e3da [AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX.
llvm-svn: 291402
2017-01-08 21:32:30 +00:00
Simon Pilgrim e6d948b857 Strip trailing whitespace.
llvm-svn: 291395
2017-01-08 18:37:42 +00:00
Simon Pilgrim b2a80950fe Fix line endings and strip trailing whitespace.
llvm-svn: 291393
2017-01-08 16:45:39 +00:00
Sanjay Patel bf51c8a975 [x86] fix usage of stale operands when lowering select
I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR.

The debug output shows nodes like this:

t4: i32 = xor t2, Constant:i32<-1>
    t21: i8 = setcc t4, Constant:i32<0>, setlt:ch
  t14: i32 = select t21, t4, Constant:i32<-1>

And because the select is holding onto the t4 (xor) node while EmitTest creates a new 
x86-specific xor node, the lowering results in:

  t4: i32 = xor t2, Constant:i32<-1>
  t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1>
t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1

Differential Revision: https://reviews.llvm.org/D28374

llvm-svn: 291392
2017-01-08 15:53:40 +00:00
Simon Pilgrim 9c58950eeb [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim 1fa5487c05 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Craig Topper 5c46c7526e [AVX-512] Remove redundant patterns that select unaligned moves with zero masking for patterns that already use the aligned form. NFC
llvm-svn: 291383
2017-01-08 05:46:21 +00:00
Dylan McKay 8fa6d8db9c [AVR] Implement TargetLoweing::getRegisterByName
This allows the use of the 'read_register' intrinsics used by clang's
named register globals features.

llvm-svn: 291375
2017-01-07 23:39:47 +00:00
Simon Pilgrim 9681c407b4 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Craig Topper a74e3088df [AVX-512] Remove patterns from the other VBLENDM instructions. They are all redundant with masked move instructions.
We should probably teach the two address instruction pass to turn masked moves into BLENDM when its beneficial to the register allocator.

llvm-svn: 291371
2017-01-07 22:20:34 +00:00
Craig Topper 81f20aa336 [AVX-512] Remove patterns from masked broadcast versions of BLENDM instructions.
All but (v2f64 broadcast f64) are handled with VBROADCAST instructions. The v2f64 version can be handled with VMOVDDUP.

We may want to consider converting to BLENDM instructions in the two address instruction pass if its beneficial to register allocation.

llvm-svn: 291369
2017-01-07 22:20:26 +00:00
Craig Topper da84ff3ed4 [AVX-512] Add masked forms of the alternate MOVDDUP patterns.
I'm not too sure how to get isel to select even all of the unmasked forms, but at least we have a consistent set now.

llvm-svn: 291368
2017-01-07 22:20:23 +00:00
Simon Pilgrim a470296367 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim 82e3e05fe2 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim e70644dab7 [CostModel][X86] Generalized cost calculation of SHL by constant -> MUL conversion.
llvm-svn: 291364
2017-01-07 21:33:00 +00:00
Simon Pilgrim 725997154d [CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.
llvm-svn: 291355
2017-01-07 18:19:25 +00:00
Simon Pilgrim a4109d6433 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim df7de7a87e [CostModel][X86] Added missing AVX2 arithmetic costs.
Allows us to correctly fall through to the lower AVX1 costs if look up failed.

llvm-svn: 291353
2017-01-07 17:27:39 +00:00
Simon Pilgrim 100eae1ee0 [CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target order. NFCI.
llvm-svn: 291352
2017-01-07 17:03:51 +00:00
Simon Pilgrim a1b8e2c725 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Craig Topper 42b848a683 [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load size is smaller than the register size so unfolding would increase the load size.
llvm-svn: 291338
2017-01-07 06:56:54 +00:00
Dan Gohman 0e2ceb8121 [WebAssembly] Don't abort on code with UB.
Gracefully leave code that performs function-pointer bitcasts implying
non-trivial pointer conversions alone, rather than aborting, since it's
just undefined behavior.

llvm-svn: 291326
2017-01-07 01:50:01 +00:00
Dan Gohman d5eda35557 [WebAssembly] Move a SmallVector to a more specific scope. NFC.
llvm-svn: 291324
2017-01-07 01:31:18 +00:00
Dylan McKay c5e209a0b2 [AVR] Parenthesize a boolean expression
Without the parentheses, clang would emit warnings while compiling the
code.

llvm-svn: 291320
2017-01-07 00:55:28 +00:00
Dan Gohman 1b637458f6 [WebAssembly] Add a pass to create wrappers for function bitcasts.
WebAssembly requires caller and callee signatures to match exactly. In LLVM,
there are a variety of circumstances where signatures may be mismatched in
practice, and one can bitcast a function address to another type to call it
as that type. This patch adds a pass which replaces bitcasted function
addresses with wrappers to replace the bitcasts.

This doesn't catch everything, but it does match many common cases.

llvm-svn: 291315
2017-01-07 00:34:54 +00:00
Eugene Zelenko 4282c404f9 [BPF] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291297
2017-01-06 23:06:25 +00:00
Jan Vesely 06200bd7bc AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes
This will make transition to SCRATCH_MEMORY easier

Differential Revision: https://reviews.llvm.org/D24746

llvm-svn: 291279
2017-01-06 21:00:46 +00:00
Matthias Braun 258b847c4f AArch64CollectLOH: Rewrite as block-local analysis.
Re-apply r288561: This time with a fix where the ADDs that are part of a
3 instruction LOH would not invalidate the "LastAdrp" state. This fixes
http://llvm.org/PR31361

Previously this pass was using up to 5% compile time in some cases which
is a bit much for what it is doing. The pass featured a full blown
data-flow analysis which in the default configuration was restricted to a
single block.

This rewrites the pass under the assumption that we only ever work on a
single block. This is done in a single pass maintaining a state machine
per general purpose register to catch LOH patterns.

Differential Revision: https://reviews.llvm.org/D27329

This reverts commit 9e6cedb0a4f14364d6511597a9160305e7d34493.

llvm-svn: 291266
2017-01-06 19:22:01 +00:00
Chad Rosier e177185e79 [AArch64] Reduce vector insert/extract cost for Falkor.
Differential Revision: https://reviews.llvm.org/D28403

llvm-svn: 291254
2017-01-06 18:03:26 +00:00
Simon Pilgrim 3128d6b520 [X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI.
Early step towards ignoring domain above a certain shuffle depth.

llvm-svn: 291248
2017-01-06 17:34:30 +00:00
Konstantin Zhuravlyov 31dbb0391d [AMDGPU] Remove extra semicolon. NFC
llvm-svn: 291246
2017-01-06 17:23:21 +00:00
Konstantin Zhuravlyov 67a6d5401a [AMDGPU] Do not emit .AMDGPU.config section for amdhsa
Differential Revision: https://reviews.llvm.org/D27732

llvm-svn: 291245
2017-01-06 17:02:10 +00:00
Simon Pilgrim bd3c6824d4 [X86][SSE] Simplify float domain requirement in unary shuffle matching.
The AVX1-only limit is never actually required in matchUnaryVectorShuffle

llvm-svn: 291244
2017-01-06 17:00:59 +00:00
Simon Pilgrim a08d7b9913 Remove trailing whitespace. NFCI.
llvm-svn: 291240
2017-01-06 15:31:52 +00:00
Simon Pilgrim 9b8c7caf4e [X86] Add X86Subtarget argument. NFCI.
All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it.

llvm-svn: 291239
2017-01-06 15:29:17 +00:00
Simon Pilgrim d8333372bc [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Craig Topper e86fb932ea [AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp.
llvm-svn: 291214
2017-01-06 05:18:48 +00:00
David Blaikie 1e58b463d9 Remove unused private fields to fix the clang -Werror build.
llvm-svn: 291201
2017-01-06 00:48:24 +00:00
Eugene Zelenko 049b017538 [AArch64, Lanai] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291197
2017-01-06 00:30:53 +00:00
Logan Chien ce542eefe3 Code cleanup: Remove tab indents.
llvm-svn: 291193
2017-01-05 23:41:33 +00:00
Simon Pilgrim aa186c632d [CostModel][X86] Tidyup arithmetic costs code. NFCI.
Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention.

llvm-svn: 291187
2017-01-05 22:48:02 +00:00
Geoff Berry d46b6e8096 [AArch64] Fold some filled/spilled subreg COPYs
Summary:
Extend AArch64 foldMemoryOperandImpl() to handle folding spills of
subreg COPYs with read-undef defs like:

  %vreg0:sub_32<def,read-undef> = COPY %WZR; GPR64:%vreg0

by widening the spilled physical source reg and generating:

  STRXui %XZR <fi#0>

as well as folding fills of similar COPYs like:

  %vreg0:sub_32<def,read-undef> = COPY %vreg1; GPR64:%vreg0, GPR32:%vreg1

by generating:

  %vreg0:sub_32<def,read-undef> = LDRWui <fi#0>

Reviewers: MatzeB, qcolombet

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27425

llvm-svn: 291180
2017-01-05 21:51:42 +00:00
Evgeniy Stepanov e8e11eb726 Revert "Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")"
Summary: This reverts commit r291144. It breaks build bots.

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/3270, http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/2058

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp:1638:12: error: could not convert ‘(const unsigned int*)(& Variants)’ from ‘const unsigned int*’ to ‘llvm::ArrayRef<unsigned int>’
     return Variants;

Reviewers: eugenis, tstellarAMD

Patch by Alex Shlyapnikov.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D28372

llvm-svn: 291168
2017-01-05 19:51:13 +00:00
Simon Pilgrim 4c050c2190 [CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.
llvm-svn: 291165
2017-01-05 19:42:43 +00:00
Simon Pilgrim 6f72eba606 Remove trailing whitespace. NFCI.
llvm-svn: 291163
2017-01-05 19:24:25 +00:00
Simon Pilgrim 5b06e4d319 [CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. NFCI.
llvm-svn: 291162
2017-01-05 19:19:39 +00:00
Simon Pilgrim a8bf97569a [CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.
Removes need for yet another LUT.

llvm-svn: 291158
2017-01-05 19:01:50 +00:00
Simon Pilgrim 430d34fc14 [CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.
Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead.

llvm-svn: 291152
2017-01-05 18:36:48 +00:00
Simon Pilgrim b01e844241 [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Simon Pilgrim f74700aa8c [CostModel][X86] Merged SK_PermuteSingleSrc/SK_PermuteTwoSrc into common shuffle cost LUTs. NFCI.
llvm-svn: 291146
2017-01-05 17:56:19 +00:00
Matt Arsenault ec63f62c58 Reapply r291025 ("AMDGPU: Remove unneccessary intermediate vector")
Arrays are supposed to be static const

llvm-svn: 291144
2017-01-05 17:36:11 +00:00
Sanjay Patel dea5a7bd53 less braces; NFC
llvm-svn: 291126
2017-01-05 16:47:32 +00:00
Simon Pilgrim bca02f9e20 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Zvi Rackover 4b7d724d62 [X86] Optimize vector shifts with variable but uniform shift amounts
Summary:
For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register.
The lower 64-bits of the register are evaluated to determine the shift amount.
This patch improves the construction of the vector containing the shift amount.

Reviewers: craig.topper, delena, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28353

llvm-svn: 291120
2017-01-05 15:11:43 +00:00
Tony Jiang 3a2f00b024 [PowerPC] Implement missing ISA 2.06 instructions.
Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement
them and add corresponding test cases in this patch.

llvm-svn: 291116
2017-01-05 15:00:45 +00:00
Simon Pilgrim a62395a4bd [CostModel][X86] Pulled out common type legalization code
llvm-svn: 291109
2017-01-05 14:33:32 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Kristof Beyls 2252440b81 [GlobalISel] Fix AArch64 ICMP instruction selection
Differential Revision: https://reviews.llvm.org/D28175

llvm-svn: 291097
2017-01-05 10:16:08 +00:00
Mohammed Agabaria 189e2d29ba [Test Commit] fixing some format issue in X86TTI to match clang-format output.
llvm-svn: 291095
2017-01-05 09:51:02 +00:00
Elena Demikhovsky 143cbc425b AVX-512: Optimized pattern for truncate with unsigned saturation.
DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512.
Differential revision: https://reviews.llvm.org/D28216

llvm-svn: 291092
2017-01-05 08:21:09 +00:00
Richard Smith d4d575b955 Revert r291025 ("AMDGPU: Remove unneccessary intermediate vector")
This caused buildbot failures due to returning ArrayRefs referencing local
(temporary) objects.

llvm-svn: 291067
2017-01-05 03:13:10 +00:00
Matt Arsenault 6796d7ea8b AMDGPU: Remove unneccessary intermediate vector
llvm-svn: 291025
2017-01-04 22:54:10 +00:00
Chad Rosier 63687e40bc [AArch64] Update the feature set for Qualcomm's Falkor CPU.
llvm-svn: 291010
2017-01-04 21:26:23 +00:00
Nirav Dave 0f9d111f97 [AArch64] Fix over-eager early-exit in load-store combiner
Fix early-exit analysis for memory operation pairing when operations are
not emitted in ascending order.

Reviewers: mcrosier, t.p.northover

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D28251

llvm-svn: 291008
2017-01-04 21:21:46 +00:00
Hal Finkel b2f951d87a [PowerPC] Fix logic dealing with nop after calls (and tail-call eligibility)
This change aims to unify and correct our logic for when we need to allow for
the possibility of the linker adding a TOC restoration instruction after a
call. This comes up in two contexts:

 1. When determining tail-call eligibility. If we make a tail call (i.e.
    directly branch to a function) then there is no place for the linker to add
    a TOC restoration.
 2. When determining when we need to add a nop instruction after a call.
    Likewise, if there is a possibility that the linker might need to add a
    TOC restoration after a call, then we need to put a nop after the call
    (the bl instruction).

First problem: We were using similar, but different, logic to decide (1) and
(2). This is just wrong. Both the resideInSameModule function (used when
determining tail-call eligibility) and the isLocalCall function (used when
deciding if the post-call nop is needed) were supposed to be determining the
same underlying fact (i.e. might a TOC restoration be needed after the call).
The same logic should be used in both places.

Second problem: The logic in both places was wrong. We only know that two
functions will share the same TOC when both functions come from the same
section of the same object. Otherwise the linker might cause the functions to
use different TOC base addresses (unless the multi-TOC linker option is
disabled, in which case only shared-library boundaries are relevant). There are
a number of factors that can cause functions to be placed in different sections
or come from different objects (-ffunction-sections, explicitly-specified
section names, COMDAT, weak linkage, etc.). All of these need to be checked.
The existing logic only checked properties of the callee, but the properties of
the caller must also be checked (for example, calling from a function in a
COMDAT section means calling between sections).

There was a conceptual error in the resideInSameModule function in that it
allowed tail calls to functions with weak linkage and protected/hidden
visibility. While protected/hidden visibility does prevent the function
implementation from being replaced at runtime (via interposition), it does not
prevent the linker from using an alternate implementation at link time (i.e.
using some strong definition to replace the provided weak one during linking).
If this happens, then we're still potentially looking at a required TOC
restoration upon return.

Otherwise, in general, the post-call nop is needed wherever ELF interposition
needs to be supported. We don't currently support ELF interposition at the IR
level (see http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html
for more information), and I don't think we should try to make it appear to
work in the backend in spite of that fact. Unfortunately, because of the way
that the ABI works, we need to generate code as if we supported interposition
whenever the linker might insert stubs for the purpose of supporting it.

Differential Revision: https://reviews.llvm.org/D27231

llvm-svn: 291003
2017-01-04 21:05:13 +00:00
Eric Christopher 568c113ac0 Remove dead and unused variable NumSentinelElements.
Fixes PR31529.

llvm-svn: 290998
2017-01-04 20:05:18 +00:00
Jan Vesely d48445d513 AMDGPU/SI: Implement sendmsghalt intrinsic
v2: expose using amdgcn prefix

Differential Revision: https://reviews.llvm.org/D23511

llvm-svn: 290977
2017-01-04 18:06:55 +00:00
Simon Pilgrim bb895f3e9c [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Simon Pilgrim 939b8cd708 [X86] Merged Reverse/Alternate shuffle cost tables. NFCI.
As discussed on D27811, merged the shuffle cost LUTs and use the shuffle kind to perform the lookup instead of the ISD opcode.

llvm-svn: 290956
2017-01-04 12:08:41 +00:00
Florian Hahn 5815f6c53c [framelowering] Skip dbg values when getting next/previous instruction.
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.

In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.

Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)

Reviewers: aprantl, MatzeB, mkuper

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D27688

llvm-svn: 290955
2017-01-04 12:08:35 +00:00
Nitesh Jain b0bc573ca8 [LLC][MIPS] Fix crash after enabling LLVM_ENABLE_EXPENSIVE_CHECKS
Reviewers: sdardis, vkalintiris

Subscribers: jaydeep, slthakur, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27841

llvm-svn: 290949
2017-01-04 09:34:37 +00:00
Ayman Musa 02f9533823 [X86][AVX512] Passing the appropriate memory operand class to INT_{U}COMIS{S|D} instructions
Replacing the memory operand in the intrinsic versions of the comis/ucomis instrucions from f128mem to ssmem/sdmem accordingly.

Differential Revision: https://reviews.llvm.org/D28138

llvm-svn: 290948
2017-01-04 08:21:54 +00:00
Simon Pilgrim c76ea4b638 [X86] Attempt to pre-truncate arithmetic operations if useful
In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need.

Differential Revision: https://reviews.llvm.org/D28219

llvm-svn: 290947
2017-01-04 08:05:42 +00:00
Craig Topper d0aa53b9ae [AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources.
These are best handled with a vinsert32x4 or vinsert64x2 instruction.

llvm-svn: 290946
2017-01-04 07:32:03 +00:00
Craig Topper 83115a809f [AVX-512] Simplify code for creating 512-bit SHUF128 operations.
We don't need two loops and we can safely assume assume and hardcode the size of the widened mask.

llvm-svn: 290942
2017-01-04 07:31:51 +00:00
Eugene Zelenko b2ca1b3f37 [Hexagon, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 290925
2017-01-04 02:02:05 +00:00
Craig Topper 48d232d3e7 [X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle.
llvm-svn: 290872
2017-01-03 07:36:41 +00:00
Craig Topper 785e58fdc9 [AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask.
llvm-svn: 290871
2017-01-03 07:36:39 +00:00
Craig Topper 9496e3f916 [AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts.
llvm-svn: 290870
2017-01-03 07:00:40 +00:00
Craig Topper fa875a1d3d [AVX-512] Teach EVEX to VEX conversion pass to handle VINSERT and VEXTRACT instructions.
llvm-svn: 290869
2017-01-03 05:46:18 +00:00
Craig Topper be9ef55152 [X86] Remove trailing whitespace and an unnecessary line wrap. NFC
llvm-svn: 290867
2017-01-03 05:46:06 +00:00
Craig Topper 06bae884bd [X86] Fix header comment. NFC
llvm-svn: 290866
2017-01-03 05:46:05 +00:00
Craig Topper c849172105 [AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation.
llvm-svn: 290865
2017-01-03 05:46:02 +00:00
Craig Topper 0cda8bbf74 [AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits.
llvm-svn: 290864
2017-01-03 05:45:57 +00:00
Craig Topper 4d47c6ae57 [AVX-512] Remove vextract intrinsics and autoupgrade to native shufflevectors. This unfortunately generates some really terrible code without VLX support due to v2i1 and v4i1 not being legal.
Hopefully we can improve that in future patches.

llvm-svn: 290863
2017-01-03 05:45:46 +00:00
Dean Michael Berris f7e7b938ea [XRay] Merge instrumentation point table emission code into AsmPrinter.
Summary:
No need to have this per-architecture.  While there, unify 32-bit ARM's
behaviour with what changed elsewhere and start function names lowercase
as per the coding standards.  Individual entry emission code goes to the
entry's own class.

Fully tested on amd64, cross-builds on both ARMs and PowerPC.

Reviewers: dberris

Subscribers: aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28209

llvm-svn: 290858
2017-01-03 04:30:21 +00:00
Elena Demikhovsky d96200d60a Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky 21706cbd24 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Dylan McKay 97cf837b46 [AVR] Optimize 16-bit ANDs with '1'
Summary: Fixes PR 31345

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28186

llvm-svn: 290778
2016-12-31 01:07:14 +00:00
Aaron Ballman 58a61e723e Caught a simple typo. I do not know of a way to test this, but it seems like an unlikely thing to regress in the future.
llvm-svn: 290757
2016-12-30 15:57:56 +00:00
Dylan McKay 453d042969 [AVR] Optimize 16-bit ORs with '0'
Summary: Fixes PR 31344

Authored by Anmol P. Paralkar

Reviewers: dylanmckay

Subscribers: fhahn, llvm-commits

Differential Revision: https://reviews.llvm.org/D28121

llvm-svn: 290732
2016-12-30 00:21:56 +00:00
Reid Kleckner cd46c1df80 Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.

llvm-svn: 290714
2016-12-29 17:07:10 +00:00
Artem Tamazov 25478d821b [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.

Relevant test expanded at once (also file renamed for clarity).

Differential Revision: https://reviews.llvm.org/D28140

llvm-svn: 290710
2016-12-29 15:41:52 +00:00
Reid Kleckner c9e0a153cf [COFF] Use 32-bit jump table entries in .rdata for Win64
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.

Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code.  COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.

Fixes PR31488

Reviewers: majnemer, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28141

llvm-svn: 290694
2016-12-29 00:12:39 +00:00
Gadi Haber 19c4fc5e62 This is a large patch for X86 AVX-512 of an optimization for reducing code size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.

Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky 
Differential Revision: https://reviews.llvm.org/D27901

llvm-svn: 290663
2016-12-28 10:12:48 +00:00
Chad Rosier 2ff37b8615 [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.
Differential Revision: https://reviews.llvm.org/D27953

llvm-svn: 290609
2016-12-27 16:58:09 +00:00
Artem Tamazov a01cce8887 [AMDGPU][llvm-mc] Predefined symbols to access register counts (.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.

Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
	value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.

Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.

Test added.

Differential Revision: https://reviews.llvm.org/D27859

llvm-svn: 290608
2016-12-27 16:00:11 +00:00
Sam Kolton e66365e07d [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructions
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28051

llvm-svn: 290599
2016-12-27 10:06:42 +00:00
Craig Topper e77e901130 [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load folding tables.
llvm-svn: 290591
2016-12-27 06:51:09 +00:00
Craig Topper 2da265b7bf [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them to unmasked intrinsics plus a select.
llvm-svn: 290583
2016-12-27 05:30:14 +00:00