Commit Graph

37221 Commits

Author SHA1 Message Date
Hans Wennborg 6596977130 [X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325)
The size savings are significant, and from what I can tell, both ICC and GCC do this.

Differential Revision: http://reviews.llvm.org/D18573

llvm-svn: 264966
2016-03-30 23:38:01 +00:00
Matthias Braun 8d41436004 CodeGen: Factor out code for tail call result compatibility check; NFC
llvm-svn: 264959
2016-03-30 22:46:04 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Aaron Ballman ef0fe1eed8 Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264929
2016-03-30 21:30:00 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Justin Lebar e3804cc932 [NVPTX] Make NVVMReflect a function pass.
Summary:
Currently it's a module pass.  Make it a function pass so that we can
move it to PassManagerBuilder's EP_EarlyAsPossible extension point,
which only accepts function passes.

Reviewers: rnk

Subscribers: tra, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D18615

llvm-svn: 264919
2016-03-30 20:40:11 +00:00
Chad Rosier f7ac5f28ab [AArch64] Fix warnings pointed out by Hal.
llvm-svn: 264882
2016-03-30 18:08:51 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Jonas Paulsson f76123386a [SystemZ] Add nop and nopr InstAliases.
For compatability with GAS, nop and nopr are recognized as alises for
bc and bcr, respectively. A mask of 0 turns these instructions
effectively into no-operations.

Reviewed by Ulrich Weigand.

llvm-svn: 264875
2016-03-30 16:11:58 +00:00
Nirav Dave 8dd66e5753 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Simon Pilgrim b87ffe8519 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Benjamin Kramer 9415e06da7 [NVPTX] Avoid temporary std::string and make single-use function local to the cpp file.
No functionality change intended.

llvm-svn: 264861
2016-03-30 12:31:51 +00:00
Chandler Carruth 8e06a10d1f [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Chandler Carruth 81c3ddeb1c [x86] Extract a helper function to compute the full addressing mode from
an x86 MachineInstr's operands. This will be super useful to fix some
bad atomics code in my next commit.

No functionality changed.

llvm-svn: 264819
2016-03-30 03:10:24 +00:00
Adam Nemet fb8fbba584 [Aarch64] Turn on the LoopDataPrefetch pass for Cyclone
llvm-svn: 264811
2016-03-30 00:21:29 +00:00
Adam Nemet b81f1e0db3 [PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distance
After the previous change, this can now be overridden centrally in the
pass.

llvm-svn: 264807
2016-03-29 23:45:56 +00:00
Adam Nemet 1428d41f9a [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

llvm-svn: 264806
2016-03-29 23:45:52 +00:00
James Y Knight 7306cd47d4 [SPARC] Use AtomicExpandPass to expand AtomicRMW instructions.
They were previously expanded to CAS loops in a custom isel expansion,
but AtomicExpandPass knows how to do that generically.

Testing is covered by the existing sparc atomics.ll testcases.

llvm-svn: 264771
2016-03-29 19:09:54 +00:00
Manman Ren f46262e0b7 Swift Calling Convention: add swiftself attribute.
Differential Revision: http://reviews.llvm.org/D17866

llvm-svn: 264754
2016-03-29 17:37:21 +00:00
Konstantin Zhuravlyov ecc7cbf611 Test commit access
llvm-svn: 264736
2016-03-29 15:15:44 +00:00
Simon Dardis 9a3f32c00d [mips] Test commit: Mark insertNoop as dead code (NFC)
llvm-svn: 264728
2016-03-29 13:02:19 +00:00
Daniel Sanders 5d3840fdf9 [mips] Correct MIPS16 jal/jalx to have uimm26 offsets and add MC layer range checks. NFC.
Summary:
However, this has no effect at this time because the instructions affected
are marked 'isCodeGenOnly=1' and have no alternative for the MC layer.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18179

llvm-svn: 264712
2016-03-29 09:40:38 +00:00
Elena Demikhovsky 95629caaa9 AVX-512: fixed a bug in fp_to_uint pattern on KNL
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>

Differential Revision: http://reviews.llvm.org/D18512

llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Hal Finkel fa7057a415 [PowerPC] Refactor popcnt[dw] target features
Instead of using two feature bits, one to indicate the availability of the
popcnt[dw] instructions, and another to indicate whether or not they're fast,
use a single enum. This allows more consistent control via target attribute
strings, and via Clang's command line.

llvm-svn: 264690
2016-03-29 01:36:01 +00:00
Derek Schuff ecabac6244 [WebAssembly] Remove duplicate disabling of passes
Also put all the disabled passes together

llvm-svn: 264684
2016-03-28 22:52:20 +00:00
Hal Finkel 69ada2f514 [PowerPC] Clarify a comment in PPCTTI about vector loads
This should say that we could do unaligned vector loads on the P7 using VSX
instructions, not that we should.

llvm-svn: 264683
2016-03-28 22:39:35 +00:00
Simon Pilgrim d3df400fa9 [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Haicheng Wu 6a6bc750d5 [AArch64] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize
Mimic what x86 does when optimizing sdiv/udiv for minsize.

llvm-svn: 264606
2016-03-28 18:17:07 +00:00
Hal Finkel 7059d41622 [PowerPC] On the A2, popcnt[dw] are very slow
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.

I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
  1. This allows us to return more accurate information via the TTI interface
     (I recognize that this currently makes no practical difference)
  2. Is hopefully easier to understand (it allows the core's features to match
     its manual while still having the desired effect).

llvm-svn: 264600
2016-03-28 17:52:08 +00:00
Derek Schuff ad154c837e Introduce MachineFunctionProperties and the AllVRegsAllocated property
MachineFunctionProperties represents a set of properties that a MachineFunction
can have at particular points in time. Existing examples of this idea are
MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which
will eventually be switched to use this mechanism.
This change introduces the AllVRegsAllocated property; i.e. the property that
all virtual registers have been allocated and there are no VReg operands
left.

With this mechanism, passes can declare that they require a particular property
to be set, or that they set or clear properties by implementing e.g.
MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class
verifies that the requirements are met, and handles the setting and clearing
based on the delcarations. Passes can also directly query and update the current
properties of the MF if they want to have conditional behavior.

This change annotates the target-independent post-regalloc passes; future
changes will also annotate target-specific ones.

Reviewers: qcolombet, hfinkel

Differential Revision: http://reviews.llvm.org/D18421

llvm-svn: 264593
2016-03-28 17:05:30 +00:00
Tom Stellard a76bcc2ea1 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
Krzysztof Parzyszek 2d65ea74dc [Hexagon] Improve handling of unaligned vector loads and stores
llvm-svn: 264584
2016-03-28 15:43:03 +00:00
Krzysztof Parzyszek bb63f66686 [Hexagon] Only use restore functions for single register at -Oz
llvm-svn: 264581
2016-03-28 14:52:21 +00:00
Krzysztof Parzyszek a34901aae9 [Hexagon] Speed up frame lowering when no optimizations are enabled
- Do not optimize stack slots in optnone functions.
- Get aligned-base register from HexagonMachineFunctionInfo instead of
  looking for ALIGNA instruction in the function's body.

llvm-svn: 264580
2016-03-28 14:42:03 +00:00
Douglas Katzman d0c11cf7ad Sparc: silently ignore .proc assembler directive
Differential Revision: http://reviews.llvm.org/D18463

llvm-svn: 264579
2016-03-28 14:00:11 +00:00
Jacques Pienaar fcef3e4617 [lanai] Add Lanai backend.
Add the Lanai backend to lib/Target.

General Lanai backend discussion on llvm-dev thread "[RFC] Lanai backend" (http://lists.llvm.org/pipermail/llvm-dev/2016-February/095118.html).

Differential Revision: http://reviews.llvm.org/D17011

llvm-svn: 264578
2016-03-28 13:09:54 +00:00
Chuang-Yu Cheng d5eb774eb6 [Power9] Implement new altivec instructions: bcd* series
This patch implements the following altivec instructions:

- Decimal Convert From/to National/Zoned/Signed-QWord:
    bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq.

- Decimal Copy-Sign/Set-Sign:
    bcdcpsgn. bcdsetsgn.

- Decimal Shift/Unsigned-Shift/Shift-and-Round:
    bcds. bcdus. bcdsr.

- Decimal (Unsigned) Truncate:
    bcdtrunc. bcdutrunc.

Total 13 instructions

Thanks Amehsan's advice! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D17838

llvm-svn: 264568
2016-03-28 09:04:23 +00:00
Chuang-Yu Cheng 80722719eb [Power9] Implement new vsx instructions: insert, extract, test data class, min/max, reverse, permute, splat
This change implements the following vsx instructions:

- Scalar Insert/Extract
    xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp

- Vector Insert/Extract
    xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp
    xxextractuw xxinsertw

- Scalar/Vector Test Data Class
    xststdcdp xststdcsp xststdcqp
    xvtstdcdp xvtstdcsp

- Maximum/Minimum
    xsmaxcdp xsmaxjdp
    xsmincdp xsminjdp

- Vector Byte-Reverse/Permute/Splat
    xxbrd xxbrh xxbrq xxbrw
    xxperm xxpermr
    xxspltib

30 instructions

Thanks Nemanja for invaluable discussion! Thanks Kit's great help!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16842

llvm-svn: 264567
2016-03-28 08:34:28 +00:00
Elena Demikhovsky 83f0647d85 AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Chuang-Yu Cheng 5663848996 [Power9] Implement new vsx instructions: quad-precision move, fp-arithmetic
This change implements the following vsx instructions:

- quad-precision move
    xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp

- quad-precision fp-arithmetic
    xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o)
    xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o)

22 instructions

Thanks Nemanja and Kit for careful review and invaluable discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

http://reviews.llvm.org/D16110

llvm-svn: 264565
2016-03-28 07:38:01 +00:00
Hal Finkel 0b37175ca6 [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based loop legality
Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc
function calls (fmax[f], etc.) generally map to function calls after lowering.
For some vector types with QPX at least, however, we can legally lower these,
and we don't need to prohibit CTR-based loops on their account.

It turned out, however, that the logic that checked the opcodes associated with
intrinsics was broken (it would set the Opcode variable, but that variable was
later checked only if set for some otherwise-external function call.

This fixes the latter problem and adds the FMAX/MINNUM mappings.

llvm-svn: 264532
2016-03-27 05:40:56 +00:00
Simon Pilgrim dcdf85033c [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim e4dbeb40c6 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim 3eef33a806 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim 7379a70677 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
David Majnemer b549ab02b4 [PowerPC] Disable the CTR optimization in the presence of {min,max}num
The minnum and maxnum intrinsics get lowered to libcalls which
invalidates the CTR optimization.

This fixes PR27083.

llvm-svn: 264508
2016-03-26 09:42:31 +00:00
Simon Pilgrim ff7b7141cd [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.

llvm-svn: 264506
2016-03-26 09:29:04 +00:00
Chuang-Yu Cheng 065969ec8e [Power9] Implement new altivec instructions: permute, count zero, extend sign, negate, parity, shift/rotate, mul10
This change implements the following vector operations:
- vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw
- vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d
- vnegd vnegw
- vprtybd vprtybq vprtybw
- vbpermd vpermr
- vrlwnm vrlwmi vrldnm vrldmi vslv vsrv
- vmul10cuq vmul10uq vmul10ecuq vmul10euq

28 instructions

Thanks Nemanja, Kit for invaluable hints and discussion!
Reviewers: hal, nemanja, kbarton, tjablin, amehsan

Phabricator: http://reviews.llvm.org/D15887
llvm-svn: 264504
2016-03-26 05:46:11 +00:00
David Majnemer 020e890a19 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Saleem Abdulrasool 750a90df6a ARM: maintain BB ordering when expanding WIN__DBZCHK
It is possible to have a fallthrough MBB prior to MBB placement.  The original
addition of the BB would result in reordering the BB as not preceding the
successor.  Because of the fallthrough nature of the BB, we could end up
executing incorrect code or even a constant pool island!  Insert the spliced BB
into the same location to avoid that.

Thanks to Tim Northover for invaluable hints and Fiora for the discussion on
what may have been occurring!

llvm-svn: 264454
2016-03-25 19:48:06 +00:00
Justin Bogner f2a0d349a6 AMDGPU: Fix a use-after free and a missing break
We're erasing MI here, but then immediately using it again inside the
`if`. This moves the erase after we're done using it.

Doing that reveals a second problem though - this case is missing a
break, so we fall through to the default and dereference MI again.
This is obviously a bug, though I don't know how to write a test that
triggers it - all we do in the error case is print some extra debug
output.

Both of these issue crash on lots of tests under ASAN with the
recycling allocator changes from PR26808 applied.

llvm-svn: 264442
2016-03-25 18:33:16 +00:00
Hans Wennborg 5f916d3df4 [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.

Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.

Differential Revision: http://reviews.llvm.org/D18374

llvm-svn: 264440
2016-03-25 18:11:31 +00:00
Jonas Paulsson 5dd1e56de5 [SystemZ] Remove isBranch and isTerminator flags on BRCT and BRCTG.
The BranchUnaryRI instruction class already sets these flags.

Reviewed by Ulrich Weigand.

llvm-svn: 264411
2016-03-25 15:42:30 +00:00
Chad Rosier 59bcbba6b4 [AArch64] Fix typo. NFC.
llvm-svn: 264408
2016-03-25 14:37:43 +00:00
Simon Pilgrim ac04923b0f [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC.
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.

llvm-svn: 264403
2016-03-25 14:17:54 +00:00
Elena Demikhovsky abc9c04ab7 fixed typo
llvm-svn: 264395
2016-03-25 10:08:36 +00:00
Matt Arsenault 8c8fcb2585 AMDGPU: Cost model for basic integer operations
This resolves bug 21148 by preventing promotion to
i64 induction variables.

llvm-svn: 264376
2016-03-25 01:16:40 +00:00
Hans Wennborg 4ae5119eeb X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).

Differential Revision: http://reviews.llvm.org/D18246

llvm-svn: 264375
2016-03-25 01:10:56 +00:00
Matt Arsenault 9651813ee0 AMDGPU: Partially implement getArithmeticInstrCost for FP ops
llvm-svn: 264374
2016-03-25 01:00:32 +00:00
Duncan P. N. Exon Smith 1d15a9f0c9 IR: Reserve an MDKind for !llvm.loop; NFC
This reserves an MDKind for !llvm.loop, which allows callers to avoid a
string-based lookup.  I'm not sure why it was missing.

There should be no functionality change here, just a small compile-time
speedup.

llvm-svn: 264371
2016-03-25 00:35:38 +00:00
Saleem Abdulrasool 0dab98d926 ARM: fix optimised division on WoA
We did not have an explicit branch to the continuation BB.  When the check was
hoisted, this could permit control follow to fall through into the division
trap.  Add the explicit branch to the continuation basic block to ensure that
code execution is correct.

llvm-svn: 264370
2016-03-25 00:34:11 +00:00
Matt Arsenault 59767cea79 AMDGPU: TTI: Make insertelement free.
We don't want to have a cost to scalarizing operations.

llvm-svn: 264364
2016-03-25 00:14:11 +00:00
Eric Christopher b979d51afa Finish the incomplete 'd' inline asm constraint support for PPC by
making sure we give it a register and mark it as a register constraint.

llvm-svn: 264340
2016-03-24 21:04:52 +00:00
Krzysztof Parzyszek 01598de3ec [Hexagon] Be sure to treat subregisters of a CSR as CSRs as well
llvm-svn: 264331
2016-03-24 20:31:41 +00:00
Krzysztof Parzyszek c9d4caa32c [Hexagon] Add support for run-time stack overflow checking
Patch by Sundeep Kushwaha.

llvm-svn: 264328
2016-03-24 20:20:07 +00:00
Krzysztof Parzyszek 181fdbd174 [Hexagon] Generate PIC-specific versions of save/restore routines
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.

Patch by Jyotsna Verma.

llvm-svn: 264324
2016-03-24 19:18:48 +00:00
Simon Atanasyan 26fe92d19f [MC][mips] Add MipsMCInstrAnalysis class and register it as MC instruction analyzer
The `MipsMCInstrAnalysis` class overrides the `evaluateBranch` method
and calculates target addresses for branch and calls instructions.
That allows llvm-objdump to print functions' names in branch instructions
in the disassemble mode.

Differential Revision: http://reviews.llvm.org/D18209

llvm-svn: 264309
2016-03-24 17:18:14 +00:00
Simon Pilgrim a6ba27fbde [X86][XOP] Fixed instruction postfixes to more closely match operands
Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky

llvm-svn: 264305
2016-03-24 16:31:30 +00:00
Elena Demikhovsky 95f3173ce9 AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Tim Northover 4498eff9bb CodeGen: extend RHS when splitting ATOMIC_CMP_SWAP_WITH_SUCCESS.
If the operation's type has been promoted during type legalization, we
need to account for the fact that the high bits of the comparison
operand are likely unspecified.

The LHS is usually zero-extended, but MIPS sign extends it, so we have
to be slightly careful.

Patch by Simon Dardis.

llvm-svn: 264296
2016-03-24 15:38:38 +00:00
Tom Stellard 9babad25e5 AMDGPU/SI: Add Polaris support
Patch By: Sonny Jiang

llvm-svn: 264295
2016-03-24 15:31:05 +00:00
Simon Pilgrim d7c4fce47d [X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI.
llvm-svn: 264294
2016-03-24 15:28:02 +00:00
Daniel Sanders 15f8fb6f83 [mips] Range check vsplat_simm5 and vsplat_simm10
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18177

llvm-svn: 264287
2016-03-24 14:53:40 +00:00
Nemanja Ivanovic 5ebc92dbe1 [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit mode
This patch corresponds to review:
http://reviews.llvm.org/D17711

It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.

llvm-svn: 264282
2016-03-24 13:40:33 +00:00
Daniel Sanders 837f15187b [mips] Range check simm10
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18148

llvm-svn: 264279
2016-03-24 13:26:59 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Daniel Sanders f692130216 [mips] Tidy up cnMIPS tablegen definitions. NFC.
Summary:
In particular, make the cnMIPS predicates much more obvious and prefer
  def ... : ... {
    let Foo = bar;
  }
over:
  let Foo = bar in
  def ... : ...;

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18354

llvm-svn: 264258
2016-03-24 11:40:48 +00:00
Vasileios Kalintiris b8a37205d2 Fix sequence point warning. NFC.
llvm-svn: 264255
2016-03-24 10:53:28 +00:00
Zlatko Buljan 94af4cbcf4 [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

llvm-svn: 264248
2016-03-24 09:22:45 +00:00
Hrvoje Varga 2cb74ac3c3 [mips][microMIPS] Implement MTC*, MTHC* and DMTC* instructions
Differential Revision: http://reviews.llvm.org/D17328

llvm-svn: 264246
2016-03-24 08:02:09 +00:00
Hrvoje Varga dbea1a1e51 [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

llvm-svn: 264245
2016-03-24 06:05:35 +00:00
Paul Robinson f81836bd18 [PS4] Guarantee an instruction after a 'noreturn' call.
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.

Differential Revision: http://reviews.llvm.org/D18414

llvm-svn: 264224
2016-03-24 00:10:03 +00:00
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault 0a30e456b4 AMDGPU: Promote alloca should skip volatiles
llvm-svn: 264214
2016-03-23 23:17:29 +00:00
Matt Arsenault f43c2a0b49 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

llvm-svn: 264200
2016-03-23 21:49:25 +00:00
Cong Hou 94710840fb Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 264199
2016-03-23 21:45:37 +00:00
Sanjay Patel 7876f180b5 [x86] make peekThroughBitcasts() a helper function
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.

It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.

llvm-svn: 264186
2016-03-23 20:16:37 +00:00
Chad Rosier 85c8594056 [AArch64] Replace return 0 with return false. NFC.
llvm-svn: 264185
2016-03-23 20:07:28 +00:00
Kyle Butt 613112826e Codegen: [PPC] Word Rotates are Zero Extending.
Add Word rotates to the list of instructions that are zero extending.
This allows them to be used in dot form to compare with zero.

llvm-svn: 264183
2016-03-23 19:51:22 +00:00
Artyom Skrobov e6f1b7f094 Replace a string comparison in ARMSubtarget.h with a tablegen entry in ARM.td (NFC)
Reviewers: rengolin, t.p.northover

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D18393

llvm-svn: 264165
2016-03-23 16:18:13 +00:00
Oliver Stannard aa77b1e025 [AArch64] Replace some uses of report_fatal_error with reportError in AArch64 ELF object writer
If we can't handle a relocation type, report it as an error in the source,
rather than asserting. I've added a more descriptive message and a test for the
only cases of this that I've been able to trigger.

Differential Revision: http://reviews.llvm.org/D18388

llvm-svn: 264156
2016-03-23 13:45:03 +00:00
Andrey Turetskiy 6a3d561ea0 [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Hrvoje Varga c45baf212a [mips][microMIPS] Delay slot filler modifications
Differential Revision: http://reviews.llvm.org/D18181

llvm-svn: 264147
2016-03-23 10:29:38 +00:00
Valery Pykhtin c0a77c5064 [AMDGPU] Fix missing assembler predicates.
Differential Revision: http://reviews.llvm.org/D18351

llvm-svn: 264137
2016-03-23 04:27:26 +00:00
Tom Stellard 52ecd2d69b AMDGPU: Cache information about register pressure sets
We can statically decide whether or not a register pressure set is for
SGPRs or VGPRs, so we don't need to re-compute this information in
SIRegisterInfo::getRegPressureSetLimit().

Differential Revision: http://reviews.llvm.org/D14805

llvm-svn: 264126
2016-03-23 01:53:22 +00:00
Joerg Sonnenberger 772bb5b65d Typo
llvm-svn: 264110
2016-03-22 22:24:52 +00:00
Dan Gohman 665d7e3838 [WebAssembly] Implement the rotate instructions.
llvm-svn: 264076
2016-03-22 18:01:49 +00:00
Simon Pilgrim 25fb4177fb [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Daniel Sanders f3599eb683 [mips] Make simm6 consistent with the rest. NFC.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18147

llvm-svn: 264057
2016-03-22 14:50:22 +00:00
Daniel Sanders 97297770a6 [mips] Range check simm7.
Summary:
Also renamed li_simm7 to li16_imm since it's not a simm7 and has an unusual
encoding (it's a uimm7 except that 0x7f represents -1).

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18145

llvm-svn: 264056
2016-03-22 14:40:00 +00:00
Daniel Sanders 0f17d0da4a [mips] Range check simm5.
Summary:
We can't check the error message for this one because there's another lw/sw
available that covers a larger range. We therefore check the transition
between the two sizes.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D18144

llvm-svn: 264054
2016-03-22 14:29:53 +00:00
Daniel Sanders 946dee3b5b [mips] Range check vsplat_uimm[1234568].
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18143

llvm-svn: 264053
2016-03-22 14:17:41 +00:00
Daniel Sanders 93fa4ce9b7 [mips] Range check uimm4_ptr, remove uimm6_ptr, and use correctly sized immediates in MSA copy/insert.
Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18142

llvm-svn: 264052
2016-03-22 13:58:53 +00:00
Nicolai Haehnle 0a33abdfd2 AMDGPU: Fix dangling references introduced by r263982
Fixes Valgrind errors on the test cases that were reported as failing
by buildbots.

llvm-svn: 264000
2016-03-21 22:54:02 +00:00
Duncan P. N. Exon Smith 20be876a64 Fix -Wdocumentation warnings from r263853
Thanks to chapuni for catching this.

llvm-svn: 263993
2016-03-21 22:13:44 +00:00
Nicolai Haehnle a56e6b6a53 AMDGPU: Coding style fixes
I meant to add these before committing r263982 as per the review,
but I forgot to squash.

llvm-svn: 263983
2016-03-21 20:39:24 +00:00
Nicolai Haehnle 213e87f2ee AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Krzysztof Parzyszek b14f4fd0de [Hexagon] Add handling fixups and instruction relaxation
llvm-svn: 263981
2016-03-21 20:27:17 +00:00
Krzysztof Parzyszek c6f1e1a709 [Hexagon] Properly encode registers in duplex instructions
llvm-svn: 263980
2016-03-21 20:13:33 +00:00
Krzysztof Parzyszek 6514a887f4 [Hexagon] Fix reserving emergency spill slots for register scavenger
- R10 and R11 are not reserved registers.
- Check for reserved registers when finding unused caller-saved registers.

llvm-svn: 263977
2016-03-21 19:57:08 +00:00
Dan Gohman c8d7f14506 [WebAssembly] Implement the eqz instructions.
llvm-svn: 263976
2016-03-21 19:54:41 +00:00
Tom Stellard 92339e888f AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

llvm-svn: 263969
2016-03-21 18:56:58 +00:00
Chad Rosier cf173ffb46 [AArch64] Add a helpful assert. NFC.
llvm-svn: 263965
2016-03-21 18:04:10 +00:00
Matt Arsenault cb38a6bd35 AMDGPU: Remove SignBitIsZero for mubuf scratch offsets
These instructions do not have the same negative base
address problem that DS instructions do on SI.

llvm-svn: 263964
2016-03-21 18:02:18 +00:00
Peter Collingbourne 86b9fbe980 ARM: Better codegen for 64-bit compares.
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.

Before:

	push	{r7, lr}
	cmp	r0, r2
	mov.w	r0, #0
	mov.w	r12, #0
	it	hs
	movhs	r0, #1
	cmp	r1, r3
	it	ge
	movge.w	r12, #1
	it	eq
	moveq	r12, r0
	cmp.w	r12, #0
	bne	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

After:

	push	{r7, lr}
	subs	r0, r0, r2
	sbcs.w	r0, r1, r3
	bge	.LBB1_2
@ BB#1:                                 @ %bb1
	bl	f
	pop	{r7, pc}
.LBB1_2:                                @ %bb2
	bl	g
	pop	{r7, pc}

Saves around 80KB in Chromium's libchrome.so.

Some notes on this patch:

- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
  introduced (nothing else needs them). However, they are necessary in
  order to avoid poor codegen, and they seem similar to existing combines
  in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
  (brcond Compare)).

- No support for Thumb-1. This is in principle possible, but we'd need
  to implement ARMISD::SUBE for Thumb-1.

Differential Revision: http://reviews.llvm.org/D15256

llvm-svn: 263962
2016-03-21 18:00:02 +00:00
Renato Golin 2b6b7ffd6c [ARM] Add Cortex-A32 support
Adding Cortex-A32 as an available target in the ARM backend.

Patch by Sam Parker.

llvm-svn: 263956
2016-03-21 17:29:01 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Chad Rosier 4aeab5fbf2 [AArch64] Fix a -Wdocumentation warning. NFC.
llvm-svn: 263942
2016-03-21 13:43:58 +00:00
Jingyue Wu 1375560bdb [NVPTX] Adds a new address space inference pass.
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).

The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.

Reviewers: jholewinski, tra, jlebar

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17965

llvm-svn: 263916
2016-03-20 20:59:20 +00:00
Simon Pilgrim fcc4532afa [X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements
Based on feedback for D14261

llvm-svn: 263911
2016-03-20 17:43:07 +00:00
Simon Pilgrim c44472a5bc [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger 3ea8af5108 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Michael Kuperstein 048cc3b7a8 Use a range-based for loop. NFC.
llvm-svn: 263889
2016-03-20 00:16:13 +00:00
Manman Ren a3a019cf90 [CXX_FAST_TLS] Fix issues in ARM.
We need to be careful on which registers can be explicitly handled
via copies. Prologue, Epilogue use physical registers and if one belongs
to the set of CSRsViaCopy, it will no longer be CSRed, since PEI overwrites
it after the explicit copies.

llvm-svn: 263857
2016-03-18 23:44:37 +00:00
Manman Ren 4865d89653 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Manman Ren 2828c57b6f [CXX_FAST_TLS] fix issues with O0 on ARM, AArch64 and X86.
Since at O0, explicit copies via SplitCSR may not be removed even if
they are unnecessary, we choose not to use SplitCSR at O0.

llvm-svn: 263855
2016-03-18 23:38:49 +00:00
Duncan P. N. Exon Smith c3fa1eded2 AArch64: Don't modify other modules in AArch64PromoteConstant
Avoid modifying other modules in `AArch64PromoteConstant` when the
constant is `ConstantData` (a horrible accident, I'm sure, caught by an
experimental follow-up to r261464).

Previously, this walked through all the users of a constant, but that
reaches into other modules when the constant doesn't depend transitively
on a `GlobalValue`!  Since we're walking instructions anyway, just
modify the instructions we actually see.

As a drive-by, instead of storing `Use` and getting the instructions
again via `Use::getUser()` (which is not a constantant time lookup),
store `std::pair<Instruction, unsigned>`.  Besides being cheaper, this
makes it easier to drop use-lists form `ConstantData` in the future.
(I threw this in because I was touching all the code anyway.)

Because the patch completely changes the traversal logic, it looks
like a rewrite of the pass, but the core logic is all the same (or
should be, minus the out-of-module changes).  In other words, there
should be NFC as long as the LLVMContext only has a single Module.

I didn't think of a good way to test this, but I hope to submit a patch
eventually that makes walking these use-lists illegal/impossible.

llvm-svn: 263853
2016-03-18 23:30:54 +00:00
Alexei Starovoitov 7e453bb8be BPF: emit an error message for unsupported signed division operation
Signed-off-by: Yonghong Song <yhs@plumgrid.com>
Signed-off-by: Alexei Starovoitov <ast@fb.com>
llvm-svn: 263842
2016-03-18 22:02:47 +00:00
Nicolai Haehnle fa771811b3 AMDGPU: add missing braces around multi-line if block
This fixes an issue with rL263658 pointed out by Tom Stellard.

llvm-svn: 263823
2016-03-18 20:32:04 +00:00
Chad Rosier cdfd7e7201 [AArch64] Enable more load clustering in the MI Scheduler.
This patch adds unscaled loads and sign-extend loads to the TII
getMemOpBaseRegImmOfs API, which is used to control clustering in the MI
scheduler. This is done to create more opportunities for load pairing.  I've
also added the scaled LDRSWui instruction, which was missing from the scaled
instructions. Finally, I've added support in shouldClusterLoads for clustering
adjacent sext and zext loads that too can be paired by the load/store optimizer.

Differential Revision: http://reviews.llvm.org/D18048

llvm-svn: 263819
2016-03-18 19:21:02 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Ehsan Amiri 631ed04af0 adding another optimization opportunity to readme file
llvm-svn: 263775
2016-03-18 04:02:25 +00:00
Adam Nemet 709e3046ee [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

llvm-svn: 263772
2016-03-18 00:27:43 +00:00
Adam Nemet 6d8beeca53 [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

llvm-svn: 263771
2016-03-18 00:27:38 +00:00
Adam Nemet 53e758fc55 [Aarch64] Add pass LoopDataPrefetch for Cyclone
Summary:
This wires up the pass for Cyclone but keeps it off for now because we
need a few more TTIs.

The getPrefetchMinStride value is not very well tuned right now but it
works well with CFP2006/433.milc which motivated this.

Tests will be added as part of the upcoming large-stride prefetching
patch.

Reviewers: t.p.northover

Subscribers: llvm-commits, aemerson, hfinkel, rengolin

Differential Revision: http://reviews.llvm.org/D17943

llvm-svn: 263770
2016-03-18 00:27:29 +00:00
Tim Shen 5cdf75084a [PPC, FastISel] Fix ordered/unordered fcmp
For fcmp, major concern about the following 6 cases is NaN result. The
comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered),
only one of which will be set. The result is generated by fcmpu
instruction. However, bc instruction only inspects one of the first 3
bits, so when un is set, bc instruction may jump to to an undesired
place.

More specifically, if we expect an unordered comparison and un is set, we
expect to always go to true branch; in such case UEQ, UGT and ULT still
give false, which are undesired; but UNE, UGE, ULE happen to give true,
since they are tested by inspecting !eq, !lt, !gt, respectively.

Similarly, for ordered comparison, when un is set, we always expect the
result to be false. In such case OGT, OLT and OEQ is good, since they are
actually testing GT, LT, and EQ respectively, which are false. OGE, OLE
and ONE are tested through !lt, !gt and !eq, and these are true.

llvm-svn: 263753
2016-03-17 22:27:58 +00:00
Tim Northover 498c56c240 ARM: stop asserting on weird <3 x Ty> vectors in ISelLowering.
llvm-svn: 263741
2016-03-17 20:10:28 +00:00
Petar Jovanovic 0b44f24033 [PowerPC] Disable CTR loops optimization for soft float operations
This patch prevents CTR loops optimization when using soft float operations
inside loop body. Soft float operations use function calls, but function
calls are not allowed inside CTR optimized loops.

Patch by Aleksandar Beserminji.

Differential Revision: http://reviews.llvm.org/D17600

llvm-svn: 263727
2016-03-17 17:11:33 +00:00
Derek Schuff d4207ba0f6 [WebAssembly] Stackify code emitted by eliminateFrameIndex and SP writeback
Summary:
MRI::eliminateFrameIndex can emit several instructions to do address
calculations; these can usually be stackified. Because instructions with
FI operands can have subsequent operands which may be expression trees,
find the top of the leftmost tree and insert the code before it, to keep
the LIFO property.

Also use stackified registers when writing back the SP value to memory
in the epilog; it's unnecessary because SP will not be used after the
epilog, and it results in better code.

Differential Revision: http://reviews.llvm.org/D18234

llvm-svn: 263725
2016-03-17 17:00:29 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Nicolai Haehnle 79cad857a0 AMDGPU: mark atomic instructions as sources of divergence
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18156

llvm-svn: 263719
2016-03-17 16:21:59 +00:00
Simon Pilgrim 0f37fbac51 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Saleem Abdulrasool 071a099102 ARM: Revert SVN r253865, 254158, fix windows division
The two changes together weakened the test and caused a regression with division
handling in MSVC mode.  They were applied to avoid an assertion being triggered
in the block frequency analysis.  However, the underlying problem was simply
being masked rather than solved properly.  Address the actual underlying problem
and revert the changes.  Rather than analyze the cause of the assertion, the
division failure was assumed to be an overflow.

The underlying issue was a subtle bug in the BB construction in the emission of
the div-by-zero check (WIN__DBZCHK).  We did not construct the proper successor
information in the basic blocks, nor did we update the PHIs associated with the
basic block when we split them.  This would result in assertions being triggered
in the block frequency analysis pass.

Although the original tests are being removed, the tests themselves performed
very little in terms of validation but merely tested that we did not assert when
generating code.  Update this with new tests that actually ensure that we do not
regress on the code generation.

llvm-svn: 263714
2016-03-17 14:10:49 +00:00
Simon Atanasyan 58ee875296 [mips] Use `formatImm` call to print immediate value in the `MipsInstPrinter`
That allows, for example, to print hex-formatted immediates using
llvm-objdump --print-imm-hex command line option.

Differential Revision: http://reviews.llvm.org/D18195

llvm-svn: 263704
2016-03-17 10:43:36 +00:00
Scott Egerton d65377da78 [mips] Eliminate instances of "potentially uninitialised local variable" warnings, NFC
Summary:
This should eliminate all occurrences of this within LLVMMipsAsmParser.
This patch is in response to http://reviews.llvm.org/D17983. I was unable
to reproduce the warnings on my machine so please advise if this fixes the
warnings.

Reviewers: ariccio, vkalintiris, dsanders

Subscribers: dblaikie, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D18087

llvm-svn: 263703
2016-03-17 10:37:51 +00:00
James Y Knight f44fc5219f Tweak some atomics functions in preparation for larger changes; NFC.
- Rename getATOMIC to getSYNC, as llvm will soon be able to emit both
  '__sync' libcalls and '__atomic' libcalls, and this function is for
  the '__sync' ones.

- getInsertFencesForAtomic() has been replaced with
  shouldInsertFencesForAtomic(Instruction), so that the decision can be
  made per-instruction. This functionality will be used soon.

- emitLeadingFence/emitTrailingFence are no longer called if
  shouldInsertFencesForAtomic returns false, and thus don't need to
  check the condition themselves.

llvm-svn: 263665
2016-03-16 22:12:04 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Colin LeMahieu bb0cdfb9f7 [Hexagon] Adding missing break in switch statement. Extra operands would have been appended to the end.
llvm-svn: 263657
2016-03-16 20:00:38 +00:00
Sanjay Patel be37e62e0c fix function names; NFC
llvm-svn: 263646
2016-03-16 18:00:09 +00:00
Michel Danzer 302f83ac4e AMDGPU: Verify instructions in non-debug builds as well
And emit an error if it fails.

This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.

This is a candidate for the stable branch(es).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
2016-03-16 09:10:42 +00:00
Michel Danzer beb79ceb19 AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
2016-03-16 09:10:35 +00:00
Igor Breger 0ba7b04f5f AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Davide Italiano dfdf278ebf [MC] Rename TLSDESC as it's not ARM specific.
Similarly to what was done for TLSCALL in r263515.

llvm-svn: 263564
2016-03-15 17:29:52 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Douglas Katzman 708eeb0519 Myriad: Add new sparc CPU kinds.
llvm-svn: 263557
2016-03-15 16:41:47 +00:00
Davide Italiano 249c45d92e [MC] Rename TLSCALL as it's not ARM specific.
`MCSymbolRefExpr` variant kind for TLSCALL is prefixed with 
_ARM_ since this is how it was originally implemented.
The X86_64 version is exactly the same so there's no reason
to create a new variant, we can just rename the existing
one to be machine-independent.
This generalization is the first step to implement support
for GNU2 TLS dialect in MC.

Differential Revision:  http://reviews.llvm.org/D18160

llvm-svn: 263515
2016-03-15 00:25:22 +00:00
Eric Christopher da8b3f1914 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Ulrich Weigand 52aa7fba3f [SystemZ] Add missing isBranch flags to certain instruction
Some instructions were missing isBranch, isCall, or isTerminator
flags.  This didn't really affect code generation since most of
the affected patterns were used only for the AsmParser and/or
disassembler.

However, it could affect tools using the MC layer to disassemble
and parse binary code (e.g. via MCInstrDesc::mayAffectControlFlow).

llvm-svn: 263478
2016-03-14 20:16:30 +00:00
Chad Rosier 27c352d26d [AArch64] Refactor AArch64FrameLowering::emitPrologue. NFC.
http://reviews.llvm.org/D18125
Patch by Aditya Kumar.

llvm-svn: 263461
2016-03-14 18:24:34 +00:00
Chad Rosier 6d98655070 [AArch64] Break the dependency between FP and SP when possible.
When the SP in not changed because of realignment/VLAs etc., we restore the SP
by using the previous value of SP and not the FP. Breaking the dependency will
help in cases when the epilog of a callee is close to the epilog of the caller;
for then "sub sp, fp, #" depends on the load restoring the FP in the epilog of
the callee.

http://reviews.llvm.org/D18060
Patch by Aditya Kumar and Evandro Menezes.

llvm-svn: 263458
2016-03-14 18:17:41 +00:00
Chad Rosier 7a21bb196b [Mips] Fix -Wunused-private-field warning after r263444.
llvm-svn: 263454
2016-03-14 18:10:20 +00:00
Sanjay Patel 7506852709 [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Sanjay Patel 62d707c8d9 [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Daniel Sanders e8efff373a [mips] MIPS32R6 compact branch support
Summary:
MIPSR6 introduces a class of branches called compact branches. Unlike the
traditional MIPS branches which have a delay slot, compact branches do not
have a delay slot. The instruction following the compact branch is only
executed if the branch is not taken and must not be a branch.

It works by generating compact branches for MIPS32R6 when the delay slot
filler cannot fill a delay slot. Then, inspecting the generated code for
forbidden slot hazards (a compact branch with an adjacent branch or other
CTI) and inserting nops to clear this hazard.

Patch by Simon Dardis.

Reviewers: vkalintiris, dsanders

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16353

llvm-svn: 263444
2016-03-14 16:24:05 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Nicolai Haehnle 74127fe8d7 AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18101

llvm-svn: 263440
2016-03-14 15:37:18 +00:00
Vasileios Kalintiris 42db3ff47f [mips] Use range-based for loops. NFC.
llvm-svn: 263438
2016-03-14 15:05:30 +00:00
Ulrich Weigand cdce026b4d [SystemZ] Avoid LER on z13 due to partial register dependencies
On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).

Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE.  However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.

This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.

llvm-svn: 263431
2016-03-14 13:50:03 +00:00
Zlatko Buljan fba68931ed [mips] Fix an issue with long double when function roundl is defined
Differential Revision: http://reviews.llvm.org/D17760

llvm-svn: 263428
2016-03-14 12:50:23 +00:00
Daniel Sanders 127d2d2b46 [mips] Range check uimm16_64
Summary:

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17725

llvm-svn: 263427
2016-03-14 12:44:44 +00:00
Daniel Sanders cfa3483c8e [mips] Simplify ordering of range checked immediate classes.
Summary:
With the addition of checks to ensure that operands have a strict ordering
it has become tricky to manage the order in the way I originally intended.

This patch linearizes the ordering which simplifies the implementation but
requires an order that is arbitrary in places. Here are some examples:
* uimm4 < uimm5 < uimm6
* simm4 < uimm4 < simm5 < uimm5
* uimm5 < uimm5_plus1 (1..32) < uimm5_plus32 (32..63) < uimm6
  The term 'superset' starts to break down here since the *_plus* classes
  are not true supersets of uimm5 (but they are still subsets of uimm6).
* uimm5 < uimm5_64, and uimm5 < vsplat_uimm5
  This is entirely arbitrary. We need an ordering and what we pick is
  unimportant since only one is possible for a given mnemonic.

Reviewers: vkalintiris

Subscribers: llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D17723

llvm-svn: 263423
2016-03-14 11:46:30 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Daniel Sanders 19b7f76afa [mips] Range check uimm6_lsl2.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17291

llvm-svn: 263419
2016-03-14 11:16:56 +00:00
Hans Wennborg 369ebfe4c9 Try to fix build of WebAssemblyRegStackify.cpp on Windows
It's failing to build on VS2015 with:

C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\lib\Target\WebAssembly\WebAssemblyRegStackify.cpp(520):
error C2668: 'llvm::make_reverse_iterator': ambiguous call to overloaded function
C:\b\build\slave\ClangToTWin\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(217):
note: could be 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
llvm::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(IteratorTy)'
        with
        [
            IteratorTy=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
        ]
C:\b\depot_tools\win_toolchain\vs_files\391bbf1220d3edcd3cc3fccdb56224181e3b13a7\win_sdk\bin\..\..\VC\include\xutility(1217):
note: or 'std::reverse_iterator<llvm::MachineBasicBlock::iterator>
std::make_reverse_iterator<llvm::MachineInstrBundleIterator<llvm::MachineInstr>>(_RanIt)' [found using argument-dependent lookup]
        with
        [
            _RanIt=llvm::MachineInstrBundleIterator<llvm::MachineInstr>
        ]

I don't have VS2015 locally at the moment, but hopefully this will help.

llvm-svn: 263418
2016-03-14 11:04:15 +00:00
Igor Breger a949100532 AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Valery Pykhtin 0f97f17152 [AMDGPU] AsmParser: Factor out parseRegister. NFC.
llvm-svn: 263411
2016-03-14 07:43:42 +00:00
Valery Pykhtin 9e33c7f5d3 [AMDGPU] AsmParser: refactor post push_back vector access. NFC.
llvm-svn: 263409
2016-03-14 05:25:44 +00:00
Valery Pykhtin f91911c3ae [AMDGPU] AsmParser: remove redundant isReg checks. NFC.
llvm-svn: 263407
2016-03-14 05:01:45 +00:00
Mehdi Amini ba9fba81d6 Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Simon Pilgrim 035b19ecf5 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Craig Topper 955308fbee [X86] Remove many operands that represent memory stores from outs to ins. These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs.
llvm-svn: 263354
2016-03-13 02:56:31 +00:00
Nemanja Ivanovic bd56e4e25a Fix for PR 26378
This patch corresponds to review:
http://reviews.llvm.org/D17712

We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).

llvm-svn: 263338
2016-03-12 10:23:07 +00:00
Quentin Colombet cf9732b417 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Eric Christopher 35abd051c0 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Simon Pilgrim 33d57c7547 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Ahmed Bougacha 171f7b9986 [AArch64] Don't blindly lower f16/f128 FCCMPs.
Instead, extend f16 (like we do when lowering a standalone SETCC),
and let f128 be legalized to the RT calls.

Fixes PR26803.

llvm-svn: 263301
2016-03-11 22:02:58 +00:00
Dan Gohman da323e88ea [WebAssembly] Add `final` keywords to a few more subclasses, for consistency.
llvm-svn: 263287
2016-03-11 19:45:37 +00:00
Simon Pilgrim 7b2164ffe0 Fix spelling.
llvm-svn: 263266
2016-03-11 17:31:43 +00:00
Mehdi Amini 99eab3dd06 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Valery Pykhtin a7f480b4e9 [AMDGPU] Fix VOPC instruction operand namings
Differential Revision: http://reviews.llvm.org/D17966

llvm-svn: 263242
2016-03-11 14:53:28 +00:00
Simon Pilgrim 7ca9614c71 [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Vasileios Kalintiris e2cbc21b6f [mips] MIPSR6 Instruction itineraries
Summary: Defines instruction itineraries for common MIPSR6 instructions.

Patch by Simon Dardis.

Reviewers: vkalintiris

Subscribers: MatzeB, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D17198

llvm-svn: 263229
2016-03-11 13:05:06 +00:00
Daniel Sanders 78e8902097 [mips] Range check simm4.
Summary:

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16811

llvm-svn: 263220
2016-03-11 11:37:50 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Chandler Carruth 89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Matt Arsenault bafc9dc591 AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.

llvm-svn: 263206
2016-03-11 08:20:50 +00:00
Matt Arsenault 6b6a2c37bc AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Tim Northover 6092de5075 AArch64: only try to use scaled fcvt ops on legal vector types.
Before we ended up calling getSimpleVectorType on a <3 x float>, which
asserted.

llvm-svn: 263169
2016-03-10 23:02:21 +00:00
Sanjay Patel 0181943b89 [x86] don't use a shuffle when a vselect will do; NFCI
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.

llvm-svn: 263167
2016-03-10 22:35:33 +00:00
Simon Pilgrim 61eb49e437 [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Peter Collingbourne aba16fca5d ARM: Support relative references using the PREL31 symbol variant.
Differential Revision: http://reviews.llvm.org/D17937

llvm-svn: 263156
2016-03-10 19:30:18 +00:00
Tim Northover 00e2dcec02 AArch64: remove pseudo-instructions used only for their patterns.
There's no real reason for these pseudos to exist, we should be writing real
patterns even if it is slightly less convenient. NFC.

llvm-svn: 263141
2016-03-10 18:46:12 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Michael Kuperstein 8be8de6d62 [X86] Correctly select registers to pop into for x86_64
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.

This fixes PR26711.

Differential Revision: http://reviews.llvm.org/D18029

llvm-svn: 263139
2016-03-10 18:43:21 +00:00
Balaram Makam e9b2725287 [AArch64] Optimize compare and branch sequence when the compare's constant operand is power of 2
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC

Examples:
   and  w8, w8, #0x400
   cbnz w8, L1
 to
   tbnz w8, #10, L1

Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17942

llvm-svn: 263136
2016-03-10 17:54:55 +00:00
Alexandros Lamprineas 843164242e [ARM] Cortex-R8 support
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.

Patch by Pablo Barrio.

Differential Revision: http://reviews.llvm.org/D17925

llvm-svn: 263132
2016-03-10 17:38:41 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Saleem Abdulrasool 1632fe1f77 ARM: follow up improvements for SVN r263118
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct.  The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node.  This node set R7 as clobbered.  However, the
code would then follow up with a clobber of R11.  I had failed to notice the
imp-def,kill on R7 in the isel.  Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if.  Instead, construct a new parallel
WIN node and prefer that when targeting windows.  This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.

llvm-svn: 263123
2016-03-10 16:26:37 +00:00
David L Kreitzer 14f0077f38 Unified the handling of returns in the X87 stackifier so that the stackifier
runs successfully on routines containing IRETs. This fixes PR26410.

Differential Revision: http://reviews.llvm.org/D17643

llvm-svn: 263120
2016-03-10 15:14:02 +00:00
Saleem Abdulrasool 8b30f9854e ARM: correct __builtin_longjmp on WoA
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7.  This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.

llvm-svn: 263118
2016-03-10 15:11:09 +00:00
Elena Demikhovsky cd9967d160 AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)
(failed on instruction selection phase)

Differential Revision: http://reviews.llvm.org/D17924

llvm-svn: 263111
2016-03-10 13:44:22 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Simon Pilgrim 13d4056795 [X86][AVX] Improve target shuffle combining of BLEND+zero
The BLEND+zero combine was failing to combine equivalent BLEND masks.

Follow up to D17483 and D17858

llvm-svn: 263105
2016-03-10 11:50:15 +00:00
Simon Pilgrim 16d11785a5 [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.

This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.

There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.

Differential Revision: http://reviews.llvm.org/D17858

llvm-svn: 263102
2016-03-10 11:23:51 +00:00
Elena Demikhovsky 38f78a2b92 AVX-512: Fixed a bug in shuffle for v64i8 type
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".

Differential Revision: http://reviews.llvm.org/D17994

llvm-svn: 263097
2016-03-10 08:32:09 +00:00
Roman Levenstein 2792b3f02f Add support for a preserve_most calling convention to the AArch64 backend.
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.

There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.

Differential Revision: http://reviews.llvm.org/D18016

llvm-svn: 263092
2016-03-10 04:35:09 +00:00
Sanjay Patel 9f6c4d50b4 [x86] fix cost model inaccuracy for vector memory ops
The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar
has a completely double-pumped AVX implementation. But getting the cost model to
reflect that is a much bigger problem. The small goal here is simply to improve on
the lie that !AVX2 == SandyBridge.

Differential Revision: http://reviews.llvm.org/D18000

llvm-svn: 263069
2016-03-09 22:23:33 +00:00
Derek Schuff 3e89580571 [WebAssembly] Update known gcc test failures
llvm-svn: 263068
2016-03-09 22:14:33 +00:00
Sanjay Patel 4a8dd89128 [x86, AVX] optimize masked loads with constant masks
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.

Differential Revision: http://reviews.llvm.org/D17899

llvm-svn: 263067
2016-03-09 22:12:08 +00:00
Evandro Menezes 669aaccb89 [AArch64] Minor reformatting (NFC).
llvm-svn: 263054
2016-03-09 19:56:38 +00:00
Chris Dewhurst 52adb575e6 This change adds co-processor condition branching and conditional traps to the Sparc back-end.
This will allow inline assembler code to utilize these features, but no automatic lowering is provided, except for the previously provided @llvm.trap, which lowers to "ta 5".

The change also separates out the different assembly language syntaxes for V8 and V9 Sparc. Previously, only V9 Sparc assembly syntax was provided.

The change also corrects the selection order of trap disassembly, allowing, e.g. "ta %g0 + 15" to be rendered, more readably, as "ta 15", ignoring the %g0 register. This is per the sparc v8 and v9 manuals.

Check-in includes many extra unit tests to check this works correctly on both V8 and V9 Sparc processors.

Code Reviewed at http://reviews.llvm.org/D17960.

llvm-svn: 263044
2016-03-09 18:20:21 +00:00
Kit Barton a1d6a6f1de [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructions
This has to be committed before the FE changes

Phabricator: http://reviews.llvm.org/D17837
llvm-svn: 263035
2016-03-09 17:48:01 +00:00
Chad Rosier e4e15ba046 [AArch64] Move helper functions into TII, so they can be reused elsewhere. NFC.
llvm-svn: 263032
2016-03-09 17:29:48 +00:00
Chad Rosier 0da267dd1d [AArch64] Minor cleanup/remove redundant code. NFC.
llvm-svn: 263024
2016-03-09 16:46:48 +00:00
Chad Rosier c27a18f39f [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.
http://reviews.llvm.org/D17967

llvm-svn: 263021
2016-03-09 16:00:35 +00:00
Teresa Johnson e50b23c67f Fix build error due to unsigned compare >= 0 in r263008 (NFC)
Fixes error from building with clang:

/usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12:
error: comparison of unsigned expression >= 0 is always true
[-Werror,-Wtautological-compare]
  if ((Imm >= 0x000) && (Imm <= 0x0ff)) {
         ~~~ ^  ~~~~~

llvm-svn: 263014
2016-03-09 14:58:23 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Nikolay Haustov 9b7577ed22 [AMDGPU] Assembler: Support abs() syntax.
Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|.
Add tests.

Differential Revision: http://reviews.llvm.org/D17887

llvm-svn: 263006
2016-03-09 11:03:21 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Dan Gohman ddfa1a6c18 [WebAssembly] Update comments about irreducible control flow.
llvm-svn: 262995
2016-03-09 04:17:36 +00:00
Dan Gohman d7a2eea619 [WebAssembly] Implement irreducible control flow.
This implements a very simple conservative transformation that doesn't
require more than linear code size growth. There's room for much more
optimization in this space.

llvm-svn: 262982
2016-03-09 02:01:14 +00:00
Quentin Colombet 4340b55593 Revert r262759 and r262760.
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix! 

llvm-svn: 262943
2016-03-08 17:29:11 +00:00
Chad Rosier e40b9513a9 [AArch64] Add MMOs to unscaled pairs.
Test to be committed in follow up commit, per discussion in D17097.
http://reviews.llvm.org/D17097

llvm-svn: 262942
2016-03-08 17:16:38 +00:00
Artyom Skrobov 5ddea6a8e9 [ARM] Simplify ARMInstr*.td by getting rid of identity PatFrags (NFC)
Reviewers: t.p.northover, grosbach, resistor

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17636

llvm-svn: 262936
2016-03-08 16:23:54 +00:00
Hans Wennborg e00b6e7249 Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG"
This caused PR26870.

llvm-svn: 262935
2016-03-08 16:21:41 +00:00
Igor Breger 999ac754f2 AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.
Differential Revision: http://reviews.llvm.org/D17953

llvm-svn: 262929
2016-03-08 15:21:25 +00:00
Kit Barton ba532dc816 [Power9] Implement new vsx instructions: load, store instructions for vector and scalar
We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to
implement this new patch.

This patch implements the following vsx instructions:

Vector load/store:
lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx
stxv stxvb16x stxvh8x stxvl stxvll stxvx
Scalar load/store:
lxsd lxssp lxsibzx lxsihzx
stxsd stxssp stxsibx stxsihx
21 instructions

Phabricator: http://reviews.llvm.org/D16919
llvm-svn: 262906
2016-03-08 03:49:13 +00:00
Dan Gohman 1402606477 [WebAssembly] Update for spec change from tableswitch to br_table.
Also note that the operand order changed; the default label is now listed
after the regular labels.

llvm-svn: 262903
2016-03-08 03:18:12 +00:00
Quentin Colombet f574ab292b [AArch64] Initialize GlobalISel as part of the target initialization.
llvm-svn: 262897
2016-03-08 01:45:36 +00:00
Richard Smith c2a2830e94 A couple more UB fixes for C++14 sized deallocation.
llvm-svn: 262891
2016-03-08 00:59:44 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Matt Arsenault 56356c8a9c AMDGPU: Remove a fixme for ptrrtoint handling
llvm-svn: 262854
2016-03-07 21:12:46 +00:00
Matt Arsenault 81d06015c6 AMDGPU: Move function only used by R600
llvm-svn: 262853
2016-03-07 21:10:13 +00:00
Marina Yatsina 5f5de9f89b [ms-inline-asm][AVX512] Add ability to use k registers in MS inline asm + fix bag with curly braces
Until now curly braces could only be used in MS inline assembly to mark block start/end.
All curly braces were removed completely at a very early stage.
This approach caused bugs like:
"m{o}v eax, ebx" turned into "mov eax, ebx" without any error.

In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such.
Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such).

This patch fixes the bug described above and enables the use of AVX-512 special operands.

This commit is the the llvm part of the patch.
The clang part of the review is: http://reviews.llvm.org/D17766
The llvm part of the review is: http://reviews.llvm.org/D17767

Differential Revision: http://reviews.llvm.org/D17767

llvm-svn: 262843
2016-03-07 18:11:16 +00:00
Simon Pilgrim 253ca348b2 [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target shuffle combining.
Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles.

This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed.

Removed non-constant pool mask decode path as we have no way of testing it right now.

Differential Revision: http://reviews.llvm.org/D17916

llvm-svn: 262809
2016-03-06 21:54:52 +00:00
Valery Pykhtin dc11054f20 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
Engages code from r262804.

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262808
2016-03-06 20:25:36 +00:00
Valery Pykhtin 50cd3c4ec7 fix sanitizer-ppc64be-linux failure for r262804
error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930

llvm-svn: 262805
2016-03-06 15:13:54 +00:00
Valery Pykhtin 499a5c6323 [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262804
2016-03-06 13:27:13 +00:00
Igor Breger 4d94d4d5f7 AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element types on SKX
Differential Revision: http://reviews.llvm.org/D17913

llvm-svn: 262803
2016-03-06 12:38:58 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Craig Topper 581c0087b9 [X86] Use high bits of return value from getEncoding instead of predicate functions to populate the REX and VEX prefix bits that extend register encodings. NFC
llvm-svn: 262800
2016-03-06 08:12:47 +00:00
Craig Topper faab5c68d4 [X86] Remove unnecessary masking. The assert above it already guaranteed it. NFC
llvm-svn: 262799
2016-03-06 08:12:44 +00:00
Craig Topper 5e038cf589 [X86] Use uint8_t instead of unsigned char as it shortens the code and more explicitly reflects the desired size.
llvm-svn: 262798
2016-03-06 08:12:42 +00:00
Igor Breger f1bd761e00 AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector).
Differential Revision: http://reviews.llvm.org/D17763

llvm-svn: 262797
2016-03-06 07:46:03 +00:00
Simon Pilgrim 40e1a71cdd [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.
Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.

Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization

Followup to D17681

llvm-svn: 262784
2016-03-05 22:53:31 +00:00
Simon Pilgrim aa99331bad [X86] AMD Bobcat CPU (btver1) doesn't support XSAVE
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.

Differential Revision: http://reviews.llvm.org/D17683

llvm-svn: 262782
2016-03-05 22:00:50 +00:00
Quentin Colombet 2a7676b442 [X86] Fix the lowering of setjmp intrinsic on i386.
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.

This fixes PR26742.

llvm-svn: 262762
2016-03-05 00:31:04 +00:00
Quentin Colombet 13b524597d [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.

rdar://problem/24851412

llvm-svn: 262759
2016-03-04 23:29:39 +00:00
David Majnemer 71a1c2c619 Fix build breakage
llvm-svn: 262756
2016-03-04 23:02:15 +00:00
David Majnemer d2f767d2f6 [X86] Support cleaning more than 2**16 bytes of stack
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.

There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.

To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:

pop  %ecx     ; return address is now in %ecx
add  $n, %esp ; clean the stack
push %ecx     ; bring the return address back on the stack
ret           ; pop the return address and jmp to it's value

llvm-svn: 262755
2016-03-04 22:56:17 +00:00
Dan Gohman e6b81362e9 [WebAssembly] Add another possible code-size optimization to README.txt
llvm-svn: 262740
2016-03-04 20:09:57 +00:00
Renato Golin 175c6d6d95 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262738
2016-03-04 19:19:36 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Tom Stellard ebef6f9771 AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().

Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.

This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17591

llvm-svn: 262728
2016-03-04 18:02:01 +00:00
Krzysztof Parzyszek 51155fc0d1 [Hexagon] Fix lowering of calls with the return type of i1
This fixes an assertion in test/CodeGen/Hexagon/ifcvt-edge-weight.ll
when run with -debug-only=isel

llvm-svn: 262726
2016-03-04 17:38:05 +00:00
Zoran Jovanovic a68b67d1ed [mips][microMIPS] Prevent usage of OR16_MMR6 instruction when code for microMIPS is generated.
Author: milena.vujosevic.janicic
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D17373

llvm-svn: 262725
2016-03-04 17:34:31 +00:00
Sam Kolton f51f4b8370 Test commit access
llvm-svn: 262714
2016-03-04 12:29:14 +00:00
Valery Pykhtin 824e804bf6 test commit
llvm-svn: 262709
2016-03-04 10:59:50 +00:00
Benjamin Kramer 4dbf3371bb Make headers self-contained again.
llvm-svn: 262702
2016-03-04 10:49:30 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Simon Pilgrim f33cb61471 [X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test.
PSHUFB decoder was assuming that input was 128 or 256-bit vector only.

llvm-svn: 262661
2016-03-03 21:55:01 +00:00
Simon Pilgrim abcee45b7a [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPS
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.

This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.

Differential Revision: http://reviews.llvm.org/D17681

llvm-svn: 262635
2016-03-03 18:13:53 +00:00
Simon Pilgrim 022afe2538 [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.
lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway.

llvm-svn: 262633
2016-03-03 17:54:35 +00:00
Simon Pilgrim 0107d24810 [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.
llvm-svn: 262631
2016-03-03 17:35:43 +00:00
Amjad Aboud 0ce261d052 MCU target has its own ABI, however X86 interrupt handler calling convention overrides this ABI.
Fixed the ordering to check first for X86 interrupt handler then for MCU target.

Differential Revision: http://reviews.llvm.org/D17801

llvm-svn: 262628
2016-03-03 17:17:54 +00:00
Ahmed Bougacha 671795a985 [X86] Don't assume that shuffle non-mask operands starts at #0.
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).

Since:
  r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.

This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.

Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.

One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).

Differential Revision: http://reviews.llvm.org/D17041

llvm-svn: 262627
2016-03-03 16:53:50 +00:00
Sanjay Patel d6cb4ec2a2 [AArch64] fold 'isPositive' vector integer operations (PR26819)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26819

Shift and negate is what InstCombine prefers to produce (and I tried to make it do more of that
in http://reviews.llvm.org/rL262424 ), so we should recognize that pattern as something that might
come from autovectorization even if it's unlikely to be produced from C NEON intrinsics.

The patch is based on the x86 equivalent:
http://reviews.llvm.org/rL262036

Differential Revision: http://reviews.llvm.org/D17834

llvm-svn: 262623
2016-03-03 15:56:08 +00:00
Igor Breger 639fde79b0 AVX512: Combine AND + TESTM instructions .
Differential Revision: http://reviews.llvm.org/D17844

llvm-svn: 262621
2016-03-03 14:18:38 +00:00
Simon Pilgrim 91dd0a796c [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 262599
2016-03-03 09:43:28 +00:00
Renato Golin 3d78271eac Revert "[ARM] Merging 64-bit divmod lib calls into one"
This reverts commit r262507, which broke some ARM buildbots.

llvm-svn: 262594
2016-03-03 08:57:44 +00:00
Michael Zuckerman c4d054fa4a [LLVM][AVX512] PSRLWI Chnage imm8 to int
Differential Revision: http://reviews.llvm.org/D17753

llvm-svn: 262592
2016-03-03 08:54:05 +00:00
Tom Stellard cc7067a668 AMDGPU: Insert two S_NOP instructions for every high level source statement.
Patch by: Konstantin Zhuravlyov

Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Reviewers: tstellarAMD, arsenm

Subscribers: echristo, dblaikie, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17454

llvm-svn: 262579
2016-03-03 03:53:29 +00:00
Tom Stellard 600ca6fd39 AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17590

llvm-svn: 262577
2016-03-03 03:45:09 +00:00
Hans Wennborg 153e4b0f11 [X86] Enable forwarding bool arguments in tail calls (PR26305)
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.

Differential Revision: http://reviews.llvm.org/D17786

llvm-svn: 262575
2016-03-03 02:06:32 +00:00
Tim Shen 6e676a84ad [PPCVSXFMAMutate] Temporarily disable this pass
llvm-svn: 262573
2016-03-03 01:27:35 +00:00
David Majnemer 1ef654024f [X86] Don't give catch objects a displacement of zero
Catch objects with a displacement of zero do not initialize a catch
object.  The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.

If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.

Address this by creating our catch objects as fixed objects.  We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.

Differential Revision: http://reviews.llvm.org/D17823

llvm-svn: 262546
2016-03-03 00:01:25 +00:00
Matt Arsenault 8226fc4829 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

llvm-svn: 262536
2016-03-02 23:00:21 +00:00
Renato Golin 93e42d9934 [ARM] Merging 64-bit divmod lib calls into one
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.

This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.

This patch fixes PR17193 (and a long time FIXME in the tests).

llvm-svn: 262507
2016-03-02 19:35:45 +00:00
Reid Kleckner 65f9d9cd32 Revert "[X86] Elide references to _chkstk for dynamic allocas"
This reverts commit r262370.

It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404

The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.

llvm-svn: 262505
2016-03-02 19:20:59 +00:00
Matthias Braun f290912d22 ARM: Introduce conservative load/store optimization mode
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.

These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.

So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.

Differential Revision: http://reviews.llvm.org/D17015

llvm-svn: 262504
2016-03-02 19:20:00 +00:00
Geoff Berry 62c1a1e7c7 [AArch64] Enable non-leaf frame pointer elimination.
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.

This change should be NFC when compiling via clang without
-fomit-frame-pointer.

Reviewers: t.p.northover

Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines

Differential Revision: http://reviews.llvm.org/D17730

llvm-svn: 262495
2016-03-02 17:58:31 +00:00
Michael Zuckerman 927fdaee88 [LLVM][AVX512]PSRAWI Change imm8 to int.
Differential Revision: http://reviews.llvm.org/D17705

llvm-svn: 262480
2016-03-02 12:05:07 +00:00
Simon Pilgrim c02b72627a [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanisms
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.

This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.

It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.

Differential Revision: http://reviews.llvm.org/D17680

llvm-svn: 262478
2016-03-02 11:43:05 +00:00
Nikolay Haustov f2fbabe9c1 Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields"
Build failure with clang.

llvm-svn: 262477
2016-03-02 11:16:56 +00:00
Nikolay Haustov f0f24628cb Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler."
Build failure with clang.

llvm-svn: 262475
2016-03-02 10:54:21 +00:00
Nikolay Haustov 73447a9714 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262474
2016-03-02 10:36:30 +00:00
Nikolay Haustov 6c8c74969a [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.

Benefits:

unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262473
2016-03-02 10:36:25 +00:00
Craig Topper 1d3f4aefd6 [X86] Remove unnecessary call to isReg from emitter's DestMem handling for VEX prefix. The operand is always a register. NFC
llvm-svn: 262468
2016-03-02 07:32:45 +00:00
Craig Topper 6a7cd42213 [X86] Make X86MCCodeEmitter::DetermineREXPrefix locate operands more like how VEX prefix handling does.
llvm-svn: 262467
2016-03-02 07:32:43 +00:00
David Majnemer 5aadde1ecc [X86] Permit reading of the FLAGS register without it being previously defined
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.

Differential Revision: http://reviews.llvm.org/D17782

llvm-svn: 262465
2016-03-02 06:46:52 +00:00
Craig Topper d4dabb3939 [X86] Remove assertion I accidentally left in.
llvm-svn: 262464
2016-03-02 06:35:22 +00:00
Craig Topper a267431fa6 [X86] Be more structured about how we capture the register number when it is encoded in bits 7:4 of the immediate.
For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand.

Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally.

llvm-svn: 262462
2016-03-02 06:06:18 +00:00
Craig Topper cf65c62737 [X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes respectively should reduce size tiny bit. NFC
llvm-svn: 262458
2016-03-02 04:42:31 +00:00
Matt Arsenault f2dcb4737b AMDGPU: Fix bug 26659.
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.

llvm-svn: 262457
2016-03-02 04:12:39 +00:00
Matt Arsenault a266bd8760 AMDGPU: Cleanup suggested in bug 23960
llvm-svn: 262456
2016-03-02 04:05:14 +00:00
Matt Arsenault 5de68cbc4c Bug 20810: Use report_fatal_error instead of unreachable
llvm-svn: 262455
2016-03-02 03:33:55 +00:00
Colin LeMahieu 2d497a0078 [NFC] Convert tabs to spaces.
llvm-svn: 262411
2016-03-01 22:05:03 +00:00
Matthias Braun 37d884fdf4 AArch64: Reenable CompleteModel for A53, A57 and Kryo models
The fixes in r262393 completed them as well.

llvm-svn: 262408
2016-03-01 21:55:35 +00:00
Colin LeMahieu 5cb6eea664 [Hexagon] Modifying r262258 to only be in effect in the hand assembler path, not the integrated assembler.
llvm-svn: 262400
2016-03-01 21:37:41 +00:00
Matthias Braun a6cfb6f682 AArch64: Add missing schedinfo, check completeness for cyclone
This adds some missing generic schedule info definitions, enables
completeness checking for cyclone and fixes a typo uncovered by that.

Differential Revision: http://reviews.llvm.org/D17748

llvm-svn: 262393
2016-03-01 21:20:31 +00:00
Kit Barton e725669483 [Power9] Implement new vector compare, extract, insert instructions
This change implements the following vector operations:

  - Vector Compare Not Equal
    - vcmpneb(.) vcmpneh(.) vcmpnew(.)
    - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.)
  - Vector Extract Unsigned
    - vextractub vextractuh vextractuw vextractd
    - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx
  - Vector Insert
    - vinsertb vinserth vinsertw vinsertd

26 instructions.

Phabricator: http://reviews.llvm.org/D15916
llvm-svn: 262392
2016-03-01 20:51:57 +00:00
Sanjay Patel 18988ae66c [x86] use getBitcast()
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.

llvm-svn: 262391
2016-03-01 20:47:02 +00:00
Geoff Berry a0df341082 Revert "[AArch64] Fix isLegalAddImmediate() to return true for valid negative values."
Revert r262248 in an attempt to fix the clang-native-aarch64-full
    bot and to investigate a performance regression in
    SingleSource/Benchmarks/CoyoteBench/huffbench

llvm-svn: 262388
2016-03-01 20:28:52 +00:00
Vasileios Kalintiris 36901dd1c3 Revert "[mips] Promote the result of SETCC nodes to GPR width."
This reverts commit r262316.

It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.

llvm-svn: 262387
2016-03-01 20:25:43 +00:00
Kit Barton 01cd2e7a4b New file to track implementation status of new POWER9 instructions
llvm-svn: 262386
2016-03-01 20:19:43 +00:00
Matthias Braun 17cb57995e TableGen: Check scheduling models for completeness
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:

- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model

Typical steps necessary to complete a model:

- Ensure all pseudo instructions that are expanded before machine
  scheduling (usually everything handled with EmitYYY() functions in
  XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
  resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.

Differential Revision: http://reviews.llvm.org/D17747

llvm-svn: 262384
2016-03-01 20:03:21 +00:00
Justin Lebar 2daaa1ceca [NVPTX] Annotate param loads/stores as mayLoad/mayStore.
Summary:
Tablegen was unable to determine that param loads/stores were actually
reading or writing from memory.  I think this isn't a problem in
practice for param stores, because those occur in a block right before
we make our call.  But param loads don't have to at the very beginning
of a function, so should be annotated as mayLoad so we don't incorrectly
optimize them.

Reviewers: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D17471

llvm-svn: 262381
2016-03-01 19:44:22 +00:00
Justin Lebar 536c8b7446 [NVPTX] Remove workaround for tablegen crash in NVPTXInstrInfo.td.
Summary: Looks like this was caused by a typo.

Reviewers: jholewinski

Subscribers: jholewinski, llvm-commits, tra

Differential Revision: http://reviews.llvm.org/D17357

llvm-svn: 262380
2016-03-01 19:44:20 +00:00
Justin Lebar b5ca00a58d [NVPTX] Use different, convergent MIs for convergent calls.
Summary:
Calls sometimes need to be convergent.  This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.

Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs.  But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.

Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.

Reviewers: jholewinski

Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17423

llvm-svn: 262373
2016-03-01 19:24:03 +00:00
Justin Lebar 93e7a9b91c [NVPTX] Nix hack used to emit '{' and '}' for NVPTX calls.
Summary: Tablegen understands backslash as an escape char; that's sufficient.

Reviewers: jholewinski

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17432

llvm-svn: 262372
2016-03-01 19:24:00 +00:00
Justin Lebar 877f5acc60 [NVPTX] Reformat NVPTXInstrInfo.td, and add additional comments.
Summary:
Also simplify some of the embedded C++ logic.

No functional changes.

Reviewers: jholewinski

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17354

llvm-svn: 262371
2016-03-01 19:23:30 +00:00
David Majnemer 791b88b6da [X86] Elide references to _chkstk for dynamic allocas
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations.  However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.

This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.

Differential Revision: http://reviews.llvm.org/D17679

llvm-svn: 262370
2016-03-01 19:20:23 +00:00
Sanjay Patel 8a37a988e6 fix function names; NFC
llvm-svn: 262367
2016-03-01 19:14:09 +00:00
Matt Arsenault 03dac8d8e4 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Changpeng Fang 24f035af32 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Michael Zuckerman 433b241570 [LLVM][AVX512] PSRL{DI|QI} Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17713

llvm-svn: 262353
2016-03-01 17:46:32 +00:00
Hans Wennborg e64cf9dddb [X86] Check that attribute parameters match for tail calls (PR26590)
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.

  void g(signed short);
  void f(unsigned short x) {
    g(x);
  }

llvm-svn: 262352
2016-03-01 17:45:23 +00:00
Sanjay Patel 2ca144f14c fix documentation comments; NFC
llvm-svn: 262351
2016-03-01 17:25:35 +00:00
Sanjay Patel 9fea531fec function names start with a lowercase letter; NFC
llvm-svn: 262347
2016-03-01 16:17:48 +00:00
Nikolay Haustov e309e1415d [AMDGPU] Remove unused disassembler code.
llvm-svn: 262346
2016-03-01 16:02:40 +00:00
Nikolay Haustov 47a115cd41 [AMDGPU] Fix build warnings.
llvm-svn: 262338
2016-03-01 14:50:59 +00:00
Nikolay Haustov ac106add0f [AMDGPU] Disassembler code refactored + error messages.
Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out.

64bit instructions decoding added.

Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code.

The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel.

Previous TODOs were saved whenever possible.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17720

llvm-svn: 262332
2016-03-01 13:57:29 +00:00
Michael Zuckerman 7878888690 [AVX512][PSRAQ][PSRAD] Change imm8 to int.
Differential Revision: http://reviews.llvm.org/D17692

llvm-svn: 262320
2016-03-01 11:36:23 +00:00
Amjad Aboud 719325fe11 Disallow generating vzeroupper before return instruction (iret) in interrupt handler function.
This resolves https://llvm.org/bugs/show_bug.cgi?id=26412

Differential Revision: http://reviews.llvm.org/D17542

llvm-svn: 262319
2016-03-01 11:32:03 +00:00
Vasileios Kalintiris 3a8f7f9e31 [mips] Promote the result of SETCC nodes to GPR width.
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.

The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D10970

llvm-svn: 262316
2016-03-01 10:08:01 +00:00
Nikolay Haustov ea8febde04 [TableGen] AsmMatcher: Skip optional operands in the midle of instruction if it is not present
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:

string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).

Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17568

[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:

clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
2016-03-01 08:34:43 +00:00
Craig Topper 073e947f1b [X86] Centralize the masking of TSFlags with FormMask into a variable earlier so we can stop masking in multiple places. NFC
llvm-svn: 262312
2016-03-01 07:15:59 +00:00
Craig Topper 5c8dc5f064 [X86] Localize a temporary variable into the cases its need in. NFC
llvm-svn: 262310
2016-03-01 06:42:48 +00:00
Craig Topper b8c29b4ae9 [X86] Be consistent about using pre/post increment/decrement in nearby code. NFC
llvm-svn: 262309
2016-03-01 06:42:46 +00:00
Craig Topper d40a55064f [X86] Combine some initialization code with variable declaration and comments. NFC
llvm-svn: 262301
2016-03-01 05:42:16 +00:00
Matt Arsenault d275fcabcb AMDGPU: Don't emit build_pair during udivrem legalization
Technically you aren't supposed to emit these after type legalization
for some reason, and we use vector extracts of bitcasted integers
as the canonical way to do this.

llvm-svn: 262298
2016-03-01 05:06:05 +00:00
Matt Arsenault f4dfc1a027 AMDGPU: Don't use estimated stack size when we know the real stack size
llvm-svn: 262297
2016-03-01 04:58:20 +00:00
Matt Arsenault 59b8b77405 AMDGPU: Set HasExtractBitInsn
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.

But in most situations we do want the sinking to occur.

llvm-svn: 262296
2016-03-01 04:58:17 +00:00
Eric Christopher 114fa1c3f6 Simplify some boolean conditional return statements in AArch64.
http://reviews.llvm.org/D9979

Patch by Richard Thomson (and some conflict resolution by me).

llvm-svn: 262266
2016-02-29 22:50:49 +00:00
Colin LeMahieu ab9eca4d9f [Hexagon] As a size optimization, not lazy extending TPREL or DTPREL variants since they're usually in range.
llvm-svn: 262258
2016-02-29 21:21:56 +00:00
Colin LeMahieu 9e5a9c32db [Hexagon] Missed member initialization causing ubsan failure.
llvm-svn: 262252
2016-02-29 20:42:25 +00:00
Geoff Berry f5ba61d18c [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.
Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D17463

llvm-svn: 262248
2016-02-29 19:53:22 +00:00
Ahmed Bougacha bb5d7d7ed8 [X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI.
This is long-standing dirtiness, as acknowledged by r77582:

    The current trick is to select it into a merge_values with
    the first definition being an implicit_def. The proper solution is
    to add new ISD opcodes for the no-output variant.

Doing this before selection will let us combine away some constructs.

Differential Revision: http://reviews.llvm.org/D17659

llvm-svn: 262244
2016-02-29 19:28:07 +00:00
Colin LeMahieu b9f1eae328 [Hexagon] Setting sign mismatch flag on expression instead of using bit tricks.
llvm-svn: 262243
2016-02-29 19:17:56 +00:00
David Majnemer e60ee3b8ce [WinEH] Make setjmp work correctly with EH
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit.  A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.

Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.

MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs.  This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.

Differential Revision: http://reviews.llvm.org/D17721

llvm-svn: 262241
2016-02-29 19:16:03 +00:00
Colin LeMahieu 73cd686ce1 [Hexagon] Using MustExtend flag on expression instead of passing around bools.
llvm-svn: 262238
2016-02-29 18:39:51 +00:00
Nemanja Ivanovic 1a5706ca1b Fix for PR26180
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592

This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.

llvm-svn: 262233
2016-02-29 16:42:27 +00:00
Daniel Sanders 03a8d2f8ec [mips] Range check uimm20 and fixed a bug this revealed.
Summary:
The bug was that dextu's operand 3 would print 0-31 instead of 32-63 when
printing assembly. This came up when replacing
MipsInstPrinter::printUnsignedImm() with a version that could handle arbitrary
bit widths.

MipsAsmPrinter::printUnsignedImm*() don't seem to be used so they have been
removed.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15521

llvm-svn: 262231
2016-02-29 16:06:38 +00:00
Vasileios Kalintiris 29620aca3e [mips] Do not use SLL for ANY_EXTEND nodes as the high bits are undefined.
Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15420

llvm-svn: 262230
2016-02-29 15:58:12 +00:00
Daniel Sanders 611eb82953 [mips] Make isel select the correct DEXT variant up front.
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.

Now isel selects the correct variant up front.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16810

llvm-svn: 262229
2016-02-29 15:26:54 +00:00
Daniel Sanders 90f0d0b8e3 [mips] Make symbols an acceptable branch target when expanding compare-to-immediate-and-branch macros.
Reviewers: vkalintiris

Subscribers: llvm-commits, vkalintiris, dim, seanbruno, dsanders

Differential Revision: http://reviews.llvm.org/D15369

llvm-svn: 262213
2016-02-29 11:24:49 +00:00
Vasileios Kalintiris c9aaa3171d [mips] Remove unused function declarations from MipsRegisterInfo.h. NFC.
llvm-svn: 262187
2016-02-28 16:55:28 +00:00
JF Bastien 1afd1e2baa WebAssembly: fix build
More API churn, experimental target got sad.

llvm-svn: 262179
2016-02-28 15:33:53 +00:00
Michael Zuckerman 96836fc81c [AVX512][PSLLW ][PSLLV] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17684

llvm-svn: 262176
2016-02-28 07:32:10 +00:00
Matt Arsenault 3a61985b2f AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Duncan P. N. Exon Smith be8f8c4478 CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFC
These parameters aren't expected to be null, so take them by reference.

llvm-svn: 262151
2016-02-27 20:14:29 +00:00
Duncan P. N. Exon Smith fd8cc23220 CodeGen: Change MachineInstr to use MachineInstr&, NFC
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null.  Slowly inching
toward being able to fix PR26753.

llvm-svn: 262149
2016-02-27 20:01:33 +00:00
Duncan P. N. Exon Smith 1f6624ae34 AArch64: Use MachineInstr& in guaranteesZeroRegInBlock(), NFC
llvm-svn: 262143
2016-02-27 19:12:54 +00:00
Duncan P. N. Exon Smith 5702287809 CodeGen: Update DFAPacketizer API to take MachineInstr&, NFC
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*.  In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator.  Besides cleaning up the API, this is in
search of PR26753.

llvm-svn: 262142
2016-02-27 19:09:00 +00:00
Duncan P. N. Exon Smith f9ab416d70 WIP: CodeGen: Use MachineInstr& in MachineInstrBundle.h, NFC
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null.  Besides
being a nice cleanup, this is tacking toward a fix for PR26753.

llvm-svn: 262141
2016-02-27 17:05:33 +00:00
JF Bastien 13d3b9b777 WebAssembly: fix build
It was broken by the work for PR26753.

llvm-svn: 262140
2016-02-27 16:38:23 +00:00
Simon Pilgrim 9e10b1655c Tidyup for loops - don't repeat upper limit evaluation if you don't have to. NFCI.
llvm-svn: 262137
2016-02-27 13:26:58 +00:00
Chris Dewhurst 053826af69 The patch adds missing registers and instructions to complete all the registers supported by the Sparc v8 manual.
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.

Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.

llvm-svn: 262133
2016-02-27 12:49:59 +00:00
Simon Pilgrim 3b42ca0760 Strip trailing whitespace. NFCI.
llvm-svn: 262131
2016-02-27 11:49:16 +00:00
Matt Arsenault 9d82ee7526 AMDGPU: Split vi-insts subtarget feature
This will be more useful for marking builtins acceptable for which
subtargets.

llvm-svn: 262121
2016-02-27 08:53:55 +00:00
Matt Arsenault 274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Duncan P. N. Exon Smith 3ac9cc6156 CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals.  The MachineInstrs here are
never null, so this cleans up the API a bit.  It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).

At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.

llvm-svn: 262115
2016-02-27 06:40:41 +00:00
Ahmed Bougacha ffcab7bf32 [X86] Fix a stale comment. NFC.
llvm-svn: 262087
2016-02-26 22:59:57 +00:00
Ahmed Bougacha 55e1592889 [X86] Remove the unused SDTX86atomicBinary. NFC.
llvm-svn: 262086
2016-02-26 22:59:41 +00:00
Simon Pilgrim 4d1a088323 Strip trailing whitespace. NFCI.
llvm-svn: 262083
2016-02-26 22:28:50 +00:00
Kit Barton 915c5ecee1 [PPC] Legalize FNEG on PPC when possible
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode

Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
2016-02-26 21:59:44 +00:00
Simon Pilgrim 10e3ca2cc1 Fix spelling. NFCI.
llvm-svn: 262078
2016-02-26 21:56:27 +00:00
Kit Barton 93612ec5f2 Power9] Implement new vsx instructions: compare and conversion
This change implements the following vsx instructions:

Quad/Double-Precision Compare:
xscmpoqp xscmpuqp
xscmpexpdp xscmpexpqp
xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp
xvcmpnedp(.) xvcmpnesp(.)
Quad-Precision Floating-Point Conversion
xscvqpdp(o) xscvdpqp
xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp
xscvdphp xscvhpdp xvcvhpsp xvcvsphp
xsrqpi xsrqpix xsrqpxp
28 instructions

Phabricator: http://reviews.llvm.org/D16709
llvm-svn: 262068
2016-02-26 21:11:55 +00:00
Sanjay Patel 51488ed2d5 [x86] refactor to eliminate duplicated code; NFCI
llvm-svn: 262062
2016-02-26 20:59:05 +00:00
Nirav Dave 2993854bb4 Fix Sparc 32bit Lowering to rebundle up v2i32 values.
Summary: Fix LowerCall to rebundle v2i32 values after lowering and add testcase

Reviewers: jyknight

Subscribers: llvm-commits, jyknight

Differential Revision: http://reviews.llvm.org/D17615

llvm-svn: 262048
2016-02-26 18:55:22 +00:00
Sanjay Patel 155193c3aa [x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701)
This extends the fold introduced with:
http://reviews.llvm.org/rL262036

llvm-svn: 262047
2016-02-26 18:42:50 +00:00
Sanjay Patel 4402a32b32 [x86, SSE] fold 'isPositive' vector integer operations (PR26701)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701

Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. 
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).

Differential Revision: http://reviews.llvm.org/D17630

llvm-svn: 262036
2016-02-26 16:56:03 +00:00
Chris Dewhurst 829b104dc2 Reverting breaking change. Sorry.
llvm-svn: 262007
2016-02-26 12:20:10 +00:00
Chris Dewhurst 9c3bf91d6e Reviewed at reviews.llvm.org/D17133
llvm-svn: 262005
2016-02-26 11:46:47 +00:00
Chris Dewhurst 6456376fe9 Initial test commit only
llvm-svn: 262003
2016-02-26 11:38:24 +00:00
Nikolay Haustov 2f684f1347 [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
James Molloy 4eba0154fb [AArch64] Slight cleanup in FPLoadBalancing
Instead of the convoluted if-statment we can just use getColor. This also fixes
a bug where we relied upon the parity of tablegen-generated register indexes
(instead of using the machine encoding).

llvm-svn: 261990
2016-02-26 09:10:53 +00:00
Craig Topper c929349912 [X86] Null out some redundant patterns for masked vector register to register moves. These can be accomplished with both aligned and unaligned opcodes.
Currently aligned is what is being used so remove the redundant patterns for the unaligned versions. But don't do this for the byte and word vector types since they don't have aligned versions.

llvm-svn: 261985
2016-02-26 06:50:29 +00:00
Craig Topper d50b5f8abc [X86] Add test cases for r261977 and fix a grammatical error.
llvm-svn: 261983
2016-02-26 06:50:24 +00:00
Craig Topper 4d187630de [X86] Remove a couple returns after llvm_unreachables. NFC
llvm-svn: 261979
2016-02-26 05:29:39 +00:00
Craig Topper a11be0be89 [X86] Use inclusive ranges for XMM/YMM/ZMM registers in is32Extended and isX86_64ExtendedReg. NFC
llvm-svn: 261978
2016-02-26 05:29:35 +00:00
Craig Topper 29c2273369 [X86] Explicitly diagnose use of %xmm16-%xmm31, %ymm16-%ymm31 and %zmm16-%zmm31 when AVX512 is not enabled in the asm parser.
llvm-svn: 261977
2016-02-26 05:29:32 +00:00
David L Kreitzer 602ba70a0b Reformatted a comment to fit the 80 column limit. NFC.
llvm-svn: 261916
2016-02-25 18:50:45 +00:00
Hongbin Zheng 3f97840721 Introduce analysis pass to compute PostDominators in the new pass manager. NFC
Differential Revision: http://reviews.llvm.org/D17537

llvm-svn: 261902
2016-02-25 17:54:07 +00:00
Tim Northover aa35bd26c7 ARM: disallow pc as a base register in Thumb2 memory ops.
These should all be deferring to the "OP (literal)" variant according to the
ARM ARM.

llvm-svn: 261895
2016-02-25 16:54:52 +00:00
Geoff Berry 62d472507e [AArch64] Clean up callee-save CFI emission. NFC.
Summary:
Avoid special case for FP, LR CFI emission and just allow general
AArch64FrameLowering::emitCalleeSavedFrameMoves() to handle them.  Also,
stop recalculating the stack offsets in emitCalleeSavedFrameMoves()
since we can just reuse the previously calculated offset stored in the
MachineFrameInfo.

Depends on D17000

Reviewers: t.p.northover, rengolin, mcrosier, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17004

llvm-svn: 261885
2016-02-25 16:36:08 +00:00
Nikolay Haustov 161a158e5c [AMDGPU] Disassembler: Support for all VOP1 instructions.
Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget:

VGPR_32 and VReg_64 operand register classes
VS_32 and VS_64 operand register classes with inline and literal constants
Tests for VOP1 instructions.

Patch by: skolton

Reviewers: arsenm, tstellarAMD

Review: http://reviews.llvm.org/D17194
llvm-svn: 261878
2016-02-25 16:09:14 +00:00
Igor Breger 45ef10f110 AVX512F: Add GATHER/SCATTER assembler Intel syntax tests for knl/skx/avx . Change memory operand parser handling.
Differential Revision: http://reviews.llvm.org/D17564

llvm-svn: 261862
2016-02-25 13:30:17 +00:00
Hrvoje Varga 46458d0bcc [mips][microMIPS] Implement DINSU, DINSM, DINS instructions
Differential Revision: http://reviews.llvm.org/D16181

llvm-svn: 261860
2016-02-25 12:53:29 +00:00
Nikolay Haustov 2e4c72977c [AMDGPU] Assembler: Simplify handling of optional operands
Resubmit with index problem fixed. Verified with valgrind.

Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands.
However this means that when parsing instruction which has no mnemonic prefix,
we cannot add both default values for VOP3 and for DPP optional operands
to OperandVector - neither instructions would match. So add default values
for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445
llvm-svn: 261856
2016-02-25 10:58:54 +00:00
Simon Pilgrim e4178ae510 [X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles
Now that PerformShuffleCombine can handle unary shuffles.

llvm-svn: 261843
2016-02-25 09:12:12 +00:00
NAKAMURA Takumi 3d3d0f4151 Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands"
It brought undefined behavior.

llvm-svn: 261839
2016-02-25 08:35:27 +00:00
Elena Demikhovsky e5bbca6ae2 Optimized loading (zextload) of i1 value from memory.
This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793.
Extra "and" causes performance degradation.

We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits.

Differential Revision: http://reviews.llvm.org/D17541

llvm-svn: 261828
2016-02-25 07:05:12 +00:00
Tim Northover ca8e7e2e23 AArch64: remove CRC feature from Cyclone.
Turns out we don't actually support those instructions.

llvm-svn: 261759
2016-02-24 18:10:17 +00:00
Anton Korobeynikov 064dbac212 `MSP430InstrInfo::loadRegFromStackSlot` forgets to set register def.
Summary:
For instance, compiling the below results in a panic:

```
llc: ../lib/CodeGen/InlineSpiller.cpp:1140: bool (anonymous namespace)::InlineSpiller::foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned int> >, llvm::MachineInstr *): Assertion `MO->isDead() && "Cannot fold physreg def"' failed.
#0 0x00007f50fbcf353e llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:321:15
#1 0x00007f50fbcf3929 PrintStackTraceSignalHandler(void*) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:380:1
#2 0x00007f50fbcf22a3 llvm::sys::RunSignalHandlers() /home/h/3rd/llvm/build/../lib/Support/Signals.cpp:45:5
#3 0x00007f50fbcf3bb4 SignalHandler(int) /home/h/3rd/llvm/build/../lib/Support/Unix/Signals.inc:210:1
#4 0x00007f50fa87a180 (/lib/x86_64-linux-gnu/libc.so.6+0x35180)
#5 0x00007f50fa87a107 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x35107)
#6 0x00007f50fa87b4e8 abort (/lib/x86_64-linux-gnu/libc.so.6+0x364e8)
#7 0x00007f50fa873226 (/lib/x86_64-linux-gnu/libc.so.6+0x2e226)
#8 0x00007f50fa8732d2 (/lib/x86_64-linux-gnu/libc.so.6+0x2e2d2)
#9 0x00007f50fddd9287 (anonymous namespace)::InlineSpiller::foldMemoryOperand(llvm::ArrayRef<std::pair<llvm::MachineInstr*, unsigned int> >, llvm::MachineInstr*) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1141:21
#10 0x00007f50fddd9ee9 (anonymous namespace)::InlineSpiller::spillAroundUses(unsigned int) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1286:9
#11 0x00007f50fddd388b (anonymous namespace)::InlineSpiller::spillAll() /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1338:21
#12 0x00007f50fddd221d (anonymous namespace)::InlineSpiller::spill(llvm::LiveRangeEdit&) /home/h/3rd/llvm/build/../lib/CodeGen/InlineSpiller.cpp:1391:3
#13 0x00007f50fdfd921b (anonymous namespace)::RAGreedy::selectOrSplitImpl(llvm::LiveInterval&, llvm::SmallVectorImpl<unsigned int>&, llvm::SmallSet<unsigned int, 16u, std::less<unsigned int> >&, unsigned int) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2555:5
#14 0x00007f50fdfd647b (anonymous namespace)::RAGreedy::selectOrSplit(llvm::LiveInterval&, llvm::SmallVectorImpl<unsigned int>&) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2221:12
#15 0x00007f50fdfc89f9 llvm::RegAllocBase::allocatePhysRegs() /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocBase.cpp:110:14
#16 0x00007f50fdfd6337 (anonymous namespace)::RAGreedy::runOnMachineFunction(llvm::MachineFunction&) /home/h/3rd/llvm/build/../lib/CodeGen/RegAllocGreedy.cpp:2611:3
#17 0x00007f50fded33ee llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /home/h/3rd/llvm/build/../lib/CodeGen/MachineFunctionPass.cpp:43:3
#18 0x00007f50fd6cdc6f llvm::FPPassManager::runOnFunction(llvm::Function&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1550:23
#19 0x00007f50fd6cdf85 llvm::FPPassManager::runOnModule(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1571:16
#20 0x00007f50fd6ce71a (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1627:23
#21 0x00007f50fd6ce246 llvm::legacy::PassManagerImpl::run(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1730:16
#22 0x00007f50fd6cec31 llvm::legacy::PassManager::run(llvm::Module&) /home/h/3rd/llvm/build/../lib/IR/LegacyPassManager.cpp:1761:3
#23 0x0000000000415bdc compileModule(char**, llvm::LLVMContext&) /home/h/3rd/llvm/build/../tools/llc/llc.cpp:405:5
#24 0x0000000000414571 main /home/h/3rd/llvm/build/../tools/llc/llc.cpp:211:13
#25 0x00007f50fa866b45 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b45)
#26 0x0000000000414296 _start (/home/h/3rd/llvm/build/bin/llc+0x414296)
Stack dump:
0.	Program arguments: ./bin/llc -mtriple msp430 loadstore.ll 
1.	Running pass 'Function Pass Manager' on module 'loadstore.ll'.
2.	Running pass 'Greedy Register Allocator' on function '@inc'
```

Original IR:

```llvm
%struct.VeryLarge = type { i8, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }

; Function Attrs: norecurse nounwind
define void @inc(%struct.VeryLarge* noalias nocapture sret %agg.result, %struct.VeryLarge* byval align 1 %s) #0 {
entry:
  %p0 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 0
  %0 = load i8, i8* %p0, align 1, !tbaa !1
  %p1 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 1
  %1 = load i32, i32* %p1, align 1, !tbaa !6
  %p2 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 2
  %2 = load i32, i32* %p2, align 1, !tbaa !7
  %p3 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 3
  %3 = load i32, i32* %p3, align 1, !tbaa !8
  %p4 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 4
  %4 = load i32, i32* %p4, align 1, !tbaa !9
  %p5 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 5
  %5 = load i32, i32* %p5, align 1, !tbaa !10
  %p6 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 6
  %6 = load i32, i32* %p6, align 1, !tbaa !11
  %p7 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 7
  %7 = load i32, i32* %p7, align 1, !tbaa !12
  %p8 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 8
  %8 = load i32, i32* %p8, align 1, !tbaa !13
  %p9 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 9
  %9 = load i32, i32* %p9, align 1, !tbaa !14
  %p10 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 10
  %10 = load i32, i32* %p10, align 1, !tbaa !15
  %p11 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 11
  %11 = load i32, i32* %p11, align 1, !tbaa !16
  %p12 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 12
  %12 = load i32, i32* %p12, align 1, !tbaa !17
  %p13 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 13
  %13 = load i32, i32* %p13, align 1, !tbaa !18
  %p14 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 14
  %14 = load i32, i32* %p14, align 1, !tbaa !19
  %p15 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 15
  %15 = load i32, i32* %p15, align 1, !tbaa !20
  %p16 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 16
  %16 = load i32, i32* %p16, align 1, !tbaa !21
  %p17 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 17
  %17 = load i32, i32* %p17, align 1, !tbaa !22
  %p18 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 18
  %18 = load i32, i32* %p18, align 1, !tbaa !23
  %p19 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 19
  %19 = load i32, i32* %p19, align 1, !tbaa !24
  %p20 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 20
  %20 = load i32, i32* %p20, align 1, !tbaa !25
  %p21 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 21
  %21 = load i32, i32* %p21, align 1, !tbaa !26
  %p22 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 22
  %22 = load i32, i32* %p22, align 1, !tbaa !27
  %p23 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 23
  %23 = load i32, i32* %p23, align 1, !tbaa !28
  %p24 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 24
  %24 = load i32, i32* %p24, align 1, !tbaa !29
  %p25 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 25
  %25 = load i32, i32* %p25, align 1, !tbaa !30
  %p26 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 26
  %26 = load i32, i32* %p26, align 1, !tbaa !31
  %p27 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 27
  %27 = load i32, i32* %p27, align 1, !tbaa !32
  %p28 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 28
  %28 = load i32, i32* %p28, align 1, !tbaa !33
  %p29 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 29
  %29 = load i32, i32* %p29, align 1, !tbaa !34
  %p30 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 30
  %30 = load i32, i32* %p30, align 1, !tbaa !35
  %p31 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 31
  %31 = load i32, i32* %p31, align 1, !tbaa !36
  %p32 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %s, i32 0, i32 32
  %32 = load i32, i32* %p32, align 1, !tbaa !37
  %add = add i8 %0, 1
  store i8 %add, i8* %p0, align 1, !tbaa !1
  %add2 = add i32 %1, 2
  store i32 %add2, i32* %p1, align 1, !tbaa !6
  %add3 = add i32 %2, 3
  store i32 %add3, i32* %p2, align 1, !tbaa !7
  %add4 = add i32 %3, 4
  store i32 %add4, i32* %p3, align 1, !tbaa !8
  %add5 = add i32 %4, 5
  store i32 %add5, i32* %p4, align 1, !tbaa !9
  %add6 = add i32 %5, 6
  store i32 %add6, i32* %p5, align 1, !tbaa !10
  %add7 = add i32 %6, 7
  store i32 %add7, i32* %p6, align 1, !tbaa !11
  %add8 = add i32 %7, 8
  store i32 %add8, i32* %p7, align 1, !tbaa !12
  %add9 = add i32 %8, 9
  store i32 %add9, i32* %p8, align 1, !tbaa !13
  %add10 = add i32 %9, 10
  store i32 %add10, i32* %p9, align 1, !tbaa !14
  %add11 = add i32 %10, 11
  store i32 %add11, i32* %p10, align 1, !tbaa !15
  %add12 = add i32 %11, 12
  store i32 %add12, i32* %p11, align 1, !tbaa !16
  %add13 = add i32 %12, 13
  store i32 %add13, i32* %p12, align 1, !tbaa !17
  %add14 = add i32 %13, 14
  store i32 %add14, i32* %p13, align 1, !tbaa !18
  %add15 = add i32 %14, 15
  store i32 %add15, i32* %p14, align 1, !tbaa !19
  %add16 = add i32 %15, 16
  store i32 %add16, i32* %p15, align 1, !tbaa !20
  %add17 = add i32 %16, 17
  store i32 %add17, i32* %p16, align 1, !tbaa !21
  %add18 = add i32 %17, 18
  store i32 %add18, i32* %p17, align 1, !tbaa !22
  %add19 = add i32 %18, 19
  store i32 %add19, i32* %p18, align 1, !tbaa !23
  %add20 = add i32 %19, 20
  store i32 %add20, i32* %p19, align 1, !tbaa !24
  %add21 = add i32 %20, 21
  store i32 %add21, i32* %p20, align 1, !tbaa !25
  %add22 = add i32 %21, 22
  store i32 %add22, i32* %p21, align 1, !tbaa !26
  %add23 = add i32 %22, 23
  store i32 %add23, i32* %p22, align 1, !tbaa !27
  %add24 = add i32 %23, 24
  store i32 %add24, i32* %p23, align 1, !tbaa !28
  %add25 = add i32 %24, 25
  store i32 %add25, i32* %p24, align 1, !tbaa !29
  %add26 = add i32 %25, 26
  store i32 %add26, i32* %p25, align 1, !tbaa !30
  %add27 = add i32 %26, 27
  store i32 %add27, i32* %p26, align 1, !tbaa !31
  %add28 = add i32 %27, 28
  store i32 %add28, i32* %p27, align 1, !tbaa !32
  %add29 = add i32 %28, 29
  store i32 %add29, i32* %p28, align 1, !tbaa !33
  %add30 = add i32 %29, 30
  store i32 %add30, i32* %p29, align 1, !tbaa !34
  %add31 = add i32 %30, 31
  store i32 %add31, i32* %p30, align 1, !tbaa !35
  %add32 = add i32 %31, 32
  store i32 %add32, i32* %p31, align 1, !tbaa !36
  %add33 = add i32 %32, 33
  store i32 %add33, i32* %p32, align 1, !tbaa !37
  %33 = getelementptr inbounds %struct.VeryLarge, %struct.VeryLarge* %agg.result, i32 0, i32 0
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %33, i8* %p0, i32 129, i32 1, i1 false), !tbaa.struct !38
  ret void
}

; Function Attrs: argmemonly nounwind
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture readonly, i32, i32, i1) #1

attributes #0 = { norecurse nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { argmemonly nounwind }

!llvm.ident = !{!0}

!0 = !{!"clang version 3.8.0 (git://github.com/llvm-mirror/clang 40ef2b7531472c41212c4719a9294aeb7bddebbc) (git://github.com/llvm-mirror/llvm c601eaf55606dfb9ad372b514b77aa00d1409be1)"}
!1 = !{!2, !3, i64 0}
!2 = !{!"", !3, i64 0, !5, i64 1, !5, i64 5, !5, i64 9, !5, i64 13, !5, i64 17, !5, i64 21, !5, i64 25, !5, i64 29, !5, i64 33, !5, i64 37, !5, i64 41, !5, i64 45, !5, i64 49, !5, i64 53, !5, i64 57, !5, i64 61, !5, i64 65, !5, i64 69, !5, i64 73, !5, i64 77, !5, i64 81, !5, i64 85, !5, i64 89, !5, i64 93, !5, i64 97, !5, i64 101, !5, i64 105, !5, i64 109, !5, i64 113, !5, i64 117, !5, i64 121, !5, i64 125}
!3 = !{!"omnipotent char", !4, i64 0}
!4 = !{!"Simple C/C++ TBAA"}
!5 = !{!"int", !3, i64 0}
!6 = !{!2, !5, i64 1}
!7 = !{!2, !5, i64 5}
!8 = !{!2, !5, i64 9}
!9 = !{!2, !5, i64 13}
!10 = !{!2, !5, i64 17}
!11 = !{!2, !5, i64 21}
!12 = !{!2, !5, i64 25}
!13 = !{!2, !5, i64 29}
!14 = !{!2, !5, i64 33}
!15 = !{!2, !5, i64 37}
!16 = !{!2, !5, i64 41}
!17 = !{!2, !5, i64 45}
!18 = !{!2, !5, i64 49}
!19 = !{!2, !5, i64 53}
!20 = !{!2, !5, i64 57}
!21 = !{!2, !5, i64 61}
!22 = !{!2, !5, i64 65}
!23 = !{!2, !5, i64 69}
!24 = !{!2, !5, i64 73}
!25 = !{!2, !5, i64 77}
!26 = !{!2, !5, i64 81}
!27 = !{!2, !5, i64 85}
!28 = !{!2, !5, i64 89}
!29 = !{!2, !5, i64 93}
!30 = !{!2, !5, i64 97}
!31 = !{!2, !5, i64 101}
!32 = !{!2, !5, i64 105}
!33 = !{!2, !5, i64 109}
!34 = !{!2, !5, i64 113}
!35 = !{!2, !5, i64 117}
!36 = !{!2, !5, i64 121}
!37 = !{!2, !5, i64 125}
!38 = !{i64 0, i64 1, !39, i64 1, i64 4, !40, i64 5, i64 4, !40, i64 9, i64 4, !40, i64 13, i64 4, !40, i64 17, i64 4, !40, i64 21, i64 4, !40, i64 25, i64 4, !40, i64 29, i64 4, !40, i64 33, i64 4, !40, i64 37, i64 4, !40, i64 41, i64 4, !40, i64 45, i64 4, !40, i64 49, i64 4, !40, i64 53, i64 4, !40, i64 57, i64 4, !40, i64 61, i64 4, !40, i64 65, i64 4, !40, i64 69, i64 4, !40, i64 73, i64 4, !40, i64 77, i64 4, !40, i64 81, i64 4, !40, i64 85, i64 4, !40, i64 89, i64 4, !40, i64 93, i64 4, !40, i64 97, i64 4, !40, i64 101, i64 4, !40, i64 105, i64 4, !40, i64 109, i64 4, !40, i64 113, i64 4, !40, i64 117, i64 4, !40, i64 121, i64 4, !40, i64 125, i64 4, !40}
!39 = !{!3, !3, i64 0}
!40 = !{!5, !5, i64 0}
```



Reviewers: asl

Subscribers: qcolombet

Differential Revision: http://reviews.llvm.org/D17441

llvm-svn: 261746
2016-02-24 15:15:02 +00:00
Simon Pilgrim 3b6feeaa7c [X86][SSE41] Combine vector blends with zero
Part 2 of 2
This patch add support for combining target shuffles into blends-with-zero.

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261745
2016-02-24 15:14:21 +00:00
Simon Pilgrim dd01f70085 [X86][SSE41] Combine insertion of zero scalars into vector blends with zero
Part 1 of 2
This patch attempts to replace the insertion of zero scalars with a vector blend with zero, avoiding the use of the integer insertion instructions (which are particularly slow on many targets).
(Part 2 will add support for combining multiple blends-with-zero).

Differential Revision: http://reviews.llvm.org/D17483

llvm-svn: 261743
2016-02-24 14:53:27 +00:00
Nikolay Haustov 4f073ca7fa [AMDGPU] Assembler: Simplify handling of optional operands
Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445

Reviewers: tstellarAMD
llvm-svn: 261742
2016-02-24 14:22:47 +00:00
Nikolay Haustov 3c728e43aa [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields)
lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t.

Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com)

Reviewers: arsenm
llvm-svn: 261732
2016-02-24 10:54:25 +00:00
David Majnemer e2ad73759d [CodeView] Describe variables live in x87 registers
We didn't have a mapping from LLVM's x87 floating point registers to
CodeView's encoding.

llvm-svn: 261730
2016-02-24 10:01:24 +00:00
Simon Pilgrim c291c03702 [X86][SSE] Don't get target shuffle operands prematurely.
PerformShuffleCombine should be usable by unary and binary target shuffles, but was attempting to get the first two operands whatever the instruction type. Since these are only used for VECTOR_SHUFFLE instructions for one particular combine I've moved them inside the relevant if statement.

llvm-svn: 261727
2016-02-24 09:07:47 +00:00
Michael Zuckerman a1f2d27da2 [LLVM][AVX512][PSHUFHW ][PSHUFLW ] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17538

llvm-svn: 261725
2016-02-24 08:39:05 +00:00
Igor Breger c7ba5699c5 AVX512: Add vpmovzxbw/d/q ,vpmovzxw/d/q ,vpmovzxbdq lowering patterns that support 256bit inputs like AVX patterns ( that are disable in case HasVLX , see SS41I_pmovx_avx2_patterns).
Differential Revision: http://reviews.llvm.org/D17504

llvm-svn: 261724
2016-02-24 08:15:20 +00:00
Justin Bogner 38e5217b1e X86: Wrap a helper for an assert in #ifndef NDEBUG
This function is used in exactly one place, and only in asserts
builds. Move it a few lines up before the use and only define it when
asserts are enabled. Fixes the release build under -Werror.

Also remove the forward declaration and commentary that was basically
identical to the code itself.

llvm-svn: 261722
2016-02-24 07:58:02 +00:00
Matt Arsenault cd09961fb3 AMDGPU: Check cheaper condition before SignBitIsZero
Don't do an expensive computeKnownBits call when we
can do the cheap check for legal offsets first.

llvm-svn: 261720
2016-02-24 04:55:29 +00:00
Derek Schuff f9c0a5c377 Revert "[WebAssembly] Stackify code emitted by eliminateFrameIndex"
This reverts r261685 due to wasm test breakage.

llvm-svn: 261702
2016-02-23 22:13:21 +00:00
Tim Northover 87442c1133 AArch64: rename compact unwind forms back to UNWIND_ARM64_*. NFC.
Looks like the global rename last year was a bit over-zealous. These things
really are referred to with ARM64 elsewhere (ld64, libunwind, ...).

llvm-svn: 261698
2016-02-23 21:49:05 +00:00
Derek Schuff b21570cc1d [WebAssembly] Stackify code emitted by eliminateFrameIndex
llvm-svn: 261685
2016-02-23 21:25:17 +00:00
Tim Northover b9edc5ca21 ARM: fix handling of movw/movt relocations with addend.
We were emitting only one half of a the paired relocations needed for these
instructions because we decided that an offset needed a scattered relocation.
In fact, movw/movt relocations can be paired without being scattered.

llvm-svn: 261679
2016-02-23 20:20:23 +00:00
Geoff Berry 27b1ded41a [AArch64] Generate csinv instruction more often
Reviewers: t.p.northover, jmolloy

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17546

llvm-svn: 261675
2016-02-23 19:34:13 +00:00
Davide Italiano 62b7f7a398 [X86ISelLowering] Stop typing the same return over and over and over.
llvm-svn: 261666
2016-02-23 18:39:38 +00:00
Weiming Zhao 5e0c3ebe9e Fix PR25339: ARM Constant Island
Summary:
Currently, the ARM Constant Island may not converge (or not converge quickly).
This patch let it move to the closest water after the user if it doesn't converge after 15 iterations.

This address https://llvm.org/bugs/show_bug.cgi?id=25339

Reviewers: t.p.northover, srhines, kristof.beyls, aadg, rengolin

Subscribers: weimingz, aemerson, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D16890

llvm-svn: 261665
2016-02-23 18:39:19 +00:00
Derek Schuff 770bdfe30b [WebAssembly] Add TODO comment to revisit red zone size
llvm-svn: 261664
2016-02-23 18:17:46 +00:00
Derek Schuff 4b3bb213b2 [WebAssembly] Implement red zone for user stack
Implements a mostly-conventional redzone for the userspace
stack. Because we have unsigned load/store offsets we continue to use a
local SP subtracted from the incoming SP but do not write it back to
memory.

Differential Revision: http://reviews.llvm.org/D17525

llvm-svn: 261662
2016-02-23 18:13:07 +00:00
Geoff Berry a1c6269c91 [AArch64] Fix fastcc -tailcallopt epilog code generation.
Summary:
Fix a bug in epilog generation where the incoming stack arguments were
not being popped for fastcc functions when -tailcallopt was passed.

Reviewers: t.p.northover, mcrosier, jmolloy, rengolin

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16894

llvm-svn: 261650
2016-02-23 16:54:36 +00:00
Aaron Ballman 8374c1f785 Silencing a signed vs unsigned mismatch.
llvm-svn: 261640
2016-02-23 15:02:43 +00:00
Chad Rosier 44a6d443d1 [AArch64] Fix comment typo in Cyclone scheduling defs. NFC.
llvm-svn: 261637
2016-02-23 14:05:13 +00:00
Junmo Park 453f4aa4dd [ARM] fix initialization of PredictableSelectIsExpensive
Summary:
If we want classify OoO or not, using getSchedModel().isOutOfOrder()
could be more proper way than using Subtarget->isLikeA9().

Reviewers: jmolloy, rengolin

Differential Revision: http://reviews.llvm.org/D17433

llvm-svn: 261623
2016-02-23 09:56:58 +00:00
Nikolay Haustov 2a62b3c244 [AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64
src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64).
src0 and src1 of s_bfm_b64 are 32-bit.
Update tests.

Review: http://reviews.llvm.org/D17480

Reviewers: arsenm
llvm-svn: 261621
2016-02-23 09:19:14 +00:00
Igor Breger 1360f4db47 AVX512: Fix predicate of AVX pcmpeqw/b , pcmpgtb/w/d instructions . AVX512 version of this instructions return result in kmask register, so AVX patterns should not be disabled.
Differential Revision: http://reviews.llvm.org/D17517

llvm-svn: 261619
2016-02-23 08:55:33 +00:00
Duncan P. N. Exon Smith 6307eb5518 CodeGen: TII: Take MachineInstr& in predicate API, NFC
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest).  All of these
functions require non-null parameters already, so references are more
clear.  As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.

No functionality change intended.

llvm-svn: 261605
2016-02-23 02:46:52 +00:00
David Majnemer 964b70d559 [X86] Create mergeable constant pool entries for AVX
We supported creating mergeable constant pool entries for smaller
constants but not for 32-byte AVX constants.

llvm-svn: 261584
2016-02-22 22:23:11 +00:00
Derek Schuff 27e3b8a6e3 [WebAssembly] Fix writeback of stack pointer with dynamic alloca
Previously the stack pointer was only written back to memory in the
prolog. But this is wrong for dynamic allocas, for which
target-independent codegen handles SP updates after the prolog (and
possibly even in another BB). Instead update the SP global in
ADJCALLSTACKDOWN which is generated after the SP update sequence.
This will have further refinements when we add red zone support.

llvm-svn: 261579
2016-02-22 21:57:17 +00:00
Duncan P. N. Exon Smith d84f600653 CodeGen: Bring back MachineBasicBlock::iterator::getInstrIterator()...
This is a little embarrassing.

When I reverted r261504 (getIterator() => getInstrIterator()) in
r261567, I did a `git grep` to see if there were new calls to
`getInstrIterator()` that I needed to migrate.  There were 10-20 hits,
and I blindly did a `sed ...` before calling `ninja check`.

However, these were `MachineInstrBundleIterator::getInstrIterator()`,
which predated r261567.  Perhaps coincidentally, these had an identical
name and return type.

This commit undoes my careless sed and restores
`MachineBasicBlock::iterator::getInstrIterator()`.

llvm-svn: 261577
2016-02-22 21:30:15 +00:00
Davide Italiano 2ec4717c2c [X86ISelLowering] Consolidate duplicated code in a single place.
llvm-svn: 261573
2016-02-22 21:06:46 +00:00
Matt Arsenault fa67bdbde0 AMDGPU/R600: Implement allowsMisalignedMemoryAccess
This avoids some test regressions in a future commit
when unaligned operations are expanded when they
have custom lowering.

llvm-svn: 261570
2016-02-22 21:04:16 +00:00
Duncan P. N. Exon Smith c5b668deb8 Revert "CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC"
This reverts commit r261504, since it's not obvious the new name is
better:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html

I'll recommit if we get consensus that it's the right direction.

llvm-svn: 261567
2016-02-22 20:49:58 +00:00
Dan Gohman 42bb254133 [WebAssembly] Re-enable the TailDuplicate pass.
llvm-svn: 261566
2016-02-22 20:47:12 +00:00
JF Bastien 437786cf60 WebAssembly: update expected failures
clang r261557 lowers va_arg in clang.

llvm-svn: 261564
2016-02-22 20:37:34 +00:00
Dan Gohman 3b09d279be [WebAssembly] Teach address folding to fold bitwise-or nodes.
LLVM converts adds into ors when it can prove that the operands don't share
any non-zero bits. Teach address folding to recognize or instructions with
constant operands with this property that can be folded into addresses as
if they were adds.

llvm-svn: 261562
2016-02-22 20:04:02 +00:00
Tom Stellard d93a34f714 [AMDGPU][llvm-mc] Support for 32-bit inline literals
Patch by: Artem Tamazov

Summary:
Note: Support for 64-bit inline literals TBD
Added: Support of abs/neg modifiers for literals (incomplete; parsing TBD).
Added: Some TODO comments.
Reworked/clarity: rename isInlineImm() to isInlinableImm()
Reworked/robustness: disallow BitsToFloat() with undefined value in isInlinableImm()
Reworked/reuse: isSSrc32/64(), isVSrc32/64()
Tests added.

Reviewers: tstellarAMD, arsenm

Subscribers: vpykhtin, nhaustov, SamWot, arsenm

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D17204

llvm-svn: 261559
2016-02-22 19:17:56 +00:00
Tom Stellard 82fc153baa [AMDGPU] [llvm-mc] [VI] Fix encoding of LDS/GDS instructions.
Patch by: Artem Tamazov

Summary: Tests added.

Reviewers: tstellarAMD, arsenm

Subscribers: vpykhtin, SamWot, #llvm-amdgpu-spb

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D17271

llvm-svn: 261558
2016-02-22 19:17:53 +00:00
Nemanja Ivanovic a8ef3c9b86 Fix for PR26690 take 2
This is what was meant to be in the initial commit to fix this bug. The
parens were missing. This commit also adds a test case for the bug and
has undergone full testing on PPC and X86.

llvm-svn: 261546
2016-02-22 18:04:00 +00:00
Dan Gohman 595e8ab22d [WebAssembly] Properly ignore llvm.dbg.value instructions.
llvm-svn: 261538
2016-02-22 17:45:20 +00:00
Zoran Jovanovic d665a66b0f [mips] added support for trunc macro
Author: obucina
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D15745

llvm-svn: 261529
2016-02-22 16:00:23 +00:00
Nemanja Ivanovic 3361867470 Revert bad fix for PR26690.
llvm-svn: 261527
2016-02-22 15:06:32 +00:00
Nemanja Ivanovic d58b976bb7 Fix for PR26690
I mistook BitVector::empty() to mean BitVector::count() == 0 and it does
not. Corrected the issue with the fix for PR26500.

llvm-svn: 261525
2016-02-22 14:47:49 +00:00
Benjamin Kramer 451f54cf62 Fix some abuse of auto flagged by clang's -Wrange-loop-analysis.
llvm-svn: 261524
2016-02-22 13:11:58 +00:00
Igor Breger 252c2d9680 AVX512F: Add assembler Intel syntax tests for knl, fix minor bugs.
Differential Revision: http://reviews.llvm.org/D17498

llvm-svn: 261521
2016-02-22 12:37:41 +00:00
Igor Breger 4511e76e5c AVX512: Fix scalar mem operands.
Differential Revision: http://reviews.llvm.org/D17500

llvm-svn: 261520
2016-02-22 11:48:27 +00:00
Craig Topper 84f2f18cfd [X86] Minor formatting fix. NFC
llvm-svn: 261515
2016-02-22 08:00:04 +00:00
Duncan P. N. Exon Smith e59c8af705 Reapply "CodeGen: Use references in MachineTraceMetrics::Trace, NFC"
This reverts commit r261510, effectively reapplying r261509.  The
original commit missed a caller in AArch64ConditionalCompares.

Original commit message:

Pass non-null arguments by reference in MachineTraceMetrics::Trace,
simplifying future work to remove implicit iterator => pointer
conversions.

llvm-svn: 261511
2016-02-22 03:33:28 +00:00
Duncan P. N. Exon Smith d6de2a7612 Document assumption in X86FrameLowering::inlineStackProbe()
Resolve FIXME from r261504.  Apparently bundled instructions are illegal
here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160215/334146.html

llvm-svn: 261507
2016-02-22 02:32:35 +00:00
Duncan P. N. Exon Smith dc0848c029 CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.

- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
  that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator().  This matches the
  naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator().  This is explicitly called
  "bundle" (not matching MachineBasicBlock) to disintinguish it clearly
  from ilist_node::getIterator().
- Update all calls.  Some of these I switched to `auto` to remove
  boiler-plate, since the new name is clear about the type.

There was one call I updated that looked fishy, but it wasn't clear what
the right answer was.  This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp.  I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.

llvm-svn: 261504
2016-02-21 22:58:35 +00:00
Duncan P. N. Exon Smith e9bc579c37 ADT: Remove == and != comparisons between ilist iterators and pointers
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.

Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators.  (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)

There should be NFC here.

llvm-svn: 261498
2016-02-21 20:39:50 +00:00
Craig Topper 4ab89ea187 [X86] Remove unused encoding types from disassembler. NFC
llvm-svn: 261494
2016-02-21 19:49:16 +00:00
Simon Pilgrim e9093adae0 [X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads
Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements.

Differential Revision: http://reviews.llvm.org/D17297

llvm-svn: 261490
2016-02-21 19:15:48 +00:00
Sanjoy Das aa63dc0e9a Fix LLVM's handling and detection of skylake and cannonlake CPUs
Summary:
 - Rename `"skylake"` == SkylakeServerProc to `"skylake-avx512"`
 - Change `"skylake"` to denote SkylakeClientProc
 - Fix the detection of cpu family 6 and model 94 to be
   SkylakeClientProc instead of SkylakeServerProc
 - Remove the `"cnl"` for CannonLake

Reviewers: craig.topper, delena

Subscribers: zansari, echristo, qcolombet, RKSimon, spatel, DavidKreitzer, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17090

llvm-svn: 261482
2016-02-21 17:12:03 +00:00
JF Bastien 55afdfd384 WebAssembly: update expected torture test failures
r261457 handles CopyToReg nodes with flag results in LowerCopyToReg, which was causing the SelectionDAGNodes assert.

llvm-svn: 261479
2016-02-21 16:52:00 +00:00
Dan Gohman 27a11eefcc [WebAssembly] Support physical registers in the rewrite-to-discard optimization.
llvm-svn: 261465
2016-02-21 03:27:22 +00:00
David Majnemer 78f46bea08 Unbreak non-X86 targets from fallout caused by r261462
llvm-svn: 261463
2016-02-21 01:40:04 +00:00
David Majnemer a3ea407d48 [X86] Use the correct alignment for COMDAT constant pool entries
COFF doesn't have sections with mergeable contents.  Instead, each
constant pool entry ends up in a COMDAT section.  The linker, when
choosing between COMDAT sections, doesn't choose the max alignment of
the two sections.  You just get whatever alignment was on the section.

If one constant needed a higher alignment in one object file from
another one, then we will get into trouble if the linker chooses the
lower alignment one.

Instead, lets promote the alignment of the constant pool entry to make
sure we don't use an under aligned constant with an instruction which
assumed otherwise.

This fixes PR26680.

llvm-svn: 261462
2016-02-21 01:30:30 +00:00
Dan Gohman 3a71ccb607 [WebAssembly] Refine a README.txt entry.
The register coloring pass may also need to be involved in order to
optimally sort registers.

llvm-svn: 261458
2016-02-20 23:11:14 +00:00
Dan Gohman 02c0871abd [WebAssembly] Handle CopyToReg nodes with flag results in LowerCopyToReg.
llvm-svn: 261457
2016-02-20 23:09:44 +00:00
Derek Schuff 90dbb8cfc3 [WebAssembly] Write stack pointer back to memory when FP is used
The stack pointer is bumped when there is a frame pointer or when there
are static-size objects, but was only getting written back when there
were static-size objects.

llvm-svn: 261453
2016-02-20 22:18:47 +00:00
Derek Schuff dc5f6aa4bb [WebAssembly] Stackify function prologs and epilogs
The instructions are the same, but fewer locals are used.

Differential Revision: http://reviews.llvm.org/D17428

llvm-svn: 261452
2016-02-20 21:46:50 +00:00
Nemanja Ivanovic daf0ca2341 Fix the build bot break caused by rL261441.
The patch has a necessary call to a function inside an assert. Which is fine
when you have asserts turned on. Not so much when they're off. Sorry about
the regression.

llvm-svn: 261447
2016-02-20 20:45:37 +00:00
Nemanja Ivanovic ae22101c55 Fix for PR 26500
This patch corresponds to review:
http://reviews.llvm.org/D17294

It ensures that whatever block we are emitting the prologue/epilogue into, we
have the necessary scratch registers. It takes away the hard-coded register
numbers for use as scratch registers as registers that are guaranteed to be
available in the function prologue/epilogue are not guaranteed to be available
within the function body. Since we shrink-wrap, the prologue/epilogue may end
up in the function body.

llvm-svn: 261441
2016-02-20 18:16:25 +00:00
Simon Pilgrim ecb0433599 [X86][SSE] Fixed issue with commutation of 'faux unary' target shuffles (PR26667)
Fixed a bug introduced by D16683 when a binary shuffle is simplified to a unary shuffle (with undef/zero sentinel mask indices) - if this resulted in only the second input being used combineX86ShuffleChain failed to take this into account and still referenced the first input.

llvm-svn: 261434
2016-02-20 14:39:45 +00:00
Simon Pilgrim ccf2cce67c [X86][SSE] Move all undef/zero cases before target shuffle combining.
First small step towards fixing PR26667 - we need to ensure that combineX86ShuffleChain only gets called with a valid shuffle input node (a similar issue was found in D17041).

llvm-svn: 261433
2016-02-20 12:57:32 +00:00
Andrey Turetskiy 9994b8894a [X86] Enable the LEA optimization pass by default.
Differential Revision: http://reviews.llvm.org/D16877

llvm-svn: 261429
2016-02-20 11:11:55 +00:00
Andrey Turetskiy 0babd26626 [X86] PR26575: Fix LEA optimization pass (Part 2).
Handle address displacement operands of a type other than Immediate or Global in LEAs and load/stores.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26575

Differential Revision: http://reviews.llvm.org/D17374

llvm-svn: 261428
2016-02-20 10:58:28 +00:00
David Majnemer 862c5ba302 Move some code from doInitialization to runOnFunction
This has no observable behavior change, it just makes the state
insertion pass look a little more like normal passes.

llvm-svn: 261420
2016-02-20 07:34:21 +00:00
Craig Topper 2bf0c0394d [X86] Add some missing reversed forms of XOP instructions.
llvm-svn: 261417
2016-02-20 06:20:17 +00:00
Davide Italiano 228978c0dc [X86ISelLowering] Fix TLSADDR lowering when shrink-wrapping is enabled.
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we 
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!

llvm-svn: 261387
2016-02-20 00:44:47 +00:00
Tom Stellard 467b5b9024 AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.

This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.

This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.

Reviewers: arsenm, cfang, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17305

llvm-svn: 261385
2016-02-20 00:37:25 +00:00
Davide Italiano a8f1f2efaf [X86ISelLowering] Provide a more informative assert message.
I stumbled upon this while debugging a lowering bug.

llvm-svn: 261371
2016-02-19 22:18:49 +00:00
Davide Italiano 4cfe2a9e38 [X86ISelLowering] Merge two conditions inside a single if.
llvm-svn: 261370
2016-02-19 22:01:07 +00:00
Hans Wennborg 7c3077ca52 Revert r253557 "Alternative to long nops for X86 CPUs, by Andrey Turetsky"
Turns out the new nop sequences aren't actually nops on x86_64 (PR26554).

llvm-svn: 261365
2016-02-19 21:26:31 +00:00
Dimitry Andric db417b6d40 Fix incorrect selection of AVX512 sqrt when OptForSize is on
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions.  This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition.  Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.

In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.

Reviewers: grosbach, resistor, joker.eph

Subscribers: spatel, emaste, llvm-commits

Differential Revision: http://reviews.llvm.org/D17414

llvm-svn: 261360
2016-02-19 20:14:11 +00:00
Dan Gohman 87e368b7db [WebAssembly] Add another optimization idea to README.txt.
llvm-svn: 261354
2016-02-19 19:22:44 +00:00
Geoff Berry 7e4ba3dc02 [AArch64][ShrinkWrap] Fix bug in prolog clobbering live reg when shrink wrapping.
Summary: See bug https://llvm.org/bugs/show_bug.cgi?id=26642

Reviewers: qcolombet, t.p.northover

Subscribers: aemerson, rengolin, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D17350

llvm-svn: 261349
2016-02-19 18:27:32 +00:00
Tom Stellard 2d26fe7aa6 AMDGPU/SI: Fix s_waitcnt insertion for flat instructions
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions.  The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17366

llvm-svn: 261330
2016-02-19 15:33:13 +00:00
Ulrich Weigand cfa1d2b49d [SystemZ] Fix ABI for i128 argument and return types
According to the SystemZ ABI, 128-bit integer types should be
passed and returned via implicit reference.  However, this is
not currently implemented at the LLVM IR level for the i128
type.  This does not matter when compiling C/C++ code, since
clang will implement the implicit reference itself.

However, it turns out that when calling libgcc helper routines
operating on 128-bit integers, LLVM will use i128 argument and
return value types; the resulting code is not compatible with
the ABI used in libgcc, leading to crashes (see PR26559).

This should be simple to fix, except that i128 currently is not
even a legal type for the SystemZ back end.  Therefore, common
code will already split arguments and return values into multiple
parts.  The bulk of this patch therefore consists of detecting
such parts, and correctly handling passing via implicit reference
of a value split into multiple parts.  If at some time in the
future, i128 becomes a legal type, this code can be removed again.

This fixes PR26559.

llvm-svn: 261325
2016-02-19 14:10:21 +00:00
Craig Topper 5eeb41c173 [X86] Remove unused entries from the disassembler type enum.
llvm-svn: 261311
2016-02-19 06:57:40 +00:00
Junmo Park 1108ab059c Minor code cleanups. NFC.
llvm-svn: 261294
2016-02-19 01:46:04 +00:00
Sanjay Patel 0adbea4b5c [x86] fix initialization of PredictableSelectIsExpensive
This is effectively NFC because Atom is the only in-order x86 subtarget currently,
but the predicate would have become wrong if any other in-order CPU came along.

See related discussion in:
http://reviews.llvm.org/D16836

llvm-svn: 261275
2016-02-18 23:08:48 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Adam Nemet 9d9cb274ea [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

Obviously the pass still only used from PPC at this point.  Subsequent
patches will start driving this from ARM64 as well.

Due to the previous patch most lines should show up as moved lines.

llvm-svn: 261265
2016-02-18 21:38:19 +00:00
Adam Nemet 7cf9b1bf05 [PPCLoopDataPrefetch] Remove PPC from some of the names. NFC
This is done only to make the next patch that move the pass out PPC to
Transforms easier to read.  After this most line should show up as moved
lines in that patch.

This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

llvm-svn: 261264
2016-02-18 21:37:12 +00:00
David Majnemer a822c880a9 [WinEH] Hoist state stores from successors
If we know that all of our successors want to be in the exact same
state, it makes sense to hoist the state transition into their common
predecessor.

Differential Revision: http://reviews.llvm.org/D17391

llvm-svn: 261262
2016-02-18 21:13:35 +00:00
Davide Italiano 440a676136 [X86ISelLowering] Use isPowerof2 instead of rewriting it. NFC.
llvm-svn: 261255
2016-02-18 20:43:15 +00:00
Matthew Simpson 921ad01a1d [AArch64] Reduce vector insert/extract cost for Kryo
Differential Revision: http://reviews.llvm.org/D17379

llvm-svn: 261237
2016-02-18 18:35:45 +00:00
Hans Wennborg 23cdc643b9 Revert to extend i8/i16 return values on Darwin (PR26665)
In r260133, LLVM was changed to no longer extend i8/i16 return values,
as it's not required by the ABI. However, code was found in the wild
that relies on the old behaviour on Darwin, so this commit reverts
back to that old behaviour for Darwin.

On other platforms, it's less likely that code would be depending on
the old behaviour, as GCC and MSVC haven't been extending such return
values.

llvm-svn: 261235
2016-02-18 18:17:05 +00:00
Chad Rosier c00ab4f27d [Hexagon] Remove redundant check.
llvm-svn: 261232
2016-02-18 17:49:57 +00:00
Nicolai Haehnle f2c64db55a AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.

IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.

Differential Revision: http://reviews.llvm.org/D17276

llvm-svn: 261224
2016-02-18 16:44:18 +00:00
Krzysztof Parzyszek 754bad884d [Hexagon] Fix compilation error with GCC 6
Compiling Hexagon target with GCC 6 produces "error: should have been
declared inside" due to GCC PR c++/69657 which was merged.

Properly wrapping operator<<() definitions within the namespace llvm
fixes the issue.

Author: domagoj.stolfa

Differential Revision: http://reviews.llvm.org/D17281

llvm-svn: 261220
2016-02-18 16:10:27 +00:00
Krzysztof Parzyszek 7a737d1abb [Hexagon] Implement TLS support
Patch by Anand Kodnani.

llvm-svn: 261218
2016-02-18 15:42:57 +00:00
Zlatko Buljan f034021443 [mips][microMIPS] Implement TLBINV and TLBINVF instructions
Differential Revision: http://reviews.llvm.org/D16849

llvm-svn: 261211
2016-02-18 14:10:52 +00:00
Krzysztof Parzyszek 6895b2ceb2 [Hexagon] Add support for __builtin_prefetch
llvm-svn: 261210
2016-02-18 13:58:38 +00:00
Krzysztof Parzyszek 39686cf98e [Hexagon] Update the callee-saved register set for EH-aware functions
llvm-svn: 261208
2016-02-18 13:41:05 +00:00
Simon Pilgrim 05e48b95eb [X86][SSE] Improve PSHUFB shuffle mask decoding.
In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool.

The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly.

llvm-svn: 261201
2016-02-18 10:17:40 +00:00
Nikolay Haustov 10813a4efa Test commit access.
llvm-svn: 261199
2016-02-18 10:02:12 +00:00
Michael Zuckerman 724dc3b20c [AVX512][PRORQ][PRORD] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17024

llvm-svn: 261198
2016-02-18 09:52:12 +00:00
Dan Gohman d85ab7fc10 [WebAssembly] Don't use setRequiresStructuredCFG(true).
While we still do want reducible control flow, the RequiresStructuredCFG
flag imposes more strict structure constraints than WebAssembly wants.
Unsetting this flag enables critical edge splitting and tail merging.

Also, disable TailDuplication explicitly, as it doesn't support virtual
registers, and was previously only disabled by the RequiresStructuredCFG
flag.

llvm-svn: 261190
2016-02-18 06:32:53 +00:00
Tom Stellard e1818af8c5 [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target
Changes:

- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
  disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler

Disassembler only supports:

- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants

[Valery]

One of the point that requires to pay attention to is how decoder
conflicts were resolved:

- Groups of target instructions were separated by using different
  DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
  approach.

- There were conflicts in IMAGE_<> instructions caused by two
  different reasons:

1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
   the address components but have the same encoding by the HW spec. The
   actual number of address components is determined by the HW at runtime
   using image resource descriptor starting from the VGPR encoded in an
   IMAGE instruction. This means that we should choose only one instruction
   from conflicting group to be the rule for decoder. I didn’t find the way
   to disable decoder generation for an arbitrary instruction and therefore
   made a onelinear fix to tablegen generator that would suppress decoder
   generation when DecoderMethod is set to “NONE”. This is a change that
   should be reviewed and submitted first. Otherwise I would need to
   specify different DecoderNamespace for every instruction in the
   conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
   used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
   done later.

[/Valery]

Patch By: Sam Kolton

Differential Revision: http://reviews.llvm.org/D16723

llvm-svn: 261185
2016-02-18 03:42:32 +00:00
Derek Schuff 71434ff642 [WebAssembly] Disable register stackification and coloring when not optimizing
These passes are optimizations, and should be disabled when not
optimizing.
Also create an MCCodeGenInfo so the opt level is correctly plumbed to
the backend pass manager.
Also remove the command line flag for disabling register coloring;
running llc with -O0 should now be useful for debugging, so it's not
necessary.

Differential Revision: http://reviews.llvm.org/D17327

llvm-svn: 261176
2016-02-17 23:20:43 +00:00
Tim Northover 7687bcee4a AArch64: always clear kill flags up to last eliminated copy
After r261154, we were only clearing flags if the known-zero register was
originally live-in to the basic block, but we have to do it even if not when
more than one COPY has been eliminated, otherwise the user of the first COPY
may still have <kill> marked.

E.g.

BB#N:
    %X0 = COPY %XZR
    STRXui %X0<kill>, <fi#0>
    %X0 = COPY %XZR
    STRXui %X0<kill>, <fi#1>

We can eliminate both copies, X0 is not live-in, but we must clear the kill on
the first store.

Unfortunately, I've been unable to come up with a non-fragile test for this.
I've only seen it in the wild with regalloc-created spills, and attempts to
reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile
asm clobbers were moved around. Should fix the aarch64 bot though.

llvm-svn: 261175
2016-02-17 23:07:04 +00:00
Amaury Sechet 22d2878399 Move LLVMCreateTargetData and LLVMDisposeTargetData together. NFC
llvm-svn: 261172
2016-02-17 22:41:09 +00:00
Tim Northover 3f2285615a AArch64: improve redundant copy elimination.
Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we
didn't sort out the use-def chain properly.

I've also made it check more than just the last instruction for a compatible
CBZ (so it can cope without fallthroughs). I'd have liked to do that
separately, but it's helps writing the test.

Finally, I removed some custom loops in favour of MachineInstr helpers and
refactored the control flow to flatten it and avoid possibly quadratic
iterations in blocks with many copies. NFC for these, just a general tidy-up.

llvm-svn: 261154
2016-02-17 21:16:53 +00:00
Colin LeMahieu 5e552d141f [Hexagon] Replacing reference/dereference with reference cast.
llvm-svn: 261133
2016-02-17 18:50:21 +00:00
Nico Weber 32ac273a91 Remove superfluous semicolon.
llvm-svn: 261128
2016-02-17 18:48:08 +00:00
David Majnemer 7e5937b775 [WinEH] Optimize WinEH state stores
32-bit x86 Windows targets use a linked-list of nodes allocated on the
stack, referenced to via thread-local storage.  The personality routine
interprets one of the fields in the node as a 'state number' which
indicates where the personality routine should transfer control.

State transitions are possible only before call-sites which may throw
exceptions.  Our previous scheme had us update the state number before
all call-sites which may throw.

Instead, we can try to minimize the number of times we need to store by
reasoning about the nearest store which dominates the current call-site.
If the last store agrees with the current call-site, then we know that
the state-update is redundant and can be elided.

This is largely straightforward: an RPO walk of the blocks allows us to
correctly forward propagate the information when the function is a DAG.
Currently, loops are not handled optimally and may trigger superfluous
state stores.

Differential Revision: http://reviews.llvm.org/D16763

llvm-svn: 261122
2016-02-17 18:37:11 +00:00
Colin LeMahieu 3d3ff650d6 [Hexagon] Loop instructions don't need special processing. Extension and fitting is performed by generic code and the comment is incorrect, loops don't have a separate extended opcode.
llvm-svn: 261118
2016-02-17 18:14:05 +00:00
Justin Lebar f9b5add6ad [NVPTX] Annotate convergent intrinsics as convergent.
Summary:
Previously the machine instructions for bar.sync &co. were not marked as
convergent.  This resulted in some MI passes (such as TailDuplication,
fixed in an upcoming patch) doing unsafe things to these instructions.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski, hfinkel

Differential Revision: http://reviews.llvm.org/D17318

llvm-svn: 261115
2016-02-17 17:46:54 +00:00
Justin Lebar d596ec93ce [NVPTX] Annotate call machine instructions as calls.
Summary:
Otherwise we'll try to do unsafe optimizations on these MIs, such as
sinking loads below calls.

(I suspect that this is not the only bug in the NVPTX instruction
tablegen files; I need to comb through them.)

Reviewers: jholewinski, tra

Subscribers: jingyue, jhen, llvm-commits

Differential Revision: http://reviews.llvm.org/D17315

llvm-svn: 261113
2016-02-17 17:46:50 +00:00
Krzysztof Parzyszek de697d4d40 [Hexagon] Fold object construction into map::insert
llvm-svn: 261096
2016-02-17 15:02:07 +00:00
Igor Breger ac02f1bb62 AVX512: Fix LowerMSCATTER() return value.
Bug description:
  The bug was discovered when test was compiled with -O0.
  In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.

Differential Revision: http://reviews.llvm.org/D17331

llvm-svn: 261090
2016-02-17 14:04:33 +00:00
Scott Egerton 219fae9e36 [mips] Removed the SHF_ALLOC flag and the SHT_REL flag from the .pdr section.
This section is used for debug information and has no need to be
in memory at runtime. This patch also fixes an error when compiling
the Linux kernel. The error is that there are relocations within the
.pdr section in a VDSO. SHT_REL was removed as it is a section type
and not a section flag, therefore it does not make sense for it to
be there. With this patch, LLVM now emits the same flags as
the GNU assembler.

llvm-svn: 261083
2016-02-17 11:15:16 +00:00
Simon Pilgrim c5b5dcb985 [X86][AVX] Support bit-blend integer shuffles for 256-bit integer vectors
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.

This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.

Part 2 of 2

Differential Revision: http://reviews.llvm.org/D17292

llvm-svn: 261082
2016-02-17 10:50:06 +00:00
Simon Pilgrim a50e8d3627 [X86][AVX] Support bit-mask integer shuffles for 256-bit integer vectors
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.

This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.

Part 1 of 2

Differential Revision: http://reviews.llvm.org/D17292

llvm-svn: 261081
2016-02-17 10:37:49 +00:00
Simon Pilgrim 9904924e6b [X86][SSE] Tidyup BUILD_VECTOR operand collection. NFCI.
Avoid reuse of operand variables, keep them local to a particular lowering - the operand collection is unique to each case anyhow.

Renamed from V to Ops to more closely match their purpose.

llvm-svn: 261078
2016-02-17 10:12:30 +00:00
Benjamin Kramer 98520ca73b [Hexagon] cast<> a reference instead of referencing + dereferencing.
llvm-svn: 261077
2016-02-17 09:28:45 +00:00
Hans Wennborg 84047896b9 Revert r260979 "[X86] Enable the LEA optimization pass by default."
Asserts are still firing in Chromium builds. PR26575.

llvm-svn: 261058
2016-02-17 02:49:59 +00:00
JF Bastien a0d5347ee9 WebAssembly: update expected failures
r261050 seems to inadvertently fix the assertion failure.

llvm-svn: 261051
2016-02-17 01:59:23 +00:00
Dan Gohman 476ffcec04 [WebAssembly] Call memcpy for large byval copies.
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.

llvm-svn: 261050
2016-02-17 01:43:37 +00:00
JF Bastien 188ca894c2 WebAssembly: update expected test failures
r261032 adds frame address support.

llvm-svn: 261044
2016-02-17 00:34:15 +00:00
Reid Kleckner 8de35fef3d [X86] Fix a shrink-wrapping miscompile around __chkstk
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.

llvm-svn: 261039
2016-02-17 00:17:33 +00:00
Dan Gohman 1d547bf566 [WebAssembly] Use SDValue::getConstantOperandVal. NFC.
llvm-svn: 261037
2016-02-17 00:14:03 +00:00
Dan Gohman 94c6566055 [WebAssembly] Implement __builtin_frame_address.
Differential Revision: http://reviews.llvm.org/D17307

llvm-svn: 261032
2016-02-16 23:48:04 +00:00
Ahmed Bougacha f3cccab1e0 [X86] Remove the now-unused X86ISD::PSIGN. NFC.
llvm-svn: 261025
2016-02-16 22:14:12 +00:00
Ahmed Bougacha af60a429c9 [X86] Generalize logic blend of (x, -x) combine to match (-x, x).
I suspect this is what let PR26110 lie dormant for so long.

llvm-svn: 261024
2016-02-16 22:14:07 +00:00
Ahmed Bougacha 132fbf5476 [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
Currently, we sometimes miscompile this vector pattern:
    (c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
    (~c & v) | (c & -v)

When we have SSSE3, we incorrectly lower that to PSIGN, which does:
    (c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
    (c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.

Note that the PSIGN tests are also incorrect. Consider:
    %b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = and <4 x i32> %a, %0
    %2 = and <4 x i32> %b.lobit, %sub
    %cond = or <4 x i32> %1, %2
    ret <4 x i32> %cond
if %b is zero:
    %b.lobit = <4 x i32> zeroinitializer
    %sub = sub nsw <4 x i32> zeroinitializer, %a
    %0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
    %1 = <4 x i32> %a
    %2 = <4 x i32> zeroinitializer
    %cond = or <4 x i32> %a, zeroinitializer
    ret <4 x i32> %a
whereas we currently generate:
    psignd %xmm1, %xmm0
    retq
which returns 0, as %xmm1 is 0.

Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

Fixes PR26110.

Differential Revision: http://reviews.llvm.org/D17181

llvm-svn: 261023
2016-02-16 22:14:03 +00:00
Ahmed Bougacha f211f685e4 [X86] Extract PSIGN/BLENDVP combine. NFC.
llvm-svn: 261021
2016-02-16 22:13:55 +00:00
Ahmed Bougacha 7502768c6d [X86] Extract ANDNP combine. NFC.
This makes it IMO more readable and reduces indentation.

llvm-svn: 261020
2016-02-16 22:13:49 +00:00
Derek Schuff 25628d3362 [WebAssembly] Update torture test expectations
These were fixed with r260978

llvm-svn: 261017
2016-02-16 21:52:06 +00:00
Derek Schuff f8f8f093aa [WebAssemly] Don't move calls or stores past intervening loads
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.

Differential Revision: http://reviews.llvm.org/D17298

llvm-svn: 261014
2016-02-16 21:44:19 +00:00
Colin LeMahieu ecef1d9cbc [Hexagon] Adding relocation for code size, cold path optimization allowing a 23-bit 4-byte aligned relocation to be a valid instruction encoding.
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.

Another way is to put a .word32 and mix code and data within a function.  The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.

This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits.  Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.

llvm-svn: 261006
2016-02-16 20:38:17 +00:00
Jun Bum Lim b389d9b9af [AArch64] Add pass to remove redundant copy after RA
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
  BB0:
    cbz x0, .BB1
  BB1:
    mov x0, xzr

Jun

Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier

Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin

Differential Revision: http://reviews.llvm.org/D16203

llvm-svn: 261004
2016-02-16 20:02:39 +00:00
Quentin Colombet ba2a01645b [GlobalISel] Re-apply r260922-260923 with MSVC-friendly code.
Original message:
Get rid of the ifdefs in TargetLowering.
Introduce a new API used only by GlobalISel: CallLowering.
This API will contain target hooks dedicated to call lowering.

llvm-svn: 260998
2016-02-16 19:26:02 +00:00
Derek Schuff aadc89c25d [WebAssembly] Insert COPY_LOCAL between CopyToReg and FrameIndex DAG nodes
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).

Differential Revision: http://reviews.llvm.org/D17213

llvm-svn: 260987
2016-02-16 18:18:36 +00:00
Tom Stellard cc4c8718ed [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00