llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	4108368a89	[X86][AVX2] Regenerated broadcast domain tests llvm-svn: 260010	2016-02-06 22:09:25 +00:00
Simon Pilgrim	672808a853	[X86][SSE] Add tests for MOVHLPS/MOVLHPS shuffle lowering. As raised in PR26491, we don't make use of these instructions at the moment. llvm-svn: 260008	2016-02-06 20:11:52 +00:00
Simon Pilgrim	0acc32a3b3	[X86][AVX512] Added support for VPMOVZX shuffle decoding. llvm-svn: 260007	2016-02-06 19:51:21 +00:00
Simon Pilgrim	83e04913e5	[X86][AVX512] Fixed prefix ordering for lzcnt tests. Let AVX512 targets share the same CHECKs. llvm-svn: 260000	2016-02-06 18:07:19 +00:00
Simon Pilgrim	a09a154c2e	[X86][SSE] Regenerate vector shift tests llvm-svn: 259999	2016-02-06 17:57:15 +00:00
Simon Pilgrim	63b1ecab7d	line endings fix llvm-svn: 259992	2016-02-06 15:38:25 +00:00
Simon Pilgrim	9e369f2a51	[X86][SSE] Don't replace an existing 32-bit load with its duplicate If we are already loading a single 32-bit float/integer then just reuse it. Fix for regression in D16729 llvm-svn: 259991	2016-02-06 15:37:09 +00:00
Matt Arsenault	7f83397d72	AMDGPU: Account for LDS alignment The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912	2016-02-05 19:47:29 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Wei Mi	a62f058989	Some stackslots are allocated to vregs which have no real reference. LiveRangeEdit::eliminateDeadDef is used to remove dead define instructions after rematerialization. To remove a VNI for a vreg from its LiveInterval, LiveIntervals::removeVRegDefAt is used. However, after non-PHI VNIs are all removed, PHI VNI are still left in the LiveInterval. Such unused vregs will be kept in RegsToSpill[] at the end of InlineSpiller::reMaterializeAll and spiller will allocate stackslot for them. The fix is to get rid of unused reg by checking whether it has non-dbg reference instead of whether it has non-empty interval. llvm-svn: 259895	2016-02-05 18:14:24 +00:00
Dan Gohman	d46b09267b	[WebAssembly] Update the select instructions' operand orders to match the spec. llvm-svn: 259893	2016-02-05 17:14:59 +00:00
Nemanja Ivanovic	d05e072b74	Add the missing test case for PR26193 llvm-svn: 259888	2016-02-05 15:03:17 +00:00
Renato Golin	6274e5222d	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)." This reverts commit r259812 as it broke AArch64 self-hosting. llvm-svn: 259881	2016-02-05 12:14:30 +00:00
Nemanja Ivanovic	b6fdce4ca0	Fix for PR 26356 Using the load immediate only when the immediate (whether signed or unsigned) can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and 0 to 65535 for unsigned. This patch also ensures that we sign-extend under the right conditions. llvm-svn: 259840	2016-02-04 23:14:42 +00:00
Nemanja Ivanovic	220b4fe4a9	Provide a test case for rl259798 llvm-svn: 259835	2016-02-04 22:36:10 +00:00
Simon Pilgrim	7823fd2535	[X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type. This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain..... llvm-svn: 259816	2016-02-04 19:27:51 +00:00
Chad Rosier	05f8020cdf	[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3). This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769 and r259790. The tramp3d failure was caused by an incorrect refactoring in the patch. Specifically, we weren't always properly clearing the SExtIdx flag. llvm-svn: 259812	2016-02-04 18:59:49 +00:00
Silviu Baranga	33b3bd17dd	[AArch64] Multiply extended 32-bit ints with `[U\|S]MADDL' During instruction selection, the AArch64 backend can recognise the following pattern and generate an [U\|S]MADDL instruction, i.e. a multiply of two 32-bit operands with a 64-bit result: (mul (sext i32), (sext i32)) However, when one of the operands is constant, the sign extension gets folded into the constant in SelectionDAG::getNode(). This means that the instruction selection sees this: (mul (sext i32), i64) ...which doesn't match the pattern. Sign-extension and 64-bit multiply instructions are generated, which are slower than one 32-bit multiply. Add a pattern to match this and generate the correct instruction, for both signed and unsigned multiplies. Patch by Chris Diamand! llvm-svn: 259800	2016-02-04 16:47:09 +00:00
Benjamin Kramer	e4dff62f64	The canonical way to XFAIL a test for all targets is XFAIL: , not XFAIL: Fix the lit bug that enabled this "feature" (empty triple is substring of all possible target triples) and change the two outliers to use the documented syntax. llvm-svn: 259799	2016-02-04 16:21:38 +00:00
Renato Golin	c455e2f441	[PPC] Move PPC test to a PPC-specific dir llvm-svn: 259797	2016-02-04 16:14:59 +00:00
Simon Pilgrim	6788f33cf2	[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load. Differential Revision: http://reviews.llvm.org/D16729 llvm-svn: 259796	2016-02-04 16:12:56 +00:00
Chad Rosier	18896c0f5e	Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR." This reverts commit r259790. tramp3d-v4 is still having problems. llvm-svn: 259795	2016-02-04 16:01:40 +00:00
Simon Pilgrim	528e94e9a2	[X86][SSE] Added i686 target tests to make sure we are correctly loading consecutive entries as 64-bit integers llvm-svn: 259794	2016-02-04 15:51:55 +00:00
Elena Demikhovsky	86528270b9	AVX-512: Fixed a bug in FMA instruction selection on KNL The FMA instruction was selected from AVX2 set instead of AVX-512 Differential Revision: http://reviews.llvm.org/D16884 llvm-svn: 259792	2016-02-04 15:11:11 +00:00
Petar Jovanovic	23e44f5e39	[Power PC] softening long double type This patch implements softening of long double type (ppcf128) on ppc32 architecture and enables operations for this type for soft float. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D15811 llvm-svn: 259791	2016-02-04 14:43:50 +00:00
Chad Rosier	feec2aeb0f	[AArch64] Improve load/store optimizer to handle LDUR + LDR. This patch allows the mixing of scaled and unscaled load/stores to form load/store pairs. PR24465 http://reviews.llvm.org/D12116 Many thanks to Ahmed and Michael for fixes and code review. This is a reapplication of r246769, which was reverted in r246782 due to a test-suite failure. I'm unable to reproduce the issue at this time. llvm-svn: 259790	2016-02-04 14:42:55 +00:00
Michael Zuckerman	7d73360479	[AVX512] add vfmadd132ss and vfmadd132sd Intrinsic Differential Revision: http://reviews.llvm.org/D16589 llvm-svn: 259789	2016-02-04 14:41:08 +00:00
Simon Pilgrim	8159cf11bc	[X86] Add AVX512 vector zext tests llvm-svn: 259786	2016-02-04 14:06:19 +00:00
Jingyue Wu	f650441b04	[NVPTX] Disable performance optimizations when OptLevel==None Reviewers: jholewinski, tra, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16874 llvm-svn: 259749	2016-02-04 04:15:36 +00:00
Nemanja Ivanovic	155402c9c2	Test case for PR 26381 llvm-svn: 259740	2016-02-04 01:58:20 +00:00
Wei Mi	a49559befb	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736	2016-02-04 01:27:38 +00:00
Tim Shen	f99f0d5a7e	[SelectionDAG] Fix CombineToPreIndexedLoadStore O(n^2) behavior This patch consists of two parts: a performance fix in DAGCombiner.cpp and a correctness fix in SelectionDAG.cpp. The test case tests the bug that's uncovered by the performance fix, and fixed by the correctness fix. The performance fix keeps the containers required by the hasPredecessorHelper (which is a lazy DFS) and reuse them. Since hasPredecessorHelper is called in a loop, the overall efficiency reduced from O(n^2) to O(n), where n is the number of SDNodes. The correctness fix keeps iterating the neighbor list even if it's time to early return. It will return after finishing adding all neighbors to Worklist, so that no neighbors are discarded due to the original early return. llvm-svn: 259691	2016-02-03 20:58:55 +00:00
Saleem Abdulrasool	f36005a358	ARM: support TLS for WoA Add support for TLS access for Windows on ARM. This generates a similar access to MSVC for ARM. The changes to the tablegen data is needed to support loading an external symbol global that is not for a call. The adjustments to the DAG to DAG transforms are needed to preserve the 32-bit move. llvm-svn: 259676	2016-02-03 18:21:59 +00:00
Wei Mi	97de385868	Revert r259662, which caused regressions on polly tests. llvm-svn: 259675	2016-02-03 18:05:57 +00:00
Jonas Paulsson	ac29f01788	[ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten. Recommited, after some fixing with test cases. Updated test cases: test/CodeGen/AArch64/arm64-misched-memdep-bug.ll test/CodeGen/AArch64/tailcall_misched_graph.ll Temporarily disabled test cases: test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated) test/CodeGen/PowerPC/vsx-fma-m.ll test/CodeGen/PowerPC/vsx-fma-sp.ll http://reviews.llvm.org/D8705 Reviewers: Hal Finkel, Andy Trick. llvm-svn: 259673	2016-02-03 17:52:29 +00:00
Wei Mi	ed133978a0	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662	2016-02-03 17:05:12 +00:00
Renato Golin	6027dd38ef	[ARM] Move GNUEABI divmod to __aeabi_divmod* The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores which happens to be a lot faster than __divsi3/__modsi3 when the core has hardware divide instructions. Do the same here. Fixes PR26450. llvm-svn: 259657	2016-02-03 16:10:54 +00:00
Simon Atanasyan	e774126c96	[mips] Add SHF_MIPS_GPREL flag to the MIPS .sbss and .sdata sections MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL flag. See Figure 4–7 on page 69 in the following document: ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf. Differential Revision: http://reviews.llvm.org/D15740 llvm-svn: 259641	2016-02-03 11:50:22 +00:00
Simon Pilgrim	18bcf93efb	[X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads Follow up to D16217 and D16729 This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD Differential Revision: http://reviews.llvm.org/D16768 llvm-svn: 259635	2016-02-03 09:41:59 +00:00
Kyle Butt	d62d8b771d	Codegen: [PPC] Fix PPCVSXFMAMutate to handle duplicates. The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms on PPC. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7 ;v6 = v6 + v5 * v7 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96 ;v5 = v5 * v7 + v96 This was broken in the case where the target register was also used as a multiplicand. Fix this case by checking for it and replacing both uses with the copied register. %vreg6<def> = COPY %vreg96 %vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6 ;v6 = v6 + v5 * v6 is replaced by %vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96 ;v5 = v5 * v96 + v96 llvm-svn: 259617	2016-02-03 01:41:09 +00:00
Yunzhong Gao	eb959722a7	Revert r259576: Disable the vzeroupper insertion pass on PS4. Will re-implement based on review feedback. llvm-svn: 259615	2016-02-03 01:25:12 +00:00
Yunzhong Gao	b76ccacfb1	Disable the vzeroupper insertion pass on PS4. See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation. Original patch by: Sean Silva llvm-svn: 259576	2016-02-02 21:39:23 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Quentin Colombet	b8fb2ba1bb	[X86] Fix the merging of SP updates in prologue/epilogue insertions. When the merging was involving LEAs, we were taking the wrong immediate from the list of operands. rdar://problem/24446069 llvm-svn: 259553	2016-02-02 20:11:17 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Oliver Stannard	7e7d983a87	Refactor backend diagnostics for unsupported features Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498	2016-02-02 13:52:43 +00:00
Simon Pilgrim	96fe4ef5f7	[X86][AVX512] Add support for AVX512 VMOVQ (load) shuffle decoding llvm-svn: 259496	2016-02-02 13:32:56 +00:00

1 2 3 4 5 ...

14887 Commits