llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	24c3a2395f	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support. llvm-svn: 291747	2017-01-12 06:49:12 +00:00
Craig Topper	69ab67b279	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq. llvm-svn: 291746	2017-01-12 06:49:08 +00:00
Elad Cohen	c5ba925ef2	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745	2017-01-12 06:49:03 +00:00
Simon Pilgrim	0c1faf432b	Remove trailing whitespace. NFCI. llvm-svn: 291680	2017-01-11 16:38:20 +00:00
Elena Demikhovsky	9d0e7c33d3	X86 CodeGen: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670	2017-01-11 12:59:32 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Hans Wennborg	6573976f57	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640	2017-01-11 01:36:57 +00:00
Hans Wennborg	12de693747	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630	2017-01-11 00:49:54 +00:00
Michael Zuckerman	bcd03e7f3b	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584	2017-01-10 18:57:17 +00:00
Craig Topper	2ed461e5c4	[X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails. Fixes PR31593. llvm-svn: 291535	2017-01-10 04:12:24 +00:00
Michael Kuperstein	1559e8863e	Revert r291092 because it introduces a crash. See PR31589 for details. llvm-svn: 291478	2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov	d497d36083	X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB. Differential Revision: https://reviews.llvm.org/D28087 llvm-svn: 291473	2017-01-09 20:26:17 +00:00
Simon Pilgrim	0f23b2ba1a	[X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets. Cost model updates to follow. llvm-svn: 291451	2017-01-09 17:20:03 +00:00
Simon Pilgrim	d990cd371b	[X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw). Cost model updates to follow. llvm-svn: 291445	2017-01-09 15:15:45 +00:00
Craig Topper	f51ba1e3da	[AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX. llvm-svn: 291402	2017-01-08 21:32:30 +00:00
Sanjay Patel	bf51c8a975	[x86] fix usage of stale operands when lowering select I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392	2017-01-08 15:53:40 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Simon Pilgrim	3128d6b520	[X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI. Early step towards ignoring domain above a certain shuffle depth. llvm-svn: 291248	2017-01-06 17:34:30 +00:00
Simon Pilgrim	bd3c6824d4	[X86][SSE] Simplify float domain requirement in unary shuffle matching. The AVX1-only limit is never actually required in matchUnaryVectorShuffle llvm-svn: 291244	2017-01-06 17:00:59 +00:00
Simon Pilgrim	a08d7b9913	Remove trailing whitespace. NFCI. llvm-svn: 291240	2017-01-06 15:31:52 +00:00
Simon Pilgrim	9b8c7caf4e	[X86] Add X86Subtarget argument. NFCI. All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it. llvm-svn: 291239	2017-01-06 15:29:17 +00:00
Craig Topper	e86fb932ea	[AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp. llvm-svn: 291214	2017-01-06 05:18:48 +00:00
Sanjay Patel	dea5a7bd53	less braces; NFC llvm-svn: 291126	2017-01-05 16:47:32 +00:00
Zvi Rackover	4b7d724d62	[X86] Optimize vector shifts with variable but uniform shift amounts Summary: For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register. The lower 64-bits of the register are evaluated to determine the shift amount. This patch improves the construction of the vector containing the shift amount. Reviewers: craig.topper, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28353 llvm-svn: 291120	2017-01-05 15:11:43 +00:00
Elena Demikhovsky	143cbc425b	AVX-512: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. Differential revision: https://reviews.llvm.org/D28216 llvm-svn: 291092	2017-01-05 08:21:09 +00:00
Eric Christopher	568c113ac0	Remove dead and unused variable NumSentinelElements. Fixes PR31529. llvm-svn: 290998	2017-01-04 20:05:18 +00:00
Simon Pilgrim	c76ea4b638	[X86] Attempt to pre-truncate arithmetic operations if useful In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947	2017-01-04 08:05:42 +00:00
Craig Topper	d0aa53b9ae	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946	2017-01-04 07:32:03 +00:00
Craig Topper	83115a809f	[AVX-512] Simplify code for creating 512-bit SHUF128 operations. We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942	2017-01-04 07:31:51 +00:00
Craig Topper	48d232d3e7	[X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872	2017-01-03 07:36:41 +00:00
Craig Topper	785e58fdc9	[AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871	2017-01-03 07:36:39 +00:00
Craig Topper	9496e3f916	[AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts. llvm-svn: 290870	2017-01-03 07:00:40 +00:00
Craig Topper	c849172105	[AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation. llvm-svn: 290865	2017-01-03 05:46:02 +00:00
Craig Topper	0cda8bbf74	[AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864	2017-01-03 05:45:57 +00:00
Reid Kleckner	cd46c1df80	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64" This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714	2016-12-29 17:07:10 +00:00
Reid Kleckner	c9e0a153cf	[COFF] Use 32-bit jump table entries in .rdata for Win64 Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694	2016-12-29 00:12:39 +00:00
Craig Topper	f56d985f77	[AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532	2016-12-26 01:40:17 +00:00
Michael Zuckerman	86602e85dd	revert commit 290516 llvm-svn: 290517	2016-12-25 12:45:18 +00:00
Michael Zuckerman	45aa420640	Commit try added new empty line llvm-svn: 290516	2016-12-25 12:01:34 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Elena Demikhovsky	7c7bf1b432	Added a template for building target specific memory node in DAG. I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250	2016-12-21 10:43:36 +00:00
Oren Ben Simhon	cb692157b7	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing a warning. llvm-svn: 290248	2016-12-21 09:47:31 +00:00
Oren Ben Simhon	3b95157090	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240	2016-12-21 08:31:45 +00:00
Simon Pilgrim	688114d888	[X86][SSE] Ensure we're only combining shuffles with legal mask types. I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types. llvm-svn: 290183	2016-12-20 17:09:52 +00:00
Daniel Jasper	373f9a6a0c	Revert r289955 and r289962. This is causing lots of ASAN failures for us. Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066	2016-12-18 14:36:38 +00:00
Simon Pilgrim	e940daf532	[X86][SSE] Add support for combining target shuffles to SHUFPS. As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064	2016-12-18 14:26:02 +00:00
Craig Topper	7029db0eaa	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060	2016-12-18 07:54:23 +00:00
Hans Wennborg	ef57755427	Fix -Wself-assign from r289955 llvm-svn: 289962	2016-12-16 17:16:46 +00:00
Hans Wennborg	35f21cba13	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955	2016-12-16 16:34:59 +00:00
Simon Pilgrim	4b73c3de50	[X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950	2016-12-16 15:23:32 +00:00

1 2 3 4 5 ...

4344 Commits