llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	07cfc80186	Remove trailing whitespace. NFCI. llvm-svn: 305476	2017-06-15 16:20:27 +00:00
Simon Pilgrim	4d432b2c6b	[X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was assuming v8i16 vectors We can use this with v16i16/v32i16 as well. Found during fuzz testing. llvm-svn: 305472	2017-06-15 14:52:30 +00:00
Simon Pilgrim	b98cb3808c	Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470	2017-06-15 14:39:34 +00:00
Ayman Musa	56912cda71	[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465	2017-06-15 13:02:37 +00:00
Sanjay Patel	83cb007940	[x86] avoid unnecessary shuffle mask math in combineX86ShufflesRecursively() This is a follow-up to https://reviews.llvm.org/D34174 / https://reviews.llvm.org/rL305398. We mentioned replacing the multiplies with shifts, but the real win seems to be in bypassing the extra ops in the common case when the RootRatio and OpRatio are one. This gives us another 1-2% overall win for the test in PR32037: https://bugs.llvm.org/show_bug.cgi?id=32037 llvm-svn: 305414	2017-06-14 20:37:11 +00:00
Sanjay Patel	ce0b99563a	[x86] replace div/rem with shift/mask for better shuffle combining perf We know that shuffle masks are power-of-2 sizes, but there's no way (?) for LLVM to know that, so hack combineX86ShufflesRecursively() to be much faster by replacing div/rem with shift/mask. This makes the motivating compile-time bug in PR32037 ( https://bugs.llvm.org/show_bug.cgi?id=32037 ) about 9% faster overall. Differential Revision: https://reviews.llvm.org/D34174 llvm-svn: 305398	2017-06-14 17:00:57 +00:00
Simon Pilgrim	9ff06a0c7e	Strip UTF8 BOM that got added in rL305091 Seems my recent move to VS2017 has resulted in a few text editor issues..... llvm-svn: 305285	2017-06-13 10:17:57 +00:00
Simon Pilgrim	2b3b717768	[X86][SSE] Refactor getTargetConstantBitsFromNode to avoid large APInts (PR32037) Much of PR32037's compile time regression is due to getTargetConstantBitsFromNode always creating large (>64bit) APInts during the bitcasting from the source data to the destination bitwidth. This commit avoids this bitcast stage if the data is already the correct bitwidth. llvm-svn: 305284	2017-06-13 10:13:48 +00:00
Sanjay Patel	d4765a38b4	[DAG] add helper to bind memop chains; NFCI This step is just intended to reduce code duplication rather than change any functionality. A follow-up would be to replace PPCTargetLowering::spliceIntoChain() usage with this new helper. Differential Revision: https://reviews.llvm.org/D33649 llvm-svn: 305192	2017-06-12 14:41:48 +00:00
Sanjay Patel	dcbfbb11d9	[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a 32-byte load I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171	2017-06-11 21:18:58 +00:00
Simon Pilgrim	3d37b1a277	[X86][SSE] Add support for PACKSS nodes to faux shuffle extraction If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091	2017-06-09 17:29:52 +00:00
Andrew V. Tischenko	e0531025f8	This patch closes PR28513: an optimization of multiplication by different constants. The initial patch was rejected: I fixed the issue and re-apply it. llvm-svn: 304972	2017-06-08 10:20:13 +00:00
Sanjay Patel	6e8e7cc70e	[x86] avoid flipping sign bits for vector icmp by using known bits If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909	2017-06-07 13:46:34 +00:00
Simon Pilgrim	58f5be2771	[X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining We were checking that the index was in range of the destination vector type, not the (larger) source vector type llvm-svn: 304894	2017-06-07 10:30:35 +00:00
Simon Pilgrim	b2ef948628	[X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal (PR32744) Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304718	2017-06-05 16:02:01 +00:00
Simon Pilgrim	46dd55f1e1	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688	2017-06-04 20:12:04 +00:00
Simon Pilgrim	f93debb40c	[X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. llvm-svn: 304659	2017-06-03 11:12:57 +00:00
Sanjay Patel	e737cf8500	[x86] simplify code for vector icmp pred transforms; NFCI Organizing by transform is smaller and easier to read than a squashed switch with fall-throughs. llvm-svn: 304611	2017-06-02 23:21:53 +00:00
Ahmed Bougacha	018a68f9e4	[X86] Correctly broadcast NaN-like integers as float on AVX. Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590	2017-06-02 20:02:59 +00:00
Sanjay Patel	469014ada4	[x86] fix formatting; NFCI llvm-svn: 304576	2017-06-02 18:14:31 +00:00
David Blaikie	b6b42e018a	Tidy up a bit of r304516, use SmallVector::assign rather than for loop This might give a few better opportunities to optimize these to memcpy rather than loops - also a few minor cleanups (StringRef-izing, templating (to avoid std::function indirection), etc). The SmallVector::assign(iter, iter) could be improved with the use of SFINAE, but the (iter, iter) ctor and append(iter, iter) need it to and don't have it - so, workaround it for now rather than bothering with the added complexity. (also, as noted in the added FIXME, these assign ops could potentially be optimized better at least for non-trivially-copyable types) llvm-svn: 304566	2017-06-02 17:24:26 +00:00
Amaury Sechet	2adb7bdbca	Remove ADDC, ADDE, SUBC, SUBE and SETCCE support from the X86 backend, use the CARRY ops instead. Summary: As per title. This cleanup some technical debt. Depends on D33374 Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33390 llvm-svn: 304435	2017-06-01 16:33:08 +00:00
Zvi Rackover	7693733e80	[X86] Match bitcast of vxi1 to pmovmsk Summary: Add an early combine to match patterns such as: (i16 bitcast (v16i1 x)) -> (i16 movmsk (v16i8 sext (v16i1 x))) This combine needs to happen early enough before type-legalization scalarizes the result of the setcc. Reviewers: igorb, craig.topper, RKSimon Subscribers: delena, llvm-commits Differential Revision: https://reviews.llvm.org/D33311 llvm-svn: 304406	2017-06-01 11:27:57 +00:00
Amaury Sechet	251ea8a4f8	Do not legalize large setcc with setcce, introduce setcccarry and do it with usubo/setcccarry. Summary: This is a continuation of the work started in D29872 . Passing the carry down as a value rather than as a glue allows for further optimizations. Introducing setcccarry makes the use of addc/subc unecessary and we can start the removal process. This patch only introduce the optimization strictly required to get the same level of optimization as was available before nothing more. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33374 llvm-svn: 304404	2017-06-01 11:14:17 +00:00
Amaury Sechet	6506a90a70	Remove ISD::SETCC match from combineX86ADD. It's done improperly and doesn't work. llvm-svn: 304403	2017-06-01 11:13:10 +00:00
Galina Kistanova	6ad77845e2	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC. llvm-svn: 304312	2017-05-31 17:10:03 +00:00
Vedant Kumar	87aefe9042	Revert "This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level." This reverts commit r304209. I think this change is responsible for a tablgen failure in stage2 builds: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/2171/ I reproduced the failure locally (without ThinLTO), reverted the commit, rebuilt the stage1 clang, rebuilt the stage2 llvm-tblgen tool, and found that the crash disappears when the commit is reverted. Here is the stack trace: FAILED: lib/Target/ARM/ARMGenRegisterBank.inc.tmp cd /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM && /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes /Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp 0 llvm-tblgen 0x0000000106fc9568 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40 1 llvm-tblgen 0x0000000106fc9be6 SignalHandler(int) + 422 2 libsystem_platform.dylib 0x00000001076a7fba _sigtramp + 26 3 libsystem_platform.dylib 0x00007fff58deb468 _sigtramp + 1366570184 4 llvm-tblgen 0x0000000106e89cc7 llvm::CodeGenRegBank::getCompositeSubRegIndex(llvm::CodeGenSubRegIndex, llvm::CodeGenSubRegIndex) + 615 5 llvm-tblgen 0x0000000106e88be6 llvm::CodeGenRegister::computeSubRegs(llvm::CodeGenRegBank&) + 2182 6 llvm-tblgen 0x0000000106e8e9f0 llvm::CodeGenRegBank::CodeGenRegBank(llvm::RecordKeeper&) + 2192 7 llvm-tblgen 0x0000000106f384a1 llvm::EmitRegisterBank(llvm::RecordKeeper&, llvm::raw_ostream&) + 65 8 llvm-tblgen 0x0000000106f72c64 (anonymous namespace)::LLVMTableGenMain(llvm::raw_ostream&, llvm::RecordKeeper&) + 1172 9 llvm-tblgen 0x0000000106fcb15f llvm::TableGenMain(char, bool ()(llvm::raw_ostream&, llvm::RecordKeeper&)) + 3599 10 llvm-tblgen 0x0000000106f727a6 main + 134 11 libdyld.dylib 0x000000010733c6a5 start + 1 Stack dump: 0. Program arguments: /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz-master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp /bin/sh: line 1: 41986 Segmentation fault: 11 /Volumes/Builds/pz-master-stage2-RA/bin/llvm-tblgen -gen-register-bank -I /Users/vk/llvm/lib/Target/ARM -I /Users/vk/llvm/include -I /Users/vk/llvm/lib/Target /Users/vk/llvm/lib/Target/ARM/ARM.td -o /Volumes/Builds/pz -master-stage2-RA/lib/Target/ARM/ARMGenRegisterBank.inc.tmp llvm-svn: 304231	2017-05-30 19:25:22 +00:00
Craig Topper	f6d4dc5b4a	[SelectionDAG] Set ISD::FPOWI to Expand by default Summary: Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie". This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default. Reviewers: spatel, RKSimon, efriedma Reviewed By: RKSimon Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits Differential Revision: https://reviews.llvm.org/D33530 llvm-svn: 304215	2017-05-30 15:27:55 +00:00
Andrew V. Tischenko	8b04826663	This patch closes PR28513: an optimization of multiplication by different constants. It's implemented on DAG combiner level. llvm-svn: 304209	2017-05-30 13:00:44 +00:00
Oren Ben Simhon	7bf27f03f2	[X86] Adding vpopcntd and vpopcntq instructions AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858	2017-05-25 13:45:23 +00:00
Guy Blank	548e22a1a7	[X86][AVX512] Make i1 illegal in the CodeGen This patch defines the i1 type as illegal in the X86 backend for AVX512. For DAG operations on <N x i1> types (build vector, extract vector element, ...) i8 is used, and should be truncated/extended. This should produce better scalar code for i1 types since GPRs will be used instead of mask registers. Differential Revision: https://reviews.llvm.org/D32273 llvm-svn: 303421	2017-05-19 12:35:15 +00:00
Michael Liao	ab12984634	Fix PR33028 - '-verify-mahcineinstrs' starts to complain allocatable live-in physical registers on non-entry or non-landing-pad basic blocks. - Refactor the XBEGIN translation to define EAX on a dedicated fallback code path due to XABORT. Add a pseudo instruction to define EAX explicitly to avoid add physical register live-in. Differential Revision: https://reviews.llvm.org/D33168 llvm-svn: 303306	2017-05-17 21:48:00 +00:00
Peter Collingbourne	6f0ecca3b5	IR: Give function GlobalValue::getRealLinkageName() a less misleading name: dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134	2017-05-16 00:39:01 +00:00
Zvi Rackover	e6b278bc65	[X86] Utilize SelectionDAG::getSelect(). NFC. Replace SelectionDAG::getNode(ISD::SELECT, ...) and SelectionDAG::getNode(ISD::VSELECT, ...) with SelectionDAG::getSelect(...) Saves a few lines of code and in some cases saves the need to explicitly check the type of the desired node. llvm-svn: 303024	2017-05-14 21:30:38 +00:00
Craig Topper	ceea1a76a1	[X86] Remove unused value from IntrinsicType enum. NFC llvm-svn: 303018	2017-05-14 19:38:06 +00:00
Simon Pilgrim	f3ee9c6997	[X86][AVX] Allow 32-bit targets to peek through subvectors to extract constant splats for vXi64 shifts. llvm-svn: 303009	2017-05-14 11:46:26 +00:00
Craig Topper	8df66c602a	[KnownBits] Add bit counting methods to KnownBits struct and use them where possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925	2017-05-12 17:20:30 +00:00
Reid Kleckner	43bbeb4c9f	Issue diagnostics when returning FP values on x86_64 without SSE1/2 Avoid using report_fatal_error, because it will ask the user to file a bug. If the user attempts to disable SSE on x86_64 and them use floating point, that's a bug in their code, not a bug in the compiler. This is just a start. There are other ways to crash the backend in this configuration, but they should be updated to follow this pattern. Differential Revision: https://reviews.llvm.org/D27522 llvm-svn: 302835	2017-05-11 22:43:02 +00:00
Chandler Carruth	97500a9918	[x86] Fix a failure to select with AVX-512 when the type legalizer manages to form a VSELECT with a non-i1 element type condition. Those are technically allowed in SDAG (at least, the generic type legalization logic will form them and I wouldn't want to try to audit everything te preclude forming them) so we need to be able to lower them. This isn't too hard to implement. We mark VSELECT as custom so we get a chance in C++, add a fast path for i1 conditions to get directly handled by the patterns, and a fallback when we need to manually force the condition to be an i1 that uses the vptestm instruction to turn a non-mask into a mask. This, unsurprisingly, generates awful code. But it at least doesn't crash. This was actually impacting open source packages built with LLVM for AVX-512 in the wild, so quickly landing a patch that at least stops the immediate bleeding. I think I've found where to fix the codegen quality issue, but less confident of that change so separating it out from the thing that doesn't change the result of any existing test case but causes mine to not crash. llvm-svn: 302785	2017-05-11 10:52:16 +00:00
Simon Pilgrim	a4a13a0da0	Strip trailing whitespace. NFCI. llvm-svn: 302784	2017-05-11 10:03:05 +00:00
Serge Guelton	e38003f839	Suppress all uses of LLVM_END_WITH_NULL. NFC. Use variadic templates instead of relying on <cstdarg> + sentinel. This enforces better type checking and makes code more readable. Differential Revision: https://reviews.llvm.org/D32541 llvm-svn: 302571	2017-05-09 19:31:13 +00:00
Guy Blank	0c42d8c35b	VX512] Only look at lower bit in constant scalar masks for scalar masked instructions only the lower bit of the mask is relevant. so for constant masks we should either do an unmasked operation or no operation, depending on the value of the lower bit. This patch handles cases where the lower bit is '1'. Differential Revision: https://reviews.llvm.org/D32805 llvm-svn: 302546	2017-05-09 16:16:48 +00:00
Serge Pavlov	d526b13e61	Add extra operand to CALLSEQ_START to keep frame part set up previously Using arguments with attribute inalloca creates problems for verification of machine representation. This attribute instructs the backend that the argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size stored in CALLSEQ_START in this case does not count the size of this argument. However CALLSEQ_END still keeps total frame size, as caller can be responsible for cleanup of entire frame. So CALLSEQ_START and CALLSEQ_END keep different frame size and the difference is treated by MachineVerifier as stack error. Currently there is no way to distinguish this case from actual errors. This patch adds additional argument to CALLSEQ_START and its target-specific counterparts to keep size of stack that is set up prior to the call frame sequence. This argument allows MachineVerifier to calculate actual frame size associated with frame setup instruction and correctly process the case of inalloca arguments. The changes made by the patch are: - Frame setup instructions get the second mandatory argument. It affects all targets that use frame pseudo instructions and touched many files although the changes are uniform. - Access to frame properties are implemented using special instructions rather than calls getOperand(N).getImm(). For X86 and ARM such replacement was made previously. - Changes that reflect appearance of additional argument of frame setup instruction. These involve proper instruction initialization and methods that access instruction arguments. - MachineVerifier retrieves frame size using method, which reports sum of frame parts initialized inside frame instruction pair and outside it. The patch implements approach proposed by Quentin Colombet in https://bugs.llvm.org/show_bug.cgi?id=27481#c1. It fixes 9 tests failed with machine verifier enabled and listed in PR27481. Differential Revision: https://reviews.llvm.org/D32394 llvm-svn: 302527	2017-05-09 13:35:13 +00:00
Simon Pilgrim	ca3a63a849	[X86][SSE42] Lower v2i64/v4i64 ASHR(X, 63) as PCMPGTQ(0, X) Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD. Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal. Differential Revision: https://reviews.llvm.org/D32973 llvm-svn: 302525	2017-05-09 13:14:40 +00:00
Simon Pilgrim	df39b03f29	[X86][SSE] Improve combineLogicBlendIntoPBLENDV to use general masks. Currently combineLogicBlendIntoPBLENDV can only match ASHR to detect sign splatting of a bit mask, this patch generalises this to use computeNumSignBits instead. This is a first step in several things we can do to improve PBLENDV support: * Better matching of X86ISD::ANDNP patterns. * Handle floating point cases. * Better vector and bitcast support in computeNumSignBits. * Recognise that PBLENDV only uses the sign bit of the mask, we should be able strip away sign splats (ASHR, PCMPGT isNeg tests etc.). Differential Revision: https://reviews.llvm.org/D32953 llvm-svn: 302424	2017-05-08 14:16:39 +00:00
Dean Michael Berris	9bcaed867a	[XRay] Custom event logging intrinsic This patch introduces an LLVM intrinsic and a target opcode for custom event logging in XRay. Initially, its use case will be to allow users of XRay to log some type of string ("poor man's printf"). The target opcode compiles to a noop sled large enough to enable calling through to a runtime-determined relative function call. At runtime, when X-Ray is enabled, the sled is replaced by compiler-rt with a trampoline to the logic for creating the custom log entries. Future patches will implement the compiler-rt parts and clang-side support for emitting the IR corresponding to this intrinsic. Reviewers: timshen, dberris Subscribers: igorb, pelikan, rSerge, timshen, echristo, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D27503 llvm-svn: 302405	2017-05-08 05:45:21 +00:00
Simon Pilgrim	33f7397cc0	[X86][AVX512] Relax assertion and just exit combine for unsupported types (PR32907) llvm-svn: 302361	2017-05-06 20:53:52 +00:00
Simon Pilgrim	fea153f341	[X86][AVX512] Move v2i64/v4i64 VPABS lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302359	2017-05-06 19:11:59 +00:00
Simon Pilgrim	f15a2f4d94	[X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI. llvm-svn: 302357	2017-05-06 18:17:56 +00:00
Simon Pilgrim	781cb10104	[X86][SSE] Break register dependencies on v16i8/v8i16 BUILD_VECTOR on SSE41 rL294581 broke unnecessary register dependencies on partial v16i8/v8i16 BUILD_VECTORs, but on SSE41 we (currently) use insertion for full BUILD_VECTORs as well. By allowing full insertion to occur on SSE41 targets we can break register dependencies here as well. llvm-svn: 302355	2017-05-06 17:30:39 +00:00
Simon Pilgrim	430a335b7b	[X86] Use SDValue::getConstantOperandVal helper. NFCI. llvm-svn: 302286	2017-05-05 20:53:52 +00:00
Craig Topper	f0aeee01c3	[KnownBits] Add wrapper methods for setting and clear all bits in the underlying APInts in KnownBits. This adds routines for reseting KnownBits to unknown, making the value all zeros or all ones. It also adds methods for querying if the value is zero, all ones or unknown. Differential Revision: https://reviews.llvm.org/D32637 llvm-svn: 302262	2017-05-05 17:36:09 +00:00
Simon Pilgrim	ac3c4b6da4	[X86][AVX512] Improve support and testing for CTLZ of 512-bit vectors without CDI llvm-svn: 302233	2017-05-05 13:31:52 +00:00
Simon Pilgrim	e9c5d7b70b	[X86] Remove duplicate operation actions. NFCI. llvm-svn: 302230	2017-05-05 12:34:55 +00:00
Simon Pilgrim	c89aa0bee5	[X86][AVX512CDI] Move v2i64/v4i64 and v4i32/v8i32 VPLZCNT lowering to tablegen Extend NoVLX targets to use the 512-bit versions llvm-svn: 302229	2017-05-05 12:20:34 +00:00
Simon Pilgrim	73b88d5183	Remove unused variable llvm-svn: 302226	2017-05-05 11:55:38 +00:00
Simon Pilgrim	1d47a15d89	[X86][AVX] Add LowerIntUnary helpers to split unary vector ops in half. NFCI. Same as LowerIntArith helpers but for unary ops instead of binary. llvm-svn: 302222	2017-05-05 10:59:24 +00:00
Craig Topper	d938fd1397	[KnownBits] Add zext, sext, and trunc methods to KnownBits This patch adds zext, sext, and trunc methods to KnownBits and uses them where possible. Differential Revision: https://reviews.llvm.org/D32784 llvm-svn: 302088	2017-05-03 22:07:25 +00:00
Simon Pilgrim	eada39d050	Silence a 'enum and non-enum used in conditional' warning. llvm-svn: 302048	2017-05-03 16:43:57 +00:00
Simon Pilgrim	99b925bdf3	[X86][LWP] Add llvm support for LWP instructions (reapplied). This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041	2017-05-03 15:51:39 +00:00
Simon Pilgrim	a271c54324	Revert rL302028 due to accidental line ending changes. llvm-svn: 302038	2017-05-03 15:42:29 +00:00
Simon Pilgrim	b2e0464fde	[X86][LWP] Add llvm support for LWP instructions. This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028	2017-05-03 15:18:34 +00:00
Guy Blank	d0baa524d0	[X86][AVX512] remove unnecessary case. NFC VFPCLASS is for vector types and not scalar, so it cannot get here. Differential Revision: https://reviews.llvm.org/D32694 llvm-svn: 302023	2017-05-03 13:34:05 +00:00
Oren Ben Simhon	dbd4bba1ec	[X86] Support of no_caller_saved_registers attribute This patch implements the LLVM part for no_caller_saved_registers attribute as appears here: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5ed3cc7b66af4758f7849ed6f65f4365be8223be. In order to implement the attribute, we use the dynamic CSR mechanism to remove returned/passed arguments from the function regmask/CSR list. Differential Revision: https://reviews.llvm.org/D31876 llvm-svn: 302020	2017-05-03 13:07:19 +00:00
Simon Pilgrim	05cfa83843	[X86] Refactored LowerINTRINSIC_W_CHAIN to use a switch statament. NFCI. Pre-commit as requested in D32769. llvm-svn: 302010	2017-05-03 10:40:18 +00:00
Simon Pilgrim	24d361f7bf	[X86] Tidyup subvector insert/extract helpers. NFCI. Use getConstantOperandVal where possible. llvm-svn: 301912	2017-05-02 11:08:15 +00:00
Simon Pilgrim	7aca5218b0	Fix typo in comment. NFCI. llvm-svn: 301911	2017-05-02 10:43:33 +00:00
Simon Pilgrim	8d196c88a6	[X86] Reduce code for setting operations actions by merging into loops across multiple types/ops. NFCI. llvm-svn: 301879	2017-05-01 23:09:01 +00:00
Simon Pilgrim	ab1a82764f	[X86][AVX] Rename LowerVectorBroadcast to lowerBuildVectorAsBroadcast. NFCI. Since the shuffle refactor, this is only used during BUILD_VECTOR lowering. llvm-svn: 301834	2017-05-01 20:56:35 +00:00
Amara Emerson	d28f0cd448	Generalize the specialized flag-carrying SDNodes by moving flags into SDNode. This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803	2017-05-01 15:17:51 +00:00
Amaury Sechet	8ac81f3924	Do not legalize large add with addc/adde, introduce addcarry and do it with uaddo/addcarry Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29872 llvm-svn: 301775	2017-04-30 19:24:09 +00:00
Craig Topper	778f57b4f1	[APInt] Replace calls to setBits with more specific calls to setBitsFrom and setLowBits where possible. llvm-svn: 301768	2017-04-30 07:44:58 +00:00
Craig Topper	d503644a4a	[X86] Clear KnownBits instead of reconstructing it. NFC llvm-svn: 301767	2017-04-30 07:44:55 +00:00
Matthias Braun	744c215e29	TargetLowering: Add finalizeLowering() function; NFC Adds a new method finalizeLowering to TargetLoweringBase. This is in preparation for an upcoming commit. This function is meant for target specific adjustments to MachineFrameInfo or register reservations. Move the freezeRegisters() and the hasCopyImplyingStackAdjustment() handling into the new function to prove the concept. As an added bonus GlobalISel no longer missed the hasCopyImplyingStackAdjustment() handling with this. Differential Revision: https://reviews.llvm.org/D32621 llvm-svn: 301679	2017-04-28 20:25:05 +00:00
Simon Pilgrim	cce5097ce4	Move variable local to where ita used. NFCI. llvm-svn: 301646	2017-04-28 14:42:15 +00:00
Craig Topper	d0af7e8ab8	[SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620	2017-04-28 05:31:46 +00:00
Craig Topper	0e03e74e95	[SelectionDAG] Use various APInt methods to reduce temporary APInt creation This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version. llvm-svn: 301618	2017-04-28 04:57:59 +00:00
Craig Topper	24e71017aa	[APInt] Use inplace shift methods where possible. NFCI llvm-svn: 301612	2017-04-28 03:36:24 +00:00
Simon Pilgrim	d68785803b	[SelectionDAG] Added getBuildVector(ArrayRef<SDUse>) helper. llvm-svn: 301322	2017-04-25 16:41:28 +00:00
Krzysztof Parzyszek	c8e8e2a046	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301234	2017-04-24 19:51:12 +00:00
Krzysztof Parzyszek	98ab4c64c4	Revert r301231: Accidentally committed stale files I forgot to commit local changes before commit. llvm-svn: 301232	2017-04-24 19:48:51 +00:00
Krzysztof Parzyszek	c0197066d7	Move value type list from TargetRegisterClass to TargetRegisterInfo Differential Revision: https://reviews.llvm.org/D31937 llvm-svn: 301231	2017-04-24 19:43:45 +00:00
Renato Golin	4abfb3d741	Revert "[APInt] Fix a few places that use APInt::getRawData to operate within the normal API." This reverts commit r301105, 4, 3 and 1, as a follow up of the previous revert, which broke even more bots. For reference: Revert "[APInt] Use operator<<= where possible. NFC" Revert "[APInt] Use operator<<= instead of shl where possible. NFC" Revert "[APInt] Use ashInPlace where possible." PR32754. llvm-svn: 301111	2017-04-23 12:15:30 +00:00
Craig Topper	cdd5ae6676	[APInt] Use operator<<= where possible. NFC llvm-svn: 301104	2017-04-23 05:43:02 +00:00
Craig Topper	5f68af0806	[APInt] Use operator<<= instead of shl where possible. NFC llvm-svn: 301103	2017-04-23 05:18:31 +00:00
Craig Topper	ae9672c96d	[APInt] Use ashInPlace where possible. llvm-svn: 301101	2017-04-23 03:45:59 +00:00
Akira Hatanaka	22e839f4b2	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300932 and r300930, which was causing dag-combine to loop forever. The problem was that optimizeLogicalImm was returning true even when there was no change to the immediate node (which happened when the immediate was all zeros or ones), which caused dag-combine to push and pop the same node to the work list over and over again without making any progress. This commit fixes the bug by returning false early in optimizeLogicalImm if the immediate is all zeros or ones. Also, it changes the code to compare the immediate with 0 or Mask rather than calling countPopulation. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 301019	2017-04-21 18:53:12 +00:00
Akira Hatanaka	78ccba6a20	Revert r300932 and r300930. It seems that r300930 was creating an infinite loop in dag-combine when compling the following file: MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c llvm-svn: 300940	2017-04-21 01:31:50 +00:00
Akira Hatanaka	19077aaee0	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. This recommits r300913, which broke bots because I didn't fix a call to ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of TargetLoweringOpt and TargetLowering. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300930	2017-04-21 00:05:16 +00:00
Akira Hatanaka	7b06cebe73	Revert "[AArch64] Improve code generation for logical instructions taking" This reverts r300913. This broke bots. llvm-svn: 300916	2017-04-20 23:03:30 +00:00
Akira Hatanaka	e327f09832	[AArch64] Improve code generation for logical instructions taking immediate operands. This commit adds an AArch64 dag-combine that optimizes code generation for logical instructions taking immediate operands. The optimization uses demanded bits to change a logical instruction's immediate operand so that the immediate can be folded into the immediate field of the instruction. rdar://problem/18231627 Differential Revision: https://reviews.llvm.org/D5591 llvm-svn: 300913	2017-04-20 22:47:56 +00:00
Craig Topper	bcfd2d1789	[APInt] Rename getSignBit to getSignMask getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask. Differential Revision: https://reviews.llvm.org/D32108 llvm-svn: 300856	2017-04-20 16:56:25 +00:00
Dehao Chen	58601674d2	PR32710: Disable using PMADDWD for unsigned short. Summary: PMADDWD can only handle signed short. Reviewers: mkuper, wmi Reviewed By: mkuper Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D32236 llvm-svn: 300737	2017-04-19 19:50:34 +00:00
Sanjoy Das	f09c1e346e	Add a getPointerOperandType() helper to LoadInst and StoreInst; NFC I will use this in a later change. llvm-svn: 300613	2017-04-18 22:00:54 +00:00
Matt Arsenault	3138075dd4	DAG: Make mayBeEmittedAsTailCall parameter const llvm-svn: 300603	2017-04-18 21:16:46 +00:00
Simon Pilgrim	e8ad1da4e2	[X86] Use for-range loop. NFCI. llvm-svn: 300567	2017-04-18 17:18:54 +00:00
Craig Topper	fc947bcfba	[APInt] Use lshrInPlace to replace lshr where possible This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result. This adds an lshrInPlace(const APInt &) version as well. Differential Revision: https://reviews.llvm.org/D32155 llvm-svn: 300566	2017-04-18 17:14:21 +00:00
Benjamin Kramer	f5f593b674	[X86] Remove special handling for 16 bit for A asm constraints. Our 16 bit support is assembler-only + the terrible hack that is .code16gcc. Simply using 32 bit registers does the right thing for the latter. Fixes PR32681. llvm-svn: 300429	2017-04-16 20:13:08 +00:00
Dimitry Andric	909b3376ba	Use correct registers for "A" inline asm constraint Summary: In PR32594, inline assembly using the 'A' constraint on x86_64 causes llvm to crash with a "Cannot select" stack trace. This is because `X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A' means the EAX and EDX registers. However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86 (ia16?) it means the old AX and DX registers. Add new register classes in `X86RegisterInfo.td` to support these cases, and amend the logic in `getRegForInlineAsmConstraint` to cope with different subtargets. Also add a test case, derived from PR32594. Reviewers: craig.topper, qcolombet, RKSimon, ab Reviewed By: ab Subscribers: ab, emaste, royger, llvm-commits Differential Revision: https://reviews.llvm.org/D31902 llvm-svn: 300404	2017-04-15 22:15:01 +00:00
Davide Italiano	8455f7d623	[X86] Create the correct ADC/SBB SDNode when lowering add. Differential Revision: https://reviews.llvm.org/D31911 llvm-svn: 299973	2017-04-11 19:11:20 +00:00
Serge Guelton	59a2d7b909	Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. Differential Revision: https://reviews.llvm.org/D31070 llvm-svn: 299949	2017-04-11 15:01:18 +00:00
Diana Picus	b050c7fbe0	Revert "Turn some C-style vararg into variadic templates" This reverts commit r299925 because it broke the buildbots. See e.g. http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008 llvm-svn: 299928	2017-04-11 10:07:12 +00:00
Serge Guelton	5fd75fb72e	Turn some C-style vararg into variadic templates Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. llvm-svn: 299925	2017-04-11 08:36:52 +00:00
Dehao Chen	58fa724494	Use PMADDWD to expand reduction in a loop Summary: PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like: for (int i = 0; i < count; i++) a += x[i] * y[i]; Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31679 llvm-svn: 299776	2017-04-07 15:41:52 +00:00
Michael Kuperstein	6129887d21	[X86] Revert r299387 due to AVX legalization infinite loop. llvm-svn: 299720	2017-04-06 22:33:25 +00:00
Mehdi Amini	db11fdfda5	Revert "Turn some C-style vararg into variadic templates" This reverts commit r299699, the examples needs to be updated. llvm-svn: 299702	2017-04-06 20:23:57 +00:00
Mehdi Amini	579540a8f7	Turn some C-style vararg into variadic templates Module::getOrInsertFunction is using C-style vararg instead of variadic templates. From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments. The variadic template is an obvious solution to both issues. Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu> Differential Revision: https://reviews.llvm.org/D31070 llvm-svn: 299699	2017-04-06 20:09:31 +00:00
Simon Pilgrim	5fbd93b21a	[X86][SSE] Renamed combine to make it clear that it only handles the vector shift by immediate opcodes. NFCI llvm-svn: 299532	2017-04-05 10:44:42 +00:00
Ahmed Bougacha	ec8b1fb539	[X86] Relax assert in broadcast-of-subvector lowering. Before r294774, there was a problem when lowering broadcasts to use 128-bit subvectors. When we looked through a bitcast to find the broadcast input, we'd keep using the original type, so you'd end up with things like: (v8f32 (broadcast (v4f32 (extract_subvector (v8i32 V), ...)) )) r294774 fixed it to always emit subvectors with the scalar type of the original source. It also introduced some asserts, to check that we use scalars with the same size, and vectors with the same number of elements. The scalar size equality is checked earlier when looking through bitcasts, and is a useful assert. However, the number of elements don't have to be identical: we're always going to extract a 128-bit subvector, and we can have different size inputs if we looked through a concat_vector to find a 256-bit source. Relax the overzealous assert. Replace it with a check of the original source vector being 256 or 512 bits. If it's 128 bits, we can't extract_subvector from it. Fixes PR32371. llvm-svn: 299490	2017-04-05 00:14:39 +00:00
Sanjay Patel	ac618383e3	[x86] remove dead select-of-constants transform; NFCI https://reviews.llvm.org/D30537 / https://reviews.llvm.org/rL296977 added these transforms and other related transforms to the generic DAGCombiner (with a hook that x86 sets to true), so these patterns should not exist by the time we reach the target-specific combiner hook. llvm-svn: 299448	2017-04-04 16:54:58 +00:00
Simon Pilgrim	448222d8ba	Strip trailing whitespace llvm-svn: 299438	2017-04-04 14:40:53 +00:00
Oren Ben Simhon	568fb197da	[X86] Add 64 bit pattern matching for PSADBW PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison. The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT). Differential Revision: https://reviews.llvm.org/D31577 llvm-svn: 299425	2017-04-04 10:23:18 +00:00
Simon Pilgrim	af33757b5d	[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging. This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values. There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch. Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this. Differential Revision: https://reviews.llvm.org/D31373 llvm-svn: 299387	2017-04-03 21:06:51 +00:00
Amjad Aboud	0389f62879	x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated. This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue. Fixes Bug 26413 Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D30049 llvm-svn: 299383	2017-04-03 20:28:45 +00:00
Craig Topper	d33ee1b960	[APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation. Differential Revision: https://reviews.llvm.org/D31565 llvm-svn: 299362	2017-04-03 16:34:59 +00:00
Simon Pilgrim	0e2f8cd875	[X86][MMX] Improve support for folding fptosi from XMM to MMX llvm-svn: 299338	2017-04-02 17:45:41 +00:00
Simon Pilgrim	ba28263b03	[X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64 llvm-svn: 299336	2017-04-02 16:20:34 +00:00
Simon Pilgrim	e56a2d7b4c	[X86][MMX] Added support for subvector extraction to MMX register llvm-svn: 299335	2017-04-02 15:52:28 +00:00
Craig Topper	9601168670	[AVX-512] Update lowering for gather/scatter prefetch intrinsics to match the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers. Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too. Fixes PR32411. llvm-svn: 299234	2017-03-31 17:24:29 +00:00
Simon Pilgrim	3c81c34d8d	[DAGCombiner] Add vector demanded elements support to ComputeNumSignBits Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219	2017-03-31 13:54:09 +00:00
Simon Pilgrim	37b536e4b3	[DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201	2017-03-31 11:24:16 +00:00
Simon Pilgrim	ef4509b36e	Spelling mistakes in comments. NFCI. llvm-svn: 299069	2017-03-30 12:30:15 +00:00
Davide Italiano	e13920f407	[X86IselLowering] Remove extraneous semicolon. NFCI. Unbreaks the build with GCC -Werror. llvm-svn: 299030	2017-03-29 21:34:58 +00:00
Simon Pilgrim	8362c95257	[X86] Tidied up comment - we don't custom lower add/sub i64 on i686 anymore. NFCI. llvm-svn: 299004	2017-03-29 15:41:58 +00:00
Simon Pilgrim	fc97d5049f	Spelling mistakes in comments. NFCI. llvm-svn: 299000	2017-03-29 15:27:24 +00:00
Simon Pilgrim	2845189bd1	[X86][AVX2] Prevent unary interleaving patterns from calling lowerVectorShuffleAsSplitOrBlend (PR32453) llvm-svn: 298993	2017-03-29 13:00:00 +00:00
Simon Pilgrim	c7c5aa47cf	[X86][MMX] Match MMX fp_to_sint conversions from XMM registers We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack). This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc. Differential Revision: https://reviews.llvm.org/D30868 llvm-svn: 298943	2017-03-28 21:32:11 +00:00
Sanjay Patel	f01a1dad7f	[x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933	2017-03-28 17:23:49 +00:00
Simon Pilgrim	3e2aa7f40e	[X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW llvm-svn: 298929	2017-03-28 16:40:38 +00:00
Simon Pilgrim	6b30172372	[X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support easier. NFCI. Call the matchVectorShuffleAsBlend test as early as possible. llvm-svn: 298925	2017-03-28 15:50:23 +00:00
Simon Pilgrim	aa675ca77d	Fix signed/unsigned comparison warning llvm-svn: 298917	2017-03-28 13:40:09 +00:00
Simon Pilgrim	d48f47e25c	[X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining. Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining. llvm-svn: 298914	2017-03-28 13:05:48 +00:00
Simon Pilgrim	6afe0e2833	[X86][SSE] Set second operand to undef instead of first operand in unary shuffle combines. Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier. llvm-svn: 298910	2017-03-28 12:16:42 +00:00
Simon Pilgrim	defee5683c	Strip trailing whitespace llvm-svn: 298909	2017-03-28 11:15:17 +00:00
Gadi Haber	89d5f9391a	[X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2 This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1. The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2. In this case using vector unpack instructions is more efficient. Reviewers: zvi delena igorb craig.topper guyblank eladcohen m_zuckerman aymanmus RKSimon llvm-svn: 298840	2017-03-27 12:13:37 +00:00
Simon Pilgrim	92925ea701	[X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions llvm-svn: 298806	2017-03-26 13:17:55 +00:00
Simon Pilgrim	bec234c970	[X86] Pull out repeated ScalarValueSizeInBits code. NFCI. llvm-svn: 298783	2017-03-25 21:22:12 +00:00
Simon Pilgrim	c0720a4052	[X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1)) Part 3 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298782	2017-03-25 20:43:01 +00:00
Simon Pilgrim	6397963c81	[X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI Part 2 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298780	2017-03-25 19:58:36 +00:00
Simon Pilgrim	5400a4d0af	[X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask. Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm. Part 1 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298779	2017-03-25 19:50:14 +00:00
Sanjay Patel	9ebb68843e	[x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775	2017-03-25 16:05:33 +00:00
Simon Pilgrim	6aac646308	[X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all bits' result, not just comparisons. Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode Covered by existing tests added for D22814. llvm-svn: 298704	2017-03-24 16:12:31 +00:00
Eric Christopher	cff8492492	Remove the subtarget argument from LowerFP_TO_INT since there's one stored on X86TargetLowering. llvm-svn: 298628	2017-03-23 17:35:08 +00:00
Eric Christopher	a19a14b42f	Remove unused X86Subtarget argument from getOnesVector. llvm-svn: 298627	2017-03-23 17:35:06 +00:00
Simon Pilgrim	1c048ab6ba	[X86][SSE] Extract elements from narrower shuffle masks. Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle. llvm-svn: 298616	2017-03-23 16:09:34 +00:00
Simon Pilgrim	8a18299f20	[X86][SSE] Tidyup canWidenShuffleElements. NFCI. Pull out mask elements at the start, allowing us to make the widening pattern matching more readable. llvm-svn: 298594	2017-03-23 13:33:03 +00:00
Michael Zuckerman	85436ece89	[X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128\|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586	2017-03-23 09:57:01 +00:00
Eric Christopher	fd8510cfec	Clean up some Subtarget uses and casts in the X86 backend, removing unnecessary work or calls. llvm-svn: 298555	2017-03-22 22:44:52 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Sanjay Patel	79379cae15	[x86] use PMOVMSK for vector-sized equality comparisons We could do better by splitting any oversized type into whatever vector size the target supports, but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte structs, so I think we can wire that up with a TLI hook that feeds into this. Differential Revision: https://reviews.llvm.org/D31156 llvm-svn: 298376	2017-03-21 13:50:33 +00:00
Evgeniy Stepanov	e829eecc05	[Fuchsia] Use %gs for ABI slots under -mcmodel=kernel Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather than %fs to access ABI slots for stack-protector and safe-stack Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D30870 llvm-svn: 298302	2017-03-20 20:35:37 +00:00
Craig Topper	5992c8d1dc	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228	2017-03-19 17:11:09 +00:00
Oren Ben Simhon	0ef61ec32a	[MIR] Support Customed Register Mask and CSRs The MIR printer dumps a string that describe the register mask of a function. A static predefined list of register masks matches a static list of strings. However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails. This patch adds support to custom register mask printing and dumping. Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic. As such this data needs to be dumped and parsed back to the Machine Register Info. Differential Revision: https://reviews.llvm.org/D30971 llvm-svn: 298207	2017-03-19 08:14:18 +00:00
Matthias Braun	e9f8209e87	ExecutionDepsFix: Normalize names; NFC Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and ExecutionDepsFix to the last one. llvm-svn: 298183	2017-03-18 05:05:40 +00:00
Nirav Dave	ac6081cb67	Make library calls sensitive to regparm module flag (Fixes PR3997). Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179	2017-03-18 00:44:07 +00:00
Nirav Dave	6de2c77944	Capitalize ArgListEntry fields. NFC. llvm-svn: 298178	2017-03-18 00:43:57 +00:00
Sanjay Patel	455703a0c6	[x86] clean up setcc with negated operand transform and add missing test; NFCI llvm-svn: 298118	2017-03-17 20:29:40 +00:00
Sanjay Patel	25bd713d33	[x86] avoid adc/sbb assert when both sides of add are zexted (PR32316) As noted in the comment, we might want to account for this case, but I didn't look at what that would mean for the asm. I'm also not sure why this only reproduces with avx512, but I'm putting a conservative fix in for now to avoid the crash. Also, if both sides of an add are zexted, shouldn't we shrink that add? https://bugs.llvm.org/show_bug.cgi?id=32316 llvm-svn: 298107	2017-03-17 17:27:31 +00:00
Simon Pilgrim	493f4462bf	[X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833	2017-03-15 13:16:46 +00:00
Simon Pilgrim	cf2da96c82	[SelectionDAG] Add a signed integer absolute ISD node Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780	2017-03-14 21:26:58 +00:00
Oren Ben Simhon	fe34c5e429	Disable Callee Saved Registers Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715	2017-03-14 09:09:26 +00:00
Craig Topper	616641632e	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652	2017-03-13 18:34:46 +00:00
Craig Topper	eb7ea28bdd	[AVX-512] If gather mask is all ones, force the input to a zero vector. We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651	2017-03-13 18:17:46 +00:00
Craig Topper	7d56c8315b	[AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics. The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591	2017-03-12 22:29:12 +00:00
Sanjay Patel	f06b963a2b	[x86] don't blindly transform SETB into SBB I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586	2017-03-12 18:28:48 +00:00
Simon Pilgrim	18debfa5b4	[X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41) Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568	2017-03-11 20:42:31 +00:00
Craig Topper	02b463270c	[X86] Remove unnecessary commented out code. NFC llvm-svn: 297563	2017-03-11 18:25:56 +00:00
Simon Pilgrim	bfe263352a	[X86] Fix Wunused-lambda-capture warning llvm-svn: 297521	2017-03-10 22:10:34 +00:00
Simon Pilgrim	b02667c469	[APInt] Add APInt::insertBits() method to insert an APInt into a larger APInt We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458	2017-03-10 13:44:32 +00:00
Simon Pilgrim	836bcc689f	[X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267	2017-03-08 09:36:39 +00:00
Sanjoy Das	c08a79fbf2	[X86] Add option to specify preferable loop alignment Summary: Loop alignment can cause a significant change of the perfromance for short loops. To be able to evaluate the impact of loop alignment this change introduces the new option x86-experimental-pref-loop-alignment. The alignment will be 2^Value bytes, the default value is 4. Patch by Serguei Katkov! Reviewers: craig.topper Reviewed By: craig.topper Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D30391 llvm-svn: 297178	2017-03-07 18:47:22 +00:00
Reid Kleckner	812191584f	[X86] Fix arg copy elision for illegal types Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045	2017-03-06 18:39:39 +00:00
Benjamin Kramer	bb635e034c	[X86] Silence GCC enum compare warning. X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] llvm-svn: 296986	2017-03-05 12:53:20 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Simon Pilgrim	40a0e66b37	[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966	2017-03-04 12:50:47 +00:00
Matthias Braun	21f340fd25	X86ISelLowering: Only perform copy elision on legal types. This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950	2017-03-04 01:40:40 +00:00
Sanjay Patel	a84fd041c6	[x86] check for commuted add pattern to find ADC/SBB llvm-svn: 296933	2017-03-04 00:18:31 +00:00
Sanjay Patel	7ee83b41e0	[x86] refactor combineAddOrSubToADCOrSBB(); NFCI The comments were wrong, and this is not an obvious transform. This hopefully makes it clearer that we're missing the commuted patterns for adds. It's less clear that this is actually a good transform for all micro-arch. This is prep work for trying to clean up the current adc/sbb codegen because it's definitely not happening optimally. llvm-svn: 296918	2017-03-03 22:35:11 +00:00
Sanjay Patel	58e241896d	[x86] clean up materializeSBB(); NFCI This is producing SBB where it is obviously not necessary, so it needs to be limited. llvm-svn: 296894	2017-03-03 17:58:39 +00:00
Sanjay Patel	e8674825fe	[x86] fix formatting; NFC llvm-svn: 296875	2017-03-03 15:17:41 +00:00
Simon Pilgrim	c37a32d2b9	Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask creation llvm-svn: 296874	2017-03-03 14:37:57 +00:00
Simon Pilgrim	b3067dc374	[X86][MMX] Fixed i32 extraction on 32-bit targets MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782	2017-03-02 18:56:06 +00:00
Reid Kleckner	f7c0980c10	Elide argument copies during instruction selection Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683	2017-03-01 21:42:00 +00:00
Simon Pilgrim	5c4efcdddf	[X86][SSE] Attempt to extract vector elements through target shuffles DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381	2017-02-27 21:01:57 +00:00
Craig Topper	7502119ce8	[X86] Use APInt instead of SmallBitVector tracking undef elements from getTargetConstantBitsFromNode and getConstVector. Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30392 llvm-svn: 296355	2017-02-27 16:15:32 +00:00
Craig Topper	3917ca2af4	[X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in shuffle lowering Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30390 llvm-svn: 296354	2017-02-27 16:15:30 +00:00
Craig Topper	ed0101a0b9	[X86] Check for less than 0 rather than explicit compare with -1. NFC llvm-svn: 296321	2017-02-27 06:05:30 +00:00
Simon Pilgrim	0f5fb5f549	[APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied) The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272	2017-02-25 20:01:58 +00:00
Simon Pilgrim	cdf2bd656a	Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147	2017-02-24 18:31:04 +00:00
Simon Pilgrim	bd9fb2ae95	[APInt] Add APInt::extractBits() method to extract APInt subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141	2017-02-24 17:46:18 +00:00
Simon Pilgrim	7f6a7c97a7	[X86][SSE] Target shuffle combine can try to combine up to 16 vectors Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth). llvm-svn: 296130	2017-02-24 15:35:52 +00:00
Sanjay Patel	9f0fa52aa2	[x86] use DAG.getAllOnesConstant(); NFCI llvm-svn: 296128	2017-02-24 15:09:59 +00:00
Simon Pilgrim	aed352273e	[APInt] Add APInt::setBits() method to set all bits in range The current pattern for setting bits in range is typically: Mask \|= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102	2017-02-24 10:15:29 +00:00
Craig Topper	8783bbb598	[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094	2017-02-24 07:21:10 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Evgeniy Stepanov	ee2d77f6d6	Disable TLS for stack protector on Android API<17. The TLS slot did not exist back then. llvm-svn: 296014	2017-02-23 21:06:35 +00:00
Simon Pilgrim	13cdd57964	[X86][SSE] getTargetConstantBitsFromNode - insert constant bits directly into masks. Minor optimization, don't create temporary mask APInts that are just going to be OR'd into the accumulate masks - insert directly instead. llvm-svn: 295848	2017-02-22 15:38:13 +00:00
Simon Pilgrim	3a895c4873	[X86][SSE] Use APInt::getBitsSet() instead of APInt::getLowBitsSet().shl() separately. NFCI. llvm-svn: 295845	2017-02-22 15:04:55 +00:00
Craig Topper	56d4022997	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810	2017-02-22 06:54:18 +00:00
Geoff Berry	5d534b6a11	[CodeGenPrepare] Sink and duplicate more 'and' instructions. Summary: Rework the code that was sinking/duplicating (icmp and, 0) sequences into blocks where they were being used by conditional branches to form more tbz instructions on AArch64. The new code is more general in that it just looks for 'and's that have all icmp 0's as users, with a target hook used to select which subset of 'and' instructions to consider. This change also enables 'and' sinking for X86, where it is more widely beneficial than on AArch64. The 'and' sinking/duplicating code is moved into the optimizeInst phase of CodeGenPrepare, where it can take advantage of the fact the OptimizeCmpExpression has already sunk/duplicated any icmps into the blocks where they are used. One minor complication from this change is that optimizeLoadExt needed to be updated to always mark 'and's it has determined should be in the same block as their feeding load in the InsertedInsts set to avoid an infinite loop of hoisting and sinking the same 'and'. This change fixes a regression on X86 in the tsan runtime caused by moving GVNHoist to a later place in the optimization pipeline (see PR31382). Reviewers: t.p.northover, qcolombet, MatzeB Subscribers: aemerson, mcrosier, sebpop, llvm-commits Differential Revision: https://reviews.llvm.org/D28813 llvm-svn: 295746	2017-02-21 18:53:14 +00:00
Simon Pilgrim	8eb515d8c4	[X86] EltsFromConsecutiveLoads SDLoc argument should be const&. There appears never to have been a time that the reference was updated. llvm-svn: 295739	2017-02-21 17:42:28 +00:00
Simon Pilgrim	3546156122	[X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL. This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns. llvm-svn: 295723	2017-02-21 15:09:00 +00:00
Igor Breger	812f319794	[AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index. Differential Revision: https://reviews.llvm.org/D30189 llvm-svn: 295718	2017-02-21 14:01:25 +00:00
Craig Topper	16d9730b86	[X86] Fix formatting. NFC llvm-svn: 295695	2017-02-21 06:27:13 +00:00
Sanjoy Das	90208720e3	Add a wrapper around copy_if in STLExtras; NFC I will add one more use for this in a later change. llvm-svn: 295685	2017-02-21 00:38:44 +00:00
Simon Pilgrim	2967ed1c7e	[X86] Tidyup combineExtractVectorElt. NFCI. Pull out repeated code for extraction index operand and source vector value type. Use isNullConstant helper to check for zero extraction index. llvm-svn: 295670	2017-02-20 16:09:45 +00:00
Igor Breger	fda32d266a	[X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector. Its more profitable to go through memory (1 cycles throughput) than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index. IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer) For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. Removing the VINSERT node, we don't need it any more. Differential Revision: https://reviews.llvm.org/D29690 llvm-svn: 295660	2017-02-20 14:16:29 +00:00
Simon Pilgrim	5910ebe720	[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX Use v8i64 ASHR instructions if we don't have VLX. Differential Revision: https://reviews.llvm.org/D28537 llvm-svn: 295656	2017-02-20 12:16:38 +00:00
Simon Pilgrim	14a7eee0b4	[X86] Use peekThroughOneUseBitcasts helper. NFCI. llvm-svn: 295618	2017-02-19 21:40:51 +00:00
Simon Pilgrim	d590de2998	[X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements. Replaces existing approach that could only search BUILD_VECTOR nodes. Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. llvm-svn: 295613	2017-02-19 19:40:31 +00:00
Simon Pilgrim	4271186f9c	[X86][SSE] Enable initial support for domain crossing at high shuffle combine depths. As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths. We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles. llvm-svn: 295608	2017-02-19 17:19:38 +00:00
Simon Pilgrim	6d07d514de	[X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains. Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted. llvm-svn: 295606	2017-02-19 15:15:40 +00:00
Simon Pilgrim	b4460cf5a9	[X86][SSE] Add domain crossing support for target shuffle combines. Add the infrastructure to flag whether float and/or int domains are permitable. A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors. llvm-svn: 295604	2017-02-19 14:12:25 +00:00
Simon Pilgrim	2f2d8dc630	Fix signed/unsigned comparison warning. llvm-svn: 295580	2017-02-18 22:56:17 +00:00
Simon Pilgrim	7a87eebcad	[X86] Fix enumeral/non-enumeral comparison warning. gcc only allows you to mix enums / ints if they have the same signedness. llvm-svn: 295576	2017-02-18 22:40:58 +00:00
Simon Pilgrim	2e78c94ea5	[X86][SSE] Avoid repeated calls to SDValue::getValueType. Added assertion to check input type of X86ISD::VZEXT during target known bits calculation. llvm-svn: 295575	2017-02-18 22:25:27 +00:00
Sanjay Patel	12c2093e1e	[x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1 This is the same transform that is current used for: select Bool, 0, -1 llvm-svn: 295568	2017-02-18 21:03:28 +00:00
Simon Pilgrim	7db8f42fe3	[X86] Simplify by pulling out valuetype. NFCI. llvm-svn: 295502	2017-02-17 22:10:10 +00:00
Simon Pilgrim	2fe568c95e	[X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead. llvm-svn: 295326	2017-02-16 15:11:49 +00:00
Simon Pilgrim	5b4c30fb32	[X86][SSE] Don't call EltsFromConsecutiveLoads if any element is missing. Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow. llvm-svn: 295235	2017-02-15 21:09:00 +00:00
Simon Pilgrim	da25d5c7b6	[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208	2017-02-15 17:41:33 +00:00
Simon Pilgrim	0f0e5bd3c6	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets llvm-svn: 295169	2017-02-15 11:46:15 +00:00
Craig Topper	fbc7805e25	[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155	2017-02-15 06:58:47 +00:00
Diego Novillo	8adfc8ef3a	Remove unused variable. llvm-svn: 295065	2017-02-14 16:39:54 +00:00
Simon Pilgrim	6f732e026d	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs Add support for specifying an UNPCK input as UNDEF llvm-svn: 295061	2017-02-14 16:22:04 +00:00
Simon Pilgrim	a0878dea9e	[X86][SSE] Move unary inputs handling inside matchVectorShuffleWithUNPCK. llvm-svn: 295053	2017-02-14 13:47:17 +00:00
Simon Pilgrim	3efdffcb27	[X86][SSE] Tidyup matchVectorShuffleWithUNPCK helper function call. Don't bother setting the V1/V2 operands again for unary shuffles. Don't bother legalizing the value type unless the match succeeds. llvm-svn: 295051	2017-02-14 12:54:39 +00:00
Simon Pilgrim	fd6a84fbaa	Fix indentation. NFCI. llvm-svn: 294959	2017-02-13 15:31:08 +00:00
Simon Pilgrim	828dee1f70	[X86][SSE] Create matchVectorShuffleWithUNPCK helper function. Currently only used by target shuffle combining - will use it for lowering as well in a future patch. llvm-svn: 294943	2017-02-13 11:52:58 +00:00
Craig Topper	680c73e7ab	[X86] Genericize the handling of INSERT_SUBVECTOR from an EXTRACT_SUBVECTOR to support 512-bit vectors with 128-bit or 256-bit subvectors. We now detect that both the extract and insert indices are non-zero and convert to a shuffle. This will be lowered as a blend for 256-bit vectors or as a vshuf operations for 512-bit vectors. llvm-svn: 294931	2017-02-13 04:53:29 +00:00
Craig Topper	53eafa8ea4	[X86] Don't let LowerEXTRACT_SUBVECTOR call getNode for EXTRACT_SUBVECTOR. This results in the simplifications inside of getNode running while we're legalizing nodes popped off the worklist during the final DAG combine. This basically makes a DAG combine like operation occur during this legalize step, but we don't handle something quite the same way. I think we don't recursively added the removed nodes to the DAG combiner worklist. llvm-svn: 294929	2017-02-12 23:49:46 +00:00
Simon Pilgrim	cc9242bd1c	[X86] Fix typo in function name. NFCI. convertBitVectorToUnsiged - convertBitVectorToUnsigned llvm-svn: 294914	2017-02-12 20:53:44 +00:00
Simon Pilgrim	04ec0f2b2a	[X86][SSE] Update argument names to match function name. NFCI. The target shuffle match function arguments were using the term 'Ops' but the function names referred to them as 'Inputs' - use 'Inputs' consistently. llvm-svn: 294900	2017-02-12 16:46:41 +00:00
Simon Pilgrim	4cd841757a	[X86][AVX2] Add support for combining target shuffles to VPMOVZX Initial 256-bit vector support - 512-bit support requires extra checks for AVX512BW support (PMOVZXBW) that will be handled in a future patch. llvm-svn: 294896	2017-02-12 14:31:23 +00:00
Craig Topper	1c37e991e6	[X86] Move code for using blendi for insert_subvector out to an isel pattern. This gives the DAG combiner more opportunity to optimize without needing to dig through the blend. llvm-svn: 294876	2017-02-11 22:57:12 +00:00
Simon Pilgrim	755d9127f5	[X86][SSE] Use VSEXT/VZEXT constant folding for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG Preparatory step for PR31712 llvm-svn: 294874	2017-02-11 22:47:06 +00:00
Simon Pilgrim	437d64c49e	[X86][SSE] Improve VSEXT/VZEXT constant folding. Generalize VSEXT/VZEXT constant folding to work with any target constant bits source not just BUILD_VECTOR . llvm-svn: 294873	2017-02-11 21:55:24 +00:00
Simon Pilgrim	4ef9672f0f	[X86][SSE] Add early-out when trying to match blend shuffle. NFCI. llvm-svn: 294864	2017-02-11 18:06:24 +00:00
Amaury Sechet	58ce15aba1	Fix indentation in X86ISelLowering. NFC llvm-svn: 294859	2017-02-11 17:48:48 +00:00
Simon Pilgrim	0e6945e48a	[X86][SSE] Convert getTargetShuffleMaskIndices to use getTargetConstantBitsFromNode. Removes duplicate constant extraction code in getTargetShuffleMaskIndices. getTargetConstantBitsFromNode - adds support for VZEXT_MOVL(SCALAR_TO_VECTOR) and fail if the caller doesn't support undef bits. llvm-svn: 294856	2017-02-11 17:27:21 +00:00
Simon Pilgrim	d59fa0e38a	[X86] Merge repeated getScalarValueSizeInBits calls. NFCI. llvm-svn: 294852	2017-02-11 16:42:07 +00:00
Ahmed Bougacha	2e275e272f	[X86] Bitcast subvector before broadcasting it. Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774	2017-02-10 19:51:47 +00:00
Simon Pilgrim	8c8b10389d	[X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI. Also reordered an if statement to test low cost comparisons first llvm-svn: 294748	2017-02-10 14:27:59 +00:00
Simon Pilgrim	c371159aac	[X86][SSE] Add support for extracting target constants from BUILD_VECTOR In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode llvm-svn: 294746	2017-02-10 14:04:11 +00:00
Simon Pilgrim	1140281413	[X86][SSE] Add missing comment describing combing to SHUFPS. NFCI llvm-svn: 294745	2017-02-10 13:16:01 +00:00
Simon Pilgrim	7f0d7e08b2	[X86] Remove duplicate call to getValueType. NFCI. llvm-svn: 294640	2017-02-09 22:35:59 +00:00
Simon Pilgrim	e0b5c2acbd	Convert to for-range loop. NFCI. llvm-svn: 294610	2017-02-09 18:52:24 +00:00
Simon Pilgrim	6bf1bd3ed6	[X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode. llvm-svn: 294596	2017-02-09 17:08:47 +00:00
Pierre Gousseau	6953b32475	[X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588	2017-02-09 14:43:58 +00:00
Simon Pilgrim	563e23e66e	[X86][SSE] Attempt to break register dependencies during lowerBuildVector LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581	2017-02-09 11:50:19 +00:00
Craig Topper	50f3d1452c	[X86] Clzero intrinsic and its addition under znver1 This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558	2017-02-09 04:27:34 +00:00
Simon Pilgrim	dcd10344a3	[X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI. Run clang-format and standardized variable names between functions. llvm-svn: 294456	2017-02-08 14:44:45 +00:00
Sanjay Patel	b0cee9b273	[x86] improve comments for SHRUNKBLEND node creation; NFC llvm-svn: 294344	2017-02-07 19:54:16 +00:00
Sanjay Patel	ef6d573f67	[x86] use range-for loops; NFCI llvm-svn: 294337	2017-02-07 19:18:25 +00:00
Sanjay Patel	633ecbf3c4	[x86] use getSignBit() for clarity; NFCI llvm-svn: 294333	2017-02-07 19:01:35 +00:00
Simon Pilgrim	8c0f62d293	[X86][SSE] Ensure that vector shift-by-immediate inputs are correctly bitcast to the result type vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input. Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them. llvm-svn: 294308	2017-02-07 14:22:25 +00:00
Simon Pilgrim	bfd4495512	[X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183	2017-02-06 13:44:45 +00:00
Simon Pilgrim	380ce75687	[X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector llvm-svn: 294162	2017-02-05 22:50:29 +00:00
Craig Topper	6a35a81fc5	[X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. llvm-svn: 294130	2017-02-05 18:33:14 +00:00
Craig Topper	978fdb75a4	[X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1). llvm-svn: 294112	2017-02-04 23:26:46 +00:00
Craig Topper	3d95228dbe	[X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI llvm-svn: 294111	2017-02-04 23:26:42 +00:00
Simon Pilgrim	034c1bd32c	[X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle. Correctly flagging upper elements as undef. llvm-svn: 294020	2017-02-03 17:59:58 +00:00
Craig Topper	bbb2b95ce5	[X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering. llvm-svn: 293969	2017-02-03 00:24:49 +00:00
Reid Kleckner	3c467e225e	[X86] Avoid sorted order check in release builds Effectively reverts r290248 and fixes the unused function warning with ifndef NDEBUG. llvm-svn: 293945	2017-02-02 22:06:30 +00:00
Craig Topper	c45657375b	[X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine. On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI. llvm-svn: 293944	2017-02-02 22:02:57 +00:00
Simon Pilgrim	20ab6b875a	[X86][SSE] Use MOVMSK for all_of/any_of reduction patterns This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain). So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc. Differential Revision: https://reviews.llvm.org/D28810 llvm-svn: 293880	2017-02-02 11:52:33 +00:00
Craig Topper	047a8be18a	[X86] Remove some unused DAGCombinerInfo parameters. NFC llvm-svn: 293873	2017-02-02 08:03:23 +00:00
Craig Topper	94ed54b49a	[X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine. This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together. This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases. llvm-svn: 293872	2017-02-02 08:03:20 +00:00
Simon Pilgrim	ca931efc21	[X86][SSE] Remove unused argument. NFCI. llvm-svn: 293777	2017-02-01 16:34:50 +00:00
Simon Pilgrim	55a9c79bd1	[X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI. These are identical apart from the extra SSE41 guard for PINSRB. llvm-svn: 293766	2017-02-01 13:32:19 +00:00
Simon Pilgrim	1b39d5db7b	[X86][SSE] Add support for combining PINSRB into a target shuffle. llvm-svn: 293637	2017-01-31 14:59:44 +00:00
Benjamin Kramer	94a833962c	[X86] Silence unused variable warning in Release builds. llvm-svn: 293631	2017-01-31 14:13:53 +00:00
Simon Pilgrim	4eab18f6b8	[X86][SSE] Detect unary PBLEND shuffles. These can appear during shuffle combining. llvm-svn: 293628	2017-01-31 13:58:01 +00:00
Simon Pilgrim	c29eab52e8	[X86][SSE] Add support for combining PINSRW into a target shuffle. Also add the ability to recognise PINSR(Vex, 0, Idx). Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat. The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch. llvm-svn: 293627	2017-01-31 13:51:10 +00:00
Craig Topper	b76494e017	[X86] Remove 'else' after 'return'. NFC llvm-svn: 293589	2017-01-31 02:09:46 +00:00
Simon Pilgrim	3905e03a47	[X86][SSE] Fix unsigned <= 0 warning in assert. NFCI. Thanks to @mkuper llvm-svn: 293561	2017-01-30 22:58:44 +00:00
Simon Pilgrim	a80a47afef	[X86][SSE] Generalize the number of decoded shuffle inputs. NFCI. combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs. This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns. llvm-svn: 293560	2017-01-30 22:48:49 +00:00
Simon Pilgrim	098998aef0	[X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles llvm-svn: 293500	2017-01-30 16:58:34 +00:00
Asaf Badouh	e11d2d73bf	[X86][MCU] Minor bug fix for r293469 + test case llvm-svn: 293478	2017-01-30 13:14:37 +00:00
Asaf Badouh	53713df0c2	[X86][MCU] replace select with bit manipulation instead of branches Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469	2017-01-30 08:16:59 +00:00
Craig Topper	3b7e823f92	[AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements. llvm-svn: 293448	2017-01-30 00:06:01 +00:00
Craig Topper	db919caf1b	[AVX-512] Fix lowering for mask register concatenation with undef in the lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446	2017-01-29 22:53:33 +00:00
Simon Pilgrim	76073f8d22	[X86][SSE] Lower scalar_to_vector(0) to zero vector Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector. The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem. Differential Revision: https://reviews.llvm.org/D29097 llvm-svn: 293438	2017-01-29 18:13:37 +00:00
Elena Demikhovsky	17fe27f1f2	[X86 Codegen] Fixed a bug in unsigned saturation PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern. AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned. See https://llvm.org/bugs/show_bug.cgi?id=31773 Differential Revision: https://reviews.llvm.org/D29196 llvm-svn: 293431	2017-01-29 13:18:30 +00:00
Craig Topper	6533e40e9d	[X86] Fix vector ANDN matching to work correctly when both inputs to the AND are XORs. llvm-svn: 293403	2017-01-28 23:52:09 +00:00
Simon Pilgrim	027bb453d9	[X86][SSE] Add support for combining ANDNP byte masks with target shuffles llvm-svn: 293178	2017-01-26 14:31:12 +00:00
Simon Pilgrim	3057fd53f9	[X86][SSE] Pull out target shuffle resolve code into helper. NFCI. Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit. llvm-svn: 293175	2017-01-26 13:06:02 +00:00
Craig Topper	bad53cce26	[AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it. llvm-svn: 293157	2017-01-26 07:17:58 +00:00
Craig Topper	f0bab7b739	[X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand. llvm-svn: 293156	2017-01-26 07:17:53 +00:00
Martin Bohme	526299c81c	[X86][SSE] Add explicit braces to avoid -Wdangling-else warning. Reviewers: RKSimon Subscribers: llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D29076 llvm-svn: 292924	2017-01-24 12:31:30 +00:00
Simon Pilgrim	0c45338961	Fix unused variable warning llvm-svn: 292921	2017-01-24 11:54:27 +00:00
Simon Pilgrim	e1ec9072f6	[X86][SSE] Add support for constant folding vector arithmetic shift by immediates llvm-svn: 292919	2017-01-24 11:46:13 +00:00
Simon Pilgrim	6340e54861	[X86][SSE] Add support for constant folding vector logical shift by immediates llvm-svn: 292915	2017-01-24 11:21:57 +00:00
Craig Topper	fc8798fa1b	[X86] Remove unnecessary peakThroughBitcasts call that's already take care of by the ISD::isBuildVectorAllOnes check below. llvm-svn: 292894	2017-01-24 06:57:29 +00:00
Craig Topper	993edc9db1	[X86] Don't split v8i32 all ones values if only AVX1 is available. Keep it intact and split it at isel. This allows us to remove the check in ANDN combining that had to look through the extraction. llvm-svn: 292881	2017-01-24 04:33:03 +00:00
Craig Topper	eb440a14a5	[X86] Remove Undef handling from extractSubVector. This is now handled inside getNode. llvm-svn: 292877	2017-01-24 02:43:54 +00:00
Simon Pilgrim	0218ce1080	[X86][SSE] Add missing X86ISD::ANDNP combines. llvm-svn: 292767	2017-01-22 22:45:23 +00:00
Simon Pilgrim	7e1cc97513	[X86][SSE] Improve shuffle combining with zero insertions Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766	2017-01-22 22:21:44 +00:00
Sanjay Patel	8f49aede82	[x86] avoid crashing with illegal vector type (PR31672) https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758	2017-01-22 17:06:12 +00:00
Craig Topper	8e0724d332	[X86] Don't allow commuting to form phsub operations. Fixes PR31714. llvm-svn: 292713	2017-01-21 06:59:38 +00:00
Simon Pilgrim	db101e4d57	[X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI. llvm-svn: 292502	2017-01-19 18:18:32 +00:00
Simon Pilgrim	5f2f53b106	[X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493	2017-01-19 16:25:02 +00:00
Elena Demikhovsky	e01512cecf	Recommiting unsigned saturation with a bugfix. A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479	2017-01-19 12:08:21 +00:00
Craig Topper	200ea31684	[AVX-512] Support ADD/SUB/MUL of mask vectors Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474	2017-01-19 07:12:35 +00:00
Craig Topper	c227529105	[X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical. llvm-svn: 292469	2017-01-19 03:49:29 +00:00
Michael Kuperstein	d3d2925933	Revert r291670 because it introduces a crash. r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444	2017-01-18 23:05:58 +00:00
Kirill Bobyrev	6afbaf0944	Revert 292404 due to buildbot failures. llvm-svn: 292407	2017-01-18 16:34:25 +00:00
Kirill Bobyrev	9ad06dbe17	[X86] Minor code cleanup to fix several clang-tidy warnings. NFC llvm-svn: 292404	2017-01-18 16:15:47 +00:00
Michael Zuckerman	0c0240ce84	[X86] Improve mul combine for negative multiplayer (2^c - 1) This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358	2017-01-18 09:31:13 +00:00
Bob Wilson	f2d0b68b3b	Revert r291640 change to fold X86 comparison with atomic_load_add. Even with the fix from r291630, this still causes problems. I get widespread assertion failures in the Swift runtime's WeakRefCount::increment() function. I sent a reduced testcase in reply to the commit. llvm-svn: 292242	2017-01-17 19:18:57 +00:00
Craig Topper	729d30d0ae	[AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation. llvm-svn: 292201	2017-01-17 06:49:59 +00:00
Michael Zuckerman	6baa3838e9	Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE. llvm-svn: 292066	2017-01-15 16:43:14 +00:00
Craig Topper	9cc685a56e	[X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop. llvm-svn: 291996	2017-01-14 04:29:15 +00:00
Craig Topper	9850210d03	[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994	2017-01-14 04:19:35 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Simon Pilgrim	7f2a6d5e8c	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901	2017-01-13 13:16:19 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Michael Zuckerman	558a4d8419	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Nikolai Bozhenov	f02ac0eeb2	[X86] Replace AND+IMM64 with SRL/SHL Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806	2017-01-12 19:54:27 +00:00
Nikolai Bozhenov	6bdf92cec7	[X86] Tune bypassing of slow division for Intel CPUs 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800	2017-01-12 19:34:15 +00:00
Craig Topper	24c3a2395f	[AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 with VLX, but no DQ or BW support. llvm-svn: 291747	2017-01-12 06:49:12 +00:00
Craig Topper	69ab67b279	[AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 when avx512vl is available, but not avx512dq. llvm-svn: 291746	2017-01-12 06:49:08 +00:00
Elad Cohen	c5ba925ef2	[X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745	2017-01-12 06:49:03 +00:00
Simon Pilgrim	0c1faf432b	Remove trailing whitespace. NFCI. llvm-svn: 291680	2017-01-11 16:38:20 +00:00
Elena Demikhovsky	9d0e7c33d3	X86 CodeGen: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. And VPACKUS* instructions on SEE* targets. Differential Revision: https://reviews.llvm.org/D28216 llvm-svn: 291670	2017-01-11 12:59:32 +00:00
Simon Pilgrim	5a81fefad3	[X86][AVX512BW] Vectorize v64i8 vector shifts Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665	2017-01-11 10:36:51 +00:00
Hans Wennborg	6573976f57	Re-commit r289955: [X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) This was reverted because it would miscompile code where the cmp had multiple uses. That was due to a deficiency in the existing code, which was fixed in r291630 (see the PR for details). This re-commit includes an extra test for the kind of code that got miscompiled: @test_sub_1_setcc_jcc. llvm-svn: 291640	2017-01-11 01:36:57 +00:00
Hans Wennborg	12de693747	[X86] Dont run combineSetCCAtomicArith() when the cmp has multiple uses We would miscompile the following: void g(int); int f(volatile long long *p) { bool b = __atomic_fetch_add(p, 1, __ATOMIC_SEQ_CST) < 0; g(b ? 12 : 34); return b ? 56 : 78; } into pushq %rax lock incq (%rdi) movl $12, %eax movl $34, %edi cmovlel %eax, %edi callq g(int) testq %rax, %rax <---- Bad. movl $56, %ecx movl $78, %eax cmovsl %ecx, %eax popq %rcx retq because the code failed to take into account that the cmp has multiple uses, replaced one of them, and left the other one comparing garbage. llvm-svn: 291630	2017-01-11 00:49:54 +00:00
Michael Zuckerman	bcd03e7f3b	[X86][AVX512]Improving shuffle lowering by using AVX-512 EXPAND* instructions This patch fix PR31351: https://llvm.org/bugs/show_bug.cgi?id=31351 1. This patch adds new type of shuffle lowering 2. We can use the expand instruction, When the shuffle pattern is as following: { 0a[0]0a[1]...0*a[n] , n >=0 where a[] elements in a ascending order}. Reviewers: 1. igorb 2. guyblank 3. craig.topper 4. RKSimon Differential Revision: https://reviews.llvm.org/D28352 llvm-svn: 291584	2017-01-10 18:57:17 +00:00
Craig Topper	2ed461e5c4	[X86] When lowering uniform shifts, use X86ISD::VZEXT instead of using a ZERO_EXTEND_VECTOR_INREG. If we emit the ZERO_EXTEND_VECTOR_INREG too late it doesn't get lowered properly and makes it through to isel and fails. Fixes PR31593. llvm-svn: 291535	2017-01-10 04:12:24 +00:00
Michael Kuperstein	1559e8863e	Revert r291092 because it introduces a crash. See PR31589 for details. llvm-svn: 291478	2017-01-09 21:04:46 +00:00
Vyacheslav Klochkov	d497d36083	X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB. Differential Revision: https://reviews.llvm.org/D28087 llvm-svn: 291473	2017-01-09 20:26:17 +00:00
Simon Pilgrim	0f23b2ba1a	[X86][AVX512] Enable v16i8/v32i8 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i8 (extending to v16i32) on AVX512 targets and v32i8 (extending to v32i16) on AVX512BW targets. Cost model updates to follow. llvm-svn: 291451	2017-01-09 17:20:03 +00:00
Simon Pilgrim	d990cd371b	[X86][AVX512DQ] Enable v16i16 vector shifts to use an extend+shift+truncate pattern. Use the existing AVX2 v8i16 vector shift lowering for v16i16 on AVX512 targets (AVX512BW will have already have lowered with vpsravw). Cost model updates to follow. llvm-svn: 291445	2017-01-09 15:15:45 +00:00
Craig Topper	f51ba1e3da	[AVX-512] If avx512dq is available use vpmovm2d/vpmovm2q instead of vselect of zeroes/ones when handling sign extends of i1 without VLX. llvm-svn: 291402	2017-01-08 21:32:30 +00:00
Sanjay Patel	bf51c8a975	[x86] fix usage of stale operands when lowering select I noticed this problem as part of the ongoing attempt to canonicalize min/max ops in IR. The debug output shows nodes like this: t4: i32 = xor t2, Constant:i32<-1> t21: i8 = setcc t4, Constant:i32<0>, setlt:ch t14: i32 = select t21, t4, Constant:i32<-1> And because the select is holding onto the t4 (xor) node while EmitTest creates a new x86-specific xor node, the lowering results in: t4: i32 = xor t2, Constant:i32<-1> t25: i32,i32 = X86ISD::XOR t2, Constant:i32<-1> t28: i32,glue = X86ISD::CMOV Constant:i32<-1>, t4, Constant:i8<15>, t25:1 Differential Revision: https://reviews.llvm.org/D28374 llvm-svn: 291392	2017-01-08 15:53:40 +00:00
Simon Pilgrim	a1b8e2c725	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470) llvm-svn: 291347	2017-01-07 15:37:50 +00:00
Simon Pilgrim	3128d6b520	[X86][SSE] Pass float domain flag to shuffle combine match functions. NFCI. Early step towards ignoring domain above a certain shuffle depth. llvm-svn: 291248	2017-01-06 17:34:30 +00:00
Simon Pilgrim	bd3c6824d4	[X86][SSE] Simplify float domain requirement in unary shuffle matching. The AVX1-only limit is never actually required in matchUnaryVectorShuffle llvm-svn: 291244	2017-01-06 17:00:59 +00:00
Simon Pilgrim	a08d7b9913	Remove trailing whitespace. NFCI. llvm-svn: 291240	2017-01-06 15:31:52 +00:00
Simon Pilgrim	9b8c7caf4e	[X86] Add X86Subtarget argument. NFCI. All callers of getTargetVShiftNode have access to X86Subtarget already so pass it along instead of re-extracting it. llvm-svn: 291239	2017-01-06 15:29:17 +00:00
Craig Topper	e86fb932ea	[AVX-512] Add EXTRACT_SUBVECTOR support to combineBitcastForMaskedOp. llvm-svn: 291214	2017-01-06 05:18:48 +00:00
Sanjay Patel	dea5a7bd53	less braces; NFC llvm-svn: 291126	2017-01-05 16:47:32 +00:00
Zvi Rackover	4b7d724d62	[X86] Optimize vector shifts with variable but uniform shift amounts Summary: For instructions such as PSLLW/PSLLD/PSLLQ a variable shift amount may be passed in an XMM register. The lower 64-bits of the register are evaluated to determine the shift amount. This patch improves the construction of the vector containing the shift amount. Reviewers: craig.topper, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28353 llvm-svn: 291120	2017-01-05 15:11:43 +00:00
Elena Demikhovsky	143cbc425b	AVX-512: Optimized pattern for truncate with unsigned saturation. DAG patterns optimization: truncate + unsigned saturation supported by VPMOVUS* instructions in AVX-512. Differential revision: https://reviews.llvm.org/D28216 llvm-svn: 291092	2017-01-05 08:21:09 +00:00
Eric Christopher	568c113ac0	Remove dead and unused variable NumSentinelElements. Fixes PR31529. llvm-svn: 290998	2017-01-04 20:05:18 +00:00
Simon Pilgrim	c76ea4b638	[X86] Attempt to pre-truncate arithmetic operations if useful In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types. This is true for vector integer multiplication (especially vXi64), as well as ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold). Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.), other opcodes, and better analysis of when truncating the inputs reduces costs. I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need. Differential Revision: https://reviews.llvm.org/D28219 llvm-svn: 290947	2017-01-04 08:05:42 +00:00
Craig Topper	d0aa53b9ae	[AVX-512] Add support for detecting 512-bit shuffles that contain a 128-bit subvector insertion from the lowest subvector of one of the sources. These are best handled with a vinsert32x4 or vinsert64x2 instruction. llvm-svn: 290946	2017-01-04 07:32:03 +00:00
Craig Topper	83115a809f	[AVX-512] Simplify code for creating 512-bit SHUF128 operations. We don't need two loops and we can safely assume assume and hardcode the size of the widened mask. llvm-svn: 290942	2017-01-04 07:31:51 +00:00
Craig Topper	48d232d3e7	[X86] Move 128-bit shuffle mask widening check into lowerV2X128VectorShuffle to reduce code duplication. Use the now available widened mask to simplify some code inside lowerV2X128VectorShuffle. llvm-svn: 290872	2017-01-03 07:36:41 +00:00
Craig Topper	785e58fdc9	[AVX-512] Simplify the code added in r290870 to recognized 256-bit subvector inserts and avoid calling isShuffleEquivalent on a widened mask. llvm-svn: 290871	2017-01-03 07:36:39 +00:00
Craig Topper	9496e3f916	[AVX-512] Teach shuffle lowering to use vinsert instructions for shuffles corresponding to 256-bit subvector inserts. llvm-svn: 290870	2017-01-03 07:00:40 +00:00
Craig Topper	c849172105	[AVX-512] Add support for pushing bitcasts through INSERT_SUBVEC in order to select a masked operation. llvm-svn: 290865	2017-01-03 05:46:02 +00:00
Craig Topper	0cda8bbf74	[AVX-512] Remove vinsert intrinsics and autoupgrade to native shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864	2017-01-03 05:45:57 +00:00
Reid Kleckner	cd46c1df80	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64" This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714	2016-12-29 17:07:10 +00:00
Reid Kleckner	c9e0a153cf	[COFF] Use 32-bit jump table entries in .rdata for Win64 Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694	2016-12-29 00:12:39 +00:00
Craig Topper	f56d985f77	[AVX-512] Don't assume that the rounding mode argument to intrinsics is a constant. While clang will guarantee this, nothing in the backend will. A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering. llvm-svn: 290532	2016-12-26 01:40:17 +00:00
Michael Zuckerman	86602e85dd	revert commit 290516 llvm-svn: 290517	2016-12-25 12:45:18 +00:00
Michael Zuckerman	45aa420640	Commit try added new empty line llvm-svn: 290516	2016-12-25 12:01:34 +00:00
Simon Pilgrim	081abbb164	[X86][SSE] Improve lowering of vXi64 multiplies As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267	2016-12-21 20:00:10 +00:00
Elena Demikhovsky	7c7bf1b432	Added a template for building target specific memory node in DAG. I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250	2016-12-21 10:43:36 +00:00
Oren Ben Simhon	cb692157b7	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support Fixing a warning. llvm-svn: 290248	2016-12-21 09:47:31 +00:00
Oren Ben Simhon	3b95157090	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240	2016-12-21 08:31:45 +00:00
Simon Pilgrim	688114d888	[X86][SSE] Ensure we're only combining shuffles with legal mask types. I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types. llvm-svn: 290183	2016-12-20 17:09:52 +00:00
Daniel Jasper	373f9a6a0c	Revert r289955 and r289962. This is causing lots of ASAN failures for us. Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066	2016-12-18 14:36:38 +00:00
Simon Pilgrim	e940daf532	[X86][SSE] Add support for combining target shuffles to SHUFPS. As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064	2016-12-18 14:26:02 +00:00
Craig Topper	7029db0eaa	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060	2016-12-18 07:54:23 +00:00
Hans Wennborg	ef57755427	Fix -Wself-assign from r289955 llvm-svn: 289962	2016-12-16 17:16:46 +00:00
Hans Wennborg	35f21cba13	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, -C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955	2016-12-16 16:34:59 +00:00
Simon Pilgrim	4b73c3de50	[X86][AVX] Call lowerVectorShuffleWithSHUFPS directly instead of calling DAG.getVectorShuffle (PR27885) We've already done the hardwork of ensuring the mask is safe for 'SHUFPS'. llvm-svn: 289950	2016-12-16 15:23:32 +00:00
Simon Pilgrim	9519bd9232	[X86][AVX512] use a single shufps for 512-bit vectors when it can save instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946	2016-12-16 14:30:04 +00:00
Simon Pilgrim	f159a3414f	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain. We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937	2016-12-16 11:48:51 +00:00
Sanjay Patel	a97358bc8e	[x86] use a single shufps for 256-bit vectors when it can save instructions This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846	2016-12-15 18:43:46 +00:00
Sanjay Patel	a0d8a278a7	[x86] use a single shufps when it can save instructions This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837	2016-12-15 18:03:38 +00:00
Michael Zuckerman	1ce2a23a1e	Fix bug 30945- [AVX512] Failure to flip vector comparison to remove not mask instruction adding new optimization opportunity by adding new X86ISelLowering pattern. The test case was shown in https://llvm.org/bugs/show_bug.cgi?id=30945. Test explanation: Select gets three arguments mask, op and op2. In this case, the Mask is a result of ICMP. The ICMP instruction compares (with equal operand) the zero initializer vector and the result of the first ICMP. In general, The result of "cmp eq, op1, zero initializers" is "not(op1)" where op1 is a mask. By rearranging of the two arguments inside the Select instruction, we can get the same result. Without the necessary of the middle phase ("cmp eq, op1, zero initializers"). Missed optimization opportunity: vpcmpled %zmm0, %zmm1, %k0 knotw %k0, %k1 can be combine to vpcmpgtd %zmm0, %zmm2, %k1 Reviewers: 1. delena 2. igorb Commited after check all Differential Revision: https://reviews.llvm.org/D27160 llvm-svn: 289653	2016-12-14 14:57:10 +00:00
Stephan Bergmann	17c7f70362	Replace APFloatBase static fltSemantics data members with getter functions At least the plugin used by the LibreOffice build (<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly uses those members (through inline functions in LLVM/Clang include files in turn using them), but they are not exported by utils/extract_symbols.py on Windows, and accessing data across DLL/EXE boundaries on Windows is generally problematic. Differential Revision: https://reviews.llvm.org/D26671 llvm-svn: 289647	2016-12-14 11:57:17 +00:00
Sanjay Patel	62104ee6d9	[x86] fix formatting; NFC llvm-svn: 289476	2016-12-12 22:31:01 +00:00
Simon Pilgrim	4cbe1834e4	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	5ebd2b542b	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	369cd349b9	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Simon Pilgrim	831435cb14	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Oren Ben Simhon	9683ecbff6	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Craig Topper	e7166ce237	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Simon Pilgrim	a03e350e69	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Simon Pilgrim	fb58550d73	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	18b57da491	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	8e288e0b68	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	85f0e57c33	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Simon Pilgrim	017b7a71d8	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Craig Topper	a55b483bb5	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Peter Collingbourne	235c275b20	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Simon Pilgrim	c3c6463ce0	[X86][SSE] Remove AND -> VZEXT combine This is now performed more generally by the target shuffle combine code. Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....). Differential Revision: https://reviews.llvm.org/D27510 llvm-svn: 288918	2016-12-07 17:02:41 +00:00
Zvi Rackover	8bc7e4da51	[X86] Prefer reduced width multiplication over pmulld on Silvermont Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844	2016-12-06 19:35:20 +00:00
Ayman Musa	86c00b799f	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804	2016-12-06 12:24:14 +00:00
Sanjay Patel	f807f6a05f	[x86] fold fand (fxor X, -1) Y --> fandn X, Y I noticed this gap in the scalar FP-logic matching with: D26712 and: rL287171 Differential Revision: https://reviews.llvm.org/D27385 llvm-svn: 288675	2016-12-05 15:45:27 +00:00
Simon Pilgrim	5e922eb0a3	Use range based for loop. NFCI. llvm-svn: 288671	2016-12-05 14:25:04 +00:00
Simon Pilgrim	b08c98f125	[X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH. llvm-svn: 288663	2016-12-05 11:25:13 +00:00
Simon Pilgrim	20b1409f35	[X86][SSE] Add helper function to create UNPCKL/UNPCKH shuffle masks. NFCI. llvm-svn: 288659	2016-12-05 11:00:25 +00:00
Simon Pilgrim	9cb74267ac	Tidyup code with indentation and clang-format. NFCI. llvm-svn: 288505	2016-12-02 15:44:30 +00:00
Simon Pilgrim	cbf5f97018	[X86][SSE] Add support for extracting constant bit data from broadcasted constants llvm-svn: 288499	2016-12-02 13:16:08 +00:00
Simon Pilgrim	b3ae416839	[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI. getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well. Converted Constant bit extraction and Bitset splitting to helper lambda functions. llvm-svn: 288496	2016-12-02 11:58:05 +00:00
Matthias Braun	d0ee66c2e9	Move most EH from MachineModuleInfo to MachineFunction Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405	2016-12-01 19:32:15 +00:00
Simon Pilgrim	17d5b6b493	[X86][SSE] Moved shuffle mask widening/narrowing helper functions earlier in the file. Will be necessary for a future patch. llvm-svn: 288395	2016-12-01 18:27:19 +00:00
Simon Pilgrim	5fe6236035	[X86][SSE] Classify AND bitmasks as variable shuffle masks They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask. llvm-svn: 288367	2016-12-01 16:00:14 +00:00
Simon Pilgrim	1e4d870999	[X86][SSE] Add support for combining AND bitmasks to shuffles. llvm-svn: 288365	2016-12-01 15:41:40 +00:00
Daniel Jasper	19b9284f1d	Silence GCC's -Wenum-compare after r288335 in the same way it is done in X86FastISel.cpp. llvm-svn: 288337	2016-12-01 14:33:50 +00:00
Simon Pilgrim	55066e5622	[X86][SSE] Add support for combining target shuffles to AND bitmasks. llvm-svn: 288335	2016-12-01 13:47:02 +00:00
Simon Pilgrim	947650e99d	[X86][SSE] Add support for combining ISD::AND with shuffles. Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask. llvm-svn: 288333	2016-12-01 11:52:37 +00:00
Eric Christopher	e70b7c3dfb	Temporarily Revert "Move most EH from MachineModuleInfo to MachineFunction" This apprears to have broken the global isel bot: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-globalisel_build/5174/console This reverts commit r288293. llvm-svn: 288322	2016-12-01 07:50:12 +00:00
Matthias Braun	ed14cb0604	Move most EH from MachineModuleInfo to MachineFunction Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288293	2016-11-30 23:49:01 +00:00
Simon Pilgrim	288c088c17	[X86][SSE] Add support for target shuffle constant folding Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future. I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction. Differential Revision: https://reviews.llvm.org/D27220 llvm-svn: 288250	2016-11-30 16:33:46 +00:00
Simon Pilgrim	edccc1254b	Avoid repeated calls to MVT getSizeInBits and getScalarSizeInBits(). NFCI. llvm-svn: 288170	2016-11-29 17:57:48 +00:00
Simon Pilgrim	001368abc8	[X86] Moved getTargetConstantFromNode function so a future patch is more understandable. NFCI. llvm-svn: 288147	2016-11-29 15:32:58 +00:00
Simon Pilgrim	35c47c494d	[X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX. We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output. llvm-svn: 288140	2016-11-29 14:18:51 +00:00
Simon Pilgrim	923020a652	Avoid repeated calls to MVT::getScalarSizeInBits(). NFCI. llvm-svn: 288138	2016-11-29 13:43:08 +00:00
Simon Pilgrim	2228f70a85	[X86][SSE] Add initial support for combining (V)PMOVZX with shuffles. llvm-svn: 288049	2016-11-28 17:58:19 +00:00
Sanjay Patel	100bc01a72	[x86] fix formatting; NFC llvm-svn: 288045	2016-11-28 17:39:21 +00:00
Simon Pilgrim	3f10e66981	[X86][SSE] Added support for combining bit-shifts with shuffles. Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining. Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations. llvm-svn: 288040	2016-11-28 16:25:01 +00:00
Simon Pilgrim	91d6f5fbc1	[X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts llvm-svn: 288006	2016-11-27 21:08:19 +00:00
Simon Pilgrim	cdb2ce661d	[X86][SSE] Split lowerVectorShuffleAsShift ready for combines. NFCI. Moved most of matching code into matchVectorShuffleAsShift to share with target shuffle combines (in a future commit). llvm-svn: 288003	2016-11-27 19:28:39 +00:00
Craig Topper	6677bb4e50	[AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change. llvm-svn: 287971	2016-11-26 07:20:57 +00:00
Simon Pilgrim	8e8ae7219f	Use SDValue helper instead of explicitly going via SDValue::getNode(). NFCI llvm-svn: 287940	2016-11-25 17:19:53 +00:00
Craig Topper	88071b37ab	[AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size. Summary: Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations. This patch detects this and changes the element size to match the vselect. I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27087 llvm-svn: 287939	2016-11-25 16:48:05 +00:00
Simon Pilgrim	f1ee930db0	Fix unused variable warning llvm-svn: 287889	2016-11-24 15:24:47 +00:00
Simon Pilgrim	9c71e07276	[X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64 Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit). The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32. Differential Revision: https://reviews.llvm.org/D26938 llvm-svn: 287886	2016-11-24 15:12:56 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Simon Pilgrim	ab323ec411	[X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering llvm-svn: 287877	2016-11-24 13:38:59 +00:00
Nikolai Bozhenov	3a8d108b2b	[x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B The bug arises during register allocation on i686 for CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address, plus ESI is reserved as the base pointer. With such constraints the only way register allocator would do its job successfully is when the addressing mode of the instruction requires only one register. If that is not the case - we are emitting additional LEA instruction to compute the address. It fixes PR28755. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25088 llvm-svn: 287875	2016-11-24 13:23:35 +00:00
Nikolai Bozhenov	bb64aa14a3	[x86] Minor refactoring of X86TargetLowering::EmitInstrWithCustomInserter Move the definitions of three variables out of the switch. Patch by Alexander Ivchenko <alexander.ivchenko@intel.com> Differential Revision: https://reviews.llvm.org/D25192 llvm-svn: 287874	2016-11-24 13:15:49 +00:00
Simon Pilgrim	a3af79678e	[X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870	2016-11-24 12:13:46 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Zvi Rackover	14aba43ea9	[X86] Simplify lowerVectorShuffleAsBitMask to handle only integer VT's Summary: This function is only called with integer VT arguments, so remove code that handles FP vectors. Reviewers: RKSimon, craig.topper, delena, andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26985 llvm-svn: 287743	2016-11-23 06:45:25 +00:00
Simon Pilgrim	4aa876ca7c	[X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles. This occurs during UINT_TO_FP v2f64 lowering. We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle llvm-svn: 287676	2016-11-22 17:50:06 +00:00
Zvi Rackover	9a355219d1	[X86] Change lowerBuildVectorToBitOp() to take a BuildVectorSDNode. NFC. llvm-svn: 287644	2016-11-22 15:33:28 +00:00
Zvi Rackover	0aa1c32d14	[X86] Remove dead code from LowerVectorBroadcast Summary: Splat vectors are canonicalized to BUILD_VECTOR's so the code can be simplified. NFC-ish. Reviewers: craig.topper, delena, RKSimon, andreadb Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D26678 llvm-svn: 287643	2016-11-22 15:17:52 +00:00
Craig Topper	da22267055	[AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type Summary: Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking. This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018. I plan to add support for more operations in future patches. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26902 llvm-svn: 287612	2016-11-22 03:51:53 +00:00
Simon Pilgrim	b7bbaa669b	[X86][SSE] Allow PACKSS to be used to truncate any type of all/none sign bits input At the moment we only use truncateVectorCompareWithPACKSS with direct vector comparison results (just one example of a known all/none signbits input). This change relaxes the direct matching of a SETCC opcode by moving the logic up into SelectionDAG::ComputeNumSignBits and accepting any input with a known splatted signbit. llvm-svn: 287535	2016-11-21 12:05:49 +00:00
Simon Pilgrim	5fadce4a3f	[X86][AVX512] Combine unary + zero target shuffles to VPERMV3 with a zero vector where possible llvm-svn: 287497	2016-11-20 16:11:36 +00:00
Simon Pilgrim	5401bae523	[X86][AVX512] Add support for VBMI VPERMV3 target shuffle combines llvm-svn: 287496	2016-11-20 15:24:38 +00:00
Simon Pilgrim	3f40412e0f	[X86][AVX512] Add support for VBMI VPERMV target shuffle combines llvm-svn: 287495	2016-11-20 15:05:45 +00:00
Simon Pilgrim	c17e1b74b8	[X86][AVX512VL] Removed duplicate operation action Basic AVX512F already declared uint_to_fp v4i32 as legal llvm-svn: 287493	2016-11-20 14:19:29 +00:00
Simon Pilgrim	096b6d4f81	[X86][AVX512F] Add support for uint_to_fp v2i32 to v2f64 on AVX512F-only targets Use 512-bit instructions (we already do something similar for uint_to_fp v4i32 to v4f64) llvm-svn: 287491	2016-11-20 14:03:23 +00:00
Oren Ben Simhon	c0f073b67f	[X86] RegCall - Handling long double arguments The change is part of RegCall calling convention support for LLVM. Long double (f80) requires special treatment as the first f80 parameter is saved in FP0 (floating point stack). This review present the change and the corresponding tests. Differential Revision: https://reviews.llvm.org/D26151 llvm-svn: 287485	2016-11-20 11:06:07 +00:00
Simon Pilgrim	a14e0cb852	[X86][SSE] Improve PSHUFB lowering from either input Canonicalization may leave the zeroable vector in the first input. llvm-svn: 287461	2016-11-19 20:41:48 +00:00
Simon Pilgrim	623a7c57b5	[X86][AVX512] Add VPERMV/VPERMV3 v64i8 byte shuffles on avx512vbmi targets llvm-svn: 287459	2016-11-19 20:12:34 +00:00
Craig Topper	893ea9fb2c	[X86] Simplify some code a little by removing a dulicate variable and combinining two if statements. NFCI llvm-svn: 287443	2016-11-19 17:33:17 +00:00
Simon Pilgrim	7938bd666e	Cleanup function with clang-format. NFCI. llvm-svn: 287340	2016-11-18 12:16:18 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Simon Pilgrim	9d15fb3c10	Fix spelling mistakes in X86 target comments. NFC. Identified by Pedro Giffuni in PR27636. llvm-svn: 287247	2016-11-17 19:03:05 +00:00

... 7 8 9 10 11 ...

5065 Commits