llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	8c099cbe7c	[X86][SSE] lowerUINT_TO_FP_v2i32 - explicitly cast half word to double Fixes MSVC analyzer extension->double warning. llvm-svn: 365027	2019-07-03 11:23:27 +00:00
Simon Pilgrim	8df90b843d	[X86][SSE] LowerINSERT_VECTOR_ELT - ensure insertion index correctness. NFCI. Assert that the insertion index is in range and use uint64_t for the index to fix MSVC/cppcheck truncation warning. llvm-svn: 365025	2019-07-03 10:59:52 +00:00
Simon Pilgrim	8853bd9592	[X86][SSE] LowerScalarImmediateShift - ensure shift amount correctness. NFCI. Assert that the shift amount is in range and create vXi8 shift masks in a way that doesn't cause MSVC/cppcheck shift result is truncated then extended warnings. llvm-svn: 365024	2019-07-03 10:47:33 +00:00
Simon Pilgrim	64e3a51534	Fix uninitialized variable warnings. NFCI. Both MSVC and cppcheck don't like the fact that the variables are initialized via references. llvm-svn: 365018	2019-07-03 10:22:08 +00:00
Simon Pilgrim	7b7b9b78a2	[X86] LowerFunnelShift - use modulo constant shift amount. This avoids the use of getZExtValue and uses the modulo shift amount which is whats expected for funnel shifts anyhow. llvm-svn: 365016	2019-07-03 10:04:16 +00:00
Craig Topper	b770d2c9d4	[X86] Add a DAG combine for turning *_extend_vector_inreg+load into an appropriate extload if the load isn't volatile. Remove the corresponding isel patterns that did the same thing without checking for volatile. This fixes another variation of PR42079 llvm-svn: 364977	2019-07-02 23:20:03 +00:00
Simon Pilgrim	5613874947	[X86] getTargetConstantBitsFromNode - remove unnecessary getZExtValue() (PR42486) Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits. In this case it was completely superfluous as we'd moved the rest of the code to always use APInt. Fixes the <1 x i128> addition bug in PR42486 llvm-svn: 364953	2019-07-02 18:20:38 +00:00
Craig Topper	cffbaa93b7	[X86] Add patterns to select (scalar_to_vector (loadf32)) as (V)MOVSSrm instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt. Similar for (V)MOVSD. Ultimately, I'd like to see about folding scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm so this is closer to that. llvm-svn: 364948	2019-07-02 17:51:02 +00:00
Simon Pilgrim	9304168103	[X86][AVX] combineX86ShuffleChain - pull out CombineShuffleWithExtract lambda. NFCI. Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch. llvm-svn: 364924	2019-07-02 13:30:04 +00:00
Simon Pilgrim	d609ebb779	[X86] resolveTargetShuffleInputsAndMask - add repeated input handling. We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask. llvm-svn: 364906	2019-07-02 10:53:17 +00:00
Craig Topper	2d306b2d57	[X86] Add PreprocessISelDAG support for turning ISD::FP_TO_SINT/UINT into X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns. llvm-svn: 364887	2019-07-02 05:53:37 +00:00
Craig Topper	3f722d40c5	[X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where only 32-bits are loaded. v2i64 vzload defines a 64-bit memory access. It doesn't look like we have any coverage for this either way. Also remove some vzload usages where the instruction loads only 16-bits. llvm-svn: 364851	2019-07-01 21:25:11 +00:00
Craig Topper	328b24150e	[X86] Remove several bad load folding isel patterns for VPMOVZX/VPMOVSX. These patterns all matched a v2i64 vzload which only loads 64-bits to instructions that load a full 128-bits. llvm-svn: 364847	2019-07-01 21:23:38 +00:00
Craig Topper	5e7815b695	[X86] Correct v4f32->v2i64 cvt(t)ps2(u)qq memory isel patterns These instructions only read 64-bits of memory so we shouldn't allow a full vector width load to be pattern matched in case it is marked volatile. Instead allow vzload or scalar_to_vector+load. Also add a DAG combine to turn full vector loads into vzload when used by one of these instructions if the load isn't volatile. This fixes another case for PR42079 llvm-svn: 364838	2019-07-01 19:01:37 +00:00
Robert Lougher	e20030f612	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info(2) The function findPotentialBlockers may consider debug info instructions as potential blockers and may stop searching for a store-load pair prematurely. This patch corrects this and tests the cases where the store is separated from the load by more than InspectionLimit debug instructions. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D62408 llvm-svn: 364829	2019-07-01 18:28:21 +00:00
Simon Pilgrim	e3e38cce4a	[X86] Add widenSubVector to size in bits helper. NFCI. We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size. This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources. llvm-svn: 364803	2019-07-01 16:20:47 +00:00
Simon Pilgrim	172fe5dd19	[X86] CombineShuffleWithExtract - updated description comments. NFCI. CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size. llvm-svn: 364745	2019-07-01 11:33:45 +00:00
Craig Topper	29fff0797b	[X86] Improve the type checking fast-isel handling of vector bitcasts. We had a bunch of vector size legality checks for the source type based on feature flags, but we didn't check the destination type at all beyond ensuring that it was a "simple" type. But this allowed the destination to be i128 which isn't legal. This commit changes the code to use TLI's isTypeLegal logic in place of the all the subtarget checks. Then additionally checks that the source and dest are vectors. Fixes 42452 llvm-svn: 364729	2019-07-01 07:09:34 +00:00
Craig Topper	4ca81a9b99	[X86] Add a DAG combine to replace vector loads feeding a v4i32->v2f64 CVTSI2FP/CVTUI2FP node with a vzload. But only when the load isn't volatile. This improves load folding during isel where we only have vzload and scalar_to_vector+load patterns. We can't have full vector load isel patterns for the same volatile load issue. Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns. llvm-svn: 364728	2019-07-01 07:09:31 +00:00
Craig Topper	d1728f8987	[X86] Add MOVHPDrm/MOVLPDrm patterns that use VZEXT_LOAD. We already had patterns that used scalar_to_vector+load. But we can also have a vzload. Found while investigating combining scalar_to_vector+load to vzload. llvm-svn: 364726	2019-07-01 07:09:23 +00:00
Fangrui Song	78ee2fbf98	Cleanup: llvm::bsearch -> llvm::partition_point after r364719 llvm-svn: 364720	2019-06-30 11:19:56 +00:00
Craig Topper	725a8a5dc4	[X86] Custom lower AVX masked loads to masked load and vselect instead of selecting a maskmov+vblend during isel. AVX masked loads only support 0 as the value for masked off elements. So we need an extra blend to support other values. Previously we expanded the masked load to two instructions with isel patterns. With this patch we now insert the vselect during lowering and it will be separately selected as a blend. llvm-svn: 364718	2019-06-30 06:46:37 +00:00
Sanjay Patel	9126c84f50	[x86] remove stale comment about cmov; NFC The cmov node used to sometimes return a glue result (and that's what 'flag' meant in this context), but that was removed with D38664. llvm-svn: 364687	2019-06-28 21:45:55 +00:00
Simon Pilgrim	978a08c885	[X86] CombineShuffleWithExtract - recurse through EXTRACT_SUBVECTOR chain llvm-svn: 364667	2019-06-28 17:57:32 +00:00
Simon Pilgrim	a54e1a0f01	[X86] CombineShuffleWithExtract - only require 1 source to be EXTRACT_SUBVECTOR We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be. Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all). llvm-svn: 364644	2019-06-28 12:24:49 +00:00
Craig Topper	cbb88a5169	[X86] Connect the output chain properly when combining vzext_movl+load into vzext_load. llvm-svn: 364625	2019-06-28 06:58:50 +00:00
Craig Topper	e832adea0f	[X86] Remove some duplicate patterns that already exist as part of their instruction definition. NFC llvm-svn: 364623	2019-06-28 05:03:47 +00:00
Sanjay Patel	a95ca2b5ff	[x86] prevent crashing from select narrowing with AVX512 llvm-svn: 364585	2019-06-27 20:16:58 +00:00
Simon Pilgrim	1fd1c60979	[X86] combineX86ShufflesRecursively - merge shuffles with more than 2 inputs We already had the infrastructure for this, but were waiting for the fix for a number of regressions which were handled by the recent shuffle(extract_subvector(),extract_subvector()) -> extract_subvector(shuffle()) shuffle combines llvm-svn: 364569	2019-06-27 17:30:51 +00:00
Simon Pilgrim	e9a2f4fe2c	Use getConstantOperandAPInt instead of getConstantOperandVal for comparisons. getConstantOperandAPInt avoids any large integer issues - these are unlikely but the fuzzers do like to mess around..... llvm-svn: 364564	2019-06-27 16:46:00 +00:00
Simon Pilgrim	74343eba37	[X86] getTargetVShiftByConstNode - reduce variable scope. NFCI. Fixes cppcheck warning. llvm-svn: 364561	2019-06-27 16:33:44 +00:00
Djordje Todorovic	71d3869f60	[Backend] Keep call site info valid through the backend Handle call instruction replacements and deletions in order to preserve valid state of the call site info of the MachineFunction. NOTE: If the call site info is enabled for a new target, the assertion from the MachineFunction::DeleteMachineInstr() should help to locate places where the updateCallSiteInfo() should be called in order to preserve valid state of the call site info. ([10/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D61062 llvm-svn: 364536	2019-06-27 13:10:29 +00:00
Simon Pilgrim	c5cff5d3d1	[X86] getFauxShuffle - add DemandedElts as a filter This is currently benign but will be used in the future based on the elements referenced by the parent shuffle(s). llvm-svn: 364530	2019-06-27 12:35:52 +00:00
Simon Pilgrim	90e121fbe6	[X86][AVX] SimplifyDemandedVectorElts - combine PERMPD(x) -> EXTRACTF128(X) If we only use the bottom lane, see if we can simplify this to extract_subvector - which is always at least as quick as PERMPD/PERMQ. llvm-svn: 364518	2019-06-27 11:16:03 +00:00
Djordje Todorovic	7eeeb5947e	[ISEL][X86] Tracking of registers that forward call arguments While lowering calls, collect info about registers that forward arguments into following function frame. We store such info into the MachineFunction of the call. This is used very late when dumping DWARF info about call site parameters. ([9/13] Introduce the debug entry values.) Co-authored-by: Ananth Sowda <asowda@cisco.com> Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com> Co-authored-by: Ivan Baev <ibaev@cisco.com> Differential Revision: https://reviews.llvm.org/D60715 llvm-svn: 364516	2019-06-27 10:51:15 +00:00
Diana Picus	43fb5ae50c	[GlobalISel] Accept multiple vregs for lowerCall's args Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512	2019-06-27 09:18:03 +00:00
Diana Picus	8138996128	[GlobalISel] Accept multiple vregs for lowerCall's result Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511	2019-06-27 09:15:53 +00:00
Diana Picus	c3dbe23977	[GlobalISel] Accept multiple vregs in lowerFormalArgs Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510	2019-06-27 08:54:17 +00:00
Diana Picus	69ce1c1319	[GlobalISel] Allow multiple VRegs in ArgInfo. NFC Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509	2019-06-27 08:50:53 +00:00
Mikael Holmen	7b81b61368	Silence gcc warning after r364458 Without the fix gcc 7.4.0 complains with ../lib/Target/X86/X86ISelLowering.cpp: In function 'bool getFauxShuffleMask(llvm::SDValue, llvm::SmallVectorImpl<int>&, llvm::SmallVectorImpl<llvm::SDValue>&, llvm::SelectionDAG&)': ../lib/Target/X86/X86ISelLowering.cpp:6690:36: error: enumeral and non-enumeral type in conditional expression [-Werror=extra] int Idx = (ZeroMask[j] ? SM_SentinelZero : (i + j + Ofs)); ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors llvm-svn: 364507	2019-06-27 08:16:18 +00:00
Craig Topper	9153501f07	[X86] Remove (vzext_movl (scalar_to_vector (load))) matching code from selectScalarSSELoad. I think this will be turning into vzext_load during DAG combine. llvm-svn: 364499	2019-06-27 05:52:00 +00:00
Craig Topper	9ea5a32251	[X86] Teach selectScalarSSELoad to not narrow volatile loads. llvm-svn: 364498	2019-06-27 05:51:56 +00:00
Craig Topper	3d12971e1c	[X86] Rework the logic in LowerBuildVectorv16i8 to make better use of any_extend and break false dependencies. Other improvements This patch rewrites the loop iteration to only visit every other element starting with element 0. And we work on the "even" element and "next" element at the same time. The "First" logic has been moved to the bottom of the loop and doesn't run on every element. I believe it could create dangling nodes previously since we didn't check if we were going to use SCALAR_TO_VECTOR for the first insertion. I got rid of the "First" variable and just do a null check on V which should be equivalent. We also no longer use undef as the starting V for vectors with no zeroes to avoid false dependencies. This matches v8i16. I've changed all the extends and OR operations to use MVT::i32 since that's what they'll be promoted to anyway. I've tried to use zero_extend only when necessary and use any_extend otherwise. This resulted in some improvements in tests where we are now able to promote aligned (i32 (extload i8)) to a 32-bit load. Differential Revision: https://reviews.llvm.org/D63702 llvm-svn: 364469	2019-06-26 20:16:19 +00:00
Craig Topper	afa58b6ba1	[X86] Remove isTypePromotionOfi1ZeroUpBits and its helpers. This was trying to optimize concat_vectors with zero of setcc or kand instructions. But I think it produced the same code we produce for a concat_vectors with 0 even it it doesn't come from one of those operations. llvm-svn: 364463	2019-06-26 19:45:48 +00:00
Simon Pilgrim	dfe079ffbf	[X86][SSE] getFauxShuffleMask - handle OR(x,y) where x and y have no overlapping bits Create a per-byte shuffle mask based on the computeKnownBits from each operand - if for each byte we have a known zero (or both) then it can be safely blended. Fixes PR41545 llvm-svn: 364458	2019-06-26 18:21:26 +00:00
Simon Pilgrim	435ee9fb1f	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PMULDQ Allows narrowInsertExtractVectorBinOp to reduce vector size instead of the more restricted SimplifyDemandedVectorEltsForTargetNode llvm-svn: 364434	2019-06-26 14:58:11 +00:00
Simon Pilgrim	6b687bf681	[X86][SSE] X86TargetLowering::isCommutativeBinOp - add PCMPEQ Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364432	2019-06-26 14:40:49 +00:00
Simon Pilgrim	b13c6f1a9d	[X86][SSE] X86TargetLowering::isBinOp - add PCMPGT Allows narrowInsertExtractVectorBinOp to reduce vector size llvm-svn: 364431	2019-06-26 14:34:41 +00:00
Simon Pilgrim	24f96a0eee	[X86] shouldScalarizeBinop - never scalarize target opcodes. We have (almost) no target opcodes that have scalar/vector equivalents - for now assume we can't scalarize them (we can add exceptions if we need to). llvm-svn: 364429	2019-06-26 14:21:29 +00:00
Roman Lebedev	13889145f0	[X86][Codegen] X86DAGToDAGISel::matchBitExtract(): consistently capture lambdas by value llvm-svn: 364420	2019-06-26 12:19:52 +00:00
Roman Lebedev	fbb2e40d5c	[X86] X86DAGToDAGISel::matchBitExtract(): pattern c: truncation awareness Summary: The one thing of note here is that the 'bitwidth' constant (32/64) was previously pessimistic. Given `x & (-1 >> (C - z))`, we were taking `C` to be `bitwidth(x)`, but in reality we want `(-1 >> (C - z))` pattern to mean "low z bits must be all-ones". And for that, `C` should be `bitwidth(-1 >> (C - z))`, i.e. of the shift operation itself. Last pattern D does not seem to exhibit any of these truncation issues. Although it has the opposite problem - if we extract low bits (no shift) from i64, and then truncate to i32, then we fail to shrink this 64-bit extraction into 32-bit extraction. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62806 llvm-svn: 364419	2019-06-26 12:19:47 +00:00
Roman Lebedev	b0ecc1cc6b	[X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness Summary: (Not so) boringly identical to pattern a (D62786) Not yet sure how do deal with the last pattern c. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62793 llvm-svn: 364418	2019-06-26 12:19:39 +00:00
Roman Lebedev	8b9a03973a	[X86] X86DAGToDAGISel::matchBitExtract(): pattern a: truncation awareness Summary: Finally tying up loose ends here. The problem is quite simple: If we have pattern `(x >> start) & (1 << nbits) - 1`, and then truncate the result, that truncation will be propagated upwards, into the `and`. And that isn't currently handled. I'm only fixing pattern `a` here, the same fix will be needed for patterns `b`/`c` too. I think this isn't missing any extra legality checks, since we only look past truncations. Similary, i don't think we can get any other truncation there other than i64->i32. Reviewers: craig.topper, RKSimon, spatel Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62786 llvm-svn: 364417	2019-06-26 12:19:11 +00:00
Hans Wennborg	6876de90e8	Fix the build after r364401 It was failing with: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: error: call of overloaded 'makeArrayRef(<brace-enclosed initializer list>)' is ambiguous scaleShuffleMask<int>(Scale, makeArrayRef<int>({ 0, 2, 1, 3 }), Mask); ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:18772:66: note: candidates are: In file included from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/MachineFunction.h:20:0, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/CodeGen/CallingConvLower.h:19, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.h:17, from /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/Target/X86/X86ISelLowering.cpp:14: /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:480:15: note: llvm::ArrayRef<T> llvm::makeArrayRef(const std::vector<_RealType>&) [with T = int] ArrayRef<T> makeArrayRef(const std::vector<T> &Vec) { ^ /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include/llvm/ADT/ArrayRef.h:485:37: note: llvm::ArrayRef<T> llvm::makeArrayRef(const llvm::ArrayRef<T>&) [with T = int] template <typename T> ArrayRef<T> makeArrayRef(const ArrayRef<T> &Vec) { ^ llvm-svn: 364414	2019-06-26 11:56:38 +00:00
Simon Pilgrim	c0711af7f9	[X86][AVX] combineExtractSubvector - 'little to big' extract_subvector(bitcast()) support Ideally this needs to be a generic combine in DAGCombiner::visitEXTRACT_SUBVECTOR but there's some nasty regressions in aarch64 due to neon shuffles not handling bitcasts at all..... llvm-svn: 364407	2019-06-26 11:21:09 +00:00
Simon Pilgrim	3845a4f849	[X86][AVX] truncateVectorWithPACK - avoid bitcasted shuffles truncateVectorWithPACK is often used in conjunction with ComputeNumSignBits which struggles when peeking through bitcasts. This fix tries to avoid bitcast(shuffle(bitcast())) patterns in the 256-bit 64-bit sublane shuffles so we can still see through at least until lowering when the shuffles will need to be bitcasted to widen the shuffle type. llvm-svn: 364401	2019-06-26 09:50:11 +00:00
Clement Courbet	be98e0ab78	[ExpandMemCmp] Honor prefer-vector-width. Reviewers: gchatelet, echristo, spatel, atdt Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63769 llvm-svn: 364384	2019-06-26 07:06:49 +00:00
Matt Arsenault	8fcc70f141	Don't look for the TargetFrameLowering in the implementation The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349	2019-06-25 20:53:35 +00:00
Craig Topper	4577b8c17c	[X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load))) I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337	2019-06-25 17:31:52 +00:00
Craig Topper	14ea14ae85	[X86] Add a DAG combine to turn vzmovl+load into vzload if the load isn't volatile. Remove isel patterns for vzmovl+load We currently have some isel patterns for treating vzmovl+load the same as vzload, but that shrinks the load which we shouldn't do if the load is volatile. Rather than adding isel checks for volatile. This patch removes the patterns and teachs DAG combine to merge them into vzload when its legal to do so. Differential Revision: https://reviews.llvm.org/D63665 llvm-svn: 364333	2019-06-25 17:08:26 +00:00
Simon Pilgrim	aae4b68703	[X86] lowerShuffleAsSpecificZeroOrAnyExtend - add ANY_EXTEND TODO. lowerShuffleAsSpecificZeroOrAnyExtend should be able to lower to ANY_EXTEND_VECTOR_INREG as well as ZER_EXTEND_VECTOR_INREG. llvm-svn: 364313	2019-06-25 13:36:53 +00:00
Clement Courbet	3bc5ad551a	[ExpandMemCmp] Move all options to TargetTransformInfo. Split off from D60318. llvm-svn: 364281	2019-06-25 08:04:13 +00:00
Craig Topper	7fccb2ac5e	[X86] Don't a vzext_movl in LowerBuildVectorv16i8/LowerBuildVectorv8i16 if there are no zeroes in the vector we're building. In LowerBuildVectorv16i8 we took care to use an any_extend if the first pair is in the lower 16-bits of the vector and no elements are 0. So bits [31:16] will be undefined. But we still emitted a vzext_movl to ensure that bits [127:32] are 0. If we don't need any zeroes we should be consistent and make all of 127:16 undefined. In LowerBuildVectorv8i16 we can just delete the vzext_movl code because we only use the scalar_to_vector when there are no zeroes. So the vzext_movl is always unnecessary. Found while investigating whether (vzext_movl (scalar_to_vector (loadi32)) patterns are necessary. At least one of the cases where they were necessary was where the loadi32 matched 32-bit aligned 16-bit extload. Seemed weird that we required vzext_movl for that case. Differential Revision: https://reviews.llvm.org/D63700 llvm-svn: 364207	2019-06-24 17:28:41 +00:00
Craig Topper	033774e144	[X86] Cleanups and safety checks around the isFNEG This patch does a few things to start cleaning up the isFNEG function. -Remove the Op0/Op1 peekThroughBitcast calls that seem unnecessary. getTargetConstantBitsFromNode has its own peekThroughBitcast inside. And we have a separate peekThroughBitcast on the return value. -Add a check of the scalar size after the first peekThroughBitcast to ensure we haven't changed the element size and just did something like f32->i32 or f64->i64. -Remove an unnecessary check that Op1's type is floating point after the peekThroughBitcast. We're just going to look for a bit pattern from a constant. We don't care about its type. -Add VT checks on several places that consume the return value of isFNEG. Due to the peekThroughBitcasts inside, the type of the return value isn't guaranteed. So its not safe to use it to build other nodes without ensuring the type matches the type being used to build the node. We might be able to replace these checks with bitcasts instead, but I don't have a test case so a bail out check seemed better for now. Differential Revision: https://reviews.llvm.org/D63683 llvm-svn: 364206	2019-06-24 17:28:26 +00:00
Matt Arsenault	faeaedf8e9	GlobalISel: Remove unsigned variant of SrcOp Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194	2019-06-24 16:16:12 +00:00
Matt Arsenault	e3a676e9ad	CodeGen: Introduce a class for registers Avoids using a plain unsigned for registers throughoug codegen. Doesn't attempt to change every register use, just something a little more than the set needed to build after changing the return type of MachineOperand::getReg(). llvm-svn: 364191	2019-06-24 15:50:29 +00:00
Craig Topper	e8da65c698	[X86] Turn v16i16->v16i8 truncate+store into a any_extend+truncstore if we avx512f, but not avx512bw. Ideally we'd be able to represent this truncate as a any_extend to v16i32 and a truncate, but SelectionDAG doens't know how to not fold those together. We have isel patterns to use a vpmovzxwd+vpdmovdb for the truncate, but we aren't able to simultaneously fold the load and the store from the isel pattern. By pulling the truncate into the store we can successfully hide it from the DAG combiner. Then we can isel pattern match the truncstore and load+any_extend separately. llvm-svn: 364163	2019-06-23 23:51:21 +00:00
Craig Topper	c8d94e7889	[X86] Fix isel pattern that was looking for a bitcasted load. Remove what appears to be a copy/paste mistake. DAG combine should ensure bitcasts of loads don't exist. Also remove 3 patterns that are identical to the block above them. llvm-svn: 364158	2019-06-23 19:17:50 +00:00
Craig Topper	cadd826d0a	[X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150	2019-06-23 06:06:04 +00:00
Simon Pilgrim	a962c1bc0f	[X86][SSE] Fold extract_subvector(vselect(x,y,z),0) -> vselect(extract_subvector(x,0),extract_subvector(y,0),extract_subvector(z,0)) llvm-svn: 364136	2019-06-22 17:57:01 +00:00
Craig Topper	4649a051bf	[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0) 128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg. This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns. Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining. I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload. Differential Revision: https://reviews.llvm.org/D63512 llvm-svn: 364095	2019-06-21 19:10:21 +00:00
Craig Topper	4569cdbcf5	[X86] Don't mark v64i8/v32i16 ISD::SELECT as custom unless they are legal types. We don't have any Custom handling during type legalization. Only operation legalization. Fixes PR42355 llvm-svn: 364093	2019-06-21 18:50:00 +00:00
Craig Topper	ce6c06dfdd	[X86] Add a debug print of the node in the default case for unhandled opcodes in ReplaceNodeResults. This should be unreachable, but bugs can make it reachable. This adds a debug print so we can see the bad node in the output when the llvm_unreachable triggers. llvm-svn: 364091	2019-06-21 18:49:21 +00:00
Simon Pilgrim	5dba4ed208	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, EXTRACT_SUBVECTOR(SRC1)) as shuffle Subvector shuffling often ends up as insert/extract subvector. llvm-svn: 364090	2019-06-21 18:35:04 +00:00
Craig Topper	6af1be9664	[X86] Use vmovq for v4i64/v4f64/v8i64/v8f64 vzmovl. We already use vmovq for v2i64/v2f64 vzmovl. But we were using a blendpd+xorpd for v4i64/v4f64/v8i64/v8f64 under opt speed. Or movsd+xorpd under optsize. I think the blend with 0 or movss/d is only needed for vXi32 where we don't have an instruction that can move 32 bits from one xmm to another while zeroing upper bits. movq is no worse than blendpd on any known CPUs. llvm-svn: 364079	2019-06-21 17:24:21 +00:00
Simon Pilgrim	96e77ce626	[X86] isBinOp - move commutative ops to isCommutativeBinOp. NFCI. TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate. llvm-svn: 364072	2019-06-21 16:23:28 +00:00
Simon Pilgrim	36a999ffb8	[X86] X86ISD::ANDNP is a (non-commutative) binop The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better. llvm-svn: 364038	2019-06-21 12:42:39 +00:00
Simon Pilgrim	9184b009cf	[X86] createMMXBuildVector - call with BuildVectorSDNode directly. NFCI. llvm-svn: 364030	2019-06-21 11:25:06 +00:00
Simon Pilgrim	c26b8f2afc	[X86] combineAndnp - use isNOT instead of manually checking for (XOR x, -1) llvm-svn: 364026	2019-06-21 11:13:15 +00:00
Simon Pilgrim	b5733581c4	[X86] foldVectorXorShiftIntoCmp - use isConstOrConstSplat. NFCI. Use the isConstOrConstSplat helper instead of inspecting the build vector manually. llvm-svn: 364024	2019-06-21 10:54:30 +00:00
Simon Pilgrim	771c33e375	[X86][AVX] isNOT - handle concat_vectors(xor X, -1, xor Y, -1) pattern llvm-svn: 364022	2019-06-21 10:44:15 +00:00
Fangrui Song	dc8de6037c	Simplify std::lower_bound with llvm::{bsearch,lower_bound}. NFC llvm-svn: 364006	2019-06-21 05:40:31 +00:00
Craig Topper	9e1665f2d6	[X86] Add BLSI to isUseDefConvertible. Summary: BLSI sets the C flag is the input is not zero. So if its followed by a TEST of the input where only the Z flag is consumed, we can replace it with the opposite check of the C flag. We should be able to do the same for BLSMSK and BLSR, but the naive test case for those is being optimized to a subo by CodeGenPrepare. Reviewers: spatel, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63589 llvm-svn: 363957	2019-06-20 17:52:53 +00:00
Simon Pilgrim	a4d705e0ef	[X86] LowerAVXExtend - handle ANY_EXTEND_VECTOR_INREG lowering as well. llvm-svn: 363922	2019-06-20 11:31:54 +00:00
Craig Topper	b4ea64570c	[X86] Remove memory instructions form isUseDefConvertible. The caller of this is looking for comparisons of the input to these instructions with 0. But the memory instructions input is an addess not a value input in a register. llvm-svn: 363907	2019-06-20 04:58:40 +00:00
Craig Topper	451f7feb64	[X86] Add v64i8/v32i16 to several places in X86CallingConv.td where they seemed obviously missing. llvm-svn: 363906	2019-06-20 04:29:00 +00:00
Sanjay Patel	b5640b6fe8	[x86] avoid vector load narrowing with extracted store uses (PR42305) This is an exception to the rule that we should prefer xmm ops to ymm ops. As shown in PR42305: https://bugs.llvm.org/show_bug.cgi?id=42305 ...the store folding opportunity with vextractf128 may result in better perf by reducing the instruction count. Differential Revision: https://reviews.llvm.org/D63517 llvm-svn: 363853	2019-06-19 18:13:47 +00:00
Simon Pilgrim	0018b78ef6	[X86][SSE] combineToExtendVectorInReg - add ANY_EXTEND support TODO. NFCI. So I don't forget - there's a load of yak shaving to do first. llvm-svn: 363847	2019-06-19 17:42:37 +00:00
Simon Pilgrim	34279db355	[X86][SSE] Combine shuffles to ANY_EXTEND/ANY_EXTEND_VECTOR_INREG. We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension. llvm-svn: 363841	2019-06-19 17:21:15 +00:00
Simon Pilgrim	cdc0236e3a	[X86] getExtendInVec - take a ISD::*_EXTEND opcode instead of a IsSigned bool flag. NFCI. Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag. llvm-svn: 363818	2019-06-19 15:18:24 +00:00
Simon Pilgrim	d4754cac89	[X86] Add _EXTEND -> _EXTEND_VECTOR_INREG opcode conversion helper. NFCI. Given a _EXTEND or _EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG. llvm-svn: 363812	2019-06-19 14:54:02 +00:00
Simon Pilgrim	2b309027ed	[X86] Merge extract_subvector(_EXTEND) and extract_subvector(_EXTEND_VECTOR_INREG) handling. NFCI. llvm-svn: 363808	2019-06-19 14:25:27 +00:00
Clement Courbet	4ef7c2868a	[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773	2019-06-19 08:44:31 +00:00
Matt Arsenault	9cac4e6d14	Rename ExpandISelPseudo->FinalizeISel, delay register reservation This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757	2019-06-19 00:25:39 +00:00
Craig Topper	10e6128c62	[X86] Remove unnecessary line that makes v4f32 FP_ROUND Legal. NFC FP_ROUND defaults to Legal for all MVT types and nothing changes the v4f32 entry way from this default. If we needed this line we'd also need one for v8f32 with AVX512 which we don't have. llvm-svn: 363719	2019-06-18 19:04:03 +00:00
Simon Pilgrim	9c8593934a	[X86][AVX] extract_subvector(any_extend(x)) -> any_extend_vector_inreg(x) Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here. llvm-svn: 363693	2019-06-18 15:30:50 +00:00
Simon Pilgrim	7dd529e54d	[X86] Replace any_extend* vector extensions with zero_extend* equivalents First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal. In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep...... Differential Revision: https://reviews.llvm.org/D63326 llvm-svn: 363655	2019-06-18 09:50:13 +00:00
Craig Topper	f4284f8a9d	[X86] Move code that shrinks immediates for ((x << C1) op C2) into a helper function. NFCI Preliminary step for D59909 llvm-svn: 363645	2019-06-18 04:23:58 +00:00
Craig Topper	587427716c	[X86] Remove MOVDI2SSrm/MOV64toSDrm/MOVSS2DImr/MOVSDto64mr CodeGenOnly instructions. The isel patterns for these use a bitcast and load/store, but DAG combine should have canonicalized those away. For the purposes of the memory folding table these opcodes can be replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes. llvm-svn: 363644	2019-06-18 03:23:15 +00:00
Craig Topper	8582ecd8d9	[X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class. Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643	2019-06-18 03:23:11 +00:00
Craig Topper	971ad74ba2	Use VR128X instead of FR32X/FR64X for the register class in VMOVSSZmrk/VMOVSDZmrk. Removes COPY_TO_REGCLASS from some patterns. llvm-svn: 363630	2019-06-17 23:08:29 +00:00
Craig Topper	0e18300802	[X86] Make an assert in LowerSCALAR_TO_VECTOR stricter to make it clear what types are allowed here. NFC Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code. llvm-svn: 363629	2019-06-17 23:08:09 +00:00
Craig Topper	f3f968adcd	[X86] Add TB_NO_REVERSE to some memory folding table entries where the register form requires 64-bit mode, but the memory form does not. We don't know if its safe to unfold if we're in 32-bit mode. This is simlar to what was done to some load opcodes in r363523. I think its pretty unlikely we will try to unfold these anyway so I don't think this is testable. llvm-svn: 363595	2019-06-17 18:38:07 +00:00
Simon Pilgrim	835999e48a	[X86][SSE] Scalarize under-aligned XMM vector nt-stores (PR42026) If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64). llvm-svn: 363592	2019-06-17 18:20:04 +00:00
Simon Pilgrim	bb9adfdb4e	[X86][AVX] Split under-aligned vector nt-stores. If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it. llvm-svn: 363582	2019-06-17 17:22:38 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Simon Pilgrim	12cb792d7f	[X86] combineLoad - begun making the load split code more generic. NFCI. This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment. This is necessary for an upcoming patch to split under-aligned non-temporal vector loads. llvm-svn: 363570	2019-06-17 15:54:36 +00:00
Simon Pilgrim	454e6b9010	[X86][SSE] Prevent misaligned non-temporal vector load/store combines For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway. First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets. Differential Revision: https://reviews.llvm.org/D63246 llvm-svn: 363564	2019-06-17 14:26:10 +00:00
Craig Topper	9f2f127009	[X86] Add TB_NO_REVERSE to some folding table entries where the register from uses the REX prefix, but the memory form does not. It would not be safe to unfold the memory form the register form without checking that we are compiling for 64-bit mode. This probaby isn't a real functional issue since we are unlikely to unfold any of these instructions since they don't have any tied registers, aren't commutable, and don't have any inputs other than the address. llvm-svn: 363523	2019-06-16 22:33:09 +00:00
Sanjay Patel	d14389c0a5	[x86] split 256-bit vector selects if operands are vector concats This is similar logic/motivation to the select splitting in D62969. In D63233, the pattern changes so that we no longer have an extract_subvector of vselect, but the operands of the select are still being concatenated. The closest case is represented in either the first or last test diffs here - we have an extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions. I think that's the right trade-off for most AVX1 targets. In the example based on PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 ...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx). Differential Revision: https://reviews.llvm.org/D63364 llvm-svn: 363508	2019-06-16 14:04:49 +00:00
Simon Pilgrim	fcffc2facc	[X86] CombineShuffleWithExtract - handle cases with different vector extract sources Insert the shorter vector source into an undef vector of the longer vector source's type. llvm-svn: 363507	2019-06-16 08:00:41 +00:00
Simon Pilgrim	456ca5d7f7	[X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI. llvm-svn: 363501	2019-06-15 19:12:44 +00:00
Simon Pilgrim	90e87af303	[X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well. Fixes PR34380 llvm-svn: 363500	2019-06-15 18:30:43 +00:00
Simon Pilgrim	990f3ceb67	[X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3) This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0) llvm-svn: 363499	2019-06-15 17:05:24 +00:00
Kevin P. Neal	fece7c6c83	[FPEnv] Lower STRICT_FP_EXTEND and STRICT_FP_ROUND nodes in preprocess phase of ISelLowering to mirror non-strict nodes on x86. I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5 Submitted by: Drew Wock <drew.wock@sas.com> Reviewed by: Craig Topper, Kevin P. Neal Approved by: Craig Topper Differential Revision: https://reviews.llvm.org/D63271 llvm-svn: 363417	2019-06-14 16:28:55 +00:00
Eric Christopher	5e83d8fff4	Move commentary on opcode translation for code16 mov instructions to segment registers closer to the segment register check for when we add further optimizations. llvm-svn: 363355	2019-06-14 04:51:55 +00:00
Craig Topper	cf34a2bd5d	[X86Disassembler] Unify the EVEX and VEX code in emitContextTable. Merge the ATTR_VEXL/ATTR_EVEXL bits. NFCI Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes. Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately. llvm-svn: 363330	2019-06-13 22:15:25 +00:00
Simon Pilgrim	757a2f13fd	[X86] Use fresh MemOps when emitting VAARG64 Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on. Committed on behalf of @luke (Luke Lau) Differential Revision: https://reviews.llvm.org/D62726 llvm-svn: 363268	2019-06-13 14:05:37 +00:00
Simon Pilgrim	6b56ad164c	[CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257	2019-06-13 12:58:55 +00:00
Simon Pilgrim	0baf136a4d	[X86][SSE] Avoid assert for broadcast(horiz-op()) cases for non-f64 cases. Based on fuzz test from @craig.topper llvm-svn: 363251	2019-06-13 11:26:21 +00:00
Tom Stellard	f335672218	X86: Clean up pass initialization Summary: - Remove redundant initializations from pass constructors that were already being initialized by LLVMInitializeX86Target(). - Add initialization function for the FPS pass. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63218 llvm-svn: 363221	2019-06-13 02:09:32 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	5b0e0dd709	[X86][AVX] Fold concat(vpermilps(x,c),vpermilps(y,c)) -> vpermilps(concat(x,y),c) Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1). An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit. llvm-svn: 363178	2019-06-12 16:38:20 +00:00
Craig Topper	ed4cd44870	[X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad. The non-masked versions are already in there. I'm having some trouble coming up with a way to test this right now. Most load folding should happen during isel so I'm not sure how to get peephole pass to do it. llvm-svn: 363125	2019-06-12 06:29:53 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Craig Topper	627d8168e7	[X86] Add load folding isel patterns to scalar_math_patterns and AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032	2019-06-11 04:30:53 +00:00
Tom Stellard	4b0b26199b	Revert CMake: Make most target symbols hidden by default This reverts r362990 (git commit `374571301d`) This was causing linker warnings on Darwin: ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)' from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol 'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&), std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)' means the weak symbol cannot be overridden at runtime. This was likely caused by different translation units being compiled with different visibility settings. llvm-svn: 363028	2019-06-11 03:21:13 +00:00
Tom Stellard	374571301d	CMake: Make most target symbols hidden by default Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990	2019-06-10 22:12:56 +00:00
Craig Topper	9000a72a4b	[X86] When promoting i16 compare with immediate to i32, try to use sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920	2019-06-10 04:50:12 +00:00
Craig Topper	ceb807bbbc	[X86] Disable f32->f64 extload when sse2 is enabled Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919	2019-06-10 04:37:16 +00:00
Craig Topper	dd10099d5c	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled. llvm-svn: 362915	2019-06-10 01:18:55 +00:00
Craig Topper	f7ba8b808a	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914	2019-06-10 00:41:07 +00:00
Jatin Bhateja	2a30aeb010	[X86] NFCI : Comment updation for EVEX to VEX translation. Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898	2019-06-09 09:59:26 +00:00
Craig Topper	2ba0e2518b	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant. We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895	2019-06-09 03:21:33 +00:00
Craig Topper	7d8494c41c	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894	2019-06-08 23:53:31 +00:00
Jonas Paulsson	fdc4ea34e3	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection. This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868	2019-06-08 06:19:15 +00:00
Craig Topper	bd03230cb0	[X86] Remove unnecessary new line escape from the end of a macro. NFC llvm-svn: 362837	2019-06-07 20:30:40 +00:00
Sanjay Patel	6880bceda2	[x86] narrow extract subvector of vector select This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797	2019-06-07 13:17:46 +00:00
Pengfei Wang	f8b28931a7	[X86] -march=cooperlake (llvm) Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776	2019-06-07 08:31:35 +00:00
Craig Topper	f320f26716	[X86] Make a bunch of merge masked binops commutable for loading folding. This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746	2019-06-06 21:00:04 +00:00
Craig Topper	6b67dfa54c	[X86] Make masked floating point equality/ordered compares commutable for load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717	2019-06-06 16:39:04 +00:00
Craig Topper	9226ba6b37	[X86] Don't turn avx masked.load with constant mask into masked.load+vselect when passthru value is all zeroes. This is intended to enable the use of an immediate blend or more optimal instruction. But if the passthru is zero we don't need any additional instructions. llvm-svn: 362675	2019-06-06 05:41:27 +00:00
Craig Topper	3975b15dba	[X86] Fix mistake that marked VADDSSrrb_Int/VADDSDrrb_Int/VMULSSrrb_Int/VMULSDrrb_Int as commutable. One of the sources controls the pass through value for the upper bits of the result so we can't really commute it. In practice this problem isn't a functional issue because we would only try to commute this instruction in order to fold a load. But we can't do embedded rounding and fold a load at the same time. So the load fold would never succeed so I don't think we would ever commute or at least keep the version after commuting. llvm-svn: 362647	2019-06-05 21:00:31 +00:00
Craig Topper	d0fff89b81	[X86] Add the vector integer min/max instructions to isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The reduce test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629	2019-06-05 18:25:09 +00:00
Sanjay Patel	2bf82879bd	[x86] split more 256-bit stores of concatenated vectors As suggested in D62498 - collectConcatOps() matches both concat_vectors and insert_subvector patterns, and we see more test improvements by using the more general match. llvm-svn: 362620	2019-06-05 16:40:57 +00:00
Simon Pilgrim	de586bd1fd	[X86][AVX] Generalize split256BitStore to splitVectorStore. NFCI. Enables us to use this to split 512-bit vectors in future patches. llvm-svn: 362617	2019-06-05 16:14:14 +00:00
Simon Pilgrim	886a55eaa0	[X86][AVX] combineX86ShuffleChain - combine shuffle(extractsubvector(x),extractsubvector(y)) We already handle the case where we combine shuffle(extractsubvector(x),extractsubvector(x)), this relaxes the requirement to permit different sources as long as they have the same value type. This causes a couple of cases where the VPERMV3 binary shuffles occur at a wider width than before, which I intend to improve in future commits - but as only the subvector's mask indices are defined, these will broadcast so we don't see any increase in constant size. llvm-svn: 362599	2019-06-05 12:56:53 +00:00
Craig Topper	78fdce25a1	[X86] Cleanup convertIntLogicToFPLogic a little. NFCI -Use early returns to reduce indentation -Replace multipe ifs with a switch. -Replace an assert with an llvm_unreachable default in the switch. -Check that the FP type we're going to use for the X86ISD::FAND/FOR/FXOR is legal rather than checking that the integer type matches the width of a legal scalar fp type. This all runs after legalization so it shouldn't really matter, but making sure we're using a valid type in the X86ISD node is really whats important. llvm-svn: 362565	2019-06-05 01:00:34 +00:00
Craig Topper	137de38009	[X86] Mutate fceil/ffloor/ftrunc/fnearbyint/frint into X86ISD::RNDSCALE during PreProcessIselDAG to cut down on pattern permutations We already need to have patterns for X86ISD::RNDSCALE to support software intrinsics. But we currently have 5 sets of patterns for the 5 rounding operations. For of these 6 patterns we have to support 3 vectors widths, 2 element sizes, sse/vex/evex encodings, load folding, and broadcast load folding. This results in a fair amount of bytes in the isel table. This patch adds code to PreProcessIselDAG to morph the fceil/ffloor/ftrunc/fnearbyint/frint to X86ISD::RNDSCALE. This way we can remove everything, but the intrinsic pattern while still allowing the operations to be considered Legal for DAGCombine and Legalization. This shrinks the DAGISel by somewhere between 9K and 10K. There is one complication to this, the STRICT versions of these nodes are currently mutated to their none strict equivalents at isel time when the node is visited. This won't be true in the future since that loses the chain ordering information. For now I've also added support for the non-STRICT nodes to Select so we can change the STRICT versions there after they've been mutated to their non-STRICT versions. We'll probably need a STRICT version of RNDSCALE or something to handle this in the future. Which will take us back to needing 2 sets of patterns for strict and non-strict, but that's still better than the 11 or 12 sets of patterns we'd need. We can probably do something similar for scalar, but I haven't looked at it yet. Differential Revision: https://reviews.llvm.org/D62757 llvm-svn: 362535	2019-06-04 18:03:07 +00:00
Benjamin Kramer	03ff1b3c30	[X86] Fold single-use variable into assert. NFC. Avoids an unused variable warning in Release builds. llvm-svn: 362534	2019-06-04 18:01:07 +00:00
Sanjay Patel	606eb2367f	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is a reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 362524	2019-06-04 16:40:04 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Simon Pilgrim	a6e289e9f8	[X86][SSE] Pulled out (sub (xor X, M), M) 'ConditionalNegate' out pattern match code. NFCI. As discussed on D62777 - we should be able to use this in more SSE41+ cases as well but that requires us to separate it from the OR(AND(),ANDN()) matcher. llvm-svn: 362504	2019-06-04 15:02:33 +00:00
Craig Topper	dcf865f0ca	[X86] Fix the pattern for merge masked vcvtps2pd. r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440	2019-06-03 19:29:14 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Simon Pilgrim	59a8db628b	[TTI][X86] Cleanup getMaskedMemoryOpCost. NFCI. Prep work before resurrecting D61257. llvm-svn: 362335	2019-06-02 18:06:42 +00:00
Simon Pilgrim	71a39bcf68	[X86] isHorizontalBinOp - add extract_subvector(shuffle(x)) handling (PR39921) Let's us match horizontal op patterns on fast-variable-shuffle targets (Haswell etc.) llvm-svn: 362327	2019-06-02 15:47:49 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Craig Topper	396a915c26	[X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative. llvm-svn: 362309	2019-06-02 00:42:58 +00:00
Craig Topper	c288a19bb7	[X86] Add AVX512BF16 and AVX512VP2INTERSECT instructions to the loading folding tables. llvm-svn: 362288	2019-06-01 06:20:59 +00:00
Craig Topper	48fdb61766	[X86] Make the X86FoldTablesEmitter functional again. Fix the spacing in the output to make it easier to diff. Fix a few other formatting issues in the manual table. And remove some old FIXMEs. llvm-svn: 362287	2019-06-01 06:20:55 +00:00
Craig Topper	31d00d80a2	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64. These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203	2019-05-31 07:38:26 +00:00
Craig Topper	b79cc5f802	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199	2019-05-31 06:21:53 +00:00
Craig Topper	23066033a1	[X86] Correct the ins operand order for MASKPAIR16STORE to match other store instructions. This makes the 5 address operands come first. And the data operand comes last. This matches the operand order the instruction is created with. It's also the expected order in X86MCInstLower. So everything appeared to work, but the operands didn't match their declared type. Fixes a -verify-machineinstrs failure. Also remove the isel patterns from these instructions since they should only be used for stack spills and reloads. I'm not even sure what types the patterns were looking for to match. llvm-svn: 362193	2019-05-31 05:20:27 +00:00
Pengfei Wang	2e67d0c842	[X86] Add VP2INTERSECT instructions Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188	2019-05-31 02:50:41 +00:00
Craig Topper	70dc2200a2	[X86] Remove result type constraints from the extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175	2019-05-30 23:35:24 +00:00
Craig Topper	d6b74cc859	[X86] Remove code that unnecessarily sets EXTLOAD with src type of v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170	2019-05-30 22:29:06 +00:00
Simon Pilgrim	32aac1727a	[X86][SSE] Improve bool vector extload (PR26091) We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081	2019-05-30 10:25:20 +00:00
Pengfei Wang	1f67d94279	[X86] Add ENQCMD instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053	2019-05-30 03:59:16 +00:00
Pengfei Wang	72e3f9662b	Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to" This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918	2019-05-29 02:49:59 +00:00
Pengfei Wang	818c652643	[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass , const RegisterBank > so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912	2019-05-29 02:20:37 +00:00
Fangrui Song	656afe370d	[X86] Fix x86-64 call foo@tlsdesc(%rax) and support R_386_TLSGOTDESC R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq *(%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910	2019-05-29 02:02:59 +00:00
Adhemerval Zanella	6d7bf5e8df	[CodeGen] Add lrint/llrint builtins This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875	2019-05-28 20:47:44 +00:00
Sanjay Patel	f7980e727f	Revert "[x86] split 256-bit store of concatenated vectors" This reverts commit `d5a8637072`. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850	2019-05-28 17:37:58 +00:00
David Greene	561fcc0d63	[X86-64] Fix 256-bit SET0 lowering for non-VLX targets If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843	2019-05-28 15:37:01 +00:00
Sanjay Patel	d5a8637072	[x86] split 256-bit store of concatenated vectors This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822	2019-05-28 13:54:17 +00:00
Sanjay Patel	6bf4ca9d2e	[x86] fix 256-bit vector store splitting to honor 'volatile' Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815	2019-05-28 12:58:07 +00:00
Benjamin Kramer	57e267a2e9	[X86] Custom lower CONCAT_VECTORS of v2i1 The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814	2019-05-28 12:52:57 +00:00
Simon Pilgrim	4b48aa0e30	[X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized variable warnings. NFCI. llvm-svn: 361804	2019-05-28 10:53:23 +00:00
Simon Pilgrim	a044410f37	[X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREG Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734	2019-05-26 16:00:35 +00:00
Simon Pilgrim	58a8541dcc	[X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716	2019-05-26 10:54:23 +00:00
Simon Pilgrim	40fa52b174	[X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706	2019-05-25 18:02:17 +00:00
Nikita Popov	d87eceda0e	[X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmax If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704	2019-05-25 16:44:29 +00:00
Craig Topper	46e5052b8e	[X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691	2019-05-25 06:17:47 +00:00
Craig Topper	4b08fcdeb1	[X86] Add zero idioms to the haswell, broadwell, and skylake schedule models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690	2019-05-25 04:47:49 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Robert Lougher	170dfeb2ff	Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527	2019-05-23 18:15:12 +00:00
Fangrui Song	86c9ca48c3	[X86] Support -fno-plt __tls_get_addr calls In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll ___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453	2019-05-23 01:05:13 +00:00
Craig Topper	93f38e1f1a	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431	2019-05-22 21:00:18 +00:00
Craig Topper	9816d55776	[X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425	2019-05-22 20:04:55 +00:00
Sjoerd Meijer	aa4f1ffca4	[TargetMachine] error message unsupported code model When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370	2019-05-22 10:40:26 +00:00
Nikita Popov	15df05152d	[X86] Don't compare i128 through vector if construction not cheap (PR41971) Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352	2019-05-22 06:47:06 +00:00
Pengfei Wang	6a0d432e9e	[X86] [CET] Deal with return-twice function such as vfork, setjmp when CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342	2019-05-22 00:50:21 +00:00
Craig Topper	ed6df47bae	[X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFC We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288	2019-05-21 19:03:45 +00:00
Simon Pilgrim	4b82e50315	[X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP support Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270	2019-05-21 15:20:24 +00:00
Petar Jovanovic	e85bbf564d	[DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC) Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137	2019-05-20 10:35:57 +00:00
Craig Topper	3164b50af7	[X86] Remove combineShift function. Just dispatch directly to the handler for each flavor from the main switch. NFC llvm-svn: 361108	2019-05-19 01:01:46 +00:00
Simon Pilgrim	065431c82b	[X86][SSE] Fold movmsk(not(x)) -> not(movmsk) Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056	2019-05-17 17:56:25 +00:00
Simon Pilgrim	2c2f8e74b9	[X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp. Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052	2019-05-17 17:25:55 +00:00
Simon Pilgrim	279314e81b	[X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling. We can now rely on generic expansion to handle this. llvm-svn: 361038	2019-05-17 14:37:19 +00:00
Simon Pilgrim	62c7032c18	[X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035	2019-05-17 14:04:56 +00:00
Simon Pilgrim	a6d3bd486b	[X86] Pull out IsNOT helper. NFCI. Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012	2019-05-17 10:37:08 +00:00
Craig Topper	ae1597d360	[X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992	2019-05-17 06:40:21 +00:00
Fangrui Song	2463239777	[X86] Support .reloc , R_{386,X86_64}_NONE, This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983	2019-05-17 03:25:39 +00:00
David L. Jones	4a5e01faa4	[X86][AsmParser] Add mnemonics missed in r360954. These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960	2019-05-17 00:19:20 +00:00
David L. Jones	add7ed2281	[X86][AsmParser] Ignore "short" even harder in Intel syntax ASM. In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954	2019-05-16 23:27:07 +00:00
David L. Jones	11305984d0	[X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate". This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953	2019-05-16 23:27:05 +00:00
Reid Kleckner	08c15df29f	[X86] Deduplicate symbol lowering logic, NFC Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952	2019-05-16 23:15:26 +00:00
Craig Topper	f09b9d419f	[X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915	2019-05-16 16:53:50 +00:00
Adhemerval Zanella	73643b5041	[CodeGen] Add lround/llround builtins This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889	2019-05-16 13:15:27 +00:00
Craig Topper	e43bdf144c	[X86] Delay creating index register negations during address matching until after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823	2019-05-15 21:59:53 +00:00
Craig Topper	439228727a	[X86] Strengthen type constraints on some specialized X86 ISD opcodes that don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815	2019-05-15 21:16:28 +00:00
Craig Topper	384d46c0d5	[X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp. They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747	2019-05-15 04:15:46 +00:00
Philip Reames	658cad1287	[NFC] Reuse a helper function to eliminate duplicate code llvm-svn: 360740	2019-05-15 01:39:07 +00:00
Richard Trieu	0116385452	[X86] Create a TargetInfo header. NFC Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360736	2019-05-15 01:17:58 +00:00
Philip Reames	445f942fc4	Use an offset from TOS for idempotent rmw locked op lowering This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised. Differential Revision: https://reviews.llvm.org/D61862 llvm-svn: 360719	2019-05-14 22:32:42 +00:00
Simon Pilgrim	c2d9cfd925	[X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD targets (PR40758) D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced. This combine avoids AND immediate mask which usually means we reduce encoding size. Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily. Differential Revision: https://reviews.llvm.org/D61830 llvm-svn: 360684	2019-05-14 15:21:28 +00:00
Simon Pilgrim	2747ee2c83	[X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control is initialized. NFCI. Fixes scan-build warnings llvm-svn: 360664	2019-05-14 11:30:39 +00:00
Philip Reames	3098e44daa	[X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649	2019-05-14 04:43:37 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Craig Topper	e2966473dd	[X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of calling ReplaceAllUsesOfValueWith and returning SDValue(). Returning SDValue() makes the caller think that nothing happened and it will end up executing the Expand path. This generates extra nodes that will need to be pruned as dead code. Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a change and it will take care of replacing uses. This will prevent falling into the Expand path. llvm-svn: 360627	2019-05-13 22:17:13 +00:00
Craig Topper	5f999c2bea	[X86] Various type corrections to the code that creates LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence. These are updates to match how isel table would emit a LOCK_OR32mi8 node. -Use i32 for the immediate zero even though only 8 bits are encoded. -Use i16 for segment register. -Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match 64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1. -Emit an extra i32 result for the flags output. I don't know if the types here really matter just noticed it was inconsistent with normal behavior. llvm-svn: 360619	2019-05-13 21:01:24 +00:00
Robert Lougher	91a9d4ef4b	Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606	2019-05-13 17:36:46 +00:00
Nick Desaulniers	c33f754e74	[TargetLowering] Handle multi depth GEPs w/ inline asm constraints Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604	2019-05-13 17:27:44 +00:00
Simon Pilgrim	73aee29095	[X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef elts Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596	2019-05-13 16:10:11 +00:00
Simon Pilgrim	cf5a8eb7cd	[X86][SSE] Relax use limits for lowerAddSubToHorizontalOp (PR32433) Now that we can use HADD/SUB for scalar additions from any pair of extracted elements (D61263), we can relax the one use limit as we will be able to merge multiple uses into using the same HADD/SUB op. This exposes a couple of missed opportunities in LowerBuildVectorv4x32 which will be committed separately. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360594	2019-05-13 16:02:45 +00:00
Simon Pilgrim	d9aa928603	[X86] Add SimplifyDemandedBits support for PEXTRB/PEXTRW (PR39709) Test case will be included in a followup - its being used but its tricky to show a case that isn't caught at a later stage anyway. llvm-svn: 360588	2019-05-13 15:31:27 +00:00
Craig Topper	61e556d2bd	Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" I've included a new fix in X86RegisterInfo to prevent PR41619 without reintroducing r359392. We might be able to improve that in the base class implementation of shouldRewriteCopySrc somehow. But this hopefully enables forward progress on SimplifyDemandedBits improvements for now. Original commit message: This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. llvm-svn: 360552	2019-05-13 04:03:35 +00:00
Simon Pilgrim	a7fc763082	[X86][AVX] Split VZEXT_MOVL ymm/zmm if the upper elements are not demanded. Removes unnecessary vzeroupper noted in D61806 llvm-svn: 360543	2019-05-12 15:16:29 +00:00
Simon Pilgrim	fda6bffd3b	[X86][SSE] SimplifyDemandedBits - call PEXTRB/PEXTRW SimplifyDemandedVectorElts as well. See if we can simplify the demanded vector elts from the extraction before trying to simplify the demanded bits. This helps us with target shuffles and hops in particular. llvm-svn: 360535	2019-05-11 21:35:50 +00:00
Simon Pilgrim	6b10fde69b	[CostModel][X86] Add min/max reduction costs for all SSE targets The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528	2019-05-11 17:12:52 +00:00
Simon Pilgrim	e4c5b6d9bd	[X86][SSE] Add SimplifyDemandedVectorElts HADD/HSUB handling. Still missing PHADDW/PHSUBW tests because PEXTRW doesn't call SimplifyDemandedVectorElts llvm-svn: 360526	2019-05-11 16:07:12 +00:00
Simon Pilgrim	5e0f92acad	FixupLEAPass::fixupIncDec - non-LEA opcodes should not happen here. NFCI. Matches what we do in other functions and fixes scan-build warning about uninitialized NewOpcode variable. llvm-svn: 360525	2019-05-11 16:02:34 +00:00
Craig Topper	c9d7484aa3	[X86] Add CMOV_FR32X/CMOV_FR64X pseudo instructions. Use them in fast isel to fix a machine verifier error after adding test cases. Fast isel picks the FR32X/FR64X register classes when lowering pseudo select, but it didn't have the right opcode to go with it. llvm-svn: 360524	2019-05-11 16:00:28 +00:00
Craig Topper	74a436596d	[X86] Sink some fast isel code into the only if that uses it. NFC llvm-svn: 360523	2019-05-11 16:00:19 +00:00
Craig Topper	26f2b13a65	[X86] Use TLI.getRegClassFor to simplify some more fast isel code. NFCI llvm-svn: 360522	2019-05-11 16:00:13 +00:00
Craig Topper	682cc09675	[X86] Use getRegClassFor to simplify some code in fast isel. NFCI No need to select the register class based on type and features. It should already be setup by X86ISelLowering. llvm-svn: 360513	2019-05-11 05:18:58 +00:00
Craig Topper	31f7adb94f	[X86] Don't emit MOVNTDQA loads from fast-isel without SSE4.1. We were checking for SSE4.1 for FP types, but not integer 128-bit types. Fixes PR41837. llvm-svn: 360512	2019-05-11 04:19:33 +00:00
Craig Topper	bdef12df8d	[X86] Add a test case for idempotent atomic operations with speculative load hardening. Fix an additional issue found by the test. This test covers the fix from r360475 as well. llvm-svn: 360511	2019-05-11 04:00:27 +00:00
Richard Trieu	b28b8b7724	[X86] Move InstPrinter files to MCTargetDesc. NFC For some targets, there is a circular dependency between InstPrinter and MCTargetDesc. Merging them together will fix this. For the other targets, the merging is to maintain consistency so all targets will have the same structure. llvm-svn: 360484	2019-05-10 23:24:38 +00:00
Philip Reames	849ef823df	Factor out redzone ABI checks [NFCI] As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we use the redzone (via a particularly optimization?), but there's no common way to check whether the function has a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479	2019-05-10 22:55:42 +00:00
Craig Topper	df10cc6068	[X86] Disable speculative load hardening for operations with an explicit RSP base. After D58632, we can create idempotent atomic operations to the top of stack. This confused speculative load hardening because it thinks accesses should have virtual register base except for the cases it already excluded. This commit adds a new exclusion for this case. I'll try to reduce a test case for this, but this fix was verified to work by the reporter. This should avoid needing to revert D58632. llvm-svn: 360475	2019-05-10 22:03:33 +00:00
Mircea Trofin	ff3bed0e61	Skip over prefetches Summary: Skip over prefetches when assigning debug info to instructions with memory operands. This way, the debug info is stable after instrumenting a binary with prefetches, allowing for iterative profiling and instrumentation. Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61789 llvm-svn: 360471	2019-05-10 21:27:55 +00:00
Robert Lougher	986b6b86bb	[X86] Avoid SFB - Fix inconsistent codegen with/without debug info Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 360436	2019-05-10 15:55:06 +00:00
Simon Pilgrim	a0b1518a4a	[X86][SSE] Add getHopForBuildVector vector splitting If we only use the lower xmm of a ymm hop, then extract the xmm's (for free), perform the xmm hop and then insert back into a ymm (for free). Fixes some of the regressions noted in D61782 llvm-svn: 360435	2019-05-10 15:46:04 +00:00
Mircea Trofin	5c31c05fbd	[llvm] X86DiscriminateMemOps: insert debug info when missing Reviewers: davidxl Reviewed By: davidxl Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61735 llvm-svn: 360396	2019-05-10 00:12:51 +00:00
Philip Reames	bd588dfd59	[X86] Improve lowering of idemptotent RMW operations The current lowering uses an mfence. mfences are substaintially higher latency than the locked operations originally requested, but we do want to avoid contention on the original cache line. As such, use a locked instruction on a cache line assumed to be thread local. Differential Revision: https://reviews.llvm.org/D58632 llvm-svn: 360393	2019-05-09 23:23:42 +00:00
Simon Pilgrim	93bfa5af48	[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2shuffle+add/sub - so if we can reduce 2shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count. This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2shuffle+add+shuffle to hadd+2shuffle - I've opened PR41813 to cover this. Differential Revision: https://reviews.llvm.org/D61308 llvm-svn: 360360	2019-05-09 17:45:01 +00:00
Roman Lebedev	9db0e72570	[X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput) I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this slightly (< -1%) improves perf. llvm-svn: 360341	2019-05-09 13:54:51 +00:00
Hans Wennborg	b1b09e5b55	X86WinAllocaExpander: Drop code looking through register copies (PR41786) This code was never covered by tests, in PR41786 it was pointed out that the deletion part doesn't work, and in a full Chrome build I was never able to hit the code path that looks through copies. It seems the situation it's supposed to handle doesn't actually come up in practice. Delete it to simplify the code. Differential revision: https://reviews.llvm.org/D61671 llvm-svn: 360320	2019-05-09 09:22:56 +00:00
Reid Kleckner	6bf108d77a	[COFF] Use COFF stubs for extern_weak functions Summary: A COFF stub indirects the reference to a symbol through memory. A .refptr.$sym global variable pointer is created to refer to $sym. Typically mingw uses these for external global variable declarations, but we can use them for weak function declarations as well. Updates the dso_local classification to add a special case for extern_weak symbols on COFF in both clang and LLVM. Fixes PR37598 Reviewers: smeenai, mstorsjo Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61615 llvm-svn: 360207	2019-05-07 23:06:21 +00:00
Eric Christopher	4727221734	Make sure that the DAG combiner doesn't merge stores that we explicitly asked not be greater than preferred vector width for the vectorizer. Test for both 128 and 256 with a skylake architecture. llvm-svn: 360183	2019-05-07 19:25:34 +00:00
Simon Pilgrim	debb2b2a1e	Fix local shadow variable warning. NFCI. llvm-svn: 360157	2019-05-07 14:56:34 +00:00
Simon Pilgrim	b0f51266b8	[X86][AVX] Fold concat(packus(),packus()) -> packus(concat(),concat()) (PR34773) Basic "revectorization" combine, we can probably do more opcodes here but it can be a tricky cost-benefit depending on where the subvectors came from - but this case helps shuffle combining. llvm-svn: 360134	2019-05-07 11:17:39 +00:00
Simon Pilgrim	a80abeea88	Fixed "Value stored to 'Opc' is never read" warning. NFCI. llvm-svn: 360133	2019-05-07 11:09:16 +00:00
Simon Pilgrim	3c975a0ab5	[X86] Reduce scope of variables where possible. NFCI. Fixes cppcheck warnings. llvm-svn: 360131	2019-05-07 10:50:11 +00:00
Craig Topper	a75630302d	[X86] Use extended vector register classes in getRegForInlineAsmConstraint to support x/y/zmm16-31 when the type is mismatched. The FR32/FR64/VR128/VR256 register classes don't contain the upper 16 registers. For most cases we use the default implementation which will find any register class that contains the register in question if the VT is legal for the register class. But if the VT is i32 or i64, we won't find a matching register class and will instead up in the code modified in this patch. If the requested register is x/y/zmm16-31 we weren't returning a register class that contains those registers and will hit an assertion in the caller. To fix this, I've changed to use the extended register class instead. I don't believe we need a subtarget check to see if avx512 is enabled. The default implementation just pick whatever register class it finds first. I checked and we currently pick FR32X for XMM0 with an f32 type using the default implementation regardless of whether avx512 is enabled. So I assume its it is ok to do the same for i32. Differential Revision: https://reviews.llvm.org/D61457 llvm-svn: 360102	2019-05-06 23:57:42 +00:00
Craig Topper	d10a200ceb	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085	2019-05-06 21:39:51 +00:00
Craig Topper	55a71b575c	Revert r359392 and r358887 Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066	2019-05-06 19:29:24 +00:00
Guillaume Chatelet	edd69fca3e	Modernize repmovsb implementation of x86 memcpy and allow runtime sizes. Summary: This is a prerequisite to RFC http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61593 Fix typo. Turn this patch into an NFC. Addressing comments llvm-svn: 360050	2019-05-06 15:10:19 +00:00
Simon Pilgrim	2a0ef0530b	[X86] Fix uninitialized members in constructor warnings. NFCI. Initialize all member variables in X86ATTInstPrinter and X86DAGToDAGISel constructors to fix cppcheck warning. llvm-svn: 360047	2019-05-06 14:48:02 +00:00
Simon Pilgrim	d672d0e246	X86DAGToDAGISel::tryVPTESTM - fix uninitialized variable warning. NFCI. findBroadcastedOp should always initialize the value if it returns true but static-analyzer isn't great at recognising this. llvm-svn: 360037	2019-05-06 11:52:16 +00:00
Simon Pilgrim	04dad8f66d	[X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning. scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there). llvm-svn: 360028	2019-05-06 10:15:34 +00:00
Simon Pilgrim	07d91cd98a	[X86] lowerVectorShuffle - use any_of to detect out of bounds shuffle indices. NFCI. Fixes cppcheck local shadow warning as well. llvm-svn: 360027	2019-05-06 10:11:24 +00:00
Luo, Yuanke	beec41c656	Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017	2019-05-06 08:22:37 +00:00
Simon Pilgrim	8462cc3c74	[X86] Pull out repeated Subtarget feature tests. NFCI. Avoids a scan-build "uninitialized value" warning in X86FastISel::X86SelectFPExtOrFPTrunc llvm-svn: 360001	2019-05-05 20:45:20 +00:00
Simon Pilgrim	addc90e4e8	[TTI][X86] Make getAddressComputationCost cost value const. NFCI. llvm-svn: 359999	2019-05-05 20:03:51 +00:00
Simon Pilgrim	5170c0e5fe	Move getOpcode() call into if statement. NFCI. Avoids a cppcheck "Local variable name shadows outer variable" warning. llvm-svn: 359991	2019-05-05 18:34:38 +00:00
Simon Pilgrim	70ee2def90	[X86] Make X86RegisterInfo(const Triple &TT) constructor explicit. Fixes cppcheck warning. llvm-svn: 359981	2019-05-05 12:51:47 +00:00
Simon Pilgrim	cbcd9b1b92	[X86] Fix some cppcheck "Local variable name shadows outer variable" warnings. NFCI. llvm-svn: 359976	2019-05-05 12:00:14 +00:00
Craig Topper	a8f3840c62	[X86] Allow assembly parser to accept x/y/z suffixes on non-memory vfpclassps/pd and on memory forms in intel syntax The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned. But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas. The printing code will still only use the suffix for the memory form where it is needed. llvm-svn: 359903	2019-05-03 16:15:15 +00:00
Simon Pilgrim	b323d5ec7c	[X86] LowerToHorizontalOp - Tidyup calls to getHopForBuildVector. NFCI. Merge the if() tests for the various HADD/SUB + Subtarget tests llvm-svn: 359901	2019-05-03 15:56:06 +00:00
Simon Pilgrim	bfdd0f75a8	[X86] Remove repeated variables. NFCI. llvm-svn: 359889	2019-05-03 14:37:00 +00:00
Simon Pilgrim	aa49be4926	Avoid cppcheck operator precedence warnings. NFCI. Prefer ((X & Y) ? A : B) to (X & Y ? A : B) llvm-svn: 359884	2019-05-03 13:50:38 +00:00
Simon Pilgrim	a359ef192b	[X86] LowerMULH - remove unused Lo/Hi vector indices. NFCI. Leftover from before we had the extract128BitVector helpers. llvm-svn: 359871	2019-05-03 10:32:07 +00:00
Simon Pilgrim	88f9117168	Reduce variable scope to just the if() block its actually used in. NFCI. llvm-svn: 359869	2019-05-03 10:13:41 +00:00
Craig Topper	d724360695	[X86] Add more one checks to masked compare patterns that were missed in r358358. This covers the patterns we use for widening 128/256 comparisons to 512-bit when AVX512VL isn't supported. llvm-svn: 359863	2019-05-03 07:14:05 +00:00
Craig Topper	bf29238e1a	[X86] Remove LEA16r references from X86FixupLEAs. NFCI As far as I know, we never emit LEA16r llvm-svn: 359840	2019-05-02 22:46:23 +00:00
Craig Topper	e1e38d4248	[X86] Correct the register class for specific mask register constraints in getRegForInlineAsmConstraint when the VT is a scalar type The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes. Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying. This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg. Fixes PR41678 Differential Revision: https://reviews.llvm.org/D61453 llvm-svn: 359837	2019-05-02 22:26:40 +00:00
Craig Topper	47d8865a38	[X86] Remove string literal from an if. NFC This if used to be an assert that got refactored into an if, but left the string literal behind. Fixes PR41718 llvm-svn: 359833	2019-05-02 21:57:18 +00:00
Simon Pilgrim	df8daf0ef4	[X86][SSE] lowerAddSubToHorizontalOp - enable ymm extraction+fold Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway. Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result: https://godbolt.org/z/0R-U-K Differential Revision: https://reviews.llvm.org/D61426 llvm-svn: 359786	2019-05-02 14:00:55 +00:00
Simon Pilgrim	9fa56f7829	[X86][SSE] Move shouldUseHorizontalOp inside isHorizontalBinOp. NFCI. Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921 llvm-svn: 359782	2019-05-02 12:18:24 +00:00
Craig Topper	b929a0062e	[X86] Remove the redundant suffix in vfpclassp[d,s]'s broadcasting variant The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template. Patch by Pengfei Wang Differential Revision: https://reviews.llvm.org/D61295 llvm-svn: 359753	2019-05-02 03:25:50 +00:00
Simon Pilgrim	9f04d97cd7	[X86][SSE] Fold scalar horizontal add/sub for non-0/1 element extractions We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector). Differential Revision: https://reviews.llvm.org/D61263 llvm-svn: 359707	2019-05-01 17:13:35 +00:00
Simon Pilgrim	f5bdff7747	Fix 80 column violation. NFCI. llvm-svn: 359694	2019-05-01 16:01:49 +00:00
Simon Pilgrim	6711b9699a	[X86][SSE] Add demanded elts support X86ISD::PMULDQ\PMULUDQ Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode llvm-svn: 359686	2019-05-01 14:50:50 +00:00
Simon Pilgrim	3d6899e369	[X86][SSE] Add SSE vector shift support to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359680	2019-05-01 13:51:09 +00:00
Simon Pilgrim	ba372c6e62	[X86][SSE] Split 512-bit -> 128-bit vector directly in SimplifyDemandedVectorEltsForTargetNode llvm-svn: 359678	2019-05-01 12:48:42 +00:00
Simon Pilgrim	951a6b4579	[X86][SSE] Add 512-bit vector support to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359677	2019-05-01 12:37:41 +00:00
Simon Pilgrim	37c2419cc7	[X86][SSE] Add X86ISD::PACKSS\PACKUS to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359673	2019-05-01 11:29:36 +00:00
Simon Pilgrim	3353cee06c	[X86][SSE] Add X86ISD::UNPCKL\UNPCK to SimplifyDemandedVectorEltsForTargetNode vector splitting llvm-svn: 359670	2019-05-01 11:08:03 +00:00
Simon Pilgrim	f7b978a71b	[X86][SSE] Move extract_subvector(pshufb) fold to SimplifyDemandedVectorEltsForTargetNode This lets us hit more cases than combineExtractSubvector and allows us reuse more code. llvm-svn: 359669	2019-05-01 10:58:38 +00:00
Simon Pilgrim	a7d107a3e0	[X86] SimplifyDemandedVectorEltsForTargetNode - pull out vector halving code. NFCI. Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes. llvm-svn: 359667	2019-05-01 10:38:10 +00:00
Simon Pilgrim	99eefe94b5	[X86][SSE] Extract i1 elements from vXi1 bool vectors This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK. Differential Revision: https://reviews.llvm.org/D61189 llvm-svn: 359666	2019-05-01 10:02:22 +00:00
Craig Topper	dd66acef96	[X86FixupLEAs] Hoist the calls to isLEA out of the 3 separate functions and put it in the basic block instruction loop. NFC Now need to check it 3 different times. Just do it once at the top of the loop. llvm-svn: 359658	2019-05-01 06:53:03 +00:00
Simon Pilgrim	07ab4e7db8	[X86][SSE] Fold extract_subvector(extend(x)) -> extend_vector_inreg(x) This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality Minor improvement for PR39709 llvm-svn: 359608	2019-04-30 20:31:07 +00:00
Craig Topper	cad318014e	[X86] Remove if that's always true It's been like this since it was added in a refactor of this code. Fixes PR41659 llvm-svn: 359597	2019-04-30 19:02:15 +00:00
Craig Topper	3958719dda	[X86] If PreprocessISelDAG reorders a load before a call, make sure we remove dead nodes from the graph The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue. This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it. Differential Revision: https://reviews.llvm.org/D61164 llvm-svn: 359582	2019-04-30 17:56:47 +00:00
Craig Topper	965d1306ae	[X86] Initial cleanups on the FixupLEAs pass. Separate Atom LEA creation from other LEA optimizations. This removes some of the class variables. Merge basic block processing into runOnMachineFunction to keep the flags local. Pass MachineBasicBlock around instead of an iterator. We can get the iterator in the few places that need it. Allows a range-based outer for loop. Separate the Atom optimization from the rest of the optimizations. This allows fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA when profitable by its heuristics. I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index register is equal to the destination register. This is profitable regardless of the various slow flags. But again we would want Atom to be able to undo that. Differential Revision: https://reviews.llvm.org/D60993 llvm-svn: 359581	2019-04-30 17:56:28 +00:00
Simon Pilgrim	22641cc194	Fix for bug 41512: lower INSERT_VECTOR_ELT(ZeroVec, 0, Elt) to SCALAR_TO_VECTOR(Elt) for all SSE flavors Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd. INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32. So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4. https://bugs.llvm.org/show_bug.cgi?id=41512 Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike. Committed on behalf of @Serge_Preis (Serge Preis) Differential Revision: https://reviews.llvm.org/D60852 llvm-svn: 359545	2019-04-30 10:18:25 +00:00
Sjoerd Meijer	180f1ae57c	[TargetLowering] Change getOptimalMemOpType to take a function attribute list The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537	2019-04-30 08:38:12 +00:00
Martin Storsjo	c0d138d147	[X86] Run CFIInstrInserter on Windows if Dwarf is used This is necessary since SVN r330706, as tail merging can include CFI instructions since then. This fixes PR40322 and PR40012. Differential Revision: https://reviews.llvm.org/D61252 llvm-svn: 359496	2019-04-29 20:25:51 +00:00
Simon Pilgrim	028485d7b9	[X86][SSE] isHorizontalBinOp - add support for target shuffles Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support. This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type Fixes PR39936 and should help with PR39920/PR39921 Differential Revision: https://reviews.llvm.org/D61245 llvm-svn: 359491	2019-04-29 19:52:59 +00:00
Simon Pilgrim	0a5c2b2449	[X86] scaleShuffleMask - avoid potential signed overflow warning. Use size_t assignment to prevent a bad explicit type conversion warning. Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359479	2019-04-29 18:32:06 +00:00
Craig Topper	9202d5f8f1	[X86] Remove some intel syntax aliases on (v)cvtpd2(u)dq, (v)cvtpd2ps, (v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax. The 128/256 bit version of these instructions require an 'x' or 'y' suffix to disambiguate the memory form in att syntax. We were allowing the same suffix in intel syntax, but it appears gas does not do that. gas does allow the 'x' and 'y' suffix on register and broadcast forms even though its not needed. We were allowing it on unmasked register form, but not on masked versions or on masked or unmasked broadcast form. While there fix some test coverage holes so they can be extended with the 'x' and 'y' suffix tests. llvm-svn: 359418	2019-04-29 06:13:41 +00:00
Simon Pilgrim	d5cc753b6d	[X86][SSE] combineExtractVectorElt - add early-out to return zero/undef for out-of-range extraction indices. llvm-svn: 359406	2019-04-28 19:12:58 +00:00
Simon Pilgrim	22d1476bfa	[X86][AVX] Combine non-lane crossing binary shuffles using X86ISD::VPERMV3 Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results. llvm-svn: 359400	2019-04-28 14:31:01 +00:00
Simon Pilgrim	93ad48210c	[X86][SSE] Optimize llvm.experimental.vector.reduce.xor.vXi1 parity reduction (PR38840) An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0. Differential Revision: https://reviews.llvm.org/D61230 llvm-svn: 359396	2019-04-28 10:46:17 +00:00
Craig Topper	bd35a30940	[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead Summary: The register form of these instructions are CodeGenOnly instructions that cover GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for the opposite bitcast. Due to the patterns using bitcasts these instructions get marked as "bitcast" machine instructions as well. The peephole pass is able to look through these as well as other copies to try to avoid register bank copies. Because FR32/FR64/VR128 are all coalescable to each other we can end up in a situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to GR32->GR64 which the copyPhysReg code can't handle. To prevent this, this patch removes one set of the 'bitcast' instructions. So now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that converts from GR32/GR64->VR128 has no special significance to the peephole pass and won't be looked through. I guess the other option would be to add support to copyPhysReg to just promote the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined anyway. But removing the CodeGenOnly instruction in favor of one that won't be optimized seemed safer. I deleted the peephole test because it couldn't be made to work with the bitcast instructions removed. The load version of the instructions were unnecessary as the pattern that selects them contains a bitcasted load which should never happen. Fixes PR41619. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61223 llvm-svn: 359392	2019-04-28 06:25:33 +00:00
Simon Pilgrim	03c4e2663c	Revert rL359389: [X86][SSE] Add support for <64 x i1> bool reduction Minor generalization of the existing <32 x i1> pre-AVX2 split code. ........ Causing irregular buildbot failures. llvm-svn: 359391	2019-04-27 20:44:08 +00:00
Simon Pilgrim	4118be3af6	[X86][SSE] Add support for <64 x i1> bool reduction Minor generalization of the existing <32 x i1> pre-AVX2 split code. llvm-svn: 359389	2019-04-27 20:04:44 +00:00
Simon Pilgrim	2a2d422400	[X86][AVX512] Improve vector bool reductions As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern. llvm-svn: 359386	2019-04-27 17:32:46 +00:00
Simon Pilgrim	acc1e6d1c6	[X86][AVX] Merge mask select with shuffles across extract_subvector (PR40332) Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector. We can extend this in the future to handle more opcodes and non-zero selections. llvm-svn: 359378	2019-04-27 13:35:32 +00:00
Craig Topper	063b471ff7	[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368	2019-04-27 03:38:15 +00:00
Nick Desaulniers	7ab164c4a4	[AsmPrinter] refactor to support %c w/ GlobalAddress' Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337	2019-04-26 18:45:04 +00:00
Simon Pilgrim	27e01e675c	[X86][AVX] Fold extract_subvector(broadcast(x)) -> broadcast(x) iff x has one use llvm-svn: 359332	2019-04-26 18:02:14 +00:00
Craig Topper	354247c08d	[X86] Sink NoRegister creation for unused Base/Index registers into getAddressOperands. NFCI llvm-svn: 359318	2019-04-26 16:39:38 +00:00
Craig Topper	ad662cf4c1	[X86] Segment registers should have i16 type not i32. Probably doesn't really matter, but was inconsistent with the rest of the code. llvm-svn: 359317	2019-04-26 16:39:35 +00:00
Simon Pilgrim	c3a34c3e07	Fix Wparentheses warning. NFCI. llvm-svn: 359299	2019-04-26 12:23:42 +00:00
Simon Pilgrim	bb230c5e79	[X86][SSE] Pull out OR(EXTRACTELT(X,0),OR(EXTRACTELT(X,1),...)) matching code from LowerVectorAllZeroTest Create a matchBitOpReduction helper that checks for the pattern with any opcode. First step towards reusing this code to recognize other scalar reduction patterns. llvm-svn: 359296	2019-04-26 11:45:54 +00:00
Simon Pilgrim	5d6ef94c36	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293	2019-04-26 10:49:13 +00:00
Simon Pilgrim	5e161df9f8	[X86][AVX] Combine shuffles extracted from a common vector A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380. Differential Revision: https://reviews.llvm.org/D60512 llvm-svn: 359292	2019-04-26 09:56:14 +00:00
David Blaikie	0c4dbf9ecd	Assigning to a local object in a return statement prevents copy elision. NFC. I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment. P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these. (Kona update: The bulk of P1155 is proceeding to CWG review, but specifically not the parts that explored the notion of permitting copy-elision in these specific cases.) Reviewed By: dblaikie Author: Arthur O'Dwyer Differential Revision: https://reviews.llvm.org/D54885 llvm-svn: 359236	2019-04-25 20:09:00 +00:00
Simon Pilgrim	0a7d1b3ce1	[X86][SSE] combineBitcastvxi1 - add support for bitcasting to non-scalar integers Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type. We still have the issue identified in PR41594 but D61114 should handle this. llvm-svn: 359176	2019-04-25 09:34:36 +00:00
Craig Topper	013503c78d	[X86] Remove part of an if condition that should always be true. The IndexReg will always be non-null at this point. Earlier in the function, if IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it non-null. llvm-svn: 359170	2019-04-25 06:08:02 +00:00
Amy Huang	68c9199493	Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label). Summary: This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug info in codeview. Currently only changes FastISel, so emitting labels still needs to be implemented in SelectionDAG. Reviewers: rnk Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D61083 llvm-svn: 359149	2019-04-24 23:02:48 +00:00
Craig Topper	6932abee2c	[X86] Attempt to fix use-after-poison from r359121. llvm-svn: 359143	2019-04-24 21:48:24 +00:00
Craig Topper	af194e9380	[X86] Prevent folding a load into an AND if that AND is really a ZEXT_INREG that should use movzx. This can save a 32-bit immediate move. We would shrink the load and fold it if it was non-volatile, but that's trickier to check for. llvm-svn: 359129	2019-04-24 19:28:38 +00:00
Craig Topper	882ca6d484	[X86] Remove dead nodes left after ReplaceAllUsesWith calls during address matching ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes. One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match. Differential Revision: https://reviews.llvm.org/D61048 llvm-svn: 359121	2019-04-24 18:02:07 +00:00
Sanjay Patel	b1b3368907	[x86] make sure horizontal op and broadcast types match to simplify (PR41414) If the types don't match, we can't just remove the shuffle. There may be some other opportunity for optimization here, but this should prevent the crashing seen in: https://bugs.llvm.org/show_bug.cgi?id=41414 llvm-svn: 359095	2019-04-24 14:05:08 +00:00
Simon Pilgrim	d30745b2a0	[X86] Add shouldFoldConstantShiftPairToMask override placeholder. NFCI. Prep work toward fixing PR40758 llvm-svn: 359088	2019-04-24 12:34:08 +00:00
Sanjay Patel	12a561fa1b	[x86] use psubus for more vsetcc lowering (PR39859) Circling back to a leftover bit from PR39859: https://bugs.llvm.org/show_bug.cgi?id=39859#c1 ...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'. This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca. We already do this transform for the SETULT predicate, so this makes the code more symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect anything but pre-SSE4.1 subtargets. $ cat before.s movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768] pxor %xmm0, %xmm2 pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255] pand %xmm2, %xmm0 pandn %xmm1, %xmm2 por %xmm2, %xmm0 $ cat after.s movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256] psubusw %xmm0, %xmm2 pxor %xmm3, %xmm3 pcmpeqw %xmm2, %xmm3 pand %xmm3, %xmm0 pandn %xmm1, %xmm3 por %xmm3, %xmm0 $ llvm-mca before.s -mcpu=haswell Iterations: 100 Instructions: 600 Total Cycles: 909 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 0.77 IPC: 0.66 Block RThroughput: 1.8 $ llvm-mca after.s -mcpu=haswell Iterations: 100 Instructions: 700 Total Cycles: 409 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.71 IPC: 1.71 Block RThroughput: 1.8 Differential Revision: https://reviews.llvm.org/D60838 llvm-svn: 358999	2019-04-23 15:20:17 +00:00
Fangrui Song	efd94c56ba	Use llvm::stable_sort While touching the code, simplify if feasible. llvm-svn: 358996	2019-04-23 14:51:27 +00:00
Simon Pilgrim	0e4992ce27	[X86] Pull out collectConcatOps helper. NFCI. Create collectConcatOps helper that returns all the subvector ops for CONCAT_VECTORS or a INSERT_SUBVECTOR series. llvm-svn: 358989	2019-04-23 14:07:49 +00:00
Sanjay Patel	bf8aacb715	[SelectionDAG] move splat util functions up from x86 lowering This was supposed to be NFC, but the change in SDLoc definitions causes instruction scheduling changes. There's nothing x86-specific in this code, and it can likely be used from DAGCombiner's simplifyVBinOp(). llvm-svn: 358930	2019-04-22 22:43:36 +00:00
Craig Topper	5c43ab337f	[X86] Reject 512-bit types in getRegForInlineAsmConstraint when AVX512 is not enabled. Same for 256 bit and AVX. llvm-svn: 358872	2019-04-22 06:12:02 +00:00
Craig Topper	df02beb416	[X86] Add the rounding control operand to the printing for some scalar FMA instructions. llvm-svn: 358844	2019-04-21 07:12:56 +00:00
Craig Topper	63db7e347b	[X86] Don't form masked vfpclass instruction from and+vfpclass unless the fpclass only has a single use. llvm-svn: 358841	2019-04-21 05:18:04 +00:00
Craig Topper	3980d1ca6b	[X86] Disable argument copy elision for arguments passed via pointers Summary: If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory. The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too. This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix. Reviewers: rnk Reviewed By: rnk Subscribers: dbabokin, llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60801 llvm-svn: 358817	2019-04-20 15:26:44 +00:00
Nikita Popov	b75c8fc6fb	[X86] Fix stack probing on x32 (PR41477) Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI with stack probing a dynamic alloca will result in a WIN_ALLOCA_32 with a 32-bit size. The current implementation tries to copy it into RAX, resulting in a physreg copy error. Fix this by copying to EAX instead. Also fix incorrect opcodes or registers used in subs. llvm-svn: 358807	2019-04-20 07:25:46 +00:00
Craig Topper	4d4b5d952e	[X86] Don't turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the original AND can represented by MOVZX. The MOVZX doesn't require an immediate to be encoded at all. Though it does use a 2 byte opcode so its the same size as a 1 byte immediate. But it has a separate source and dest register so can help avoid copies. llvm-svn: 358805	2019-04-20 04:38:53 +00:00
Craig Topper	8b8264828c	[X86] Turn (and (anyextend (shl X, C1), C2)) into (shl (and (anyextend X), (C1 >> C2), C2) if the AND could match a movzx. There's one slight regression in here because we don't check that the immediate already allowed movzx before the shift. I'll fix that next. llvm-svn: 358804	2019-04-20 04:38:49 +00:00
Bjorn Pettersson	238c9d6308	[CodeGen] Add "const" to MachineInstr::mayAlias Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744	2019-04-19 09:08:38 +00:00
Craig Topper	bb769a2946	[X86] Turn (and (shl X, C1), C2) into (shl (and X, (C1 >> C2), C2) if the AND could match a movzx. Could get further improvements by recognizing (i64 and (anyext (i32 shl))). llvm-svn: 358737	2019-04-19 05:48:13 +00:00
Craig Topper	f73caae956	[X86] Make sure we copy the HandleSDNode back to N before executing the default code after the switch in matchAddressRecursively Summary: There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code. This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead. matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere. Reviewers: niravd, spatel, RKSimon, bkramer Reviewed By: niravd Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D60843 llvm-svn: 358735	2019-04-19 04:52:21 +00:00
Simon Pilgrim	4171a91e92	[X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together. This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356). This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit. My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great. Differential Revision: https://reviews.llvm.org/D60375#inline-539623 llvm-svn: 358692	2019-04-18 17:23:09 +00:00
Simon Pilgrim	8f87e53462	[X86][SSE] Lower ICMP EQ(AND(X,C),C) -> SRA(SHL(X,LOG2(C)),BW-1) iff C is power-of-2. This replaces the MOVMSK combine introduced at D52121/rL342326 (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)) with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra. Differential Revision: https://reviews.llvm.org/D60625 llvm-svn: 358651	2019-04-18 09:58:59 +00:00
Sanjay Patel	fb363a778f	[x86] try to widen 'shl' as part of LEA formation The test file has pairs of tests that are logically equivalent: https://rise4fun.com/Alive/2zQ %t4 = and i8 %t1, 8 %t5 = zext i8 %t4 to i16 %sh = shl i16 %t5, 2 %t6 = add i16 %sh, %t0 => %t4 = and i8 %t1, 8 %sh2 = shl i8 %t4, 2 %z5 = zext i8 %sh2 to i16 %t6 = add i16 %z5, %t0 ...so if we can fold the shift op into LEA in the 1st pattern, then we should be able to do the same in the 2nd pattern (unnecessary 'movzbl' is a separate bug I think). We don't want to do this any sooner though because that would conflict with generic transforms that try to narrow the width of the shift. Differential Revision: https://reviews.llvm.org/D60789 llvm-svn: 358622	2019-04-17 22:38:51 +00:00
Simon Pilgrim	9daacec816	[CostModel][X86] Add bool anyof/allof reduction costs On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574	2019-04-17 10:58:19 +00:00
Craig Topper	6bf0802738	[X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for GR32<->XMM register copies. We have two versions of some instructions, VR128 versions and FR32 versions that are marked as CodeGenOnly. This change switches to using the VR128 versions for these copies. It's after register allocation so the class size no longer matters. This matches how GR64 works. llvm-svn: 358555	2019-04-17 06:09:11 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Simon Pilgrim	d769bb1e58	[X86][AVX] X86ISD::PERMV/PERMV3 node types can never fold index ops Improves codegen demonstrated by D60512 - instructions represented by X86ISD::PERMV/PERMV3 can never memory fold the operand used for their index register. This patch updates the 'isUseOfShuffle' helper into the more capable 'isFoldableUseOfShuffle' that recognises that the op is used for a X86ISD::PERMV/PERMV3 index mask and can't be folded - allowing us to use broadcast/subvector-broadcast ops to reduce the size of the mask constant pool data. Differential Revision: https://reviews.llvm.org/D60562 llvm-svn: 358516	2019-04-16 19:18:53 +00:00
Craig Topper	0495f29e42	[X86] Limit the 'x' inline assembly constraint to zmm0-15 when used for a 512 type. The 'v' constraint is used to select zmm0-31. This makes 512 bit consistent with 128/256-bit.a llvm-svn: 358450	2019-04-15 21:06:32 +00:00
Craig Topper	3d9b47c770	[X86] Block i32/i64 for 'k' and 'Yk' in getRegForInlineAsmConstraint without avx512bw. 32 and 64 bit k-registers require avx512bw. If we don't block this properly, it leads to a crash. llvm-svn: 358436	2019-04-15 18:39:45 +00:00
Craig Topper	8e364c680f	[X86] Restore the pavg intrinsics. The pattern we replaced these with may be too hard to match as demonstrated by PR41496 and PR41316. This patch restores the intrinsics and then we can start focusing on the optimizing the intrinsics. I've mostly reverted the original patch that removed them. Though I modified the avx512 intrinsics to not have masking built in. Differential Revision: https://reviews.llvm.org/D60674 llvm-svn: 358427	2019-04-15 17:17:35 +00:00
Amara Emerson	d189680baa	[GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368	2019-04-15 04:53:46 +00:00
Craig Topper	5b92eb007b	[X86] Redefine KUNPCK instructions to take a narrower source register class than destination register class. Remove copies from the isel output pattern. There's no reason for the inputs to be the destination register class. This just forces an unnecessary copy in the output patterns. llvm-svn: 358362	2019-04-14 20:52:42 +00:00
Craig Topper	96950f1fa9	[X86] Put the locked mi8 instrutions above the locked mi/mi32 so they will be prefered. We want 64mi8 to be prefered over 64mi32. The order for 16mi/32mi doesn't really matter. llvm-svn: 358361	2019-04-14 19:00:00 +00:00
Craig Topper	72b976e5d7	[X86] Change IMUL with immediate instruction order to ri8 instructions come before ri/ri32 instructions. This will ensure IMUL64ri8 is tried before IMUL64ri32. For IMUL32 and IMUL16 the order doesn't really matter because only the ri8 versions use a predicate. That automatically gives them priority. llvm-svn: 358360	2019-04-14 18:59:57 +00:00
Craig Topper	3c57976447	[X86] Move VPTESTM matching from the isel table to custom code in X86ISelDAGToDAG. We had many tablegen patterns for these instructions. And due to the commutability of the patterns, tablegen expands them to even more patterns. All together VPTESTMD patterns accounted for more the 50K of the 610K isel table. This had gotten bad when we stopped canonicalizing AND to vXi64. This required a pattern for every combination of bitcast input type. This change moves the matching to custom code where it is easier to look through the bitcasts without being concerned with the specific types. The test changes are because we are now stricter with one use checks as its required to make load folding legal. We now require the AND and any BITCAST to only have a single use. This prevents forming VPTESTM and a VPAND with the same inputs. We now support broadcast loads for 128/256 patterns without VLX. We'll widen to 512-bit like and still fold the broadcast since the amount of memory read doesn't change. There are a few tests that got slightly longer because are now prefering load + VPTESTM over XOR+VPCMPEQ for (seteq (load), allzeros). Previously we were able to share the XOR with multiple VPTESTM instructions. llvm-svn: 358359	2019-04-14 18:26:11 +00:00
Craig Topper	b17e5ec61b	[X86] Don't form masked vpcmp/vcmp/vptestm operations if the setcc node has more than one use. We're better of emitting a single compare + kand rather than a compare for the other use and a masked compare. I'm looking into using custom instruction selection for VPTESTM to reduce the ridiculous number of permutations of patterns in the isel table. Putting a one use check on all masked compare folding makes load fold matching in the custom code easier. llvm-svn: 358358	2019-04-14 18:26:06 +00:00
Craig Topper	fdcdf74b0e	[X86] Remove some unused tablegen multiclasses. NFC llvm-svn: 358345	2019-04-14 04:20:38 +00:00
Bill Wendling	191f1487b6	[X86] Use PC-relative mode for the kernel code model Summary: The Linux kernel uses PC-relative mode, so allow that when the code model is "kernel". Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits, kees, nickdesaulniers Tags: #llvm Differential Revision: https://reviews.llvm.org/D60643 llvm-svn: 358343	2019-04-13 21:39:28 +00:00
Craig Topper	55b0d987fd	[X86] Use int64_t and isInt<N> instead of APInt operations in foldLoadStoreIntoMemOperand. NFC We know all our values are limited to 64 bits here so we don't need an APInt. This should save some generated code checking between large and small size. llvm-svn: 358338	2019-04-13 18:57:41 +00:00
Simon Pilgrim	6c8f4ada36	[X86][SSE] Recognise vXi1 boolean anyof/allof reduction patterns Currently combineHorizontalPredicateResult only handles anyof/allof reduction patterns of legal types, which can be tricky to match as type legalization of bools can introduce bitcasts/truncs/extensions. This patch extends combineHorizontalPredicateResult to recognise vXi1 bool reductions as well and uses the existing combineBitcastvxi1 helper to create the MOVMSK necessary to then compare the signmask result. This ensures the accuracy of the reduction costs added in D60403 which assume the MOVMSK generation. Differential Revision: https://reviews.llvm.org/D60610 llvm-svn: 358286	2019-04-12 14:22:57 +00:00
Eric Christopher	6b06c6a5ef	Add explicit dependencies on MCSection.h and MCDwarf.h to the .cpp files rather than rely on transitive includes from MCStreamer.h. llvm-svn: 358263	2019-04-12 07:40:01 +00:00
Nick Desaulniers	8ec304c9fd	[X86AsmPrinter] refactor static functions into private methods. NFC Summary: A lot of the code for printing special cases of operands in this translation unit are static functions. While I too have suffered many years of abuse at the hands of C, we should prefer private methods, particularly when you start passing around *this as your first argument, which is a code smell. This will help make generic vs arch specific asm printing easier, as it brings X86AsmPrinter more in line with other arch's derived AsmPrinters. We will then be able to more easily move architecture generic code to the base class, and architecture specific code to the derived classes. Some other small refactorings while we're here: - the parameter Op is now consistently OpNo - add spaces around binary expressions. I know we're not millionaires but c'mon. Reviewers: echristo Reviewed By: echristo Subscribers: smeenai, hiraditya, llvm-commits, srhines, craig.topper Tags: #llvm Differential Revision: https://reviews.llvm.org/D60577 llvm-svn: 358236	2019-04-11 22:47:13 +00:00
Craig Topper	68a5d619a4	[X86] Restrict vselect handling in scalarizeExtEltFP to only case to pre type legalization where the setcc result type is vXi1. If the vector setcc has been legalized then we will need to convert a vector boolean of 0 or -1 to a scalar boolean of 0 or 1. The added test case previously crashed in 32-bit mode by creating a setcc with an i64 condition that type legalization couldn't expand. llvm-svn: 358218	2019-04-11 19:57:44 +00:00
Craig Topper	586fad50ac	[X86] Add patterns for using movss/movsd for atomic load/store of f32/64. Remove atomic fadd pseudos use isel patterns instead. This patch adds patterns for turning bitcasted atomic load/store into movss/sd. It also removes the pseudo instructions for atomic RMW fadd. Instead just adding isel patterns for folding an atomic load into addss/sd. And relying on the new movss/sd store pattern to handle the write part. This also makes the fadd patterns use VEX and EVEX instructions when AVX or AVX512F are enabled. Differential Revision: https://reviews.llvm.org/D60394 llvm-svn: 358215	2019-04-11 19:19:52 +00:00
Craig Topper	f7e548c076	Recommit r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" With correct test checks this time. If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integ This matches what gcc and icc do for this case and removes an existing FIXME. llvm-svn: 358214	2019-04-11 19:19:42 +00:00
Craig Topper	8200880c9a	Revert r358211 "[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2" I seem to have messed up the test checks. llvm-svn: 358212	2019-04-11 19:04:38 +00:00
Craig Topper	1c2dfc3100	[X86] Use FILD/FIST to implement i64 atomic load on 32-bit targets with X87, but no SSE2 If we have X87, but not SSE2 we can atomicaly load an i64 value into the significand of an 80-bit extended precision x87 register using fild. We can then use a fist instruction to convert it back to an i64 integer and store it to a stack temporary. From there we can do two 32-bit loads to get the value into integer registers without worrying about atomicness. This matches what gcc and icc do for this case and removes an existing FIXME. Differential Revision: https://reviews.llvm.org/D60156 llvm-svn: 358211	2019-04-11 18:40:21 +00:00
Simon Pilgrim	40b647ae8e	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV3 mask support Completes SimplifyDemandedVectorElts's basic variable shuffle mask support which should help D60512 + D60562 llvm-svn: 358186	2019-04-11 15:29:15 +00:00
Luo, Yuanke	a2b4d3fab6	[X86] Add MM register mapping from CodeView to MC register id Differential Revision: https://reviews.llvm.org/D60437 Change-Id: I2183a6d825d0284b22705d423b88882992b236c5 llvm-svn: 358179	2019-04-11 15:01:03 +00:00
Simon Pilgrim	8a25154fa7	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMV mask support llvm-svn: 358174	2019-04-11 14:35:45 +00:00
Simon Pilgrim	6f3866c6fb	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMILPV mask support llvm-svn: 358170	2019-04-11 14:15:01 +00:00
Simon Pilgrim	cb5218ad48	[X86] SimplifyDemandedVectorElts - add X86ISD::VPERMIL2 mask support llvm-svn: 358167	2019-04-11 14:04:19 +00:00
Simon Pilgrim	e468cc7f14	[X86] SimplifyDemandedVectorElts - add VPPERM support We need to add support for all variable shuffle mask ops, but VPPERM is the only one that already has test coverage. llvm-svn: 358165	2019-04-11 13:30:38 +00:00
Craig Topper	61f31cbcb2	[X86] Teach foldMaskedShiftToScaledMask to look through an any_extend from i32 to i64 between the and & shl foldMaskedShiftToScaledMask tries to reorder and & shl to enable the shl to fold into an LEA. But if there is an any_extend between them it doesn't work. This patch modifies the code to look through any_extend from i32 to i64 when the and mask only uses bits that weren't from the extended part. This will prevent a regression from D60358 caused by 64-bit SHL being narrowed to 32-bits when their upper bits aren't demanded. Differential Revision: https://reviews.llvm.org/D60532 llvm-svn: 358139	2019-04-10 21:42:08 +00:00
Craig Topper	4a32ce39b7	[X86] Make _Int instructions the preferred instructon for the assembly parser and disassembly parser to remove inconsistencies between VEX and EVEX. Many of our instructions have both a _Int form used by intrinsics and a form used by other IR constructs. In the EVEX space the _Int versions usually cover all the capabilities include broadcasting and rounding. While the other version only covers simple register/register or register/load forms. For this reason in EVEX, the non intrinsic form is usually marked isCodeGenOnly=1. In the VEX encoding space we were less consistent, but usually the _Int version was the isCodeGenOnly version. This commit makes the VEX instructions match the EVEX instructions. This was done by manually studying the AsmMatcher table so its possible I missed some cases, but we should be closer now. I'm thinking about using the isCodeGenOnly bit to simplify the EVEX2VEX tablegen code that disambiguates the _Int and non _Int versions. Currently it checks register class sizes and Record the memory operands come from. I have some other changes I was looking into for D59266 that may break the memory check. I had to make a few scheduler hacks to keep the _Int versions from being treated differently than the non _Int version. Differential Revision: https://reviews.llvm.org/D60441 llvm-svn: 358138	2019-04-10 21:29:41 +00:00
Craig Topper	87a8f9761e	[X86] Replace some if statements in isel address matching that should never be true with asserts. And move them earlier before we looked through operands that don't change size. NFC These ifs were ensuring we don't have to handle types larger than 64 bits probably because we use getZExtValue in several places below them. None of the callers of this code pass types larger than 64-bits so we can just assert instead of branching in release code. I've also moved them earlier since we're just looking through operations that don't effect bit width. This is prep work for some refactoring I plan to do to the (and (shl)) handling code. llvm-svn: 358123	2019-04-10 19:08:59 +00:00
Nick Desaulniers	ad8f3a1440	[X86AsmPrinter] refactor to limit use of Modifier. NFC Summary: The Modifier memory operands is used in 2 cases of memory references (H & P ExtraCodes). Rather than pass around the likely nullptr Modifier, refactor the handling of the Modifier out from printOperand(). The refactorings in this patch: - Don't forward declare printOperand, move its definition up. - The diff makes it look like there's a change to printPCRelImm (narrator: there's not). - Create printModifiedOperand() - Move logic for Modifier to there from printOperand - Use printModifiedOperand in 3 call sites that actually create Modifiers. - Remove now unused Modifier parameter from printOperand - Remove default parameter from printLeaMemReference as it only has 1 call site that explicitly passes a parameter. - Remove default parameter from printMemReference, make call lone call site explicitly pass nullptr. - Drop Modifier parameter from printIntelMemReference, as Intel style memory references don't support the Modifiers in question. This will allow future changes to printOperand() to make it a pure virtual method on the base AsmPrinter class, allowing for more generic handling of some architecture generic constraints. X86AsmPrinter was the only derived class of AsmPrinter to have additional parameters on its printOperand function. Reviewers: craig.topper, echristo Reviewed By: echristo Subscribers: hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60526 llvm-svn: 358122	2019-04-10 19:01:44 +00:00
Roman Lebedev	dc67659ba5	[X86] X86ScheduleBdVer2: use !listsplat operator to cleanup loadres calculation The problem is that one can't concatenate an empty list (implied all-ones) with non-empty list here. The result will be the non-empty list, and it won't match the length of the ExePorts list. The problems begin when LoadRes != 1 here, which is the case in PdWriteResYMMPair, and more importantly i think it will be the case for PdWriteResExPair. llvm-svn: 358118	2019-04-10 18:26:42 +00:00
David Green	0861c87b06	Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113	2019-04-10 18:00:41 +00:00
Nick Desaulniers	5277b3ff25	[AsmPrinter] refactor to remove remove AsmVariant. NFC Summary: The InlineAsm::AsmDialect is only required for X86; no architecture makes use of it and as such it gets passed around between arch-specific and general code while being unused for all architectures but X86. Since the AsmDialect is queried from a MachineInstr, which we also pass around, remove the additional AsmDialect parameter and query for it deep in the X86AsmPrinter only when needed/as late as possible. This refactor should help later planned refactors to AsmPrinter, as this difference in the X86AsmPrinter makes it harder to make AsmPrinter more generic. Reviewers: craig.topper Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60488 llvm-svn: 358101	2019-04-10 16:38:43 +00:00
Simon Pilgrim	37d8d55823	[X86][AVX] getTargetConstantBitsFromNode - extract bits from X86ISD::SUBV_BROADCAST llvm-svn: 358096	2019-04-10 16:24:47 +00:00
Craig Topper	391d5caa10	[X86] Move the 2 byte VEX optimization for MOV instructions back to the X86AsmParser::processInstruction where it used to be. Block when {vex3} prefix is present. Years ago I moved this to an InstAlias using VR128H/VR128L. But now that we support {vex3} pseudo prefix, we need to block the optimization when it is set to match gas behavior. llvm-svn: 358046	2019-04-10 05:43:20 +00:00
Craig Topper	9ca3a95f79	[X86] Support the EVEX versions vcvt(t)ss2si and vcvt(t)sd2si with the {evex} pseudo prefix in the assembler. The EVEX versions are ambiguous with the VEX versions based on operands alone so we had explicitly dropped them from the AsmMatcher table. Unfortunately, when we add them they incorrectly show in the table before their VEX counterparts. This is different how the prioritization normally works. To fix this we have to explicitly reject the instructions unless the {evex} prefix has been seen. llvm-svn: 358041	2019-04-10 01:29:59 +00:00
Craig Topper	7143224272	[X86] Add VEX_LIG to scalar VEX/EVEX instructions that were missing it. Scalar VEX/EVEX instructions don't use the L bit and don't look at it for decoding either. So we should ignore it in our disassembler. The missing instructions here were found by grepping the raw tablegen class definitions in the tablegen debug output. llvm-svn: 358040	2019-04-09 23:30:36 +00:00
Craig Topper	60f83544bb	[X86] Fix a dangling StringRef issue introduced in r358029. I was attempting to convert mnemonics to lower case after processing a pseudo prefix. But the ParseOperands just hold a StringRef for tokens so there is no where to allocate the memory. Add FIXMEs for the lower case issue which also exists in the prefix parsing code. llvm-svn: 358036	2019-04-09 21:37:21 +00:00
Amara Emerson	2b523f8162	[GlobalISel][AArch64] Allow CallLowering to handle types which are normally required to be passed as different register types. E.g. <2 x i16> may need to be passed as a larger <2 x i32> type, so formal arg lowering needs to be able truncate it back. Likewise, when dealing with returns of these types, they need to be widened in the appropriate way back. Differential Revision: https://reviews.llvm.org/D60425 llvm-svn: 358032	2019-04-09 21:22:33 +00:00
Craig Topper	8e2871cd2c	[X86] Add support for {vex2}, {vex3}, and {evex} to the assembler to match gas. Use {evex} to improve the one our 32-bit AVX512 tests. These can be used to force the encoding used for instructions. {vex2} will fail if the instruction is not VEX encoded, but otherwise won't do anything since we prefer vex2 when possible. Might need to skip use of the _REV MOV instructions for this too, but I haven't done that yet. {vex3} will force the instruction to use the 3 byte VEX encoding or fail if there is no VEX form. {evex} will force the instruction to use the EVEX version or fail if there is no EVEX version. Differential Revision: https://reviews.llvm.org/D59266 llvm-svn: 358029	2019-04-09 18:45:15 +00:00
Craig Topper	53ee783c6e	[X86] Have EVEX2VEX tablegenerator use HasVEX_L and HasEVEX_L2 fields instead of the composite EVEX_LL field. Remove the EVEX_LL field. NFCI The composite existed to simplify some other tablegen code and not really in an important way. Remove the combined field and just calculate the vector size using two ifs. llvm-svn: 357972	2019-04-09 07:40:14 +00:00
Craig Topper	f19f991b7f	[X86] Use VEX_WIG for VPINSRB/W and VPEXTRB/W to match what is done for EVEX. The instruction's document this as W0 for the VEX encoding. But there's a footnote mentioning that VEX.W is ignored in 64-bit mode. And the main VEX encoding description says the VEX.W bit is ignored for instructions that are equivalent to a legacy SSE instruction that uses REX.W to select a GPR which would apply here. By making this match EVEX we can remove a special case of allowing EVEX2VEX to turn an EVEX.WIG instruction into VEX.W0. llvm-svn: 357971	2019-04-09 07:40:10 +00:00
Craig Topper	2f9c1732b8	[X86] Split the VEX_WPrefix in X86Inst tablegen class into 3 separate fields with clear meanings. llvm-svn: 357970	2019-04-09 07:40:06 +00:00
Craig Topper	6c11a31bce	[X86] Derive ssmem and sdmem from X86MemOperand. NFCI This changes the operand type from v4f32/v2f64 to iPTR which seems more correct. But that doesn't seem to do anything other than change the comments in X86GenDAGISel.inc. Probably because we use a ComplexPattern to do the matching so there's no autogenerated code to change. llvm-svn: 357959	2019-04-09 00:24:17 +00:00
Craig Topper	3a4c2192a4	[X86] Fix a couple lowering functions that called ReplaceAllUsesOfValueWith for the newly created code and then return SDValue(). Use MERGE_VALUES instead. Returning SDValue() makes the caller think custom lowering was unsuccessful and then it will fall back to trying to expand the original node. This expanded code will end up with no users and end up being pruned later. But it was useless unnecessary work to create it. Instead return a MERGE_VALUES with all the results so the caller knows something changed. The caller can handle the replacements. For one of the cases I had to use UNDEF has a dummy value for a result we know is unused. This should get pruned later. llvm-svn: 357935	2019-04-08 19:44:07 +00:00
Sanjay Patel	50c3b290ed	[x86] make 8-bit shl undesirable I was looking at a potential DAGCombiner fix for 1 of the regressions in D60278, and it caused severe regression test pain because x86 TLI lies about the desirability of 8-bit shift ops. We've hinted at making all 8-bit ops undesirable for the reason in the code comment: // TODO: Almost no 8-bit ops are desirable because they have no actual // size/speed advantages vs. 32-bit ops, but they do have a major // potential disadvantage by causing partial register stalls. ...but that leads to massive diffs and exposes all kinds of optimization holes itself. Differential Revision: https://reviews.llvm.org/D60286 llvm-svn: 357912	2019-04-08 13:58:50 +00:00
Craig Topper	6a6da233b9	[X86] Make LowerOperationWrapper more robust. Remove now unnecessary ReplaceAllUsesWith from LowerMSCATTER. Previously LowerOperationWrapper took the number of results from the original node and counted that many results from the new node. This was intended to drop chain operands from FP_TO_SINT lowering that uses X87 with memory operations to stack temporaries. The final load had an extra chain output that needs to be ignored. Unfortunately, it didn't work with scatter which has 2 result operands, the mask output which is discarded and a chain output. The chain output is the one that is needed but it comes second and it would be dropped by the previous logic here. To workaround this we were doing a ReplaceAllUses in the lowering code so that the generic legalization code wouldn't see any uses to replace since it had been given the wrong result/type. After this change we take the LowerOperation result directly if the original node has one result. This allows us to directly return the chain from scatter or the load data from the FP_TO_SINT case. When the original node has multiple results we'll ensure the returned node has the same number and copy them over. For cases where the original node has multiple results and the new code for some reason has even more results, MERGE_VALUES can be used to pass only the needed results. llvm-svn: 357887	2019-04-08 07:39:17 +00:00
Craig Topper	424417da79	[X86] Use (SUBREG_TO_REG (MOV32rm)) for extloadi64i8/extloadi64i16 when the load is 4 byte aligned or better and not volatile. Summary: Previously we would use MOVZXrm8/MOVZXrm16, but those are longer encodings. This is similar to what we do in the loadi32 predicate. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60341 llvm-svn: 357875	2019-04-07 19:19:44 +00:00

... 6 7 8 9 10 ...

19391 Commits