llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f12a3da5a7	[X86] X86DAGToDAGISel::tryFoldLoad - assert root/parent pointers are non-null. NFCI. Silences a static analyzer warning. llvm-svn: 372118	2019-09-17 13:27:02 +00:00
Benjamin Kramer	df4b9a3f4f	Hide implementation details in namespaces. llvm-svn: 372113	2019-09-17 12:56:29 +00:00
Simon Pilgrim	0b10da7cc7	[X86] Use APInt::getLowBitsSet helper. NFCI. Also avoids a static analyzer warning about out of range shifts. llvm-svn: 372103	2019-09-17 10:51:30 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
Craig Topper	95aea74494	[X86] Split oversized vXi1 vector arguments and return values into scalars on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069	2019-09-17 04:41:14 +00:00
Craig Topper	769dd59a27	[X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a broadcast load to avoid a copy. The BLENDM instructions allow an 2 sources and an independent destination while masked VBROADCAST has the destination tied to the source. llvm-svn: 372068	2019-09-17 04:41:10 +00:00
Craig Topper	2cc57bedd5	[X86] Add support for commuting EVEX VCMP instructons with any immediate value. Previously we limited to the EQ/NE/TRUE/FALSE/ORD/UNORD immediates. llvm-svn: 372067	2019-09-17 04:41:05 +00:00
Craig Topper	359918dadf	[X86] Enable commuting of EVEX VCMP for all immediate values during isel. llvm-svn: 372065	2019-09-17 04:40:58 +00:00
Simon Pilgrim	3df0daddfd	[X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operands Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector. llvm-svn: 372013	2019-09-16 17:30:33 +00:00
Clement Courbet	44bfbcc28e	[X86][NFC] Add a `use-aa` feature. Summary: This allows enabling useaa on the command-line and will allow enabling the feature on a per-CPU basis where benchmarking shows improvements. This is modelled after the ARM/AArch64 target. Reviewers: RKSimon, andreadb, craig.topper Subscribers: javed.absar, kristof.beyls, hiraditya, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67266 llvm-svn: 371989	2019-09-16 14:05:28 +00:00
Craig Topper	8e0f104916	[X86] Use incDecVectorConstant to simplify the min/max code in LowerVSETCC. incDecVectorConstant is used for a similar reason in LowerVSETCCWithSUBUS so we might as well share the code. llvm-svn: 371861	2019-09-13 14:59:08 +00:00
Simon Pilgrim	930ebc15a6	[X86] negateFMAOpcode - extend to support FMADDSUB/FMSUBADD and output negation. NFCI. Some prep work for PR42863, this change allows us to move all the FMA opcode mappings into the negateFMAOpcode helper. For the FMADDSUB/FMSUBADD cases, we can only negate the accumulator - any other negations will result in an error. llvm-svn: 371840	2019-09-13 11:22:40 +00:00
Matt Arsenault	b366329a34	DAG/GlobalISel: Correct type profile of bitcount ops The result integer does not need to be the same width as the input. AMDGPU, NVPTX, and Hexagon all have patterns working around the types matching. GlobalISel defines these as being different type indexes. llvm-svn: 371797	2019-09-13 00:11:14 +00:00
Philip Reames	0b4d67ca35	Rename nonvolatile_load/store to simple_load/store [NFC] Implement the TODO from D66318. llvm-svn: 371789	2019-09-12 23:03:39 +00:00
Craig Topper	efe6724b9f	[DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775	2019-09-12 21:30:18 +00:00
Simon Pilgrim	d67661ee24	[X86] Move negateFMAOpcode helper earlier to help future patch. NFCI. llvm-svn: 371770	2019-09-12 20:39:56 +00:00
Tim Northover	f1c2892912	AArch64: support arm64_32, an ILP32 slice for watchOS. This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM. FastISel is mostly disabled for now since it would generate incorrect code for ILP32. llvm-svn: 371722	2019-09-12 10:22:23 +00:00
Craig Topper	635d383fad	[X86] Enable -mprefer-vector-width=256 by default for Skylake-avx512 and later Intel CPUs. AVX512 instructions can cause a frequency drop on these CPUs. This can negate the performance gains from using wider vectors. Enabling prefer-vector-width=256 will prevent generation of zmm registers unless explicit 512 bit operations are used in the original source code. I believe gcc and icc both do something similar to this by default. Differential Revision: https://reviews.llvm.org/D67259 llvm-svn: 371694	2019-09-11 23:54:36 +00:00
Reid Kleckner	ff45955fc8	[X86] Fix latent bugs in 32-bit CMPXCHG8B inserter I found three issues: 1. the loop over E[ABCD]X copies run over BB start 2. the direct address of cmpxchg8b could be a frame index 3. the displacement of cmpxchg8b could be a global instead of an immediate These were all introduced together in r287875, and should be fixed with this change. Issue reported by Zachary Turner. llvm-svn: 371678	2019-09-11 21:56:17 +00:00
Craig Topper	08474ca091	[X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG legalization fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128. This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work. I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there. Differential Revision: https://reviews.llvm.org/D67128 llvm-svn: 371672	2019-09-11 21:30:09 +00:00
Guillaume Chatelet	97264366fb	[Alignment][NFC] use llvm::Align for AsmPrinter::EmitAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: dschuff, sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67443 llvm-svn: 371616	2019-09-11 13:37:35 +00:00
Amy Huang	7b1d793713	Reland "Change the X86 datalayout to add three address spaces for 32 bit signed, 32 bit unsigned, and 64 bit pointers." This reverts `57076d3199`. Original review at https://reviews.llvm.org/D64931. Review for added fix at https://reviews.llvm.org/D66843. llvm-svn: 371568	2019-09-10 23:15:38 +00:00
Philip Reames	a9beacbac8	[X86] Updated target specific selection dag code to conservatively check for isAtomic in addition to isVolatile See D66309 for context. This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific. Differential Revision: https://reviews.llvm.org/D66322 llvm-svn: 371547	2019-09-10 18:43:15 +00:00
Craig Topper	0e533ca4bb	[X86] Add broadcast load unfolding support for VCMPPS/PD. llvm-svn: 371487	2019-09-10 05:49:53 +00:00
Reid Kleckner	bf02399a85	[Windows] Replace TrapUnreachable with an int3 insertion pass This is an alternative to D66980, which was reverted. Instead of inserting a pseudo instruction that optionally expands to nothing, add a pass that inserts int3 when appropriate after basic block layout. Reviewers: hans Differential Revision: https://reviews.llvm.org/D67201 llvm-svn: 371466	2019-09-09 23:04:25 +00:00
Philip Reames	20aafa3156	Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441	2019-09-09 19:23:22 +00:00
Craig Topper	5ebd0a6e88	[SelectionDAG] Remove ISD::FP_ROUND_INREG I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431	2019-09-09 17:54:44 +00:00
Craig Topper	ce2cb0f09e	[X86] Allow _MM_FROUND_CUR_DIRECTION and _MM_FROUND_NO_EXC to be used together on instructions that only support SAE and not embedded rounding. Current for SAE instructions we only allow _MM_FROUND_CUR_DIRECTION(bit 2) or _MM_FROUND_NO_EXC(bit 3) to be used as the immediate passed to the inrinsics. But these instructions don't perform rounding so _MM_FROUND_CUR_DIRECTION is just sort of a default placeholder when you don't want to suppress exceptions. Using _MM_FROUND_NO_EXC by itself is really bit equivalent to (_MM_FROUND_NO_EXC \| _MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. Since we aren't rounding on these instructions we should also accept (_MM_FROUND_CUR_DIRECTION \| _MM_FROUND_NO_EXC) as equivalent to (_MM_FROUND_NO_EXC). icc allows this, but gcc does not. Differential Revision: https://reviews.llvm.org/D67289 llvm-svn: 371430	2019-09-09 17:48:05 +00:00
Craig Topper	a88f58ff0e	[X86] Add broadcast load unfolding support for vpcmpeq/vpcmpgt/vpcmp/vpcmpu. llvm-svn: 371368	2019-09-09 07:46:11 +00:00
Craig Topper	8c2ab1c4cb	[X86] Add broadcast load unfold support for smin/umin/smax/umax. llvm-svn: 371366	2019-09-09 06:32:24 +00:00
Craig Topper	ad7822329f	[X86] Add broadcast load unfolding support for VMAXPS/PD and VMINPS/PD. llvm-svn: 371363	2019-09-09 04:25:01 +00:00
Craig Topper	72624b0e59	[X86] Use xorps to create fp128 +0.0 constants. This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357	2019-09-09 01:35:00 +00:00
Simon Pilgrim	e0ea746215	[X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support. This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded. llvm-svn: 371353	2019-09-08 21:38:33 +00:00
Craig Topper	77dd86ee4a	[X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with two zero/undef vector inputs into an all zeroes vector. If the two zero vectors have undefs in different places they won't get combined by simplifySelect. This fixes a regression from an earlier commit. llvm-svn: 371351	2019-09-08 20:56:09 +00:00
Craig Topper	9c11901256	[X86] Remove call to getZeroVector from materializeVectorConstant. Add isel patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350	2019-09-08 20:56:05 +00:00
Roman Lebedev	94db67f0e1	[X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() manually As reported in post-commit review of r370327, there is some case where the code crashes. As discussed with Craig Topper, the problem is that getConstant() internally calls getSplatBuildVector(), so we don't insert the constant itself. If we do that manually we're good. llvm-svn: 371346	2019-09-08 19:36:13 +00:00
Craig Topper	97d41b8917	[X86] Use DAG.getConstant instead of getZeroVector in combinePMULDQ. getZeroVector canonicalizes the type to vXi32, but that's a legalization action. We should use the most correct type if possible. llvm-svn: 371345	2019-09-08 19:24:42 +00:00
Craig Topper	30837abd96	[X86] Teach materializeVectorConstant to not call getZeroVector/getOnesVector on the types we already have isel patterns for. llvm-svn: 371343	2019-09-08 19:24:29 +00:00
David Stenberg	5a583665f4	[DebugInfo][X86] Describe call site values for zero-valued imms Summary: Add zero-materializing XORs to X86's describeLoadedValue() hook in order to produce call site values. I have had to change the defs logic in collectCallSiteParameters() a bit to be able to describe the XORs. The XORs implicitly define $eflags, which would cause them to never be considered, due to a guard condition that I->getNumDefs() is one. I have changed that condition so that we now only consider instructions where a forwarded register overlaps with the instruction's single explicit define. We still need to collect the implicit defines of other forwarded registers to remove them from the work list. I'm not sure how to move towards supporting instructions with multiple explicit defines, cases where forwarded register are implicitly defined, and/or cases where an instruction produces values for multiple forwarded registers. Perhaps the describeLoadedValue() hook should take a register argument, and we then leave it up to the hook to describe the loaded value in that register? I have not yet encountered a situation where that would be necessary though. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: ychen, hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67225 llvm-svn: 371333	2019-09-08 14:22:06 +00:00
David Stenberg	8b70139e95	[NFC] Make the describeLoadedValue() hook return machine operand objects Summary: This changes the ParamLoadedValue pair which the describeLoadedValue() hook returns so that MachineOperand objects are returned instead of pointers. When describing call site values we may need to describe operands which are not part of the instruction. One such example is zero-materializing XORs on x86, which I have implemented support for in a child revision. Instead of having to return a pointer to an operand stored somewhere outside the instruction, start returning objects directly instead, as that simplifies the code. The MachineOperand class only holds POD members, and on x86-64 it is 32 bytes large. That combined with copy elision means that the overhead of returning a machine operand object from the hook does not become very large. I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The benchmark consisted of building a clang 8.0 binary configured with: -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DLLVM_TARGETS_TO_BUILD=X86 \ -DLLVM_USE_SANITIZER=Address \ -DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++" The average wall clock time increased by 4 seconds, from 62:05 to 62:09, which is an 0.1% increase. Reviewers: aprantl, vsk, djtodoro, NikolaPrica Reviewed By: vsk Subscribers: hiraditya, ychen, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D67261 llvm-svn: 371332	2019-09-08 14:05:10 +00:00
Simon Pilgrim	178cd2cd3a	[X86][SSE] Fix out of range shift introduced in D67070/rL371328 Use APInt to create the comparison mask instead. llvm-svn: 371330	2019-09-08 12:44:22 +00:00
Simon Pilgrim	3262084384	[X86][SSE] Add support for <64 x i1> bool reduction This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises. We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal. Differential Revision: https://reviews.llvm.org/D67070 llvm-svn: 371328	2019-09-08 11:46:21 +00:00
Craig Topper	37dd59298f	[X86] Make getZeroVector return floating point vectors in their native type on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325	2019-09-08 00:43:52 +00:00
Craig Topper	1829a09bea	[X86] Add support for unfold broadcast loads from FMA instructions. llvm-svn: 371323	2019-09-07 21:54:40 +00:00
Craig Topper	8cfff1e1bc	[X86] Add prefer-128-bit subtarget feature. Summary: Similar to the previous prefer-256-bit flag. We might want to enable this by default some CPUs. This just starts the initial work to implement and prove that it effects TTI's vector width. Reviewers: RKSimon, echristo, spatel, atdt Reviewed By: RKSimon Subscribers: lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67311 llvm-svn: 371319	2019-09-07 19:54:22 +00:00
Simon Pilgrim	08692e5dd1	[X86] Avoid uses of getZextValue(). NFCI. Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315	2019-09-07 16:13:57 +00:00
Nikita Popov	314893cc4b	[X86] Fix pshuflw formation from repeated shuffle mask (PR43230) Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307	2019-09-07 12:13:44 +00:00
Simon Pilgrim	d7d8bb937a	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI. llvm-svn: 371302	2019-09-07 11:04:04 +00:00
Reid Kleckner	e0df2dce4c	Remove dead .seh_stackalloc parsing method in X86AsmParser The shared COFF asm parser code handles this directive, since it is shared with AArch64. Spotted by Alexandre Ganea in review. llvm-svn: 371251	2019-09-06 20:12:44 +00:00
Craig Topper	7bb433c87b	[X86] Use MOVSX by default instead of CBW to extend i8 to AX for i8 sdivrem. We can use a MOVSX16 here then rely on FixupBWInst to change to MOVSX32 if the upper bits are dead. With a special case to not promote if it could be turned into CBW. Then we can rely on X86MCInstLower to turn the MOVSX into CBW very late if register allocation worked out. Using MOVSX gives an opportunity to use the MOVSX as a both a copy and a sign extend since the input and output register aren't tied together. Differential Revision: https://reviews.llvm.org/D67192 llvm-svn: 371243	2019-09-06 19:17:02 +00:00

1 2 3 4 5 ...

19391 Commits