llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	75a184dacf	Revert rG9ba577eca2e339726bfaad4e615c6324a705b292 "[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI." Sorry this wasn't supposed to be committed yet (and certainly not tagged as NFCI....)	2021-03-15 12:23:44 +00:00
Simon Pilgrim	9ba577eca2	[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI. Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.	2021-03-15 11:59:25 +00:00
Simon Pilgrim	6878be5dc3	[X86][SSE] Attempt to merge single-op hops for slow targets. For slow-hop targets, see if any single-op hops are duplicating work already done on another (dual-op) hop, which can sometimes occur as isHorizontalBinOp tries to find potential duplicates (but can't merge them itself). If so, reuse the other hop and shuffle the result.	2021-03-15 09:30:20 +00:00
Simon Pilgrim	6cb7dddaf4	[X86][AVX] Insert zeros byte elements into 256/512-bit vectors using shuffle/and Avoid extracting/inserting subvectors which makes it more difficult for shuffle combining to merge them together.	2021-03-12 15:16:36 +00:00
Simon Pilgrim	33dcdd414c	[X86] Provide lighter weight getTargetShuffleMask wrapper. NFCI. Most callers to getTargetShuffleMask don't use the IsUnary flag.	2021-03-12 15:16:35 +00:00
Simon Pilgrim	bc5e9ec2dc	Revert rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 "[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling." Investigating an issue reported by @bkramer, possibly when the PSHUFB mask generates zero elements.	2021-03-11 13:14:00 +00:00
Simon Pilgrim	77394c12a4	[X86] Don't attempt to fold sub(C1, xor(X, C2)) with opaque constants Fixes PR49451	2021-03-11 12:06:40 +00:00
Simon Pilgrim	d0884541cc	[X86] canonicalizeShuffleWithBinOps - add binary shuffle handling	2021-03-09 13:57:03 +00:00
Simon Pilgrim	f71cee136d	[X86] Break if-else chain. NFCI. Both if blocks affect control flow - we don't need the else. Fixes clang-tidy warning.	2021-03-08 11:44:31 +00:00
Simon Pilgrim	cd938ab162	[X86] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling.	2021-03-07 12:56:35 +00:00
Simon Pilgrim	772a501bf4	[X86] canonicalizeShuffleWithBinOps - shuffle oneuse constants. We can freely shuffle all ones/zeros constants but we can also freely shuffle other constants as long as they only have one use.	2021-03-07 11:17:03 +00:00
Alexey Lapshin	cf7cdaff64	[X86][VARARG] Avoid spilling xmm registers for va_start. That review is extracted from D69372. It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug. For the noimplicitfloat mode, the compiler mustn't generate floating-point code if it was not asked directly to do so. This rule does not work with variable function arguments currently. Though compiler correctly guards block of code, which copies xmm vararg parameters with a check for %al, it does not protect spills for xmm registers. Thus, such spills are generated in non-protected areas and could break code, which does not expect floating-point data. The problem happens in -O0 optimization mode. With this optimization level there is used FastRegisterAllocator, which spills virtual registers at basic block boundaries. Register Allocator does not protect spills with additional control-flow modifications. Thus to resolve that problem, it is suggested to not copy incoming physical registers into virtual registers. Instead, store incoming physical xmm registers into the memory from scratch. Differential Revision: https://reviews.llvm.org/D80163	2021-03-06 15:25:47 +03:00
Simon Pilgrim	d7b8cb4d57	[X86] X86ISelLowering.cpp - try to use for-range loops. NFCI.	2021-03-05 11:09:14 +00:00
Simon Pilgrim	7cbc5df438	[X86] X86TargetLowering::isSafeMemOpType - break if-else chain. NFCI. All if-else blocks return - fixes clang-tidy warning.	2021-03-04 12:15:08 +00:00
Simon Pilgrim	1584e55a26	[X86] canonicalizeShuffleWithBinOps - handle general unaryshuffle(binop(x,c)) patterns not just xor(x,-1) Generalize the shuffle(not(x)) -> not(shuffle(x)) fold to handle any binop with 0/-1. Hopefully we can further generalize to help push target unary/binary shuffles through binops similar to what we do in DAGCombiner::visitVECTOR_SHUFFLE	2021-03-04 10:44:38 +00:00
Simon Pilgrim	aa4afebbf9	[X86] Fold scalar_to_vector(x) -> extract_subvector(broadcast(x),0) iff broadcast(x) exists Add handling for reusing an existing broadcast(x) to a wider vector.	2021-03-03 15:50:37 +00:00
Benjamin Kramer	10c256ccaf	Revert "[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))" This reverts commit `925093d88a`. Causes an infinite loop when compiling some shuffles: $ cat bugpoint-reduced-simplified.ll target triple = "x86_64-unknown-linux-gnu" define void @foo() { entry: %0 = load i8, i8* undef, align 1 %broadcast.splatinsert = insertelement <16 x i8> poison, i8 %0, i32 0 %1 = icmp ne <16 x i8> %broadcast.splatinsert, zeroinitializer %2 = shufflevector <16 x i1> %1, <16 x i1> undef, <16 x i32> zeroinitializer %wide.load = load <16 x i8>, <16 x i8>* undef, align 1 %3 = icmp ne <16 x i8> %wide.load, zeroinitializer %4 = and <16 x i1> %3, %2 %5 = zext <16 x i1> %4 to <16 x i8> store <16 x i8> %5, <16 x i8>* undef, align 1 ret void } $ llc < bugpoint-reduced-simplified.ll <timeout>	2021-03-02 11:24:07 +01:00
Simon Pilgrim	925093d88a	[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef)) Move NOT out to expose more AND -> ANDN folds	2021-03-01 14:47:39 +00:00
Simon Pilgrim	ab3ea27b6f	[X86][AVX] Reuse existing VBROADCAST(x) for SCALAR_TO_VECTOR(x) Similar to what we already do for BROADCASTs of different vector sizes - if we're going to broadcast it anyway might as well reuse it.	2021-02-28 11:37:27 +00:00
Craig Topper	993f4d8ffa	[X86] Fix a couple comments that said LHS where they meant RHS. NFC	2021-02-27 17:14:17 -08:00
James Y Knight	6de6455752	Use getAlign() on atomicrmw/cmpxchg instructions, now that it's available. These locations were missed as part of adding alignment to the instructions, and were still making their own alignment assumptions.	2021-02-26 15:06:15 -05:00
Simon Pilgrim	ed1f45bce9	[X86][AVX] SimplifyDemandedBitsForTargetNode - add basic X86ISD::VBROADCAST handling. Simplify through to the scalar/vector source operand.	2021-02-26 16:13:14 +00:00
Simon Pilgrim	7ac4c956af	[X86] Remove unnecessary custom lowering of vXi1 SADDSAT/SSUBSAT/UADDSAT/USUBSAT As discussed on D97478. The removal of the custom tag causes some changes in the add/sub-overflow expansion as it no longer expands to sat-arith codegen.	2021-02-26 12:10:23 +00:00
Simon Pilgrim	aefe8f2f6c	[DAG] Fold vXi1 multiplies -> and This allows us to remove X86 custom lowering of vXi1 MUL, which helps simplify a load of mask math. Mentioned in D97478 post review.	2021-02-26 11:46:12 +00:00
Simon Pilgrim	40b8b4a466	[X86] Remove unnecessary custom lowering of v16i1/v32i1 ADD/SUB These were missed in D97478	2021-02-26 11:46:11 +00:00
Craig Topper	ceaedfb5fc	[X86] Remove custom lowering of vXi1 ADD/SUB now that they are canonicalized to XOR in getNode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97478	2021-02-25 08:52:41 -08:00
Simon Pilgrim	8b82669d56	[X86][SSE] Move unaryshuffle(xor(x,-1)) -> xor(unaryshuffle(x),-1) fold into helper. NFCI. We should be able to extend this "canonicalizeShuffleWithBinOps" to handle more generic binop cases where either/both operands can be cheaply shuffled.	2021-02-25 10:56:23 +00:00
Simon Pilgrim	b568d3d6c9	[X86] Add vector support to sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold.	2021-02-21 21:51:27 +00:00
Simon Pilgrim	3ab32c94a4	[X86] Replace explicit constant handling in sub(C1, xor(X, C2)) -> add(xor(X, ~C2), C1+1) fold. NFCI. NFC cleanup before adding vector support - rely on the SelectionDAG to handle everything for us.	2021-02-21 21:40:32 +00:00
Simon Pilgrim	bae04a3e2d	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - remove unnecessary BITCASTs. In conjunction with the 'vperm2x128(bitcast(x),bitcast(y),c) -> bitcast(vperm2x128(x,y,c))' fold in combineTargetShuffle, this should remove any unnecessary bitcasts around vperm2x128 lane shuffles.	2021-02-21 18:40:32 +00:00
Simon Pilgrim	a6a258f1da	[X86][AVX] Fold concat(extract_subvector(v0,c0), extract_subvector(v1,c1)) -> vperm2x128 Fixes regression exposed by removing bitcasts across logic-ops in D96206. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:50:43 +00:00
Simon Pilgrim	2885d1251f	[X86] Fold bitcast(logic(bitcast(X), Y)) --> logic'(X, bitcast(Y)) for int-int bitcasts Extend the existing combine that handles bitcasting for fp-logic ops to also help remove logic ops across bitcasts to/from the same integer types. This helps improve AVX512 predicate handling for D/Q logic ops and also allows DAGCombine's scalarizeExtractedBinop to remove some annoying gpr->simd->gpr transfers. The concat_vectors regression in pr40891.ll will be addressed in a followup commit on this patch. Differential Revision: https://reviews.llvm.org/D96206	2021-02-21 14:40:54 +00:00
Simon Pilgrim	761bbed264	[DAG] foldSubToUSubSat - fold sub(a,trunc(umin(zext(a),b))) -> usubsat(a,trunc(umin(b,SatLimit))) This moves the last custom x86 USUBSAT fold to generic DAGCombine. Completes PR40111 Differential Revision: https://reviews.llvm.org/D96703	2021-02-20 12:02:07 +00:00
Simon Pilgrim	2258b367db	[X86][AVX] getFauxShuffleMask - decode VBROADCAST(EXTRACT_VECTOR_ELT(V,0)) Handle the case where we're broadcasting a scalar extracted from another vector.	2021-02-19 11:06:53 +00:00
Wang, Pengfei	c98644c2ec	[X86] Fix a codegen crash in getSetCCResultType This patch fixes some crashes coming from X86ISelLowering::getSetCCResultType, which would occasionally return an EVT constructed from an invalid MVT, which has a null Type pointer. This patch refers to D95434. Differential Revision: https://reviews.llvm.org/D97036	2021-02-19 17:30:10 +08:00
Simon Pilgrim	05c64ea672	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) (REAPPLIED) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Fixes issue raised by @saugustine in rG5aa8f4c0843a where we were failing to replace null shuffle operands from MergeInnerShuffle to UNDEFs. Differential Revision: https://reviews.llvm.org/D96345	2021-02-17 11:42:43 +00:00
Sterling Augustine	5aa8f4c084	Revert "[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)))" This reverts commit `5dfba562dd`. That commit causes an assertion failure with the following repro: typedef long b __attribute__((__vector_size__(16))); b d; b e; b __attribute__((__always_inline__)) c(b h, b i) { return (__attribute__((__vector_size__(8 sizeof(short)))) short)h + i; } j() { b k, l, m, n, o[6], p, q; m = d[5]; b r = m; b s = f(r, 8); q = s; l = d[1]; p = l; t(q); n = c(m, l); o[1] = c(s, f(p, 8)); k = __builtin_shufflevector(n, o[1], 0, 2); e = __builtin_ia32_psrlwi128(k, j); } ./bin/clang -cc1 -triple x86_64-grtev4-linux-gnu -emit-obj -O1 -std=c99 test.c	2021-02-16 12:48:15 -08:00
Simon Pilgrim	5dfba562dd	[DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d))) -> bop(shuffle(x,y),shuffle(z,w)),bop(shuffle(a,b),shuffle(c,d)) Attempt to fold from a shuffle of a pair of binops to a binop of shuffles, as long as one/both of the binop sources are also shuffles that can be merged with the outer shuffle. This should guarantee that we remove one binop without introducing any additional shuffles. Technically there's potential for a merged shuffle's lowering to be poorer than the original shuffle, but it could also be better, and I'm not seeing any regressions as long as we keep the 'don't merge splats' rule already present in MergeInnerShuffle. This expands and generalizes an existing X86 combine and attempts to merge either of each binop's sources (with an on-the-fly commutation of the shuffle mask) - we couldn't do that in the x86 version as it had to stay in a form that DAGCombine's MergeInnerShuffle would still recognise. Differential Revision: https://reviews.llvm.org/D96345	2021-02-16 15:46:34 +00:00
Simon Pilgrim	4841a225b7	[DAG] Move basic USUBSAT pattern matches from X86 to DAGCombine Begin transitioning the X86 vector code to recognise sub(umax(a,b) ,b) or sub(a,umin(a,b)) USUBSAT patterns to make it more generic and available to all targets. This initial patch just moves the basic umin/umax patterns to DAG, removing some vector-only checks on the way - these are some of the patterns that the legalizer will try to expand back to so we can be reasonably relaxed about matching these pre-legalization. We can handle the trunc(sub(..))) variants as well, which helps with patterns where we were promoting to a wider type to detect overflow/saturation. The remaining x86 code requires some cleanup first - some of it isn't actually tested etc. I also need to resurrect D25987. Differential Revision: https://reviews.llvm.org/D96413	2021-02-12 18:22:57 +00:00
Simon Pilgrim	eb31c3c5cb	Revert rGe1172959226689a "[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks." Revert this while I investigate a downstream breakage report.	2021-02-10 10:26:44 +00:00
Simon Pilgrim	89d9ff8229	[X86][SSE] foldShuffleOfHorizOp - add SHUFPS v4f32 handling Fold shufps(hop(x,y),hop(z,w)) -> permute(hop(x,z)) - this is very similar to the equivalent unpack fold. I did start trying to convert foldShuffleOfHorizOp to handle generic shuffle masks but we're relying on a lot of special cases at the moment.	2021-02-09 14:18:45 +00:00
Simon Pilgrim	598ceb25d4	[X86][AVX] Fold extract_subvector(splat, c) -> extract_subvector(splat, 0) We already do this for VBROADCASTs, extend this for any splat that SelectionDAG::isSplatValue recognises as well.	2021-02-07 11:42:41 +00:00
Craig Topper	6f4f0efd89	[X86] Don't pass a 1 to the second argument of ISD::FP_ROUND in LowerFCOPYSIGN. I don't think we have any reason to believe the FP_ROUND here doesn't change the value. Found while trying to see if we still need the fp128 block in CanCombineFCOPYSIGN_EXTEND_ROUND. Removing that check caused this FP_ROUND to fire for fp128 which introduced a libcall expansion that asserted for this being a 1. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D96098	2021-02-06 10:29:01 -08:00
Simon Pilgrim	e117295922	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - merge VPERMILPD ops with different low/high masks. Now that PR48908 has been dealt with, we can handle v4f64 permute cases by extracting the low/high lane VPERMILPD masks and creating a new mask based on which lanes are referenced by the VPERM2F128 mask.	2021-02-06 15:58:02 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Simon Pilgrim	31b85e1c0b	[X86] Use VT::changeVectorElementType helper where possible. NFCI.	2021-02-04 15:03:56 +00:00
Simon Pilgrim	fa2cdb8140	[X86] Remove stale TODO comment. NFC. We now handle implicit zero-extension shuffle mask cases.	2021-02-04 12:14:05 +00:00
Simon Pilgrim	32b7c2fa42	[X86][SSE] Support variable-index float/double vector insertion on SSE41+ targets (PR47924) Extends D95779 to permit insertion into float/doubles vectors while avoiding a lot of aliased memory traffic. The scalar value is already on the simd unit, so we only need to transfer and splat the index value, then perform the select. SSE4 codegen is a little bulky due to the tied register requirements of (non-VEX) BLENDPS/PD but the extra moves are cheap so shouldn't be an actual problem. Differential Revision: https://reviews.llvm.org/D95866	2021-02-03 14:14:35 +00:00
Simon Pilgrim	8c2e075c2c	[X86][SSE] LowerINSERT_VECTOR_ELT - pull out repeated EltSizeInBits calls. NFCI.	2021-02-02 13:45:18 +00:00
Simon Pilgrim	d46a6b3d55	[X86][AVX512] Support variable-index vector insertion on AVX512 targets (PR47924) With predicate masks, AVX512 can efficiently perform variable-index vector insertion with 2 broadcasts + 1 comparison, avoiding a lot of aliased memory traffic. Differential Revision: https://reviews.llvm.org/D95779	2021-02-02 11:41:18 +00:00
Philip Reames	46e764a628	[x86] introduce no_callee_saved_registers attribute This is directly analogous to the existing no_caller_saved_registers, but with the opposite intention. A function or call so marked shifts the responsibility of spilling the usual CSRs to it's caller. An indirect call site and callee which don't agree on the attribute is ill defined. The motivation for this change is that being able to prune callee saves (without modifying other details of the calling convention) is sometimes useful when generating stubs and adapters. There's no intention to expose this as a source language feature; this is expected to be used by frontends to implement adapters where warranted. Some specific examples of use cases: * GC compatible compiled code wants to call an externally defined library function without needing to track pointer values through CSRs. * debug enabled code wants to call precompiled library which doesn't provide enough information to track CSRs while preserving debug quality in caller. * adapter stub entering hand written assembler which doesn't follow normal calling conventions.	2021-02-01 16:19:14 -08:00
Philip Reames	9d09db941f	[NFC][X86] Use CallBase interface to simplify code	2021-02-01 15:24:41 -08:00
Philip Reames	bb6c23b1f5	[NFC][X86] Avoid redundant work inspecting callee	2021-02-01 15:24:41 -08:00
Simon Pilgrim	e640b209b2	[X86][SSE] LowerScalarImmediateShift - use APInt::getLowBitsSet for vXi8 ISD::SRL mask generation. NFCI. Match what we do for ISD::SHL	2021-02-01 18:17:40 +00:00
Simon Pilgrim	5211af4818	[X86][AVX] combineExtractWithShuffle - combine extracts from 256/512-bit vector shuffles. We can only legally extract from the lowest 128-bit subvector, so extract the correct subvector to allow us to handle 256/512-bit vector element extracts.	2021-02-01 10:31:43 +00:00
Simon Pilgrim	d6b68d1344	[X86][SSE] combineExtractWithShuffle - support zero-extending to allow extracting from narrow shuffle masks If the shuffle mask can't be widened to match the original extracted element width, see if the upper bits are zeroable - which allows us to extract+zero-extend the smaller extraction.	2021-01-29 14:22:10 +00:00
Simon Pilgrim	f84efe97bc	[X86][AVX] combineHorizOpWithShuffle - fix valuetype comparison typo. Ensure we check the valuetypes of all the HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) shuffle input ops - there was a copy+paste typo (noticed by MSVC analyzer) that meant we were checking the same input from one of the shuffles twice. I haven't been able to create a test case for this yet - I don't think its currently possible to create a target/faux binary shuffle that scales to a 2x128 shuffle mask from two different value types.	2021-01-28 16:36:23 +00:00
Simon Pilgrim	6663330bc8	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - don't merge VPERMILPD ops with different low/high masks. Unlike VPERMILPS, VPERMILPD can have non-repeating masks in each 128-bit subvector, we weren't accounting for this when folding vperm2f128(vpermilpd(x,c),vpermilpd(y,c)) -> vpermilpd(vperm2f128(x,y),c). I'm intending to add support for this but wanted to get a minimal fix in first for merging into 12.xx. Fixes PR48908	2021-01-28 12:11:31 +00:00
Freddy Ye	b3b0acdc6f	[NFC] Refine some uninitialized used variables. These warning are reported by static code analysis tool: Klocwork Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D95421	2021-01-26 16:51:05 +08:00
Simon Pilgrim	13f2aee783	[X86][AVX] Generalize vperm2f128/vperm2i128 patterns to support all legal 256-bit vector types Remove bitcasts to/from v4x64 types through vperm2f128/vperm2i128 ops to help improve shuffle combining and demanded vector elts folding.	2021-01-25 15:35:36 +00:00
Simon Pilgrim	821a51a9ca	[X86][AVX] combineX86ShuffleChainWithExtract - widen to at least original root size. NFCI. We're relying on the source inputs for shuffle combining having already been widened to the root size (otherwise the offset logic falls over) - we're going to be supporting different sized shuffle inputs soon, so we need to explicitly make the minimum widened width the original root size.	2021-01-25 13:45:37 +00:00
Simon Pilgrim	1b780cf32e	[X86][AVX] LowerTRUNCATE - avoid bitcasts around extract_subvectors. We allow extract_subvector lowering of all legal types, so pre-bitcast the source type to try and reduce bitcast pollution.	2021-01-25 12:10:36 +00:00
Simon Pilgrim	f461e35cba	[X86][AVX] combineX86ShuffleChain - avoid bitcasts around insert_subvector() shuffle patterns. We allow insert_subvector lowering of all legal types, so don't always cast to the vXi64/vXf64 shuffle types - this is only necessary for X86ISD::SHUF128/X86ISD::VPERM2X128 patterns later.	2021-01-25 11:35:45 +00:00
Fangrui Song	d745b82de1	[XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering	2021-01-25 00:49:18 -08:00
Simon Pilgrim	bd122f6d21	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle vperm2x128(movddup(x),movddup(y)) cases Fold vperm2x128(movddup(x),movddup(y)) -> movddup(vperm2x128(x,y))	2021-01-22 16:05:19 +00:00
Simon Pilgrim	c33d36e066	[X86][AVX] canonicalizeLaneShuffleWithRepeatedOps - handle unary vperm2x128(permute/shift(x,c),undef) cases Fold vperm2x128(permute/shift(x,c),undef) -> permute/shift(vperm2x128(x,undef),c)	2021-01-22 15:47:23 +00:00
Simon Pilgrim	4846f6ab81	[X86][AVX] combineTargetShuffle - simplify the X86ISD::VPERM2X128 subvector matching Simplify vperm2x128(concat(X,Y),concat(Z,W)) folding. Use collectConcatOps / ISD::INSERT_SUBVECTOR to find the source subvectors instead of hardcoded immediate matching.	2021-01-22 15:47:22 +00:00
Simon Pilgrim	b1166e1317	[X86][AVX] combineX86ShufflesRecursively - attempt to constant fold before widening shuffle inputs combineX86ShufflesConstants/canonicalizeShuffleMaskWithHorizOp can both handle/earlyout shuffles with inputs of different widths, so delay widening as late as possible to make it easier to match constant folds etc. The plan is to eventually move the widening inside combineX86ShuffleChain so that we don't create any new nodes unless we successfully combine the shuffles.	2021-01-22 13:19:35 +00:00
Simon Pilgrim	ffe72f987f	[X86][SSE] Don't fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) if the shuffle are splats rGbe69e66b1cd8 added the fold, but DAGCombiner.visitVECTOR_SHUFFLE doesn't merge shuffles if the inner shuffle is a splat, so we need to bail. The non-fast-horiz-ops paths see some minor regressions, we might be able to improve on this after lowering to target shuffles. Fix PR48823	2021-01-22 11:31:38 +00:00
Simon Pilgrim	86021d98d3	[X86] Avoid a std::string copy by replacing auto with const auto&. NFC. Fixes msvc analyzer warning.	2021-01-21 11:04:07 +00:00
Max Kazantsev	d6bb96e677	[X86] Add experimental option to separately tune alignment of innermost loops We already have an experimental option to tune loop alignment. Its impact is very wide (and there is a suspicion that it's not always profitable). We want to have something more narrow to play with. This patch adds similar option that overrides preferred alignment for innermost loops. This is for experimental purposes, default values do not change the existing behavior. Differential Revision: https://reviews.llvm.org/D94895 Reviewed By: pengfei	2021-01-21 11:15:16 +07:00
Simon Pilgrim	b8b5e87e6b	[X86][AVX] Handle vperm2x128 shuffling of a subvector splat. We already handle "vperm2x128 (ins ?, X, C1), (ins ?, X, C1), 0x31" for shuffling of the upper subvectors, but we weren't dealing with the case when we were splatting the upper subvector from a single source.	2021-01-20 18:16:33 +00:00
Simon Pilgrim	19d02842ee	[X86][AVX] Fold extract_subvector(VSRLI/VSHLI(x,32)) -> VSRLI/VSHLI(extract_subvector(x),32) As discussed on D56387, if we're shifting to extract the upper/lower half of a vXi64 vector then we're actually better off performing this at the subvector level as its very likely to fold into something. combineConcatVectorOps can perform this in reverse if necessary.	2021-01-20 14:34:54 +00:00
Simon Pilgrim	5626adcd6b	[X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c)) If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.	2021-01-19 11:04:13 +00:00
Sanjay Patel	d27bb5c375	[x86] add cast to avoid compile-time warning; NFC	2021-01-18 17:47:04 -05:00
Simon Pilgrim	ce06475da9	[X86][AVX] IsElementEquivalent - add matchShuffleWithUNPCK + VBROADCAST/VBROADCAST_LOAD handling Specify LHS/RHS operands in matchShuffleWithUNPCK's calls to isTargetShuffleEquivalent, and handle VBROADCAST/VBROADCAST_LOAD matching in IsElementEquivalent	2021-01-18 15:55:00 +00:00
Simon Pilgrim	770d1e0a88	[X86][SSE] isHorizontalBinOp - reuse any existing horizontal ops. If we already have similar horizontal ops using the same args, then match that, even if we are on a target with slow horizontal ops.	2021-01-18 10:14:45 +00:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Simon Pilgrim	be69e66b1c	[X86][SSE] Attempt to fold shuffle(binop(),binop()) -> binop(shuffle(),shuffle()) If this will help us fold shuffles together, then push the shuffle through the merged binops. Ideally this would be performed in DAGCombiner::visitVECTOR_SHUFFLE but getting an efficient+legal merged shuffle can be tricky - on SSE we can be confident that for 32/64-bit elements vectors shuffles should easily fold.	2021-01-15 16:25:25 +00:00
Simon Pilgrim	1dfd5c9ad8	[X86][AVX] combineHorizOpWithShuffle - support target shuffles in HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y)) Be more aggressive on (AVX2+) folds of lane shuffles of 256-bit horizontal ops by working on target/faux shuffles as well.	2021-01-15 13:55:30 +00:00
Simon Pilgrim	cbbfc82586	[X86][SSE] canonicalizeShuffleMaskWithHorizOp - simplify shuffle(HOP(HOP(X,Y),HOP(Z,W))) style chains. See if we can remove the shuffle by resorting a HOP chain so that the HOP args are pre-shuffled. This initial version just handles (the most common) v4i32/v4f32 hadd/hsub reduction patterns - future work can extend this to v8i16 types plus PACK chains (2f64 HADD/HSUB should already be handled in the half-lane combine code later on).	2021-01-13 17:19:40 +00:00
Simon Pilgrim	0a0ee7f5a5	[X86] canonicalizeShuffleMaskWithHorizOp - minor refactor to support multiple src ops. NFCI. canonicalizeShuffleMaskWithHorizOp currently only supports shuffles with 1 or 2 sources, but PR41813 will require us to support higher numbers of sources. This patch just generalizes the initial setup stages to ensure all src ops are the same type and opcode and then will continue to early out if we have more than 2 sources.	2021-01-13 13:59:56 +00:00
Simon Pilgrim	0f59d09957	[X86][AVX] combineVectorSignBitsTruncation - limit AVX512 truncations to 128-bits (PR48727) rG73a44f437bf1 result in 256-bit packss/packus ops with additional shuffles that shuffle combining can sometimes try to convert back into a truncation.	2021-01-13 10:38:23 +00:00
Bevin Hansson	07605ea1f3	[X86] Improved lowering for saturating float to int. Adapted from D54696 by @nikic. This patch improves lowering of saturating float to int conversions, FP_TO_[SU]INT_SAT, for X86. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86079	2021-01-12 15:44:41 +01:00
Simon Pilgrim	2ed914cb7e	[X86][SSE] getFauxShuffleMask - handle PACKSS(SRAI(),SRAI()) shuffle patterns. We can't easily treat ASHR a faux shuffle, but if it was just feeding a PACKSS then it was likely being used as sign-extension for a truncation, so just peek through and adjust the mask accordingly.	2021-01-12 14:07:53 +00:00
Simon Pilgrim	7e44208115	[X86][SSE] combineSubToSubus - add v16i32 handling on pre-AVX512BW targets. v16i32 -> v16i16/v8i16 truncation is now good enough using PACKSS/PACKUS + shuffle combining that its no longer necessary to early-out on pre-AVX512BW targets. This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely.	2021-01-12 13:44:11 +00:00
Simon Pilgrim	a5212b5c91	[X86][SSE] combineSubToSubus - remove SSE2 early-out. SSE2 truncation codegen has improved over the past few years (mainly due to better shuffle lowering/combining and computeKnownBits) - its no longer necessary to early-out from v8i32/v8i64 truncations. This was noticed while looking at completing PR40111 and moving combineSubToSubus to DAGCombine entirely.	2021-01-12 12:52:11 +00:00
Simon Pilgrim	4214ca9614	[X86][AVX] Attempt to fold vpermf128(op(x,i),op(y,i)) -> op(vpermf128(x,y),i) If vpermf128/vpermi128 is acting on 2 similar 'inlane' ops, then try to perform the vpermf128 first which will allow us to merge the ops. This will help us fix one of the regressions in D56387	2021-01-11 16:59:25 +00:00
Simon Pilgrim	41bf338dd1	Revert rGd43a264a5dd3 "Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())"" This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2. [X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result. REAPPLIED with a fix for unary unpacks of HOP.	2021-01-11 11:29:04 +00:00
Nico Weber	d43a264a5d	Revert "[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())" This reverts commit `80dee7965d`. Makes clang sometimes hang forever. See https://bugs.chromium.org/p/chromium/issues/detail?id=1164786#c6 for a stand-alone repro.	2021-01-10 20:22:53 -05:00
Simon Pilgrim	80dee7965d	[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop()) UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.	2021-01-08 15:22:17 +00:00
Simon Pilgrim	73a44f437b	[X86][AVX] combineVectorSignBitsTruncation - use PACKSS/PACKUS in more AVX cases AVX512 has fast truncation ops, but if the truncation source is a concatenation of subvectors then its likely that we can use PACK more efficiently. This is only guaranteed to work for truncations to 128/256-bit vectors as the PACK works across 128-bit sub-lanes, for now I've just disabled 512-bit truncation cases but we need to get them working eventually for D61129.	2021-01-05 15:01:45 +00:00
Kazu Hirata	985f899bf2	[Target] Use llvm::append_range (NFC)	2021-01-03 09:57:43 -08:00
Fangrui Song	6be0b9a8dd	[X86] Don't fold negative offset into 32-bit absolute address (e.g. movl $foo-1, %eax) When building abseil-cpp `bin/absl_hash_test` with Clang in -fno-pic mode, an instruction like `movl $foo-2147483648, $eax` may be produced (subtracting a number from the address of a static variable). If foo's address is smaller than 2147483648, GNU ld/gold/LLD will error because R_X86_64_32 cannot represent a negative value. ``` using absl::Hash; struct NoOp { template < typename HashCode > friend HashCode AbslHashValue(HashCode , NoOp ); }; template <typename> class HashIntTest : public testing::Test {}; TYPED_TEST_SUITE_P(HashIntTest); TYPED_TEST_P(HashIntTest, BasicUsage) { if (std::numeric_limits< TypeParam >::min ) EXPECT_NE(Hash< NoOp >()({}), Hash< TypeParam >()(std::numeric_limits< TypeParam >::min())); } REGISTER_TYPED_TEST_CASE_P(HashIntTest, BasicUsage); using IntTypes = testing::Types< int32_t>; INSTANTIATE_TYPED_TEST_CASE_P(My, HashIntTest, IntTypes); ld: error: hash_test.cc:(function (anonymous namespace)::gtest_suite_HashIntTest_::BasicUsage<int>::TestBody(): .text+0x4E472): relocation R_X86_64_32 out of range: 18446744071564237392 is not in [0, 4294967295]; references absl::hash_internal::HashState::kSeed ``` Actually any negative offset is not allowed because the symbol address can be zero (e.g. set by `-Wl,--defsym=foo=0`). So disallow such folding. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D93931	2020-12-30 18:47:26 -08:00
Luo, Yuanke	981a0bd858	[X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. It can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depend on https://reviews.llvm.org/D87981. Differential Revision: https://reviews.llvm.org/D91927	2020-12-30 13:52:13 +08:00
Simon Pilgrim	8767f3bb97	[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969) Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead. Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.	2020-12-18 15:49:53 +00:00
Simon Pilgrim	992fad03e2	[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling. Simplifies a few more cases, notably shuffle demanded elts cases.	2020-12-18 11:51:10 +00:00
Simon Pilgrim	931e66bd89	[X86] Remove extract_subvector(subv_broadcast_load()) fold. This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.	2020-12-17 11:02:49 +00:00
Simon Pilgrim	cdb692ee0c	[X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969) Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns. This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts. As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes. Later patches will continue to replace X86ISD::SUBV_BROADCAST usage. Differential Revision: https://reviews.llvm.org/D92645	2020-12-17 10:25:25 +00:00
QingShan Zhang	ebdd20f430	Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128 X86 and AArch64 expand it as libcall inside the target. And PowerPC also want to expand them as libcall for P8. So, propose an implement in the legalizer to common the logic and remove the code for X86/AArch64 to avoid the duplicate code. Reviewed By: Craig Topper Differential Revision: https://reviews.llvm.org/D91331	2020-12-17 07:59:30 +00:00
Simon Pilgrim	553808d456	[X86] Rename reduction combiners to make it clearer whats happening. NFCI. Since these are all working on reduction patterns, actually use that term in the function name to make them easier to search for. At some point we're likely to start working with the ISD::VECREDUCE_* opcodes directly in the x86 backend, but that is still some way off.	2020-12-16 14:48:21 +00:00
Simon Pilgrim	e55f7de946	[X86][SSE] combineReductionToHorizontal - don't rely on widenSubVector to handle illegal vector types. Thanks to @asbirlea for reporting the bug.	2020-12-16 11:24:40 +00:00
Simon Pilgrim	712117338a	[X86] Explicitly use SDValue instead of auto. NFCI. Fix static analyzer warning about not using a SDValue&	2020-12-15 17:27:25 +00:00
Simon Pilgrim	b0e5aea557	[X86] Remove unnecessary SUBV_BROADCAST combines. NFCI. Noticed while dealing with D92645 - these are now handled by getFauxShuffleMask + shuffle combining code.	2020-12-15 16:54:34 +00:00
Simon Pilgrim	bd07092669	[X86] Remove trailing whitespace. NFC.	2020-12-15 10:11:38 +00:00
Simon Pilgrim	15a31389b2	[X86][AVX] LowerBUILD_VECTOR - reduce 256/512-bit build vectors with zero/undef upper elements + pad. As discussed on D92645, we don't do a good job of recognising when we don't require the full width of a ymm/zmm build vector because the upper elements are undef/zero. This commit allows us to make use of implicit zeroing of upper elements with AVX instructions, which we emulate in DAG with a INSERT_SUBVECTOR into the bottom of a undef/zero vector of the original type. This exposed a limitation in getTargetConstantBitsFromNode which didn't extract bits from INSERT_SUBVECTORs of different element widths which I've included as well to prevent a couple of regressions.	2020-12-15 10:11:38 +00:00
Harald van Dijk	9eac818370	[X86] Fix variadic argument handling for x32 The X86-64 ABI defines va_list as typedef struct { unsigned int gp_offset; unsigned int fp_offset; void overflow_arg_area; void reg_save_area; } va_list[1]; This means the size, alignment, and reg_save_area offset will depend on whether we are in LP64 or in ILP32 mode, so this commit adds the checks. Additionally, the VAARG_64 pseudo-instruction assumed 64-bit pointers, so this commit adds a VAARG_X32 pseudo-instruction that behaves just like VAARG_64, except for assuming 32-bit pointers. Some of these changes were originally done by Michael Liao <michael.hliao@gmail.com>. Fixes https://bugs.llvm.org/show_bug.cgi?id=48428. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D93160	2020-12-14 23:47:27 +00:00
Simon Pilgrim	5f5a2547c1	[X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI. Prep work for undef/zero 'upper elements' handling as proposed in D92645.	2020-12-14 16:28:45 +00:00
Kazu Hirata	913515e465	[Target] Use llvm::is_contained (NFC)	2020-12-13 19:35:10 -08:00
Simon Pilgrim	d5c434d7dd	[X86][SSE] combineX86ShufflesRecursively - add basic handling for combining shuffles of different widths (PR45974) If a faux shuffle uses smaller shuffle inputs, try to recursively combine with those inputs directly instead of widening them immediately. Then widen all smaller inputs at the bottom of the recursion. This will still mean we're generating nodes on the fly (PR45974) even if we don't combine to a new shuffle but it does help AVX2+ targets combine across xmm/ymm/zmm types, mainly as variable shuffles.	2020-12-13 17:18:07 +00:00
Simon Pilgrim	47321c311b	[X86][SSE] combineReductionToHorizontal - add vXi8 ISD::MUL reduction handling (PR39709) Default expansion leads to repeated extensions/truncations to/from vXi16 which shuffle combining and demanded elts can't completely unravel. Better just to promote (any_extend) the input and perform a vXi16 reduction. We'll be able to remove a lot of this if we ever get decent legalization support for reduction intrinsics in SelectionDAG.	2020-12-13 15:22:54 +00:00
Luo, Yuanke	f80b29878b	[X86] AMX programming model. This patch implements amx programming model that discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet. This patch implemeted 7 components. 1. The c interface to end user. 2. The AMX intrinsics in LLVM IR. 3. Transform load/store <256 x i32> to AMX intrinsics or split the type into two <128 x i32>. 4. The Lowering from AMX intrinsics to AMX pseudo instruction. 5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx intruction. 6. The register allocation for tile register. 7. Morph AMX pseudo instruction to AMX real instruction. Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0 Differential Revision: https://reviews.llvm.org/D87981	2020-12-10 17:01:54 +08:00
Saleem Abdulrasool	ee74d1b420	X86: use a data driven configuration of Windows x86 libcalls (NFC) Rather than creating a series of associated calls and ensuring that everything is lined up, use a table driven approach that ensures that they two always stay in sync.	2020-12-09 22:49:11 +00:00
Simon Pilgrim	24184dbb82	[X86] Fold CONCAT(VPERMV3(X,Y,M0),VPERMV3(Z,W,M1)) -> VPERMV3(CONCAT(X,Z),CONCAT(Y,W),CONCAT(M0,M1)) Further prep work toward supporting different subvector sizes in combineX86ShufflesRecursively	2020-12-09 14:29:32 +00:00
Kerry McLaughlin	4519ff4b6f	[SVE][CodeGen] Add the ExtensionType flag to MGATHER Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode. Also updated SelectionDAGDumper::print_details so that details of the gather load (is signed, is scaled & extension type) are printed. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91084	2020-12-09 11:19:08 +00:00
Harald van Dijk	29c8ea6f1a	[X86] Handle localdynamic TLS model in x32 mode D92346 added TLS_(base_)addrX32 to handle TLS in x32 mode, but missed the different TLS models. This diff fixes the logic for the local dynamic model where `RAX` was used when `EAX` should be, and extends the tests to cover all four TLS models. Fixes https://bugs.llvm.org/show_bug.cgi?id=26472. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92737	2020-12-08 21:06:00 +00:00
Tim Northover	c5978f42ec	UBSAN: emit distinctive traps Sometimes people get minimal crash reports after a UBSAN incident. This change tags each trap with an integer representing the kind of failure encountered, which can aid in tracking down the root cause of the problem.	2020-12-08 10:28:26 +00:00
Simon Pilgrim	0101fb73de	[X86] Fold MOVMSK(ICMP_SGT(X,-1)) -> NOT(MOVMSK(X))) Noticed while triaging PR37506	2020-12-06 17:56:41 +00:00
Layton Kifer	ac522f8700	[DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1) Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets. Differential Revision: https://reviews.llvm.org/D91589	2020-12-06 11:52:10 -05:00
Simon Pilgrim	b96a521077	[X86] LowerRotate - enable custom lowering of ROTL/ROTR vXi16 on VBMI2 targets.	2020-12-04 12:16:59 +00:00
Simon Pilgrim	d073805be6	[X86] LowerRotate - VBMI2 targets can lower vXi16 rotates using funnel shifts. Ideally we'd do this inside DAGCombine but until we can make the FSHL/FSHR opcodes legal for VBMI2 it won't help us.	2020-12-04 11:29:23 +00:00
Simon Pilgrim	df1ddc4234	[X86] Let VBMI2 non-VLX targets still use funnel shifts instructions	2020-12-04 11:06:43 +00:00
Simon Pilgrim	8eedd18fcb	[X86] Remove unnecessary bitcast. NFC. The X86ISD::SUBV_BROADCAST node is already VT	2020-12-04 09:44:57 +00:00
Xiang1 Zhang	f2e2924463	[X86] Unbind the ebx with GOT address in regcall calling convention No register can be allocated for indirect call when it use regcall calling convention and passed 5/5+ args. For example: call vreg (ag1, ag2, ag3, ag4, ag5, ...) --> 5 regs (EAX, ECX, EDX, ESI, EDI) used for pass args, 1 reg (EBX )used for hold GOT point, so no regs can be allocated to vreg. The Intel386 architecture provides 8 general purpose 32-bit registers. RA mostly use 6 of them (EAX, EBX, ECX, EDX, ESI, EDI). 5 of this regs can be used to pass function arguments (EAX, ECX, EDX, ESI, EDI). EBX used to hold the GOT pointer when making function calls via the PLT. ESP and EBP usually be "reserved" in register allocation. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D91020	2020-12-04 10:00:13 +08:00
Harald van Dijk	c9be4ef184	[X86] Add TLS_(base_)addrX32 for X32 mode LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit TLS addresses in 64-bit mode, which were not yet handled. This adds TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use tls32(base)addr rather than tls64(base)addr, and then restricts TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D92346	2020-12-02 22:20:36 +00:00
Simon Pilgrim	f019362329	[X86] EltsFromConsecutiveLoads - remove old FIXME comment. NFC. Its unlikely an undef element in a zero vector will be any use.	2020-12-02 17:21:41 +00:00
Simon Pilgrim	3900ec6f05	[X86] combineX86ShufflesRecursively - remove old FIXME comment. NFC. Its unlikely an undef element in a zero vector will be any use, and SimplifyDemandedVectorElts now calls combineX86ShufflesRecursively so its unlikely we actually have a dependency on these specific elements.	2020-12-02 16:29:38 +00:00
Simon Pilgrim	0dab7ecc5d	[X86] EltsFromConsecutiveLoads - pull out repeated NumLoadedElts. NFCI.	2020-12-02 16:29:37 +00:00
Simon Pilgrim	1b209ff9e3	[DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.	2020-12-01 14:25:29 +00:00
Simon Pilgrim	6dbd0d36a1	[DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111) Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds. The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away. Differential Revision: https://reviews.llvm.org/D91876	2020-12-01 11:56:26 +00:00
Harald van Dijk	cdac34bd47	[X86] Zero-extend pointers to i64 for x86_64 For LP64 mode, this has no effect as pointers are already 64 bits. For ILP32 mode (x32), this extension is specified by the ABI. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D91338	2020-11-30 18:51:23 +00:00
Simon Pilgrim	83d79ca5bf	[X86][AVX512] Only lower to VPALIGNR if we have BWI (PR48322)	2020-11-30 10:51:24 +00:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
Simon Pilgrim	8057ebf4a0	Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal" This reverts commit `12d59b696b`. Prematurely pushed this to trunk	2020-11-26 15:07:45 +00:00
Simon Pilgrim	12d59b696b	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code.	2020-11-26 14:47:28 +00:00
Simon Pilgrim	791040cd8b	[DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.	2020-11-22 13:02:27 +00:00
Simon Pilgrim	0341029bb4	[X86][AVX] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT v8i32/v4i64 lowering Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on SSE targets. Enable custom lowering for v8i32/v4i64 and generalize the 128-bit lowering code for any vector size - this also lets us use the slightly cheaper codegen for icmp_ugt instead of umin/umax.	2020-11-20 18:16:44 +00:00
Craig Topper	a7eae62a42	[SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version. The default version only works if the returned node has a single result. The X86 and PowerPC versions support multiple results and allow a single result to be returned from a node with multiple outputs. And allow a single result that is not result 0 of the node. Also replace the Mips version since the new version should work for it. The original version handled multiple results, but only if the new node and original node had the same number of results. Differential Revision: https://reviews.llvm.org/D91846	2020-11-20 10:06:53 -08:00
Simon Pilgrim	09a081f221	[X86][SSE] LowerADDSAT_SUBSAT - avoid X86ISD::BLENDV in UADDSAT/USUBSAT custom lowering Use the OR(CMP,ADD) / AND(CMP,SUB) patterns like we do on pre-SSE4 targets. We're still using X86ISD::BLENDV on some AVX targets as we don't do custom lowering for >= 256-bit vectors. Really this (and combineVSelectWithAllOnesOrZeros) needs moving to DAGCombiner, but pre-SSE42 we see the vXi64 comparison type as a 2 x 32-bits result so we can't just rely on ComputeNumSignBits to give us the 'all bits' result we need.	2020-11-20 16:53:01 +00:00
Simon Pilgrim	14ae02fb33	[X86][AVX] Only share broadcasts of different widths from the same SDValue of the same SDNode (PR48215) D57663 allowed us to reuse broadcasts of the same scalar value by extracting low subvectors from the widest type. Unfortunately we weren't ensuring the broadcasts were from the same SDValue, just the same SDNode - which failed on multiple-value nodes like ISD::SDIVREM FYI: I intend to request this be merged into the 11.x release branch. Differential Revision: https://reviews.llvm.org/D91709	2020-11-19 12:15:18 +00:00
Craig Topper	f0b0bab34d	[X86] Use GF2P8AFFINEQB to implement vector bitreverse. We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size. We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW. I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D91515	2020-11-17 23:49:06 -08:00
Craig Topper	57c0c4a275	[X86] Fix crash with i64 bitreverse on 32-bit targets with XOP. We unconditionally marked i64 as Custom, but did not install a handler in ReplaceNodeResults when i64 isn't legal type. This leads to ReplaceNodeResults asserting. We have two options to fix this. Only mark i64 as Custom on 64-bit targets and let it expand to two i32 bitreverses which each need a VPPERM. Or the other option is to add the Custom handling to ReplaceNodeResults. This is what I went with.	2020-11-15 19:02:34 -08:00
Craig Topper	114f044640	[X86] Use EVT::getIntegerVT instead of MVT::getIntegerVT where the type can be i2 or i4. This was a mistake introduced in D91294. I'm not sure how to exercise this with the existing code, but I hit it while trying some follow up experiments.	2020-11-12 21:48:45 -08:00
Craig Topper	a4124e455e	[X86] When storing v1i1/v2i1/v4i1 to memory, make sure we store zeros in the rest of the byte We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0. I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue. Should fix PR48147. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D91294	2020-11-12 21:28:18 -08:00
Simon Pilgrim	1a62ca65c1	[KnownBits] Add KnownBits::commonBits helper. NFCI. We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........	2020-11-11 12:15:54 +00:00
Kerry McLaughlin	ffbbfc76ca	[SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter(). Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print the details of masked scatters (is truncating, signed or scaled). This is the first in a series of patches which adds support for scalable masked scatters Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D90939	2020-11-11 10:58:24 +00:00
Gaurav Jain	3726b14428	[NFC] Use [MC]Register for x86 target Differential Revision: https://reviews.llvm.org/D91161	2020-11-10 15:49:39 -08:00
Craig Topper	f40925aa8b	[X86] Improve lowering of fptoui Invert the select condition when masking in the sign bit of a fptoui operation. Also, rather than lowering the sign mask to select/xor and expecting the select to get cleaned up later, directly lower to shift/xor. Patch by Layton Kifer! Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D90658	2020-11-07 23:50:03 -08:00
Simon Pilgrim	9e406ee808	[X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI. Fixes a number of cppcheck remarks.	2020-10-31 12:16:49 +00:00
serge-sans-paille	0f60bcc36c	[stack-clash] Fix probing of dynamic alloca - Perform the probing in the correct direction. Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924 - The first touch on a dynamic alloca cannot use a mov because it clobbers existing space. Use a xor 0 instead Differential Revision: https://reviews.llvm.org/D90216	2020-10-30 15:34:00 +01:00
Benjamin Kramer	35f7cbf9df	[X86] Don't crash on CVTPS2PH with wide vector inputs.	2020-10-27 14:42:02 +01:00
Craig Topper	63ba82ed00	[X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS.	2020-10-25 12:52:56 -07:00
Craig Topper	2ed16aa66f	[X86] Use TargetConstant instead of Constant for operands to X86vaarg64.	2020-10-25 12:24:59 -07:00
Craig Topper	a222d832d5	[X86] Use TargetConstant for FPDiff with X86::TC_RETURN. It's required to be a constant and can never be in a register so make it explicit.	2020-10-25 00:29:11 -07:00
Simon Pilgrim	ce356e1546	[DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method. I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns. Differential Revision: https://reviews.llvm.org/D87930	2020-10-24 12:23:09 +01:00
Simon Pilgrim	936ef89ebe	[X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI.	2020-10-23 12:35:27 +01:00
Simon Pilgrim	794dc7ad26	[CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger(). Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger. All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.	2020-10-22 14:27:42 +01:00
Tianqing Wang	be39a6fe6f	[X86] Add User Interrupts(UINTR) instructions For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D89301	2020-10-22 17:33:07 +08:00
Xiang1 Zhang	7c3fea7721	[X86] Support customizing stack protector guard Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D88631	2020-10-22 10:08:14 +08:00
Craig Topper	9e884169a2	[FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes. There are still problems with unsigned i32 conversions too which I'll try to fix in another patch. Fixes part of PR47393 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87115	2020-10-21 18:12:54 -07:00
Gaurav Jain	4634ad6c0b	[NFC] Set return type of getStackPointerRegisterToSaveRestore to Register Differential Revision: https://reviews.llvm.org/D89858	2020-10-21 16:19:38 -07:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
David Sherwood	47f2dc7e5f	[SVE][NFC] Replace some TypeSize comparisons in non-AArch64 Targets In most of lib/Target we know that we are not dealing with scalable types so it's perfectly fine to replace TypeSize comparison operators with their fixed width equivalents, making use of getFixedSize() and so on. Differential Revision: https://reviews.llvm.org/D89101	2020-10-15 09:01:21 +01:00
Craig Topper	1687a8d83b	[X86][SelectionDAG] Add SADDO_CARRY and SSUBO_CARRY to support multipart signed add/sub overflow legalization. This passes existing X86 test but I'm not sure if it handles all type legalization cases it needs to. Alternative to D89200 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D89222	2020-10-12 23:18:29 -07:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
Craig Topper	9895327914	[X86] Redefine X86ISD::PEXTRB/W and X86ISD::PINSRB/PINSRW to use a i8 TargetConstant for the immediate instead of a ptr constant. This is more consistent with other target specific ISD opcodes that require immediates.	2020-10-10 21:50:58 -07:00
Craig Topper	375849518d	[X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant. The bextri intrinsic has a ImmArg attribute which will be converted in SelectionDAG using TargetConstant. We previously converted this to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits on it. But while trying to decide if D89178 was safe, I realized that this conversion of TargetConstant to Constant would be one case where that would break. So this patch adds a new opcode specifically for the immediate case. And then teaches computeKnownBits and SimplifyDemandedBits to also handle it, but not try to SimplifyDemandedBits on it. To make up for that, I immediately masked the constant to 16 bits when converting from the intrinsic node to the X86ISD node.	2020-10-10 19:18:06 -07:00
Joao Moreira	e0b89df2e0	[X86] Check if call is indirect before emitting NT_CALL The notrack prefix is a relaxation of CET policies which makes it possible to indirectly call targets which do not have an ENDBR instruction in the landing address. To emit a call with this prefix, the special attribute "nocf_check" is used. When used as a function attribute, a CallInst targeting the respective function will return true for the method "doesNoCfCheck()", no matter if it is a direct call (and such should remain like this, as the information that the to-be-called function won't perform control-flow checks is useful in other contexts). Yet, when emitting an X86ISD::NT_CALL, the respective CallInst should be verified for its indirection, allowing that the prefixed calls are only emitted in the right situations. Update the respective testing unit to also verify for direct calls to functions with ''nocf_check'' attributes. The bug can also be reproduced through compiling the following C code using the -fcf-protection=full flag. int __attribute__((nocf_check)) foo(int a) {}; int main() { foo(42); } Differential Revision: https://reviews.llvm.org/D87320	2020-10-09 15:54:23 -07:00
Craig Topper	f34bb06935	[X86] When expanding LCMPXCHG16B_NO_RBX in EmitInstrWithCustomInserter, directly copy address operands instead of going through X86AddressMode. I suspect getAddressFromInstr and addFullAddress are not handling all addresses cases properly based on a report from MaskRay. So just copy the operands directly. This should be more efficient anyway.	2020-10-09 11:55:24 -07:00
Fangrui Song	e36a41b3cf	[X86] Fix some clang-tidy bugprone-argument-comment issues	2020-10-08 15:26:50 -07:00
Craig Topper	68e1a8d207	[X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as the frame pointer. We previously checked for this during type legalization, but that's too early to know for sure if the base pointer is needed. This patch adds a new pseudo instruction to emit from isel that uses a virtual register for the RBX input. Then we use the custom inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base pointer or LCMPXCHG16B_SAVE_RBX if it is. Fixes PR42064. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D88808	2020-10-07 17:00:43 -07:00
Simon Pilgrim	6c7d713cf5	[X86][SSE] combineX86ShuffleChain add 'CanonicalizeShuffleInput' helper. NFCI. As part of PR45974, we're getting closer to not creating 'padded' vectors on-the-fly in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain. At the moment combineX86ShuffleChain just has to bitcast an input to the correct shuffle type, but eventually we'll need to pad them as well. So, move the bitcast into a 'CanonicalizeShuffleInput helper for now, making the diff for future padding support a lot smaller.	2020-10-06 17:47:24 +01:00
Craig Topper	4da4e7cb20	[X86] Remove X86ISD::LCMPXCHG8_SAVE_EBX_DAG and LCMPXCHG8B_SAVE_EBX pseudo instruction This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B. Split from D88808 Differential Revision: https://reviews.llvm.org/D88853	2020-10-05 15:03:07 -07:00
Craig Topper	1127662c6d	[SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS. getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize FP constants to the RHS. When this happens we should create the node with the FMF that was requested. By using FlagInserter when can ensure any calls to getNode/getSetcc during canonicalization will also get the flags. Differential Revision: https://reviews.llvm.org/D88063	2020-10-05 14:55:23 -07:00
Simon Pilgrim	0ac210e580	[X86] isTargetShuffleEquivalent - merge duplicate array accesses. NFCI.	2020-10-05 17:22:14 +01:00
Craig Topper	4b38ceb0eb	[X86] Remove MWAITX_SAVE_EBX pseudo instruction. Always save/restore the full %rbx register even in gnux32. ebx/rbx only needs to be saved when 64-bit registers are supported anyway. It should be fine to save/restore the whole rbx register even in gnux32 where the base is technically just ebx. This matches what we do for cmpxchg16b where rbx is saved/restored regardless of gnux32.	2020-10-04 16:28:15 -07:00
Simon Pilgrim	e4e5c42896	[X86][SSE] isTargetShuffleEquivalent - ensure shuffle inputs are the correct size. Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain. This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.	2020-10-04 15:32:05 +01:00
Craig Topper	a7e45ea30d	[X86] Add memory operand to AESENC/AESDEC Key Locker instructions. This removes FIXMEs from selectAddr.	2020-10-03 21:42:16 -07:00
Craig Topper	39fc4a0b0a	[X86] Move ENCODEKEY128/256 handling from lowering to selection. We should avoid emitting MachineSDNodes from lowering. We can use the the implicit def handling in InstrEmitter to avoid manually copying from each xmm result register. We only need to manually emit the copies for the implicit uses.	2020-10-03 18:44:53 -07:00
Craig Topper	7f3da48885	[X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel.	2020-10-03 18:44:53 -07:00
Craig Topper	adccc0bfa3	[X86] Add X86ISD opcodes for the Key Locker AESENCKL and AESDECKL instructions Instead of emitting MachineSDNodes during lowering, emit X86ISD opcodes. These opcodes will either be selected by tablegen patterns or custom selection code. Emitting MachineSDNodes during lowering is uncommon so this makes things more consistent. It also allows selectAddr to be called to perform address matching during instruction selection. I had trouble getting tablegen to accept XMM0-XMM7 as results in an isel pattern for the WIDE instructions so I had to use custom instruction selection.	2020-10-03 16:55:19 -07:00
serge-sans-paille	9573c9f2a3	Fix limit behavior of dynamic alloca When the allocation size is 0, we shouldn't probe. Within [1, PAGE_SIZE], we should probe once etc. This fixes https://bugs.llvm.org/show_bug.cgi?id=47657 Differential Revision: https://reviews.llvm.org/D88548	2020-10-02 11:10:02 +02:00
Craig Topper	d1d7fc9832	[X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare. This will be further canonicalized to a compare involving 0 which will enable the use of test instructions. Either using cmovg for signed for cmovne for unsigned. Fixes more case for PR47049	2020-09-30 13:50:52 -07:00
Xiang1 Zhang	413577a879	[X86] Support Intel Key Locker Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access to the raw key value by converting AES keys into “handles”. These handles can be used to perform the same encryption and decryption operations as the original AES keys, but they only work on the current system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot), then any previous handles can no longer be used. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D88398	2020-09-30 18:08:45 +08:00
Craig Topper	618a890b72	[X86] Increase the depth threshold required to form VPERMI2W/VPERMI2B in shuffle combining These instructions are implemented with two port 5 uops and one port 015 uop so they are more complicated that most shuffles. This patch increases the depth threshold for when we form them during shuffle combining to try to limit increasing the number of uops especially on port 5. Differential Revision: https://reviews.llvm.org/D88503	2020-09-29 18:37:23 -07:00
Craig Topper	82da0cabb9	[X86] Add computeKnownBits support for PEXT. The number of zeros in the mask provides a lower bound on the number of leading zeros in the result.	2020-09-28 22:54:07 -07:00
Craig Topper	e53196b1e8	[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask. We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits -If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it. -The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted. -The result will have zeros in any position that the mask is zero. -Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input. Differential Revision: https://reviews.llvm.org/D87883	2020-09-28 14:21:30 -07:00
Simon Pilgrim	e0820d87e3	[X86] Flip isShuffleEquivalent argument order to match isTargetShuffleEquivalent A while ago, we converted isShuffleEquivalent/isTargetShuffleEquivalent to both use IsElementEquivalent internally. This allows us to make the shuffle args optional like isTargetShuffleEquivalent and update foldShuffleOfHorizOp to use isShuffleEquivalent (which it should as its using a ISD::VECTOR_SHUFFLE mask).	2020-09-28 12:53:56 +01:00
Simon Pilgrim	6b5198f06b	[X86] Simplify broadcast mask detection with isUndefOrEqual helper. Add an additional isUndefOrEqual variant that matches an entire mask, not just a single value.	2020-09-28 12:53:56 +01:00
Simon Pilgrim	283036394e	[X86][SSE] combineVectorTruncation - enable (pre-SSSE3) vXi16->vXi8 truncation. Shuffle combining can now handle this output, and by performing this early in combineVectorTruncation we avoid a scalarization that caused a regression on D87502.	2020-09-24 15:51:36 +01:00
Craig Topper	f21f835ee8	[X86] Improve demanded bits for X86ISD::BEXTR. If the control is constant we can figure out exactly which bits of the input are demanded. Differential Revision: https://reviews.llvm.org/D88072	2020-09-23 10:51:02 -07:00
Craig Topper	a74b1faba2	[X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization. The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant. Differential Revision: https://reviews.llvm.org/D87863	2020-09-20 13:54:20 -07:00
Craig Topper	4e8c028158	[X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores. Differential Revision: https://reviews.llvm.org/D87862	2020-09-20 13:46:59 -07:00
Simon Pilgrim	0bfeede669	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X)	2020-09-20 18:39:12 +01:00
Simon Pilgrim	bb0078e591	[X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 18:39:12 +01:00
Simon Pilgrim	15c8306056	[X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X) It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp	2020-09-20 16:33:02 +01:00
Simon Pilgrim	a0c8793ce6	[X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets. Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.	2020-09-20 16:05:10 +01:00
Simon Pilgrim	2b634a9d0e	[X86] Rename getExtendInVec to getEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering/combining.	2020-09-20 15:19:39 +01:00
Simon Pilgrim	91720ee561	[X86] combineX86ShufflesRecursively - fix use after move warning. NFCI. After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.	2020-09-20 14:06:50 +01:00
Simon Pilgrim	e17686ae60	[X86] Rename combineExtInVec to combineEXTEND_VECTOR_INREG. NFCI. Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering.	2020-09-20 12:16:00 +01:00

... 2 3 4 5 6 ...

7719 Commits