llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	8cefc37be5	[DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815	2019-12-23 10:11:45 -05:00
Carl Ritson	2791667d2e	[DAGCombiner] Check term use before applying aggressive FSUB optimisations Summary: Without this check unnecessary FMA instructions are generated when the FSUB terms are reused. This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation. Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel Reviewed By: arsenm Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71656	2019-12-23 09:37:58 +09:00
Amaury Séchet	ff6567cc77	[DAGCombiner] Add node back in the worklist in topological order in CommitTargetLoweringOpt Summary: Right now, DAGCombiner process the nodes in an iplementation defined order. This tends to be fragile as optimisation may or may not kick in depending on the traversal order. This is part of a larger effort to get the DAGCombiner to process its node in topological order. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70921	2019-12-17 18:26:16 +01:00
Alex Richardson	11448eeb72	[NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD) Summary: To find potential opportunities to use getMemBasePlusOffset() I looked at all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert the files in the individual backends too. The motivation for this change is our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). We use a separate register type to store pointers (128-bit capabilities, which are effectively unforgeable and monotonic fat pointers). These capabilities permit a reduced set of operations and therefore use a separate ValueType (iFATPTR). to represent pointers implemented as capabilities. Therefore, we need to avoid using ISD::ADD for our patterns that operate on pointers and need to use a function that chooses ISD::ADD or a new ISD::PTRADD opcode depending on the value type. We originally added a new DAG.getPointerAdd() function, but after this patch series we can modify the implementation of getMemBasePlusOffset() instead. Avoiding direct uses of ISD::ADD for pointer types will significantly reduce the amount of assertion/instruction selection failures for us in future upstream merges. Reviewers: spatel Reviewed By: spatel Subscribers: merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71207	2019-12-13 21:40:03 +00:00
Sanjay Patel	2f0c7fd2db	[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try) The initial attempt (rG89633320) botched the logic by reversing the source/dest types. Added x86 tests for additional coverage. The vector tests show a potential improvement (fold vector load instead of broadcasting), but that's a known/existing problem. This fold is done in IR by instcombine, and we have a special form of it already here in DAGCombiner, but we want the more general transform too: https://rise4fun.com/Alive/3jZm Name: general Pre: (C1 + zext(C2) < 64) %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %a = and i64 %s2, zext((1 << (16 - C2)) - 1) %r = trunc %a to i16 Name: special Pre: C1 == 48 %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %r = trunc %s2 to i16 ...because D58017 exposes a regression without this fold.	2019-12-13 14:03:54 -05:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sanjay Patel	9432937190	Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc" This reverts commit `8963332c33`. There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.	2019-12-12 16:24:40 -05:00
Sanjay Patel	8963332c33	[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc This fold is done in IR by instcombine, and we have a special form of it already here in DAGCombiner, but we want the more general transform too: https://rise4fun.com/Alive/3jZm Name: general Pre: (C1 + zext(C2) < 64) %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %a = and i64 %s2, zext((1 << (16 - C2)) - 1) %r = trunc %a to i16 Name: special Pre: C1 == 48 %s = lshr i64 %x, C1 %t = trunc i64 %s to i16 %r = lshr i16 %t, C2 => %s2 = lshr i64 %x, C1 + zext(C2) %r = trunc %s2 to i16 ...because D58017 exposes a regression without this fold.	2019-12-12 15:44:13 -05:00
Sanjay Patel	b39009bf1d	[DAGCombiner] improve readability This is not quite NFC because I changed the SDLoc to use the more standard 'N' (the starting node for the fold). This transform is a special-case of a more general fold that we do in IR, but it seems like the general fold is needed here too to avoid a potential regression seen in D58017. https://rise4fun.com/Alive/3jZm	2019-12-12 13:16:50 -05:00
Amaury Séchet	c594d14d40	[DAGCombine] Factor oplist operations. NFC	2019-12-02 19:12:03 +01:00
Amaury Séchet	d8d5106225	[SelectionDAG] Reduce assumptions made about levels. NFC	2019-12-02 17:43:13 +01:00
Amaury Séchet	ca818f4550	[DAGCombiner] Peek through vector concats when trying to combine shuffles. Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68195	2019-11-28 23:57:29 +01:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
Sanjay Patel	214683f3b2	[DAGCombiner] avoid crash on out-of-bounds insert index (PR44139) We already have this simplification at node-creation-time, but the test from: https://bugs.llvm.org/show_bug.cgi?id=44139 ...shows that we can combine our way to an assert/crash too.	2019-11-25 16:24:06 -05:00
Clement Courbet	cb15ba84fe	Reland "[DAGCombiner] Allow zextended load combines." Check that the generated type is simple.	2019-11-22 14:47:18 +01:00
Clement Courbet	88e205525c	Revert "[DAGCombiner] Allow zextended load combines." Breaks some bots.	2019-11-22 09:01:08 +01:00
Clement Courbet	036790f988	[DAGCombiner] Allow zextended load combines. Summary: or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base) Reviewers: apilipenko, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70487	2019-11-22 08:40:19 +01:00
Hiroshi Yamauchi	52e377497d	[PGO][PGSO] DAG.shouldOptForSize part. Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095	2019-11-21 14:16:00 -08:00
Clement Courbet	252567377c	[DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors. In preparation for D70487.	2019-11-21 08:53:37 +01:00
David Zarzycki	257acbf6ae	[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY Summary: Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR. Work remaining: match ADD, etc. Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast Reviewed By: lebedev.ri Subscribers: chfast, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70079	2019-11-20 16:25:42 +02:00
Matt Arsenault	7fe9435dc8	Work on cleaning up denormal mode handling Cleanup handling of the denormal-fp-math attribute. Consolidate places checking the allowed names in one place. This is in preparation for introducing FP type specific variants of the denormal-fp-mode attribute. AMDGPU will switch to using this in place of the current hacky use of subtarget features for the denormal mode. Introduce a new header for dealing with FP modes. The constrained intrinsic classes define related enums that should also be moved into this header for uses in other contexts. The verifier could use a check to make sure the denorm-fp-mode attribute is sane, but there currently isn't one. Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by default in the one current user. Clang must be taught to start emitting this attribute by default to avoid regressions when this is switched to assume ieee behavior if the attribute isn't present.	2019-11-19 22:01:14 +05:30
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Paweł Bylica	1c247dd028	[DAGCombiner] Drop redundant DAG method param. NFC	2019-11-14 14:02:53 +01:00
Paweł Bylica	9b89bda517	[DAGCombiner] Use TLI field already available. NFC	2019-11-14 14:02:52 +01:00
joanlluch	d384ad6b63	[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4) Summary: Replaces ``` unsigned getShiftAmountThreshold(EVT VT) ``` by ``` bool shouldAvoidTransformToShift(EVT VT, unsigned amount) ``` thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not. Updates the MSP430 target with a custom implementation. This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this. Existing tests apply, a few more have been added. Reviewers: asl, spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70042	2019-11-13 09:23:08 +01:00
Philip Reames	db036ee0a4	[X86/Atomics] Correct a few transforms for new atomic lowering This is a partial fix for the issues described in commit message of `027aa27` (the revert of G24609). Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue. These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.	2019-11-05 13:20:08 -08:00
Thomas Preud'homme	646896a442	Fix PR40644: miscompile indexed FP constant store Summary: Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both replace store of float by a store of an integer unconditionally. However this generates wrong code when the store that is replaced is an indexed or truncating store. This commit solves this issue by adding an early return in these functions when the store being considered is not a normal store. Bug was only observed on out of tree targets, hence the lack of testcase in this commit. Reviewers: efriedma Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68420	2019-11-05 11:07:52 +00:00
Sanjay Patel	113181e9bd	[DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2) Continuation of: D69116 Contributes to a fix for PR43559: https://bugs.llvm.org/show_bug.cgi?id=43559 See also D69099 and D69116 Use the TLI hook in DAGCombine.cpp to guard against creating shift nodes that are not optimal for a target. Patch by: @joanlluch (Joan LLuch) Differential Revision: https://reviews.llvm.org/D69120	2019-11-04 13:41:41 -05:00
Matt Arsenault	6221767055	DAG: Add DAG argument to isFPExtFoldable For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.	2019-10-31 22:32:45 -07:00
Matt Arsenault	1725f28841	DAG: Add new control for ISD::FMAD formation For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.	2019-10-31 07:51:38 -07:00
Sanjay Patel	1ebd4a2e3a	[DAGCombiner] widen any_ext of popcount based on target support This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53) to handle the looser "any_extend" cast in addition to zext. This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688	2019-10-28 10:07:12 -04:00
Kerry McLaughlin	da720a38b9	[AArch64][SVE] Implement masked load intrinsics Summary: Adds support for codegen of masked loads, with non-extending, zero-extending and sign-extending variants. Reviewers: huntergr, rovka, greened, dmgreen Reviewed By: dmgreen Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68877	2019-10-28 10:06:14 +00:00
Sanjay Patel	85a2146c15	[SDAG] fold insert_vector_elt with undef index Similar to: rG4c47617627fb This makes the DAG behavior consistent with IR's insertelement. https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for AArch64 and WebAssembly by replacing undef index operands with something else.	2019-10-27 15:28:43 -04:00
Sanjay Patel	e6c145e054	[DAGCombiner] widen zext of popcount based on target support zext (ctpop X) --> ctpop (zext X) This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688: https://bugs.llvm.org/show_bug.cgi?id=43688 I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that. The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this: t5: i8 = ctpop t4 t6: i32 = zero_extend t5 <-- created based on IR, but unused node? t7: i64 = zero_extend t5 Differential Revision: https://reviews.llvm.org/D69127	2019-10-25 14:10:51 -04:00
Simon Pilgrim	a18818207a	Fix cppcheck shadow variable warning. NFCI.	2019-10-24 22:14:36 +01:00
Graham Hunter	84da2596f9	[AArch64][SVE] Add SPLAT_VECTOR ISD Node Adds a new ISD node to replicate a scalar value across all elements of a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot be used. Fixes up default type legalization for scalable vectors after the new MVT type ranges were introduced. At present I only use this node for scalable vectors. A DAGCombine has been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all elements are the same, but only if the default operation action of Expand has been overridden by the target. I've only added result promotion legalization for scalable vector i8/i16/i32/i64 types in AArch64 for now. Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy Reviewed By: jmolloy Differential Revision: https://reviews.llvm.org/D47775 llvm-svn: 375222	2019-10-18 11:48:35 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
David Zarzycki	59390efef2	[X86] Make memcmp() use PTEST if possible and also enable AVX1 llvm-svn: 374922	2019-10-15 17:40:12 +00:00
Sanjay Patel	d545c9056e	[DAGCombiner] fold select-of-constants based on sign-bit test Examples: i32 X > -1 ? C1 : -1 --> (X >>s 31) \| C1 i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1 This is a small generalization of a fold requested in PR43650: https://bugs.llvm.org/show_bug.cgi?id=43650 The sign-bit of the condition operand can be used as a mask for the true operand: https://rise4fun.com/Alive/paT Note that we already handle some of the patterns (isNegative + scalar) because there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd(). It doesn't use any TLI hooks, so I can't easily rip out that code even though we're duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(), so it should not cause problems for targets that prefer select over shift. Also worth noting: I thought we could generalize this further to include the case where the true operand of the select is not constant, but Alive says that may allow poison to pass through where it does not in the original select form of the code. Differential Revision: https://reviews.llvm.org/D68949 llvm-svn: 374902	2019-10-15 15:23:57 +00:00
Sanjay Patel	3b581ac80f	[DAGCombiner] fold vselect-of-constants to shift The diffs suggest that we are missing some more basic analysis/transforms, but this keeps the vector path in sync with the scalar (rL374397). This is again a preliminary step for introducing the reverse transform in IR as proposed in D63382. llvm-svn: 374555	2019-10-11 14:17:56 +00:00
Sanjay Patel	7b904ce724	[DAGCombiner] fold select-of-constants to shift This reverses the scalar canonicalization proposed in D63382. Pre: isPowerOf2(C1) %r = select i1 %cond, i32 C1, i32 0 => %z = zext i1 %cond to i32 %r = shl i32 %z, log2(C1) https://rise4fun.com/Alive/Z50 x86 already tries to fold this pattern, but it isn't done uniformly, so we still see a diff. AArch64 probably should enable the TLI hook to benefit too, but that's a follow-on. llvm-svn: 374397	2019-10-10 17:52:02 +00:00
Sanjay Patel	7f0e7c0b1c	[DAGCombiner] reduce code duplication; NFC llvm-svn: 374370	2019-10-10 15:38:29 +00:00
Amaury Sechet	aaf0507896	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 374340	2019-10-10 13:20:10 +00:00
Philip Reames	931120846e	Conservatively add volatility and atomic checks in a few places As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86. As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them. This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions. Differential Revision: https://reviews.llvm.org/D68419 llvm-svn: 374261	2019-10-09 23:43:33 +00:00
Simon Pilgrim	b4ba3cbda0	[X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217) If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element. This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted. Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper. Fixes PR43217 Differential Revision: https://reviews.llvm.org/D68544 llvm-svn: 373871	2019-10-06 21:11:45 +00:00
Sanjay Patel	f643fabb52	Revert [DAGCombine] Match more patterns for half word bswap This reverts r373850 (git commit `25ba49824d`) This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680 llvm-svn: 373853	2019-10-06 15:27:34 +00:00
Amaury Sechet	25ba49824d	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 373850	2019-10-06 14:14:55 +00:00
Sanjay Patel	288079aafd	[DAGCombiner] add operation legality checks before creating shift ops (PR43542) As discussed on llvm-dev and: https://bugs.llvm.org/show_bug.cgi?id=43542 ...we have transforms that assume shift operations are legal and transforms to use them are profitable, but that may not hold for simple targets. In this case, the MSP430 target custom lowers shifts by repeating (many) simpler/fixed ops. That can be avoided by keeping this code as setcc/select. Differential Revision: https://reviews.llvm.org/D68397 llvm-svn: 373666	2019-10-03 21:34:04 +00:00
Simon Pilgrim	3c912c4abe	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 373343	2019-10-01 15:32:04 +00:00
Amaury Sechet	e6f98c0073	[DAGCombiner] Clang format MatchRotate. NFC llvm-svn: 373269	2019-09-30 21:41:52 +00:00
Amaury Sechet	496c0564f1	[DAGCombiner] Update MatchRotate so that it returns an SDValue. NFC llvm-svn: 373260	2019-09-30 20:47:23 +00:00
Thomas Raoux	3c8c667235	[TargetLowering] Make allowsMemoryAccess methode virtual. Rename old function to explicitly show that it cares only about alignment. The new allowsMemoryAccess call the function related to alignment by default and can be overridden by target to inform whether the memory access is legal or not. Differential Revision: https://reviews.llvm.org/D67121 llvm-svn: 372935	2019-09-26 00:16:01 +00:00
Sanjay Patel	831a7e7068	[DAGCombiner] add one-use restriction to vector transform with cheap extract We might be able to do better on the example in the test, but in general, we should not scalarize a splatted vector binop if there are other uses of the binop. Otherwise, we can end up with code as we had - a scalar op that is redundant with a vector op. llvm-svn: 372886	2019-09-25 15:08:33 +00:00
Ilya Biryukov	60e5e0b667	Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756	2019-09-24 13:48:02 +00:00
Simon Pilgrim	af6043557d	[DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863) This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation. The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants). Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.). I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches. Differential Revision: https://reviews.llvm.org/D67557 llvm-svn: 372333	2019-09-19 15:02:47 +00:00
Amaury Sechet	9e94ef42ba	[DAGCombiner] Add node to the worklist in topological order in scalarizeExtractedVectorLoad Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66661 llvm-svn: 372327	2019-09-19 14:22:11 +00:00
Simon Pilgrim	c65dd89804	[DAG] Add SelectionDAG::MaxRecursionDepth constant As commented on D67557 we have a lot of uses of depth checks all using magic numbers. This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly. Differential Revision: https://reviews.llvm.org/D67711 llvm-svn: 372315	2019-09-19 12:58:43 +00:00
Roman Lebedev	c00f318224	[DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold Summary: `DAGCombiner::visitADDLikeCommutative()` already has a sibling fold: `(add X, Carry) -> (addcarry X, 0, Carry)` This fold, as suggested by @efriedma, helps recover from //some// of the regressions of D62266 Reviewers: efriedma, deadalnix Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D62392 llvm-svn: 372259	2019-09-18 20:48:27 +00:00
Philip Reames	079e210463	[SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context. Differential Revision: https://reviews.llvm.org/D66318 llvm-svn: 371786	2019-09-12 22:49:17 +00:00
Craig Topper	efe6724b9f	[DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775	2019-09-12 21:30:18 +00:00
Simon Pilgrim	da59a6bf7d	[DAGCombine] visitFDIV - Use isCheaperToUseNegatedFPOps helper for (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y). NFCI. Minor cleanup to use equivalent helper code. llvm-svn: 371724	2019-09-12 11:03:09 +00:00
Qiu Chaofan	b7fb5d0f6f	[DAGCombiner] Improve division estimation of floating points. Current implementation of estimating divisions loses precision since it estimates reciprocal first and does multiplication. This patch is to re-order arithmetic operations in the last iteration in DAGCombiner to improve the accuracy. Reviewed By: Sanjay Patel, Jinsong Ji Differential Revision: https://reviews.llvm.org/D66050 llvm-svn: 371713	2019-09-12 07:51:24 +00:00
Craig Topper	5ebd0a6e88	[SelectionDAG] Remove ISD::FP_ROUND_INREG I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431	2019-09-09 17:54:44 +00:00
Craig Topper	dac34f52d3	[DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry. I modified the ARM test to use two inputs instead of 0 so the test hopefully still tests what was intended. llvm-svn: 371344	2019-09-08 19:24:39 +00:00
Bjorn Pettersson	5e331e4ce8	[Intrinsic] Add the llvm.umul.fix.sat intrinsic Summary: Add an intrinsic that takes 2 unsigned integers with the scale of them provided as the third argument and performs fixed point multiplication on them. The result is saturated and clamped between the largest and smallest representable values of the first 2 operands. This is a part of implementing fixed point arithmetic in clang where some of the more complex operations will be implemented as intrinsics. Patch by: leonardchan, bjope Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel Reviewed By: leonardchan Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57836 llvm-svn: 371308	2019-09-07 12:16:14 +00:00
Sanjay Patel	4e54cf3e0e	[DAGCombiner] try to form test+set out of shift+mask patterns The motivating bugs are: https://bugs.llvm.org/show_bug.cgi?id=41340 https://bugs.llvm.org/show_bug.cgi?id=42697 As discussed there, we could view this as a failure of IR canonicalization, but then we would need to implement a backend fixup with target overrides to get this right in all cases. Instead, we can just view this as a codegen opportunity. It's not even clear for x86 exactly when we should favor test+set; some CPUs have better theoretical throughput for the ALU ops than bt/test. This patch is made more complicated than I expected because there's an early DAGCombine for 'and' that can change types of the intermediate ops via trunc+anyext. Differential Revision: https://reviews.llvm.org/D66687 llvm-svn: 370668	2019-09-02 14:52:09 +00:00
Sanjay Patel	c882208367	[DAGCombiner] improve throughput of shift+logic+shift The motivating case for this is a long way from here: https://bugs.llvm.org/show_bug.cgi?id=43146 ...but I think this is where we have to start. We need to canonicalize/optimize sequences of shift and logic to ease pattern matching for things like bswap and improve perf in general. But without the artificial limit of '!LegalTypes' (early combining), there are a lot of test diffs, and not all are good. In the minimal tests added for this proposal, x86 should have better throughput in all cases. AArch64 is neutral for scalar tests because it can fold shifts into bitwise logic ops. There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Differential Revision: https://reviews.llvm.org/D67021 llvm-svn: 370617	2019-09-01 18:38:15 +00:00
Sanjay Patel	9e57b49392	[DAGCombiner] clean up code in visitShiftByConstant() This is not quite NFC because the SDLoc propagation is changed, but there are no regression test diffs from that. llvm-svn: 370587	2019-08-31 15:08:58 +00:00
Amaury Sechet	82825ab882	[DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate. Summary: The combiner transforms (shl X, 1) into (add X, X). Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66882 llvm-svn: 370578	2019-08-31 11:40:02 +00:00
James Molloy	e62c509cd4	[DAGCombiner] Don't create illegal narrow stores Narrowing stores when the target doesn't support the narrow version forces the target to expand into a load-modify-store sequence, which is highly suboptimal. The information narrowing throws away (legality of the inverse transform) is hard to re-analyze. If the target doesn't support a store of the narrow type, don't narrow even in pre-legalize mode. No test as this is DAGCombiner and depends on target bits. llvm-svn: 370576	2019-08-31 10:46:16 +00:00
Simon Pilgrim	3be7081aa1	[DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI. SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent. llvm-svn: 370498	2019-08-30 18:19:02 +00:00
Simon Pilgrim	ab8cb1a3c5	[DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI. llvm-svn: 370489	2019-08-30 17:21:20 +00:00
Simon Pilgrim	c2fed1dc8a	[DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI. llvm-svn: 370478	2019-08-30 15:17:37 +00:00
Simon Pilgrim	3367669668	[DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts. llvm-svn: 370471	2019-08-30 13:30:37 +00:00
Simon Pilgrim	8e1989e79a	[DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types. This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total. llvm-svn: 370468	2019-08-30 12:22:06 +00:00
Simon Pilgrim	7cbf823f93	[DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros Return a proper zero vector, just in case some elements are undef. Noticed by inspection after dealing with a similar issue in PR43159. llvm-svn: 370460	2019-08-30 10:42:14 +00:00
Simon Pilgrim	ea67741899	[DAGCombine] Fix shadow variable warnings. NFCI. llvm-svn: 370365	2019-08-29 14:34:07 +00:00
Simon Pilgrim	6c2fc64edc	Fix signed/unsigned comparison warning. NFCI. llvm-svn: 370333	2019-08-29 11:18:53 +00:00
Amaury Sechet	8365e42010	[DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y) Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66718 llvm-svn: 370326	2019-08-29 10:35:51 +00:00
Simon Pilgrim	14e07d7f4b	[DAGCombine] Fix cppcheck shadow variable warning. NFCI. We already have an outer Ops variable. llvm-svn: 370197	2019-08-28 12:48:41 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Simon Pilgrim	c5b38e2869	[DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI. These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors. llvm-svn: 370189	2019-08-28 11:50:36 +00:00
Sanjay Patel	b516f1afdd	[DAGCombiner] cancel fnegs from multiplied operands of FMA (-X) * (-Y) + Z --> X * Y + Z This is a missing optimization that shows up as a potential regression in D66050, so we should solve it first. We appear to be partly missing this fold in IR as well. We do handle the simpler case already: (-X) * (-Y) --> X * Y And it might be beneficial to make the constraint less conservative (eg, if both operands are cheap, but not necessarily cheaper), but that causes infinite looping for the existing fmul transform. Differential Revision: https://reviews.llvm.org/D66755 llvm-svn: 370071	2019-08-27 15:17:46 +00:00
Amaury Sechet	f28dee2cff	[DAGCombiner] Add node to the worklist in topological order in parallelizeChainedStores Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66659 llvm-svn: 370056	2019-08-27 13:27:57 +00:00
Amaury Sechet	a1e5ef3fd4	[DAGCombiner] Add node to the worklist in topological order after relegalization. Summary: As per title. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66702 llvm-svn: 370040	2019-08-27 11:06:09 +00:00
Richard Trieu	58e67b8aa3	Revert r369927 - [DAGCombiner] Remove a bunch of redundant AddToWorklist calls. This change causes instrumented builds of Clang to have a fatal error in the backend. https://reviews.llvm.org/D66537 has the details. llvm-svn: 370006	2019-08-27 02:04:11 +00:00
Craig Topper	846429de74	[DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it. Differential Revision: https://reviews.llvm.org/D66680 llvm-svn: 369937	2019-08-26 17:59:11 +00:00
Amaury Sechet	b7075e40f3	[DAGCombiner] Remove a bunch of redundant AddToWorklist calls. Summary: This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for. DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66537 llvm-svn: 369927	2019-08-26 17:02:12 +00:00
Craig Topper	b8b90ac1c5	[X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872	2019-08-25 17:59:49 +00:00
Nikita Popov	aa71c977ba	[SDAG] Fold umul_lohi with 0 or 1 multiplicand These can turn up during multiplication legalization. In principle these should also apply to smul_lohi, but I wasn't able to figure out how to produce those with the necessary operands. Differential Revision: https://reviews.llvm.org/D66380 llvm-svn: 369864	2019-08-25 08:04:22 +00:00
Simon Pilgrim	04906ef1f2	[DAGCombine] GetNegatedExpression - add FMA\FMAD support If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression. Differential Revision: https://reviews.llvm.org/D63141 llvm-svn: 369746	2019-08-23 10:49:46 +00:00
Amaury Sechet	95cf66de7c	[DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66548 llvm-svn: 369662	2019-08-22 15:35:45 +00:00
Amaury Sechet	c0f190a048	[DAGCombiner] Remove mostly redundant calls to AddToWorklist Summary: These calls change the order in which some nodes are processed and so have an effect on codegen. The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66543 llvm-svn: 369561	2019-08-21 18:51:08 +00:00
Amaury Sechet	045f33aec9	[DAGCombiner] Various nits. NFC llvm-svn: 369520	2019-08-21 12:01:37 +00:00
Craig Topper	ba375263e8	[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef) I also had to add a new combine to X86's combineExtractSubvector to prevent a regression. This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation. I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that. Differential Revision: https://reviews.llvm.org/D66456 llvm-svn: 369459	2019-08-20 22:12:50 +00:00
Bjorn Pettersson	9dddd26e31	[DAGCombiner] Add simple folds for SMULFIX/UMULFIX/SMULFIXSAT Summary: Add the following DAGCombiner folds for mulfix being one of SMULFIX/UMULFIX/SMULFIXSAT: (mulfix x, undef, scale) -> 0 (mulfix x, 0, scale) -> 0 Also added canonicalization of constants to RHS. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66052 llvm-svn: 369103	2019-08-16 13:16:48 +00:00
Simon Pilgrim	d4df81f463	Remove SmallBitVector.h include. NFCI. SmallBitVector/BitVector types aren't used at all in the cpp file. llvm-svn: 369008	2019-08-15 14:40:37 +00:00
Simon Pilgrim	ed804dad1e	[DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI. Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types. Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little. llvm-svn: 368998	2019-08-15 13:07:14 +00:00
Sanjay Patel	26b2c11451	[DAGCombiner] exclude x*2.0 from normal negation profitability rules This is the codegen part of fixing: https://bugs.llvm.org/show_bug.cgi?id=32939 Even with the optimal/canonical IR that is ideally created by D65954, we would reverse that transform in DAGCombiner and end up with the same asm on AArch64 or x86. I see 2 options for trying to correct this: 1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch). 2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul() that matches the transform done by DAGCombiner. This seems like the less intrusive patch, but if there's some other reason to prefer 1 option over the other, we can change to the other option. Differential Revision: https://reviews.llvm.org/D66016 llvm-svn: 368490	2019-08-09 21:37:32 +00:00
Sanjay Patel	0b4ae34c2f	[DAGCombiner] remove redundant fold for X*1.0; NFC This is handled at node creation time (similar to X/1.0) after: rL357029 (no fast-math-flags needed) llvm-svn: 368443	2019-08-09 14:30:59 +00:00
Craig Topper	9158e54270	[SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer. We may be able to look to how VSELECT is handled to further improve this, but this appears to be neutral or an improvement on the test cases we have. llvm-svn: 368344	2019-08-08 21:14:08 +00:00
Cullen Rhodes	ced419f4d7	[SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER Summary: Before this patch MGATHER/MSCATTER is capable of representing all common addressing modes, but only when illegal types are used. This patch adds an IndexType property so more representations are available when using legal types only. Original modes: vector of bases base + vector of signed scaled offsets New modes: base + vector of signed unscaled offsets base + vector of unsigned scaled offsets base + vector of unsigned unscaled offsets The current behaviour of addressing modes for gather/scatter remains unchanged. Patch by Paul Walker. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D65636 llvm-svn: 368008	2019-08-06 09:46:13 +00:00
Sanjay Patel	eaf13044bd	[DAGCombiner][x86] prevent infinite loop from truncate/extend transforms The test case is based on the example from the post-commit thread for: https://reviews.llvm.org/rGc9171bd0a955 This replaces the x86-specific simple-type check from: rL367766 with a check in the DAGCombiner. Adding the check isn't strictly necessary after the fix from: rL367768 ...but it seems likely that we're heading for trouble if we are creating weird types in this transform. I combined the earlier legality check into the initial clause to simplify the code. So we should only try the trunc/sext transform at the earliest combine stage, but we limit the transform to simple types anyway because the TLI hook is probably too lax about what it considers a free truncate. llvm-svn: 367834	2019-08-05 11:27:07 +00:00
Craig Topper	2edeb8a11a	[DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization. This is further fix for PR42880. Sanjay already disabled the X86 TLI hook for non-simple types, but we should really call isTypeLegal here if we're after type legalization. llvm-svn: 367768	2019-08-03 23:09:13 +00:00
Sanjay Patel	68264558f9	[DAGCombiner] try to convert opposing shifts to casts This reverses a questionable IR canonicalization when a truncate is free: sra (add (shl X, N1C), AddC), N1C --> sext (add (trunc X to (width - N1C)), AddC') https://rise4fun.com/Alive/slRC More details in PR42644: https://bugs.llvm.org/show_bug.cgi?id=42644 I limited this to pre-legalization for code simplicity because that should be enough to reverse the IR patterns. I don't have any evidence (no regression test diffs) that we need to try this later. Differential Revision: https://reviews.llvm.org/D65607 llvm-svn: 367710	2019-08-02 19:33:46 +00:00
Craig Topper	a9ed5436bd	[X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it. This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that... Differential Revision: https://reviews.llvm.org/D65533 llvm-svn: 367601	2019-08-01 18:49:07 +00:00
Michael Berg	005d705d43	Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486	2019-07-31 21:57:28 +00:00
Wei Mi	f49c107f06	[DAGCombine] Limit the number of times for the same store and root nodes to bail out in store merging dependence check. We run into a case where dependence check in store merging bail out many times for the same store and root nodes in a huge basicblock. That increases compile time by almost 100x. The patch add a map to track how many times the bailing out happen for the same store and root, and if it is over a limit, stop considering the store with the same root as a merging candidate. Differential Revision: https://reviews.llvm.org/D65174 llvm-svn: 367472	2019-07-31 19:59:24 +00:00
Wei Mi	888efda280	[DAGCombiner] Add an option to control whether or not to enable store merging. Add an option to control whether or not to enable store merging in dag combiner so we can workaround some bugs more easily. Differential Revision: https://reviews.llvm.org/D65482 llvm-svn: 367365	2019-07-30 23:14:56 +00:00
Simon Pilgrim	f8a7e9de06	[DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI. This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious. llvm-svn: 367220	2019-07-29 11:34:45 +00:00
Simon Pilgrim	76f2f04d9d	[DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI. If the subvector binop is illegal then early-out and avoid the subvector searches. llvm-svn: 367181	2019-07-27 19:42:58 +00:00
Craig Topper	a658cb0b12	[DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI The function was calling getNode() on an SDValue to return and the caller turned the result back into a SDValue. So just return the original SDValue to avoid this. llvm-svn: 366779	2019-07-23 05:13:39 +00:00
Craig Topper	f5247244f2	[DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI llvm-svn: 366778	2019-07-23 05:13:35 +00:00
Simon Pilgrim	8b525e357f	[DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI. NFC step towards reusing this in other EXTRACT_SUBVECTOR combines. llvm-svn: 366435	2019-07-18 13:45:53 +00:00
Amaury Sechet	f34a69c2e2	[DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry. Summary: As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b. Depends on D57302 Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59208 llvm-svn: 366214	2019-07-16 15:17:00 +00:00
Simon Pilgrim	701e2c0d71	[DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in helper. NFCI. First step towards supporting 'free' subvector extractions other than concat_vectors. llvm-svn: 365896	2019-07-12 13:00:35 +00:00
Simon Pilgrim	d0307f93a7	[DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y). This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well. In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops. The fast-isel-store.ll load folding regression is annoying but I don't think is that critical. Differential Revision: https://reviews.llvm.org/D63653 llvm-svn: 365785	2019-07-11 14:45:03 +00:00
Michael Berg	f4572249d7	Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang. I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it. Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm Reviewed By: arsenm Subscribers: michele.scandale, wdng, javed.absar Differential Revision: https://reviews.llvm.org/D64450 llvm-svn: 365679	2019-07-10 18:23:26 +00:00
Simon Pilgrim	94c84aca5d	[DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI. Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math. llvm-svn: 365621	2019-07-10 12:21:35 +00:00
Simon Pilgrim	bb1167a3a1	Fix const/non-const lambda return type warning. NFCI. llvm-svn: 365613	2019-07-10 10:45:09 +00:00
Craig Topper	84a1f07363	[X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it Basically the problem is that X86 doesn't set the Fast flag from allowsMemoryAccess on certain CPUs due to slow unaligned memory subtarget features. This prevents bitcasts from being folded into loads and stores. But all vector loads and stores of the same width are the same cost on X86. This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it. Differential Revision: https://reviews.llvm.org/D64295 llvm-svn: 365549	2019-07-09 19:55:28 +00:00
Simon Pilgrim	57603cbde8	[DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI. Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math. llvm-svn: 365504	2019-07-09 15:28:57 +00:00
Simon Pilgrim	9c68aa33e3	[DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. NFCI. llvm-svn: 365269	2019-07-06 18:32:15 +00:00
Craig Topper	e9aed963ce	[DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo. Summary: The uaddo won't be removed and the addcarry will still be dependent on the uaddo. So we'll just increase the use count of X and Y and potentially require a COPY. Reviewers: spatel, RKSimon, deadalnix Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64190 llvm-svn: 365149	2019-07-04 18:18:46 +00:00
Amaury Sechet	57dfacb32d	Use getAllOnesConstants instead of -1 in DAGCombiner. NFC llvm-svn: 365054	2019-07-03 16:34:36 +00:00
Amaury Sechet	bddb8c3597	[DAGCombine] More diamong carry pattern optimization. Summary: This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one. Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation. Reviewers: hfinkel, RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57302 llvm-svn: 365051	2019-07-03 16:15:59 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Zi Xuan Wu	7ae536a1ce	[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function For a given floating point load / store pair, if the load value isn't used by any other operations, then consider transforming the pair to integer load / store operations if the target deems the transformation profitable. And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer. I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target. Differential Revision: https://reviews.llvm.org/D60601 llvm-svn: 364883	2019-07-02 02:54:52 +00:00
Simon Pilgrim	a6319e5f83	[DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))) llvm-svn: 364405	2019-06-26 11:17:38 +00:00
QingShan Zhang	e0e7d4c366	Teach the DAGCombine to fold this pattern(c1 and c2 is constant). // fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) // fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2) // fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2) Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e. t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0> t2: i64 = any_extend t1 --> t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0> --> t4: i64 = sign_extend_inreg t3 Differential Revision: https://reviews.llvm.org/D63318 llvm-svn: 364382	2019-06-26 05:12:53 +00:00
Simon Pilgrim	9762b26032	[DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ...... llvm-svn: 364326	2019-06-25 16:00:16 +00:00
Simon Pilgrim	69144a925e	[DAGCombine] visitMUL - allow shift by zero in MulByConstant. This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward. Fixes OSS Fuzz #15429 llvm-svn: 364179	2019-06-24 12:47:17 +00:00
Craig Topper	6ddc7912b0	[SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store. The code divides the alignment by 2 if the original alignment is equal to the original VT size. But this wouldn't be correct if the alignment was larger than the VT size. The memory operand object already takes care of calling MinAlign on the base alignment and the memory pointer offset. So we don't need any special code at all. llvm-svn: 364151	2019-06-23 07:00:46 +00:00
Simon Pilgrim	0da13ed1f6	[DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI. llvm-svn: 364076	2019-06-21 16:44:51 +00:00
Simon Pilgrim	ca9933c22d	[DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code. Move the "extract from insert detection code" into a lambda helper function. llvm-svn: 364059	2019-06-21 14:46:21 +00:00
Simon Pilgrim	801c0f12b0	[DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible. Better handling of out-of-i64-range values due to large integer types or from fuzz tests. llvm-svn: 363955	2019-06-20 17:36:23 +00:00
Jordan Rupprecht	02508decf4	[DAGCombiner][NFC] Remove unused var llvm-svn: 363954	2019-06-20 17:30:01 +00:00
Simon Pilgrim	1d8093249f	[DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363929	2019-06-20 14:42:27 +00:00
Simon Pilgrim	98a0ac5c0f	[DAGCombine] Add TODOs for some combines that should support non-uniform vectors We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc. llvm-svn: 363924	2019-06-20 12:48:49 +00:00
Simon Pilgrim	a487628270	[DAGCombine] Reduce scope of ShAmtVal variable. NFCI. Fixes cppcheck warning. Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here. llvm-svn: 363921	2019-06-20 10:56:37 +00:00
Simon Pilgrim	046d49a8dc	[DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue(). Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something..... llvm-svn: 363887	2019-06-19 22:14:24 +00:00
Simon Pilgrim	9eed5d2f78	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. llvm-svn: 363793	2019-06-19 12:41:37 +00:00
Simon Pilgrim	8c49366c9b	[DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds. Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths. llvm-svn: 363792	2019-06-19 12:25:29 +00:00
Simon Pilgrim	bb6b856183	[DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI. llvm-svn: 363789	2019-06-19 11:31:26 +00:00
Simon Pilgrim	d954a53633	[DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI. We pre-extend, not post. llvm-svn: 363787	2019-06-19 11:17:48 +00:00
Luis Marques	2e46312ffd	[DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544	2019-06-17 10:54:12 +00:00
Michael Berg	ad6bb86b2d	adding more fmf propagation for selects plus updated tests llvm-svn: 363484	2019-06-15 04:53:51 +00:00
Fangrui Song	968b5f84af	Revert "adding more fmf propagation for selects plus tests" This reverts rL363474. -debug-only=isel was added to some tests that don't specify `REQUIRES: asserts`. This causes failures on -DLLVM_ENABLE_ASSERTIONS=off builds. I chose to revert instead of fixing the tests because I'm not sure whether we should add `REQUIRES: asserts` to more tests. llvm-svn: 363482	2019-06-15 03:51:08 +00:00
Michael Berg	69394bedc5	adding more fmf propagation for selects plus tests llvm-svn: 363474	2019-06-14 23:30:52 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Simon Pilgrim	266f43964e	[TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI. As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075. llvm-svn: 363048	2019-06-11 11:00:23 +00:00
Simon Pilgrim	287e78c82b	[DAGCombine] GetNegatedExpression - constant float vector support (PR42105) Add support for negation of constant build vectors. Differential Revision: https://reviews.llvm.org/D62963 llvm-svn: 363040	2019-06-11 09:44:33 +00:00
QingShan Zhang	ab846da7e8	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921	2019-06-10 05:40:21 +00:00
Simon Pilgrim	5f337149fa	Use for-range loop. NFCI. llvm-svn: 362897	2019-06-09 09:07:30 +00:00
Simon Pilgrim	6bae6d5a5d	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880	2019-06-08 17:02:00 +00:00
Simon Pilgrim	f0240ee76d	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI. llvm-svn: 362825	2019-06-07 18:36:43 +00:00
Simon Pilgrim	4c9db2045a	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI. llvm-svn: 362820	2019-06-07 18:07:06 +00:00
Simon Pilgrim	842c7792aa	[DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123) This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores: 1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another. 2 - The merged load\store node must retain the non-temporal flag. Differential Revision: https://reviews.llvm.org/D62910 llvm-svn: 362723	2019-06-06 17:04:13 +00:00
Simon Pilgrim	da993d08c8	[DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI. Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s llvm-svn: 362695	2019-06-06 10:21:18 +00:00
Simon Pilgrim	77d6adc491	Fix shadow local variable warning. NFCI. llvm-svn: 362622	2019-06-05 17:26:29 +00:00
Nemanja Ivanovic	aed7227b71	Revert r362472 as it is breaking PPC build bots The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots. Reverting it to bring the bots back to green. llvm-svn: 362539	2019-06-04 18:48:43 +00:00
Craig Topper	09a4415803	[DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1) This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial. Fixes PR42118 Differential Revision: https://reviews.llvm.org/D62828 llvm-svn: 362533	2019-06-04 17:44:18 +00:00
Sanjay Patel	1e63dd0b44	[SelectionDAG][x86] limit post-legalization store merging by type The proposal in D62498 showed that x86 would benefit from vector store splitting, but that may conflict with the generic DAG combiner's store merging transforms. Add memory type to the existing TLI hook that enables the merging transforms, so we can limit those changes to scalars only for x86. llvm-svn: 362507	2019-06-04 15:15:59 +00:00
Roman Lebedev	3dce0326fe	[DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952) Summary: This might be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet. As far as i can tell, there are no regressions here (ignoring x86-32), all changes are either good or neutral. This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`) `@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/vMd3 Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma Reviewed By: RKSimon Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62774 llvm-svn: 362488	2019-06-04 11:06:21 +00:00
Roman Lebedev	be6ce7b3f2	[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold Summary: All changes except ARM look great. https://rise4fun.com/Alive/R2M The regression `test/CodeGen/ARM/addsubcarry-promotion.ll` is recovered fully by D62392 + D62450. Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma Reviewed By: efriedma Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62266 llvm-svn: 362487	2019-06-04 11:06:08 +00:00
Simon Pilgrim	3018d505a3	[SelectionDAG] Add fpto[us]i(undef) --> undef constant fold Follow up to D62807. Differential Revision: https://reviews.llvm.org/D62811 llvm-svn: 362483	2019-06-04 10:04:55 +00:00
QingShan Zhang	11de0e71b0	[DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D61843 llvm-svn: 362472	2019-06-04 08:53:53 +00:00
Michael Berg	0b7f98da65	Propagate fmf for setcc/select folds Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG. Reviewers: qcolombet, spatel Reviewed By: qcolombet Subscribers: nemanjai, jsji Differential Revision: https://reviews.llvm.org/D62552 llvm-svn: 362439	2019-06-03 19:12:15 +00:00
Simon Pilgrim	cb7e4e8193	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205) We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397	2019-06-03 13:02:07 +00:00
Florian Hahn	e71963c850	Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor. If we hit the limit, we do expand the outstanding tokenfactors. Otherwise, we might drop nodes with users in the unexpanded tokenfactors. This fixes the crashes reported by Jordan Rupprecht. Reviewers: niravd, spatel, craig.topper, rupprecht Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D62633 llvm-svn: 362350	2019-06-03 01:30:19 +00:00
Craig Topper	50b35caf30	[DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask. Similar to what was done for masked load and gather. llvm-svn: 362342	2019-06-02 22:52:38 +00:00
Craig Topper	a7bc31ebc6	[DAGCombiner] Replace masked loads with a zero mask with the passthru value Similar to what was recently done for gathers in r362015. llvm-svn: 362337	2019-06-02 18:58:46 +00:00
Simon Pilgrim	7a869e7036	[DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2) Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize. Differential Revision: https://reviews.llvm.org/D59188 llvm-svn: 362324	2019-06-02 14:42:11 +00:00
Craig Topper	f58ef87bb7	[DAGCombiner] Replace two unchecked dyn_casts with casts. The results of the dyn_casts were immediately dereferenced on the next line so they had better not be null. I don't think there's any way for these dyn_casts to fail, so use a cast of adding null check. llvm-svn: 362315	2019-06-02 03:31:01 +00:00
Roman Lebedev	46511d75b5	[DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts I don't have a test case for these, but there is a test case for D62266 where, even after all the constant-folding patches, we still end up with endless combine loop. Which makes sense, since we don't constant fold for opaque constants. llvm-svn: 362156	2019-05-30 21:10:37 +00:00
Roman Lebedev	a4e3b50e26	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2 Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 362146	2019-05-30 20:37:49 +00:00
Roman Lebedev	57aa36ff91	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 362145	2019-05-30 20:37:39 +00:00
Roman Lebedev	63b4741534	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 362144	2019-05-30 20:37:29 +00:00
Roman Lebedev	05ad5fd213	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 362143	2019-05-30 20:37:18 +00:00
Roman Lebedev	1d9ec7a81b	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs, and then reverted in rL362109 to fix missing constant folds that were causing endless combine loops. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 362142	2019-05-30 20:36:54 +00:00
Roman Lebedev	7eb8b5b5dd	[DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold Summary: https://rise4fun.com/Alive/B0A Reviewers: t.p.northover, RKSimon, spatel, craig.topper Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62691 llvm-svn: 362135	2019-05-30 19:27:51 +00:00
Roman Lebedev	691b5e2ecc	[DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold Summary: https://rise4fun.com/Alive/Mb1M Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62689 llvm-svn: 362134	2019-05-30 19:27:42 +00:00
Roman Lebedev	0a3dbbcdfb	[DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold Summary: Direct sibling of D62662, the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62664 llvm-svn: 362133	2019-05-30 19:27:32 +00:00
Roman Lebedev	9ff3159b4a	[DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62663 llvm-svn: 362132	2019-05-30 19:27:26 +00:00
Roman Lebedev	cc9a9cf237	[DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold Summary: This was the root cause of the endless combine loop in D62257 https://rise4fun.com/Alive/d3W Reviewers: RKSimon, spatel, craig.topper, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62662 llvm-svn: 362131	2019-05-30 19:27:19 +00:00
Roman Lebedev	ef95679741	[DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry. Reviewers: spatel, RKSimon, craig.topper, t.p.northover Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62661 llvm-svn: 362130	2019-05-30 19:27:10 +00:00
Roman Lebedev	019d270e43	[DAGCombine] Revert of recommit of "binop-with-const hoisting" patches I was looking into an endless combine loop the uncommitted follow-up patch was causing, and it appears even these patches can exibit such an endless loop. The root cause is that we try to hoist one binop (add/sub) with constant operand, and if we get two such binops both of which are eligible for this hoisting, we get stuck. Some cases may highlight missing constant-folds. Reverts r361871,r361872,r361873,r361874. llvm-svn: 362109	2019-05-30 16:07:11 +00:00
Benjamin Kramer	107f8d9873	[DAGCombiner] Replace gathers with a zero mask with the passthru value These can be created by the legalizer when splitting a larger gather. See https://llvm.org/PR42055 for a motivating example. Differential Revision: https://reviews.llvm.org/D62613 llvm-svn: 362015	2019-05-29 19:24:19 +00:00
Roman Lebedev	dfc34f0211	[DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2 Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq This is a recommit, originally committed in rL361856, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361874	2019-05-28 20:40:10 +00:00
Roman Lebedev	d485c6bc9f	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2 Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl This is a recommit, originally committed in rL361855, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361873	2019-05-28 20:40:03 +00:00
Roman Lebedev	96c9986199	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2 Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh This is a recommit, originally committed in rL361853, but reverted to investigate test-suite compile-time hangs. Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361872	2019-05-28 20:39:55 +00:00
Roman Lebedev	2feb7e56e2	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2 Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) This is a recommit, originally committed in rL361852, but reverted to investigate test-suite compile-time hangs. Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361871	2019-05-28 20:39:39 +00:00
Roman Lebedev	272d70c366	Revert DAGCombine "hoist binop with const" folds Appear to introduce test-suite compile-time hang. http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825 This reverts r361852,r361853,r361854,r361855,r361856 llvm-svn: 361865	2019-05-28 19:04:21 +00:00
Roman Lebedev	7669665432	[DAGCombine] (x - C) - y -> (x - y) - C fold Summary: Again only vectors affected. Frustrating. Let me take a look into that.. https://rise4fun.com/Alive/AAq Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, JDevlieghere, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62294 llvm-svn: 361856	2019-05-28 17:54:21 +00:00
Roman Lebedev	8c9b3e4e4a	[DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold Summary: This prevents regressions in next patch, and somewhat recovers from the regression to AMDGPU test in D62223. It is indeed not great that we leave vector decrement, don't transform it into vector add all-ones.. https://rise4fun.com/Alive/ZRl Reviewers: RKSimon, craig.topper, spatel, arsenm Reviewed By: RKSimon, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62263 llvm-svn: 361855	2019-05-28 17:54:13 +00:00
Roman Lebedev	6a24c9b9ab	[DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold Summary: Only vector tests are being affected here, since subtraction by scalar constant is rewritten as addition by negated constant. No surprising test changes. https://rise4fun.com/Alive/pbT Reviewers: RKSimon, craig.topper, spatel Reviewed By: RKSimon Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62257 llvm-svn: 361854	2019-05-28 17:54:04 +00:00
Roman Lebedev	1499f65ac1	[DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold Summary: Direct sibling of D62223 patch. While i don't have a direct motivational pattern for this, it would seem to make sense to handle both patterns (or none), for symmetry? The aarch64 changes look neutral; sparc and systemz look like improvement (one less instruction each); x86 changes - 32bit case improves, 64bit case shows that LEA no longer gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea` https://rise4fun.com/Alive/ffh Reviewers: RKSimon, craig.topper, spatel, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62252 llvm-svn: 361853	2019-05-28 17:53:54 +00:00
Roman Lebedev	19f51ec04a	[DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold Summary: The main motivation is shown by all these `neg` instructions that are now created. In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test. AArch64 test changes all look good (`neg` created), or neutral. X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created). I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill is now hoisted into preheader (which should still be good?), 2 4-byte reloads become 1 8-byte reload, and are elsewhere, but i'm not sure how that affects that loop. I'm unable to interpret AMDGPU change, looks neutral-ish? This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 \| PR41952 ]]. https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later) Reviewers: craig.topper, RKSimon, spatel, arsenm Reviewed By: RKSimon Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62223 llvm-svn: 361852	2019-05-28 17:53:43 +00:00
Alexander Timofeev	ba447bae74	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 This commit was reverted because of the build failure. The reason was mlformed patch. Build failure fixed. llvm-svn: 361741	2019-05-26 20:33:26 +00:00
Peter Collingbourne	3b93737446	Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence." Broke sanitizer bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio llvm-svn: 361688	2019-05-25 01:52:38 +00:00
Alexander Timofeev	dffedea014	[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644	2019-05-24 15:32:18 +00:00
Simon Pilgrim	95b8d9bbf8	[SelectionDAG] computeKnownBits - support constant pool values from target This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620	2019-05-24 10:03:11 +00:00
Sanjay Patel	7d6c0bce50	[DAGCombiner] make folds of binops safe for opcodes that produce >1 value This is no-functional-change-intended currently because the definition of isBinOp() only includes opcodes that produce 1 value. But if we share that implementation with isCommutativeBinOp() as proposed in D62191, then we need to make sure that the callers bail out for opcodes that they are not prepared to handle correctly. llvm-svn: 361547	2019-05-23 20:17:25 +00:00
Sanjay Patel	78c3f58122	[DAGCombiner] prevent unsafe reassociation of FP ops There are no FP callers of DAGCombiner::reassociateOps() currently, but we can add a fast-math check to make sure this API is not being misused. This was noted as a potential risk (and that risk might increase) with: D62191 llvm-svn: 361268	2019-05-21 14:47:38 +00:00
Craig Topper	203bfdd0f0	[DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC This changes the isShift variable to include the constant operand check that was previously in the if statement. While there fix an 80 column violation and an unnecessary use of getNode. Also fix variable name capitalization. llvm-svn: 361168	2019-05-20 16:26:55 +00:00
Roman Lebedev	64c756b991	[DAGCombiner] visitShiftByConstant(): drop bogus signbit check Summary: That check claims that the transform is illegal otherwise. That isn't true: 1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter https://rise4fun.com/Alive/K4A 2. For `ISD::AND`, there is no restriction on constants: https://rise4fun.com/Alive/Wy3 3. For `ISD::OR`, there is no restriction on constants: https://rise4fun.com/Alive/GOH 3. For `ISD::XOR`, there is no restriction on constants: https://rise4fun.com/Alive/ml6 So, why is it there then? This changes the testcase that was touched by @spatel in rL347478, but i'm not sure that test tests anything particular? Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin Reviewed By: spatel Subscribers: javed.absar, llvm-commits, spatel Tags: #llvm Differential Revision: https://reviews.llvm.org/D61918 llvm-svn: 361044	2019-05-17 15:52:58 +00:00
Clement Courbet	d9d0665d1c	[[DAGCombiner][NFC] Add a comment. As suggested in D61846. llvm-svn: 360755	2019-05-15 08:21:18 +00:00
Sanjay Patel	99d6420a82	[SDAG] fix unused variable warning and unneeded indirection; NFC llvm-svn: 360640	2019-05-14 00:57:31 +00:00
Sanjay Patel	3a13d970aa	[SDAG, x86] allow targets to override test for binop opcodes This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639	2019-05-14 00:39:40 +00:00
Sanjay Patel	05dafb1c97	[DAGCombiner] narrow vector binop with inserts/extract We catch most of these patterns (on x86 at least) by matching a concat vectors opcode early in combining, but the pattern may emerge later using insert subvector instead. The AVX1 diffs for add/sub overflow show another missed narrowing pattern. That one may be falling though the cracks because of combine ordering and multiple uses. llvm-svn: 360585	2019-05-13 14:31:14 +00:00
Clement Courbet	9afc4764dd	[DAGCombiner] Fix invalid alias analysis. Summary: When we know for sure whether two addresses do or do not alias, we should immediately return from DAGCombiner::isAlias(). I think this comes from a bad copy/paste, Sorry for not catching that during the code review. Fixes PR41855. Reviewers: niravd, gchatelet, EricWF Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61846 llvm-svn: 360566	2019-05-13 09:07:37 +00:00
Sanjay Patel	a09e686821	[DAGCombiner] try to move bitcast after extract_subvector I noticed that we were failing to narrow an x86 ymm math op in a case similar to the 'madd' test diff. That is because a bitcast is sitting between the math and the extract subvector and thwarting our pattern matching for narrowing: t56: v8i32 = add t59, t58 t68: v4i64 = bitcast t56 t73: v2i64 = extract_subvector t68, Constant:i64<2> t96: v4i32 = bitcast t73 There are a few wins and neutral diffs in the other tests. Differential Revision: https://reviews.llvm.org/D61806 llvm-svn: 360541	2019-05-12 14:43:20 +00:00
Jordan Rupprecht	16c7fbd112	Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor This reverts r360171 (git commit `a9d6c32eaf`). A repro showing the asan/msan failures is forthcoming. llvm-svn: 360481	2019-05-10 23:20:02 +00:00
Sanjay Patel	b37ddeafc0	[DAGCombiner] reduce code duplication; NFC llvm-svn: 360462	2019-05-10 20:02:30 +00:00
Cameron McInally	156eb28289	[CodeGen] Add comment about FSUB <-> FNEG xforms Differential Revision: https://reviews.llvm.org/D61741 llvm-svn: 360366	2019-05-09 19:28:52 +00:00
Florian Hahn	be10bc71f9	[DAGCombiner] Limit number of nodes explored as store candidates. To find the candidates to merge stores we iterate over all nodes in a chain for each store, which leads to quadratic compile times for large basic blocks with a large number of stores. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61511 llvm-svn: 360357	2019-05-09 17:05:52 +00:00
QingShan Zhang	e065af6a42	[NFC] Add a static function to do the endian check Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. Differential Revision: https://reviews.llvm.org/D61236 llvm-svn: 360226	2019-05-08 07:21:37 +00:00
Florian Hahn	a9d6c32eaf	[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor When simplifying TokenFactors, we potentially iterate over all operands of a large number of TokenFactors. This causes quadratic compile times in some cases and the large token factors cause additional scalability problems elsewhere. This patch adds some limits to the number of nodes explored for the cases mentioned above. Reviewers: niravd, spatel, craig.topper Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D61397 llvm-svn: 360171	2019-05-07 16:47:27 +00:00
Philip Reames	2f53d79bff	Fix pr33010, a 2 year old crashing regression The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation. Arguably, we shouldn't be using TargetFrameIndices in StatepointLowering at all, but that's a much deeper change. llvm-svn: 360090	2019-05-06 22:09:31 +00:00
Nikita Popov	cfe786a195	[SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635) This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635 by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is legal but former is not) for zero-or-all-ones boolean reductions (which are detected based on sign bits). Differential Revision: https://reviews.llvm.org/D61398 llvm-svn: 360054	2019-05-06 16:17:17 +00:00
Simon Pilgrim	5d3b100750	[DAGCombine] Remove repeated variables. NFCI. llvm-svn: 359915	2019-05-03 18:20:28 +00:00
Sanjay Patel	1972826178	[DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359793	2019-05-02 15:02:08 +00:00
Sanjay Patel	64d5751254	Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate" This reverts commit `fb9a5307a9` (rL359398) because it can cause an infinite loop due to opposing combines. llvm-svn: 359695	2019-05-01 16:06:21 +00:00
Zi Xuan Wu	49d60fdc2e	[DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532	2019-04-30 03:01:14 +00:00
Bjorn Pettersson	820994572c	[DAG] Refactor DAGCombiner::ReassociateOps Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476	2019-04-29 17:50:10 +00:00
Sanjay Patel	fb9a5307a9	[DAGCombiner] try repeated fdiv divisor transform before building estimate This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359398	2019-04-28 12:23:43 +00:00
Simon Pilgrim	ef54b1dddf	[DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI. Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326	2019-04-26 17:49:02 +00:00
Simon Pilgrim	5d6ef94c36	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293	2019-04-26 10:49:13 +00:00
Sanjay Patel	6f41bf948b	[DAGCombiner] scale repeated FP divisor by splat factor If we have a vector FP division with a splatted divisor, use the existing transform that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then potentially be converted into a scalar FP division by existing combines (rL358984) as seen in the tests here. That can be a potentially big perf difference if scalar fdiv has better timing (including avoiding possible frequency throttling for vector ops). Differential Revision: https://reviews.llvm.org/D61028 llvm-svn: 359147	2019-04-24 22:28:58 +00:00
Sanjay Patel	06ff5eae5b	[DAGCombiner] generalize binop-of-splats scalarization If we only match build vectors, we can miss some patterns that use shuffles as seen in the affected tests. Note that the underlying calls within getSplatSourceVector() have the potential for compile-time explosion because of exponential recursion looking through binop opcodes, but currently the list of supported opcodes is very limited. Both of those problems should be addressed in follow-up patches. llvm-svn: 358984	2019-04-23 13:16:41 +00:00
Bjorn Pettersson	f97b29be88	[DAGCombiner] Combine OR as ADD when no common bits are set Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965	2019-04-23 10:01:08 +00:00
Sanjay Patel	9bc6c77220	[DAGCombiner] make variable name less ambiguous; NFC llvm-svn: 358886	2019-04-22 13:42:50 +00:00
Sanjay Patel	d6989daae9	[DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC llvm-svn: 358884	2019-04-22 13:36:07 +00:00
Simon Pilgrim	e7fe6dd5ed	[DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant. llvm-svn: 358585	2019-04-17 15:45:44 +00:00
Simon Pilgrim	e5573f4f4e	[TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359) As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious. shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair llvm-svn: 358526	2019-04-16 20:57:28 +00:00
Luis Marques	eda370d4c8	[DAGCombiner] Add missing flag to addressing mode check The checks in `canFoldInAddressingMode` tested for addressing modes that have a base register but didn't set the `HasBaseReg` flag to true (it's false by default). This patch fixes that. Although the omission of the flag was technically incorrect it had no known observable impact, so no tests were changed by this patch. Differential Revision: https://reviews.llvm.org/D60314 llvm-svn: 358502	2019-04-16 15:09:18 +00:00
Sanjay Patel	5e4ad39af7	[DAGCombiner] narrow shuffle of concatenated vectors // shuffle (concat X, undef), (concat Y, undef), Mask --> // concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1) The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements. The x86 changes look neutral or better. There's one test with an extra instruction, but that could be reversed for a subtarget with the right attributes. But by default, we want to avoid the 256-bit op when possible (in my motivating benchmark, a handful of ymm ops sprinkled into a sequence of xmm ops are triggering frequency throttling on Haswell resulting in significantly worse perf). Differential Revision: https://reviews.llvm.org/D60545 llvm-svn: 358291	2019-04-12 16:31:56 +00:00
Sanjay Patel	fd314eca8f	[DAGCombiner] refactor narrowing of extracted vector binop; NFC There's a TODO comment about handling patterns with insert_subvector, and we do want to match that. llvm-svn: 358187	2019-04-11 15:59:47 +00:00
Sanjay Patel	c0f4a35e68	[DAGCombiner][x86] scalarize inserted vector FP ops // bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) --> // build_vec ...undef, (bo x, y), undef... The lifetime of the nodes in these examples is different for variables versus constants, but they are all build vectors briefly, so I'm proposing to catch them in this form to handle all of the leading examples in the motivating test file. Before we have build vectors, we might have insert_vector_element. After that, we might have scalar_to_vector and constant pool loads. It's going to take more work to ensure that FP vector operands are getting simplified with undef elements, so this transform can apply more widely. In a non-loose FP environment, we are likely simplifying FP elements to NaN values rather than undefs. We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors yet. Differential Revision: https://reviews.llvm.org/D60514 llvm-svn: 358172	2019-04-11 14:21:57 +00:00
Craig Topper	61e77b11d1	[DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate. This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate. Differential Revision: https://reviews.llvm.org/D60020 llvm-svn: 358027	2019-04-09 18:33:56 +00:00
Sanjay Patel	50a8652785	[DAGCombiner][x86] scalarize splatted vector FP ops There are a variety of vector patterns that may be profitably reduced to a scalar op when scalar ops are performed using a subset (typically, the first lane) of the vector register file. For x86, this is true for float/double ops and element 0 because insert/extract is just a sub-register rename. Other targets should likely enable the hook in a similar way. Differential Revision: https://reviews.llvm.org/D60150 llvm-svn: 357760	2019-04-05 13:32:17 +00:00
Evandro Menezes	85bd3978ae	[IR] Refactor attribute methods in Function class (NFC) Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731	2019-04-04 22:40:06 +00:00
Simon Pilgrim	8d248dbd77	[DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI. Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance. llvm-svn: 357602	2019-04-03 16:00:59 +00:00
Sanjay Patel	00dae6b22d	[DAGCombiner] loosen restrictions for moving shuffles after vector binop There are 3 changes to make this correspond to the same transform in instcombine: 1. Remove the legality check - we can't create anything less legal than we started with. 2. Ease the use restriction, so we only bail out if both operands have >1 use. 3. Ease the use restriction for binops with a repeated operand (eg, mul x, x). As discussed in D60150, there's a scalarization opportunity that will be made easier by allowing this transform more generally. llvm-svn: 357580	2019-04-03 13:42:06 +00:00
Simon Pilgrim	02599de2e1	[DAGCombine] Don't use getZExtValue() until we know the constant is in range. Noticed during prep for a patch for PR40758. llvm-svn: 357571	2019-04-03 11:00:55 +00:00
Hans Wennborg	94b867dc7c	Revert r357256 "[DAGCombine] Improve Lifetime node chains." As it caused a pathological compile-time regressionin V8, see PR41352. > Improve both start and end lifetime nodes chain dependencies. > > Reviewers: courbet > > Reviewed By: courbet > > Subscribers: hiraditya, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D59795 This also reverts the follow-up r357309: > [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. > > Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357563	2019-04-03 07:41:58 +00:00
Sanjay Patel	7cb7daabbb	[DAGCombiner] reduce code duplication; NFC llvm-svn: 357498	2019-04-02 17:20:54 +00:00
Nirav Dave	54f7118de5	[DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop. Avoid EXPENSIVE_CHECK failure. NFCI. llvm-svn: 357309	2019-03-29 20:26:23 +00:00
Nirav Dave	7e84cacdbd	[DAG] Avoid redundancy in StoreMerge TokenFactor generation. Avoid generating redundant TokenFactor when all merged stores have the same chain. llvm-svn: 357299	2019-03-29 18:50:22 +00:00
Nirav Dave	fe59e14031	[DAGCombine] Prune unnused nodes. Summary: Nodes that have no uses are eventually pruned when they are selected from the worklist. Record nodes newly added to the worklist or DAG and perform pruning after every combine attempt. Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight Reviewed By: jyknight Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58070 llvm-svn: 357283	2019-03-29 17:35:56 +00:00
Nirav Dave	610036c506	[DAG] Set up infrastructure to avoid smart constructor-based dangling nodes Summary: Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations without fully pruning unused result values. This results in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter for the combiner to make sure such nodes have the chance of being pruned. This allows a number of additional peephole optimizations. Reviewers: efriedma, RKSimon, craig.topper, jyknight Reviewed By: jyknight Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58068 llvm-svn: 357279	2019-03-29 17:26:40 +00:00
Sanjay Patel	12685d0f7c	[DAGCombiner] simplify shuffle of shuffle After investigating the examples from D59777 targeting an SSE4.1 machine, it looks like a very different problem due to how we map illegal types (256-bit in these cases). We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand. We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that generality means it is limited to patterns with a one-use constraint, and the examples here have 2 uses. We don't need any uses or legality limitations for a simplification (no new value is created). It looks like we miss this pattern in IR too. In one of the zext examples here, we have shuffle masks like this: Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7> Shuf = vector_shuffle<4,u,6,7,u,u,u,u> ...so that's moving the high half of the 1st vector into the low half. But the high half of the 1st vector is already identical to the low half. Differential Revision: https://reviews.llvm.org/D59961 llvm-svn: 357258	2019-03-29 14:20:38 +00:00
Nirav Dave	9259de217e	[DAGCombine] Improve Lifetime node chains. Improve both start and end lifetime nodes chain dependencies. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59795 llvm-svn: 357256	2019-03-29 14:09:47 +00:00
Sanjay Patel	665a385035	[DAGCombiner] fold sext into decrement This is a sibling to rL357178 that I noticed we'd hit if we chose an alternate transform in D59818. %z = zext i8 %x to i32 %dec = add i32 %z, -1 %r = sext i32 %dec to i64 => %z2 = zext i8 %x to i64 %r = add i64 %z2, -1 https://rise4fun.com/Alive/kPP The x86 vector diffs show a slight regression, so there's a chance that we should limit this and the previous transform to scalars. But given that we allowed vectors before, I'm matching that behavior here. We should change both transforms together if that's the right thing to do. llvm-svn: 357254	2019-03-29 13:49:08 +00:00
Sanjay Patel	ffa8d3def7	[DAGCombiner] fold sext into negation As noted in D59818: %z = zext i8 %x to i32 %neg = sub i32 0, %z %r = sext i32 %neg to i64 => %z2 = zext i8 %x to i64 %r = sub i64 0, %z2 https://rise4fun.com/Alive/KzSR llvm-svn: 357178	2019-03-28 15:46:02 +00:00
Simon Pilgrim	38a0616c1d	[DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y)) If scalar truncates are free, attempt to pre-truncate build_vectors source operands. Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering. Differential Revision: https://reviews.llvm.org/D59654 llvm-svn: 357161	2019-03-28 11:34:21 +00:00
Nirav Dave	6b741a8038	[DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations. Reviewers: courbet, jyknight Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59897 llvm-svn: 357121	2019-03-27 20:37:08 +00:00
Nirav Dave	c6dfaa0e83	Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes." This patch appears to trigger very large compile time increases in halide builds. llvm-svn: 357116	2019-03-27 19:54:41 +00:00
Nirav Dave	b5630a2ab1	[DAGCombiner] Unify Lifetime and memory Op aliasing. Rework BaseIndexOffset and isAlias to fully work with lifetime nodes and fold in lifetime alias analysis. This is mostly NFC. Reviewers: courbet Reviewed By: courbet Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59794 llvm-svn: 357070	2019-03-27 14:14:46 +00:00
Nirav Dave	96a264e053	[DAGCombine] Refactor GatherAllAliases. NFCI. llvm-svn: 357069	2019-03-27 14:14:35 +00:00
Jonas Paulsson	38342a5185	[DAGCombiner] Don't allow addcarry if the carry producer is illegal. getAsCarry() checks that the input argument is a carry-producing node before allowing a transformation to addcarry. This patch adds a check to make sure that the carry-producing node is legal. If it is not, it may not remain in a form that is manageable by the target backend. The test case caused a compilation failure during instruction selection for this reason on SystemZ. Patch by Ulrich Weigand. Review: Sanjay Patel https://reviews.llvm.org/D59822 llvm-svn: 357052	2019-03-27 08:41:46 +00:00
Nirav Dave	a28c514581	[DAG] Avoid smart constructor-based dangling nodes. Various SelectionDAG non-combine operations (e.g. the getNode smart constructor and legalization) may leave dangling nodes by applying optimizations or not fully pruning unused result values. This can result in nodes that are never added to the worklist and therefore can not be pruned. Add a node inserter as the current node deleter to make sure such nodes have the chance of being pruned. Many minor changes, mostly positive. llvm-svn: 356996	2019-03-26 15:08:14 +00:00
Simon Pilgrim	e24441aab0	[TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT This helps us relax the extension of a lot of scalar elements before they are inserted into a vector. Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions. Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes. Differential Revision: https://reviews.llvm.org/D59484 llvm-svn: 356989	2019-03-26 12:32:01 +00:00
Florian Hahn	71033f2987	[DAGCombiner] Use getTokenFactor in a few more cases. SDNodes can only have 64k operands and for some inputs (e.g. large number of stores), we can reach this limit when creating TokenFactor nodes. This patch is a follow up to D56740 and updates a few more places that potentially can create TokenFactors with too many operands. Reviewers: efriedma, craig.topper, aemerson, RKSimon Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D59156 llvm-svn: 356668	2019-03-21 14:32:09 +00:00
Simon Pilgrim	da4992bf8d	[DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type We were calling FoldSetCC with the compare operand type instead of the result type. Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838) llvm-svn: 356667	2019-03-21 14:07:18 +00:00
Simon Pilgrim	51f65171e9	Remove out of date comment. NFCI. DAGCombiner::convertBuildVecZextToZext just requires the extractions to be sequential, they don't have to start from 0'th index. llvm-svn: 356552	2019-03-20 12:24:15 +00:00
Justin Bogner	b353d6887e	[DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle In r311255 we added a case where we split vectors whose elements are all derived from the same input vector so that we could shuffle it more efficiently. In doing so, createBuildVecShuffle was taught to adjust for the fact that all indices would be based off of the first vector when this happens, but it's possible for the code that checked that to fire incorrectly if we happen to have a BUILD_VECTOR of extracts from subvectors and don't hit this new optimization. Instead of trying to detect if we've split the vector by checking if we have extracts from the same base vector, we can just pass that information into createBuildVecShuffle, avoiding the miscompile. Differential Revision: https://reviews.llvm.org/D59507 llvm-svn: 356476	2019-03-19 16:52:00 +00:00
Simon Pilgrim	a56f2822d0	[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect These changes are related to PR37743 and include: SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner. Add promoting the integer ABS node in the LegalizeIntegerType. Expand-based legalization of integer result for the ABS nodes. Expand-based legalization of ABS vector operations. Add some integer abs testcases for different typesizes for Thumb arch Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to: tmp = (SRA, Hi, 31) Lo = (UADDO tmp, Lo) Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1)) Lo = (XOR tmp, Lo) The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern: (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))). Change integer abs testcases for codegen with the ABS node support for AArch64. Indicate that the ABS is legal for the i64 type when the NEON is supported. Change the integer abs testcases to show changing of codegen. Add combine and legalization of ABS nodes for Thumb arch. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition. For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743 Patch by: @ikulagin (Ivan Kulagin) Differential Revision: https://reviews.llvm.org/D49837 llvm-svn: 356468	2019-03-19 16:24:55 +00:00
Adhemerval Zanella	664c1ef528	[TargetLowering] Add code size information on isFPImmLegal. NFC This allows better code size for aarch64 floating point materialization in a future patch. Reviewers: evandro Differential Revision: https://reviews.llvm.org/D58690 llvm-svn: 356389	2019-03-18 18:40:07 +00:00
Nirav Dave	55c921f4bf	[DAG] Cleanup unused node in SimplifySelectCC. Delete temporarily constructed node uses for analysis after it's use, holding onto original input nodes. Ideally this would be rewritten without making nodes, but this appears relatively complex. Reviewers: spatel, RKSimon, craig.topper Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57921 llvm-svn: 356382	2019-03-18 17:02:38 +00:00
Nikita Popov	9a4453592b	[DAGCombine] Fold (x & ~y) \| y patterns Fold (x & ~y) \| y and it's four commuted variants to x \| y. This pattern can in particular appear when a vselect c, x, -1 is expanded to (x & ~c) \| (-1 & c) and combined to (x & ~c) \| c. This change has some overlap with D59066, which avoids creating a vselect of this form in the first place during uaddsat expansion. Differential Revision: https://reviews.llvm.org/D59174 llvm-svn: 356333	2019-03-17 15:45:38 +00:00
Simon Pilgrim	3b0a6c69ee	[DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097) rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef. llvm-svn: 356327	2019-03-16 17:36:26 +00:00
Nirav Dave	ee5183c796	[DAGCombiner] Fix Comment. NFC. llvm-svn: 356069	2019-03-13 17:44:40 +00:00
Nirav Dave	d6351340bb	[DAGCombiner] If a TokenFactor would be merged into its user, consider the user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068	2019-03-13 17:07:09 +00:00
Clement Courbet	3bb5d0bb9b	Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack." Always check candidates for hasOtherUses(), not only stores. llvm-svn: 356050	2019-03-13 13:56:23 +00:00
Simon Pilgrim	9f0a5ca843	[DAGCombine] Pull out repeated demanded bitmask generation. NFCI. llvm-svn: 355932	2019-03-12 15:58:28 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Amaury Sechet	a135fd5562	Remove redundant extractBooleanFlip argument. NFC llvm-svn: 355794	2019-03-11 00:37:01 +00:00
Amaury Sechet	b62642a115	Refactor isBooleanFlip into extractBooleanFlip so that users do not depend on the patern matched. NFC llvm-svn: 355769	2019-03-09 02:51:52 +00:00
Amaury Sechet	782ac933b5	[DAGCombiner] fold (add (add (xor a, -1), b), 1) -> (sub b, a) Summary: This pattern is sometime created after legalization. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58874 llvm-svn: 355716	2019-03-08 19:39:32 +00:00
Simon Pilgrim	04e8439f72	[DAGCombine] Merge visitSMULO+visitUMULO into visitMULO. NFCI. llvm-svn: 355690	2019-03-08 11:41:18 +00:00
Simon Pilgrim	c71d6d157f	[DAGCombine] Merge visitSADDO+visitUADDO into visitADDO. NFCI. llvm-svn: 355689	2019-03-08 11:30:33 +00:00
Simon Pilgrim	2c2e76a9e2	[DAGCombine] Merge visitSSUBO+visitUSUBO into visitSUBO. NFCI. llvm-svn: 355688	2019-03-08 11:16:55 +00:00
Simon Pilgrim	9d6347cfc1	[DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well. Requested by @spatel on D59006 llvm-svn: 355533	2019-03-06 18:52:52 +00:00
Simon Pilgrim	cdf95f8f07	[DAGCombiner] Enable UADDO/USUBO vector combine support Differential Revision: https://reviews.llvm.org/D58965 llvm-svn: 355517	2019-03-06 16:11:03 +00:00
Simon Pilgrim	1bdc2d1874	[DAGCombiner] Add SADDO/SSUBO combine support Basic constant handling folds, for both scalars and vectors Differential Revision: https://reviews.llvm.org/D58967 llvm-svn: 355506	2019-03-06 14:22:21 +00:00
Simon Pilgrim	642f53d292	[DAGCombiner] Enable SMULO/UMULO vector combine support (PR40442) Differential Revision: https://reviews.llvm.org/D58968 llvm-svn: 355495	2019-03-06 11:04:21 +00:00
Craig Topper	509a8a3cf1	[DAGCombiner][X86][SystemZ][AArch64] Combine some cases of (bitcast (build_vector constants)) between legalize types and legalize dag. This patch enables combining integer bitcasts of integer build vectors when the new scalar type is legal. I've avoided floating point because the implementation bitcasts float to int along the way and we would need to check the intermediate types for legality Differential Revision: https://reviews.llvm.org/D58884 llvm-svn: 355324	2019-03-04 19:12:16 +00:00
Nirav Dave	582d46328c	[DAG] Fix constant store folding to handle non-byte sizes. Avoid crashes from zero-byte values due to sub-byte store sizes. Reviewers: uabelho, courbet, rnk Reviewed By: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58626 llvm-svn: 354884	2019-02-26 15:02:32 +00:00
Andrea Di Biagio	4a1e59a6e0	Fix a sign compare warning breaking the -Werror build. The warning was introduced at r354793. llvm-svn: 354810	2019-02-25 19:33:58 +00:00
Simon Pilgrim	28441ac75f	[DAGCombine] Add undef shuffle elt support to partitionShuffleOfConcats Support undef shuffle mask indices in the shuffle(concat_vectors, concat_vectors) -> concat_vectors fold Differential Revision: https://reviews.llvm.org/D58585 llvm-svn: 354793	2019-02-25 16:02:01 +00:00
Jordan Rupprecht	6387fa2715	[NFC] Fix typos: preceeding -> preceding llvm-svn: 354715	2019-02-23 01:28:32 +00:00
Nirav Dave	46f939c118	Disable big-endian constant store merges from rL354676. llvm-svn: 354677	2019-02-22 16:20:34 +00:00
Nirav Dave	44037d7a63	[DAGCombine] Fold overlapping constant stores Fold a smaller constant store into larger constant stores immediately preceeding it. Reviewers: rnk, courbet Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58468 llvm-svn: 354676	2019-02-22 16:00:19 +00:00
Sanjay Patel	ba5ee817e9	[DAGCombiner] prevent infinite looping by truncating 'and' (PR40793) This fold can occur during legalization, so it can fight with promotion to the larger type. It apparently takes a special sequence and subtarget to avoid more basic simplifications that would hide the problem. But there's a bigger question raised here: why does distributeTruncateThroughAnd() even exist? It duplicates functionality from a more minimal pattern that we already have. But getting rid of this function requires some preliminary steps. https://bugs.llvm.org/show_bug.cgi?id=40793 llvm-svn: 354594	2019-02-21 16:01:48 +00:00
Nirav Dave	48cf37b55c	[DAGCombine] Generalize Dead Store to overlapping stores. Summary: Remove stores that are immediately overwritten by larger stores. Reviewers: courbet, rnk Reviewed By: rnk Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58467 llvm-svn: 354518	2019-02-20 21:07:50 +00:00
Clement Courbet	62b3b91ab2	Re-land the refactoring part of r354244 "[DAGCombiner] Eliminate dead stores to stack." This is an NFC. llvm-svn: 354476	2019-02-20 15:45:58 +00:00
Clement Courbet	292291fb90	Revert r354244 "[DAGCombiner] Eliminate dead stores to stack." Breaks some bots. llvm-svn: 354245	2019-02-18 08:24:29 +00:00
Clement Courbet	57f34dbd3e	[DAGCombiner] Eliminate dead stores to stack. Summary: A store to an object whose lifetime is about to end can be removed. See PR40550 for motivation. Reviewers: niravd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57541 llvm-svn: 354244	2019-02-18 07:59:01 +00:00
Sanjay Patel	86fac11d5a	[DAGCombiner] convert logic-of-setcc into bit magic (PR40611) If we're comparing some value for equality against 2 constants and those constants have an absolute difference of just 1 bit, then we can offset and mask off that 1 bit and reduce to a single compare against zero: and/or (setcc X, C0, ne), (setcc X, C1, ne/eq) --> setcc ((add X, -C1), ~(C0 - C1)), 0, ne/eq https://rise4fun.com/Alive/XslKj This transform is disabled by default using a TLI hook ("convertSetCCLogicToBitwiseLogic()"). That should be overridden for AArch64, MIPS, Sparc and possibly others based on the asm shown in: https://bugs.llvm.org/show_bug.cgi?id=40611 llvm-svn: 353859	2019-02-12 17:07:47 +00:00
Benjamin Kramer	711950c116	Move some classes into anonymous namespaces. NFC. llvm-svn: 353710	2019-02-11 15:16:21 +00:00
Simon Pilgrim	c5744d4d69	[DAG] Add optional AllowUndefs to isNullOrNullSplat No change in default behaviour (AllowUndefs = false) llvm-svn: 353646	2019-02-10 17:42:15 +00:00
Simon Pilgrim	5a82a788a2	[DAGCombine] Simplify funnel shifts with undef/zero args to bitshifts Now that we have SimplifyDemandedBits support for funnel shifts (rL353539), we need to simplify funnel shifts back to bitshifts in cases where either argument has been folded to undef/zero. Differential Revision: https://reviews.llvm.org/D58009 llvm-svn: 353645	2019-02-10 17:04:00 +00:00
Nemanja Ivanovic	92a8c36735	[DAGCombine] Optimize pow(X, 0.75) to sqrt(X) * sqrt(sqrt(X)) The sqrt case is faster and we already do this for the case where the exponent is 0.25. This adds the 0.75 case which is also not sensitive to signed zeros. Patch by Whitney Tsang (Whitney) Differential revision: https://reviews.llvm.org/D57434 llvm-svn: 353557	2019-02-08 19:50:58 +00:00
Simon Pilgrim	478bb90779	[TargetLowering] Add SimplifyDemandedBits funnel shift support llvm-svn: 353539	2019-02-08 17:19:01 +00:00
Nirav Dave	97011ccce0	Revert r353416 "[DAG] Cleanup unused nodes on failed store-to-load forward combine." This cleanup causes out-of-tree crashes. llvm-svn: 353527	2019-02-08 15:21:13 +00:00
Simon Pilgrim	fe3ac70b18	[DAGCombiner] (add (umax X, C), -C) --> (usubsat X, C) (PR40111) Move the (add (umax X, C), -C) --> (usubsat X, C) X86 combine into generic DAGCombiner First of a number of saturated arithmetic folds that can be moved out of X86-specific code for PR40111. Differential Revision: https://reviews.llvm.org/D57754 llvm-svn: 353457	2019-02-07 20:14:43 +00:00
Nirav Dave	9332fc2e19	Revert "[DAG] Cleanup of unused node in SimplifySelectCC." Causes ASAN use-after-poison errors. llvm-svn: 353442	2019-02-07 18:31:05 +00:00
Sanjay Patel	2d4b186844	[DAGCombiner] fold add/sub with bool operand based on target's boolean contents I noticed that we are missing this canonicalization in IR: rL352515 ...and then realized that we don't get this right in SDAG either, so this has to be fixed first regardless of what we choose to do in IR. The existing fold was limited to scalars and using the wrong predicate to guard the transform. We have a boolean contents TLI query that can be used to decide which direction to fold. This may eventually lead back to the problems/question in: https://bugs.llvm.org/show_bug.cgi?id=40486 ...but it makes no difference to that yet. Differential Revision: https://reviews.llvm.org/D57401 llvm-svn: 353433	2019-02-07 17:43:34 +00:00
Nirav Dave	24e60819f6	[DAG] Cleanup of unused node in SimplifySelectCC. llvm-svn: 353428	2019-02-07 17:13:55 +00:00
Nirav Dave	4b12236f7d	[DAG] Cleanup unused node on failed SELECT Combine. llvm-svn: 353426	2019-02-07 16:57:50 +00:00
Nirav Dave	724b81087d	[DAG] Cleanup unused nodes on failed store-to-load forward combine. llvm-svn: 353416	2019-02-07 15:38:14 +00:00
Nirav Dave	b3506bf985	[DAG] Immediately cleanup unused nodes from extend-based combines. llvm-svn: 353338	2019-02-06 20:12:03 +00:00
Clement Courbet	5a6712b633	[DAGCombine][NFC] GatherAllAliases should take a LSBaseSDNode. GatherAllAliases only makes sense for LSBaseSDNode. Enforce it with static typing instead of runtime cast. llvm-svn: 353291	2019-02-06 12:36:17 +00:00
Craig Topper	d4e37afe45	[DAGCombiner] Discard pointer info when combining extract_vector_elt of a vector load when the index isn't constant Summary: If the index isn't constant, this transform inserts a multiply and an add on the index to calculating the base pointer for a scalar load. But we still create a memory operand with an offset of 0 and the size of the scalar access. But the access is really to an unknown offset within the original access size. This can cause the machine scheduler to incorrectly calculate dependencies between this load and other accesses. In the case we saw, there was a 32 byte vector store that was split into two 16 byte stores, one with offset 0 and one with offset 16. The size of the memory operand for both was 16. The scheduler correctly detected the alias with the offset 0 store, but not the offset 16 store. This patch discards the pointer info so we don't incorrectly detect aliasing. I wasn't sure if we could keep using the original offset and size without risking some other transform on the load changing the size. I tried to reduce a test case, but there's still a lot of memory operations needed to get the scheduler to do the bad reordering. So it looked pretty fragile to maintain. Reviewers: efriedma Reviewed By: efriedma Subscribers: arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57616 llvm-svn: 353124	2019-02-05 00:22:23 +00:00
Simon Pilgrim	a536b89fe0	[DAGCombine] Add ADD(SUB,SUB) combines Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case. We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage. llvm-svn: 353044	2019-02-04 13:44:49 +00:00
Simon Pilgrim	bd42f97946	[SDAG] Add SDNode/SDValue getConstantOperandAPInt helper. NFCI. We already have the getConstantOperandVal helper which returns a uint64_t, but along comes the fuzzer and inserts a i128 -1 constant or something and the whole thing asserts....... I've updated a few obvious cases, and tried to make use of the const reference where possible, but there's more to do. A number of existing oss-fuzz tickets should be fixed if we start using APInt and perform value clamping where necessary. llvm-svn: 352961	2019-02-02 17:35:06 +00:00
Guozhi Wei	0bed9e0453	[DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT This patch fixes pr39098. For the attached test case, CombineZExtLogicopShiftLoad can optimize it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 But later visitANDLike transforms it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t61: i32 = truncate t57 t63: i32 = srl t61, Constant:i8<1> t64: i32 = and t63, Constant:i32<524287> t65: i64 = zero_extend t64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop. Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free. Differential Revision: https://reviews.llvm.org/D57491 llvm-svn: 352792	2019-01-31 20:46:42 +00:00
Nirav Dave	4061b44057	[DAG] Aggressively cleanup dangling node in CombineZExtLogicopShiftLoad. While dangling nodes will eventually be pruned when they are considered, leaving them disables combines requiring single-use. Reviewers: Carrot, spatel, craig.topper, RKSimon, efriedma Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D57520 llvm-svn: 352784	2019-01-31 19:35:14 +00:00
Sanjay Patel	9ab23101a8	[DAGCombiner] sub X, 0/1 --> add X, 0/-1 This extends the existing transform for: add X, 0/1 --> sub X, 0/-1 ...to allow the sibling subtraction fold. This pattern could regress with the proposed change in D57401. llvm-svn: 352680	2019-01-30 22:41:35 +00:00
Sanjay Patel	a61d586f74	[DAGCombiner] fold extract_subvector of extract_subvector This is the sibling fold for insert-of-insert that was added with D56604. Now that we have x86 shuffle narrowing (D57156), this change shows improvements for lots of AVX512 reduction code (not sure that we would ever expect extract-of-extract otherwise). There's a small regression in some of the partial-permute tests (extracting followed by splat). That is tracked by PR40500: https://bugs.llvm.org/show_bug.cgi?id=40500 Differential Revision: https://reviews.llvm.org/D57336 llvm-svn: 352528	2019-01-29 19:13:39 +00:00
Michael Berg	685d5f675e	[NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target control llvm-svn: 352396	2019-01-28 18:03:08 +00:00
Sam Parker	9a2a89d58f	[DAGCombine] Enable more pre-indexed stores The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933	2019-01-23 09:11:49 +00:00
Sanjay Patel	effee52c59	[DAGCombiner] narrow vector binop with 2 insert subvector operands vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825	2019-01-22 14:24:13 +00:00
Sanjay Patel	e713c47d49	[DAGCombiner] fix crash when converting build vector to shuffle The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753	2019-01-21 17:30:14 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Florian Hahn	dc4e154720	[SelectionDAG] Split very large token factors for chained stores to 64k chunks. Similar to D55073. Without this change, the DAG combiner crashes on code with more than 64k of stores in a single basic block that form parallelizable chains. No test case, as it would be very IR file. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D56740 llvm-svn: 351571	2019-01-18 18:37:38 +00:00
Sam Parker	dd8cd6d26b	[DAGCombine] Fix ReduceLoadWidth for shifted offsets ReduceLoadWidth can trigger using a shifted mask is used and this requires that the function return a shl node to correct for the offset. However, the way that this was implemented meant that the returned result could be an existing node, which would be incorrect. This fixes the method of inserting the new node and replacing uses. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 351310	2019-01-16 08:40:12 +00:00
Sanjay Patel	fad5bdaf95	[DAGCombiner] reduce buildvec of zexted extracted element to shuffle The motivating case for this is shown in the first regression test. We are transferring to scalar and back rather than just zero-extending with 'vpmovzxdq'. That's a special-case for a more general pattern as shown here. In all tests, we're avoiding the vector-scalar-vector moves in favor of vector ops. We aren't producing optimal shuffle code in some cases though, so the patch is limited to reduce regressions. Differential Revision: https://reviews.llvm.org/D56281 llvm-svn: 351198	2019-01-15 16:11:05 +00:00
Simon Pilgrim	a1bd4a6ba4	[DAGCombiner] Add (sub_sat x, x) -> 0 combine llvm-svn: 351073	2019-01-14 15:43:34 +00:00
Simon Pilgrim	fa1f518748	[DAGCombiner] Enable sub saturation constant folding llvm-svn: 351072	2019-01-14 15:28:53 +00:00
Simon Pilgrim	7fc6882374	[DAGCombiner] Add add/sub saturation undef handling Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070	2019-01-14 14:16:24 +00:00
Simon Pilgrim	cfa5f06dde	[DAGCombiner] Enable add saturation constant folding llvm-svn: 351060	2019-01-14 12:34:31 +00:00
Simon Pilgrim	67610926fc	[DAGCombiner] Add add saturation constant folding tests. Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057	2019-01-14 12:12:42 +00:00
Simon Pilgrim	56ba1db933	[DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y) NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026	2019-01-13 22:08:26 +00:00
Simon Pilgrim	888fa8680c	Fix unused variable warning. NFCI. llvm-svn: 351025	2019-01-13 21:53:12 +00:00
Simon Pilgrim	897d4c6fe9	[DAGCombiner] Some very basic add/sub saturation combines. Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024	2019-01-13 21:50:24 +00:00
Sanjay Patel	625d5aef62	[DAGCombiner] fold insert_subvector of insert_subvector This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008	2019-01-12 15:12:28 +00:00
Sanjay Patel	9b368f39a9	[DAGCombiner] simplify code; NFC llvm-svn: 350844	2019-01-10 16:47:42 +00:00
Sanjay Patel	9633d76a40	[DAGCombiner][x86] scalarize binop followed by extractelement As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354	2019-01-03 21:31:16 +00:00
Craig Topper	8dd7bd2cd7	[DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239	2019-01-02 18:19:07 +00:00
Craig Topper	c562fae02b	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235	2019-01-02 17:58:27 +00:00
Craig Topper	802c4979ae	[DAGCombiner] Add missing one use check on the shuffle in the bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172	2018-12-31 05:40:46 +00:00
Sanjay Patel	93f1074677	[DAGCombiner] limit shuffle to extend transform (PR40146) It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034	2018-12-23 20:48:31 +00:00
Sanjay Patel	9933574ac3	[DAGCombiner] allow hoisting vector bitwise logic ahead of extends llvm-svn: 350032	2018-12-23 19:58:16 +00:00
Sanjay Patel	4b537aaf6d	[DAGCombiner] allow narrowing of add followed by truncate trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006	2018-12-22 17:10:31 +00:00
Sanjay Patel	47a6129e26	[DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC llvm-svn: 349958	2018-12-21 21:26:30 +00:00
Simon Pilgrim	911dce2f30	[SelectionDAG] Always use the version of computeKnownBits that returns a value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907	2018-12-21 14:56:18 +00:00
Eli Friedman	b1bbd5dca3	[ARM] Complete the Thumb1 shift+and->shift+shift transforms. This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857	2018-12-20 23:39:54 +00:00
Craig Topper	bd788ce5db	[DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand. llvm-svn: 349726	2018-12-20 05:28:06 +00:00
Simon Pilgrim	2ae3a91656	[SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629	2018-12-19 14:09:38 +00:00
Simon Pilgrim	6c95bea072	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091) As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625	2018-12-19 13:37:59 +00:00
Sanjay Patel	f24900b934	[DAGCombiner] allow hoisting vector bitwise logic ahead of truncates The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303	2018-12-16 14:57:04 +00:00
Simon Pilgrim	0ef977b83d	[SelectionDAG] Add FSHL/FSHR support to computeKnownBits Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298	2018-12-16 13:33:37 +00:00
Craig Topper	257ce3871e	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext (setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137	2018-12-14 08:28:24 +00:00
Sanjay Patel	093ab45d4c	[DAGCombiner] clean up visitEXTRACT_VECTOR_ELT This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved. llvm-svn: 349095	2018-12-14 00:09:08 +00:00
Sanjay Patel	791ae69afe	[DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058	2018-12-13 17:05:01 +00:00
Sanjay Patel	c56f5728ee	revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056	2018-12-13 16:32:44 +00:00
Sanjay Patel	a7b115b392	[DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051	2018-12-13 15:44:26 +00:00
Simon Pilgrim	ab973a45b9	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028	2018-12-13 12:23:32 +00:00
Simon Pilgrim	c73a955370	[DAGCombiner] Remove unnecessary recursive DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now. llvm-svn: 348781	2018-12-10 18:18:50 +00:00
Francis Visoiu Mistrih	753efe3584	[DAGCombiner] Use the result value type in visitCONCAT_VECTORS This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759	2018-12-10 14:31:34 +00:00
Sanjay Patel	e767bf4468	[DAGCombiner] re-enable truncation of binops This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706	2018-12-08 16:07:38 +00:00
Sanjay Patel	bc47ff86fe	[DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627	2018-12-07 18:51:08 +00:00
Sanjay Patel	3af4ae9735	[DAGCombiner] disable truncation of binops by default As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604	2018-12-07 15:47:52 +00:00
Sanjay Patel	bb796cd61c	[DAGCombiner] remove explicit calls to AddToWorkList; NFCI As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597	2018-12-07 15:00:56 +00:00
Sanjay Patel	c6441c8547	[DAGCombiner] use root SDLoc for all nodes created by logic fold If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552	2018-12-07 00:01:57 +00:00
Sanjay Patel	86cb679851	[DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550	2018-12-06 23:53:58 +00:00
Sanjay Patel	276cef343c	[DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC This code can still misbehave. llvm-svn: 348547	2018-12-06 23:39:28 +00:00
Sanjay Patel	70af85b0ac	[DAGCombiner] don't group bswap with casts in logic hoisting fold This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534	2018-12-06 22:10:44 +00:00
Sanjay Patel	03a3ef2a0c	[DAGCombiner] reduce indent; NFC Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523	2018-12-06 20:02:47 +00:00
Andrea Di Biagio	52a2bac583	[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef. This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522	2018-12-06 19:55:38 +00:00
Sanjay Patel	bfc7ffa40f	[DAGCombiner] don't hoist logic op if operands have other uses, part 2 The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518	2018-12-06 19:18:56 +00:00
Sanjay Patel	c3717cd0d5	[DAGCombiner] don't hoist logic op if operands have other uses The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508	2018-12-06 18:16:32 +00:00
Sanjay Patel	e9bf78fa23	[DAGCombiner] refactor function that hoists bitwise logic; NFCI Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html llvm-svn: 348501	2018-12-06 17:08:03 +00:00
Simon Pilgrim	105a366254	DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated VT.getVectorNumElements(). NFCI. llvm-svn: 348494	2018-12-06 15:39:25 +00:00
Sanjay Patel	33a448f935	[DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383	2018-12-05 17:10:30 +00:00
Simon Pilgrim	180639afe5	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467) This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353	2018-12-05 11:12:12 +00:00
Simon Pilgrim	666261cdc8	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246	2018-12-04 10:41:06 +00:00
Sanjay Patel	d24f63477d	[DAGCombiner] narrow truncated vector binops when legal This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195	2018-12-03 21:57:35 +00:00
Craig Topper	e35b01f8ea	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Sanjay Patel	2daceedf92	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Sanjay Patel	8d27144251	[DAGCombiner] narrow truncated binops The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917	2018-11-29 20:58:26 +00:00
Sanjay Patel	7336e7c67a	[x86] limit transform for select-of-fp-constants This should likely be adjusted to limit this transform further, but these diffs should be clear wins. If we have blendv/conditional move, then we should assume those are cheap ops. The loads become independent of the compare, so those can be speculated before we need to use the values in the blend/mov. llvm-svn: 347526	2018-11-25 17:27:02 +00:00
Sanjay Patel	04435677d0	[SelectionDAG] move constant or splat functions to common location rL347502 moved the null sibling, so we should group all of these together. I'm not sure why these aren't methods of the SDValue class itself, but that's another patch if that's possible. llvm-svn: 347523	2018-11-25 16:09:32 +00:00
Sanjay Patel	7e119c0400	[DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502	2018-11-23 20:05:12 +00:00
Sanjay Patel	3e80019275	[DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478	2018-11-22 19:24:10 +00:00
Sanjay Patel	20935e0ab5	[DAGCombiner] refactor select-of-FP-constants transform This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424	2018-11-21 20:54:47 +00:00
Sanjay Patel	1c74747478	[DAGCombiner] reduce code duplication; NFC llvm-svn: 347410	2018-11-21 20:00:32 +00:00
Sanjay Patel	357053f289	[DAGCombiner] look through bitcasts when trying to narrow vector binops This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553). The x86 test diffs are wins, but the AArch64 diff is probably not. That problem already exists independent of this patch (see PR39722), but it went unnoticed in the previous patch because there were no regression tests that showed the possibility. The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling concerns with using wider vector ops, an extra extract to reduce vector width is the right trade-off at this level of codegen. Differential Revision: https://reviews.llvm.org/D54392 llvm-svn: 347356	2018-11-20 22:26:35 +00:00
Simon Pilgrim	3735105961	[DAGCombine] Add calls to SimplifyDemandedVectorElts from visitINSERT_SUBVECTOR (PR37989) This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices. llvm-svn: 347313	2018-11-20 15:23:50 +00:00
Sanjay Patel	a36c444471	[DAGCombiner] reduce code duplication in visitXOR; NFC llvm-svn: 347278	2018-11-20 00:51:45 +00:00
Simon Pilgrim	740122fb8c	[DAGCombine] SimplifyNodeWithTwoResults - ensure same legalization for LO/HI operands (PR21207) Consistently use (!LegalOperations \|\| isOperationLegalOrCustom) for all node pairs. Differential Revision: https://reviews.llvm.org/D53478 llvm-svn: 347255	2018-11-19 19:37:59 +00:00
Sanjay Patel	b25adf5edb	[SelectionDAG] simplify vector select with undef operand(s) llvm-svn: 347227	2018-11-19 17:06:05 +00:00
Sanjay Patel	c036d844be	[SelectionDAG] add simplifySelect() to reduce code duplication; NFC This should be extended to handle FP and vectors in follow-up patches. llvm-svn: 347210	2018-11-19 14:35:22 +00:00
Sanjay Patel	8c0cd77bff	[DAG] add undef simplifications for select nodes Sadly, this duplicates (twice) the logic from InstSimplify. There might be some way to at least share the DAG versions of the code, but copying the folds seems to be the standard method to ensure that we don't miss these folds. Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no way to ensure that we do these kinds of simplifications unless the code is repeated at node creation time and during combines. There were other tests that would become worthless with this improvement that I changed as pre-commits: rL347161 rL347164 rL347165 rL347166 rL347167 I'm not sure how to salvage the remaining tests (diffs in this patch). So the x86 tests verify that the new code is working as intended. The AMDGPU test is actually similar to my motivating case: we have some undef value that has survived to machine IR in an x86 test, and then it gets folded in some weird way, or we crash if we don't transfer the undef flag. But we would have been better off never getting to that point by doing these simplifications. This will lead back to PR32023 someday... https://bugs.llvm.org/show_bug.cgi?id=32023 llvm-svn: 347170	2018-11-18 17:36:23 +00:00
Stanislav Mekhanoshin	0ff7c8309d	DAG combiner: fold (select, C, X, undef) -> X Differential Revision: https://reviews.llvm.org/D54646 llvm-svn: 347110	2018-11-16 23:13:38 +00:00
Sam Parker	ab99cfab21	[DAGCombine] Fix non-deterministic debug output PR37970 reported non-deterministic debug output, this was caused by iterating through a set and not a a vector. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37970 Differential Revision: https://reviews.llvm.org/D54570 llvm-svn: 347037	2018-11-16 08:35:19 +00:00
Craig Topper	0b33b468a1	[DAGCombiner] Enable tryToFoldExtendOfConstant to run after legalize vector ops It should be ok to create a new build_vector after legal operations so long as it doesn't cause an infinite loop in DAG combiner. Unfortunately, X86's custom constant folding in combineVSZext is hiding any test changes from this. But I'm trying to get to a point where that X86 specific code isn't necessary at all. Differential Revision: https://reviews.llvm.org/D54285 llvm-svn: 346728	2018-11-13 01:59:32 +00:00
Nirav Dave	a395e2df56	[DAGCombiner] Fix load-store forwarding of indexed loads. Summary: Handle extra output from index loads in cases where we wish to forward a load value directly from a preceeding store. Fixes PR39571. Reviewers: peter.smith, rengolin Subscribers: javed.absar, hiraditya, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54265 llvm-svn: 346654	2018-11-12 14:05:40 +00:00
Craig Topper	d23cdbbeb2	[DAGCombiner] Make tryToFoldExtendOfConstant return an SDValue instead of an SDNode*. NFC Removes the need to call getNode internally and to recreate an SDValue after the call. llvm-svn: 346600	2018-11-10 23:46:03 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Alexandros Lamprineas	e15c982f6d	[SelectionDAG] swap select_cc operands to enable folding The DAGCombiner tries to SimplifySelectCC as follows: select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4) It can't cope with the situation of reordered operands: select_cc(x, y, 0, 16, cc) In that case we just need to swap the operands and invert the Condition Code: select_cc(x, y, 16, 0, ~cc) Differential Revision: https://reviews.llvm.org/D53236 llvm-svn: 346484	2018-11-09 11:09:40 +00:00
Nirav Dave	6ce9f72f76	[DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432	2018-11-08 19:14:20 +00:00
Craig Topper	8f2f2a76b9	[DAGCombiner] Use tryFoldToZero to simplify some code and make it work correctly between LegalTypes and LegalOperations. The original code avoided creating a zero vector after type legalization, but if we're after type legalization the type we have is legal. The real hazard we need to avoid is creating a build vector after op legalization. tryFoldToZero takes care of checking for this. llvm-svn: 346119	2018-11-05 05:53:06 +00:00
Craig Topper	8d64abddd1	[DAGCombiner] Remove an unused argument from tryFoldToZero. NFC llvm-svn: 346118	2018-11-05 05:53:03 +00:00
Craig Topper	3292ea03d3	[DAGCombiner] Remove 'else' after return. NFC This makes this code consistent with the nearly identical code in visitZERO_EXTEND. llvm-svn: 346090	2018-11-04 06:56:32 +00:00
Craig Topper	1ba86188cf	[SelectionDAG] Remove special methods for creating *_EXTEND_VECTOR_INREG nodes. Move asserts into getNode. These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers. The rest of the patch is just changing all callers to use getNode directly. llvm-svn: 346087	2018-11-04 02:10:18 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Simon Pilgrim	cdcbeb4997	[DAGCombiner] Remove reduceBuildVecConvertToConvertBuildVec and rely on the vectorizers instead (PR35732) reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this. While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing. Differential Revision: https://reviews.llvm.org/D53712 llvm-svn: 345964	2018-11-02 11:06:18 +00:00
Craig Topper	e2483020f2	[DAGCombiner] Make the isTruncateOf call from visitZERO_EXTEND work for vectors. Remove FIXME. I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover. llvm-svn: 345908	2018-11-01 23:21:45 +00:00
Sanjay Patel	c5fe3ce2ec	[DAGCombiner] make sure we have a whole-number extract before trying to narrow a vector op (PR39511) The test causes a crash because we were trying to extract v4f32 to v3f32, and the narrowing factor was then 4/3 = 1 producing a bogus narrow type. This should fix: https://bugs.llvm.org/show_bug.cgi?id=39511 llvm-svn: 345842	2018-11-01 15:41:12 +00:00
David Bolvansky	d0080c3a5f	[DAGCombiner] Fold 0 div/rem X to 0 Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721	2018-10-31 14:18:57 +00:00
Bjorn Pettersson	fe09a20f09	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636	2018-10-30 20:16:39 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
David Bolvansky	dfdbb038e8	[DAGCombiner] Improve X div/rem Y fold if single bit element type Summary: Tests by @spatel, thanks Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: sdardis, atanasyan, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D52668 llvm-svn: 345575	2018-10-30 09:07:22 +00:00
Craig Topper	c4b785ae1e	[DAGCombiner] Better constant vector support for FCOPYSIGN. Enable constant folding when both operands are vectors of constants. Turn into FNEG/FABS when the RHS is a splat constant vector. llvm-svn: 345469	2018-10-28 01:32:49 +00:00
Sanjay Patel	0eddd4730f	[DAGCombiner] rearrange code in narrowExtractedVectorBinOp(); NFC We can extend this code to handle many more cases if an extract is cheap, so prepping for that change. llvm-svn: 345430	2018-10-26 21:32:04 +00:00
Thomas Lively	30f1d69115	[NFC] Rename minnan and maxnan to minimum and maximum Summary: Changes all uses of minnan/maxnan to minimum/maximum globally. These names emphasize that the semantic difference between these operations is more than just NaN-propagation. Reviewers: arsenm, aheejin, dschuff, javed.absar Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53112 llvm-svn: 345218	2018-10-24 22:49:55 +00:00
Thomas Lively	43bc46207a	[SelectionDAG] DAG combiner for fminnan and fmaxnan Summary: Depends on D52765. Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52768 llvm-svn: 345210	2018-10-24 22:18:54 +00:00
Tim Northover	05fe8f918b	[DAG] check more operands for cycles when merging stores. Until now, we've only checked whether merging stores would cause a cycle via the value argument, but the address and indexed offset arguments are also capable of creating cycles in some situations. The addresses are all base+offset with notionally the same base, but the base SDNode may still be different (e.g. via an indexed load in one case, and an ISD::ADD elsewhere). This allows cycles to creep in if one of these sources depends on another. The indexed offset is usually undef (representing a non-indexed store), but on some architectures (e.g. 32-bit ARM-mode ARM) it can be an arbitrary value, again allowing dependency cycles to creep in. llvm-svn: 345200	2018-10-24 21:36:34 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Craig Topper	c8e183f9ee	Recommit r344877 "[X86] Stop promoting integer loads to vXi64" I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile. Original commit message: Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping. I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D53306 llvm-svn: 344965	2018-10-22 22:14:05 +00:00
Matt Arsenault	687ec75d10	DAG: Change behavior of fminnum/fmaxnum nodes Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914	2018-10-22 16:27:27 +00:00
Sanjay Patel	e439cc2745	[DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016) This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872	2018-10-21 20:13:29 +00:00
Sanjay Patel	8bd74785f0	[DAGCombiner] allow undef elts in vector fmul matching llvm-svn: 344534	2018-10-15 16:54:07 +00:00
Sanjay Patel	89e2197c33	[DAGCombiner] refactor folds for fadd (fmul X, -2.0), Y; NFCI The transform doesn't work if the vector constant has undef elements. llvm-svn: 344532	2018-10-15 16:47:01 +00:00
Sanjay Patel	9e7e0fd828	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344528	2018-10-15 15:56:39 +00:00
Sanjay Patel	4e970ff022	[DAGCombiner] allow undef elts in vector fma matching llvm-svn: 344525	2018-10-15 15:38:38 +00:00
Sanjay Patel	56b6660d2e	[DAGCombiner] rearrange extract_element+bitcast fold; NFC I want to add another pattern here that includes scalar_to_vector, so this makes that patch smaller. I was hoping to remove the hasOneUse() check because it shouldn't be necessary for common codegen, but an AMDGPU test has a comment suggesting that the extra check makes things better on one of those targets. llvm-svn: 344320	2018-10-11 23:56:56 +00:00
Nirav Dave	f1f2a2a31a	[DAG] Fix Big Endian in Load-Store forwarding Summary: Correct offset calculation in load-store forwarding for big-endian targets. Reviewers: rnk, RKSimon, waltl Subscribers: sdardis, nemanjai, hiraditya, jrtc27, atanasyan, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53147 llvm-svn: 344272	2018-10-11 18:28:59 +00:00
Sanjay Patel	4875662e57	[DAGCombiner] move comment closer to the corresponding code; NFC llvm-svn: 344255	2018-10-11 16:07:25 +00:00
Nirav Dave	07acc992dc	[DAGCombine] Improve Load-Store Forwarding Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142	2018-10-10 14:15:52 +00:00
Nemanja Ivanovic	72d4866e57	[DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093	2018-10-09 23:20:11 +00:00
Sanjay Patel	b64c0d7b53	[DAGCombiner] simplify code for fmul with constant fold; NFCI llvm-svn: 343997	2018-10-08 21:17:20 +00:00
Sanjay Patel	ecc8af61e7	[DAGCombiner] allow undef elts in vector fadd matching llvm-svn: 343945	2018-10-07 16:30:42 +00:00
Sanjay Patel	ef76e27985	[DAGCombiner] allow undefs when matching vector splats for fmul folds llvm-svn: 343942	2018-10-07 16:05:37 +00:00
Sanjay Patel	0b74c840dd	[DAGCombiner] allow undef elts in vector fabs/fneg matching This change is proposed as a part of D44548, but we need this independently to avoid regressions from improved undef propagation in SimplifyDemandedVectorElts(). llvm-svn: 343940	2018-10-07 15:32:06 +00:00
Sanjay Patel	46a9dc2e3e	[DAGCombiner] shorten code for bitcast+fabs fold; NFC llvm-svn: 343939	2018-10-07 15:18:30 +00:00
Sanjay Patel	f6a160a102	[SelectionDAG] allow undefs when matching splat constants And use that to transform fsub with zero constant operands. The integer part isn't used yet, but it is proposed for use in D44548, so adding both enhancements here makes that patch simpler. llvm-svn: 343865	2018-10-05 17:42:19 +00:00
Matthias Braun	004fe6bf83	DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector\|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493	2018-10-01 16:25:50 +00:00
Simon Pilgrim	818cfc40ff	[DAG] Don't perform SINT_TO_FP<->UINT_TO_FP custom conversion after legalization The SINT_TO_FP<->UINT_TO_FP combines for non-negative integers should only occur for legal ops once LegalOperations = true No test case to hand, noticed when investigating PR38226 + PR38970 llvm-svn: 343405	2018-09-30 12:46:42 +00:00
David Bolvansky	8e90bad63d	[DAGCombiner] [NFC] Improve X div/rem 1 fold Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52661 llvm-svn: 343349	2018-09-28 18:40:30 +00:00
Fangrui Song	0cac726a00	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...) Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163	2018-09-27 02:13:45 +00:00
Craig Topper	b2a00acb24	[DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike returning a UDIVREM or SDIVREM node. This shouldn't be possible and is a leftover from when we used to recursively call combine here. llvm-svn: 343049	2018-09-25 23:52:07 +00:00
Nirav Dave	f445a67be4	[DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI. Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984	2018-09-25 15:29:30 +00:00
Nirav Dave	7373d5e646	[DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. NFCI. llvm-svn: 342983	2018-09-25 15:29:04 +00:00
Nirav Dave	46ab89a0d0	[DAGCombine] Don't fold dependent loads across SELECT_CC. DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980	2018-09-25 14:43:05 +00:00
Sanjay Patel	2c901742ca	[DAGCombiner] use UADDO to optimize saturated unsigned add This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886	2018-09-24 14:47:15 +00:00
Hans Wennborg	83d15dfe2d	Remove debug printf leftover from r342397 llvm-svn: 342863	2018-09-24 08:18:47 +00:00
Craig Topper	5bef27e808	[DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTOR This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015. llvm-svn: 342856	2018-09-24 02:03:11 +00:00
Craig Topper	b3b94a8e8b	[DAGCombiner] Clarify a comment. NFC This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed. llvm-svn: 342851	2018-09-23 21:17:56 +00:00
Sanjay Patel	0027946915	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844	2018-09-23 18:41:38 +00:00
Craig Topper	81f67f7afb	[DAGCombiner] Simplify some code in visitBITCAST. NFCI llvm-svn: 342826	2018-09-22 23:12:34 +00:00
Craig Topper	e79a588cac	[DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCI llvm-svn: 342809	2018-09-22 18:03:14 +00:00
Sanjay Patel	8a1227ccc8	[SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669	2018-09-20 17:34:08 +00:00
Sanjay Patel	fdc0de19cb	[SelectionDAG] allow vector types with isBitwiseNot() The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594	2018-09-19 21:48:30 +00:00
Sanjay Patel	4fd2e2a498	[DAGCombiner][x86] add transform/hook to decompose integer multiply into shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554	2018-09-19 15:57:40 +00:00
Amara Emerson	91c2913522	Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397	2018-09-17 14:40:13 +00:00
Sanjay Patel	3eaf500a6d	[DAGCombiner] try to convert pow(x, 1/3) to cbrt(x) This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348	2018-09-16 16:50:26 +00:00
Reid Kleckner	4d1b75c6b7	Revert r342183 "[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265	2018-09-14 19:39:40 +00:00
Amara Emerson	ef600cbd86	[DAGCombine] Fix crash when store merging created an extract_subvector with invalid index. Differential Revision: https://reviews.llvm.org/D51831 llvm-svn: 342183	2018-09-13 21:28:58 +00:00
Sanjay Patel	8a478b79dc	[DAGCombiner] improve formatting for select+setcc code; NFC llvm-svn: 342095	2018-09-12 23:03:50 +00:00
Simon Pilgrim	96d6b9c2e2	[DAGCombiner] foldBitcastedFPLogic - Add basic vector support Add support for bitcasts from float type to an integer type of the same element bitwidth. There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet. llvm-svn: 341652	2018-09-07 12:13:45 +00:00
Sanjay Patel	dbf52837fe	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)) This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481	2018-09-05 17:01:56 +00:00
Craig Topper	6666861158	[DAGCombiner] Fix bad identation. NFC llvm-svn: 341103	2018-08-30 19:35:40 +00:00
Simon Pilgrim	b49d5f3b53	[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds Adds more divrem folds to try and get in sync with InstructionSimplify Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340919	2018-08-29 11:30:16 +00:00
Nirav Dave	11e39fb6fb	[DAGCombine] Rework MERGE_VALUES to inline in single pass. NFCI. Avoid hyperlinear cost of inlining MERGE_VALUE node by constructing temporary vector and doing a single replacement. llvm-svn: 340853	2018-08-28 18:13:26 +00:00
Craig Topper	c7506b28c1	[DAGCombiner][AMDGPU][Mips] Fold bitcast with volatile loads if the resulting load is legal for the target. Summary: I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there... I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))). Reviewers: efriedma, atanasyan, arsenm Reviewed By: efriedma Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50491 llvm-svn: 340797	2018-08-28 03:47:20 +00:00
Matt Arsenault	cea7c6969d	DAG: Check transformed type for forming fminnum/fmaxnum from vselect Follow up to r340655 to fix vector types which are split. llvm-svn: 340766	2018-08-27 18:11:31 +00:00
Sanjay Patel	f645927875	[SelectionDAG] add helper query for binops; NFC We will also use this in a planned enhancement for vector insertelement. llvm-svn: 340741	2018-08-27 14:20:15 +00:00
Sanjay Patel	113cac3b15	[SelectionDAG][x86] turn insertelement into undef with variable index into splat I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705	2018-08-26 18:20:41 +00:00
Matt Arsenault	5b9ef39bdd	DAG: Allow matching fminnum/fmaxnum from vselect llvm-svn: 340655	2018-08-24 21:24:18 +00:00
Craig Topper	d8e91c3e8d	[DAGCombiner][Mips] Don't combine bitcast+store after LegalOperations when the store is volatile, if the resulting store isn't Legal Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop. The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them This is related to D50491 where I was trying to allow some bitcasting of volatile loads Differential Revision: https://reviews.llvm.org/D50578 llvm-svn: 340626	2018-08-24 17:48:25 +00:00
Sam Parker	597811e7a7	[DAGCombiner] Reduce load widths of shifted masks During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261	2018-08-21 10:26:59 +00:00
Craig Topper	cc5dbbf759	[DAGCombiner] Allow divide by constant optimization on opaque constants. Summary: I believe this restores the behavior we had before r339147. Fixes PR38622. Reviewers: RKSimon, chandlerc, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50936 llvm-svn: 340120	2018-08-18 05:52:42 +00:00
Simon Pilgrim	03e57521c0	[DAGCombiner] extractShiftForRotate - fix out of range shift issue Don't just check for negative shift amounts. Fixes OSS Fuzz #9935 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935 llvm-svn: 340015	2018-08-17 12:25:18 +00:00
Simon Pilgrim	5113b48798	[DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly. Differential Revision: https://reviews.llvm.org/D35722 llvm-svn: 340010	2018-08-17 10:52:49 +00:00
Craig Topper	883ff69c93	[DAGCombiner] Don't reassociate operations that have the vector reduction flag set. When nodes are reassociated the vector-reduction flag gets lost. The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass. I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set. Differential Revision: https://reviews.llvm.org/D50827 llvm-svn: 339946	2018-08-16 21:54:05 +00:00
Simon Pilgrim	e8a906ba47	[DagCombiner] Don't bother adding to the work list if TLI.BuildSDIVPow2 failed. NFCI. Matches the code in BuildSDIV/BuildUDIV llvm-svn: 339757	2018-08-15 10:02:54 +00:00
Eli Friedman	0d12e90bf5	[ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly. Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734	2018-08-14 22:10:25 +00:00
Nirav Dave	fbfe2ad9e0	[DAG] Avoid redundant chain transversal in store merge cycle check. NFCI. Patch by Henric Karlsson. llvm-svn: 339688	2018-08-14 16:20:43 +00:00
Simon Pilgrim	26e3d3f1c8	[DAGCombiner] simplifyDivRem - add comment describing divide by undef/zero combine. NFC. llvm-svn: 339561	2018-08-13 13:12:25 +00:00
Matt Arsenault	1201301b94	DAG: Check no-signed-zeros instead of unsafe-fp-math Addresses fixme, although this should still be checking individual operand flags. llvm-svn: 339525	2018-08-12 19:09:12 +00:00
Michael Berg	ca38254601	extend folding fsub/fadd to fneg for FMF Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation Reviewers: spatel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D50195 llvm-svn: 339357	2018-08-09 17:00:03 +00:00
Sanjay Patel	e47dc1a405	[DAGCombiner] loosen constraints for fsub+fadd fold isNegatibleForFree() should not matter here (as the test diffs show) because it's always a win to replace an fsub+fadd with fneg. The problem in D50195 persists because either (1) we are doing these folds in the wrong order or (2) we're missing another fold for fadd. llvm-svn: 339299	2018-08-08 23:04:43 +00:00
Sanjay Patel	e327266d45	[DAGCombiner] move fadd simplification ahead of other folds I don't know if it's possible to expose this diff in a test, but we should always try simplifications (no new nodes created) before more complicated transforms for efficiency (similar to what we do in IR). llvm-svn: 339298	2018-08-08 22:46:30 +00:00
Simon Pilgrim	4d4220fa2a	[DAG] DAGCombiner::visitSDIVLike - remove unnecessary isConstOrConstSplat call. NFCI. The isConstOrConstSplat result is only used in a ISD::matchUnaryPredicate call which can perform the equivalent iteration just as quickly. llvm-svn: 339262	2018-08-08 15:37:52 +00:00
Craig Topper	49ed49fcb1	[SelectionDAG] When splitting scatter nodes during DAGCombine, create a serial chain dependency. Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code. Differential Revision: https://reviews.llvm.org/D50374 llvm-svn: 339157	2018-08-07 17:35:02 +00:00
Simon Pilgrim	1bfadb0499	[DAG] Allow non-uniform constant vectors to call BuildSDIV This was missed in D50185. NFC until we add actual non-uniform support to BuildSDIV (similar BuildUDIV support in D49248) - for now it just early outs. llvm-svn: 339147	2018-08-07 14:50:39 +00:00
Simon Pilgrim	7e18938793	[TargetLowering] Add support for non-uniform vectors to BuildUDIV This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators. It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV. Differential Revision: https://reviews.llvm.org/D49248 llvm-svn: 339121	2018-08-07 09:51:34 +00:00
Craig Topper	9de1797c50	[SelectionDAG][X86] Rename MaskedLoadSDNode::getSrc0 to getPassThru. Src0 doesn't really convey any meaning to what the operand is. Passthru matches what's used in the documentation for the intrinsic this comes from. llvm-svn: 339101	2018-08-07 06:52:49 +00:00
Craig Topper	17989208a9	[SelectionDAG][X86] Rename getValue to getPassThru for gather SDNodes. getValue is more meaningful name for scatter than it is for gather. Split them and use getPassThru for gather. llvm-svn: 339096	2018-08-07 06:13:40 +00:00
Simon Pilgrim	94112ebc75	[TargetLowering] Generalise BuildSDIV function First step towards a BuildSDIV equivalent to D49248 for non-uniform vector support - this just pushes the splat detection down into TargetLowering::BuildSDIV where its still used. Differential Revision: https://reviews.llvm.org/D50185 llvm-svn: 338838	2018-08-03 10:00:54 +00:00
Craig Topper	2f60ef2c78	[DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329	2018-07-30 23:22:00 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Craig Topper	a568a27dfa	[DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to BuildSDIVPow2. llvm-svn: 338303	2018-07-30 21:04:34 +00:00
Craig Topper	b94d5f853b	Revert r338222 "[DAGCombiner] Remove unnecessary calls to AddToWorklist." Thinking about it more it might be possible for the later nodes to be folded in getNode in such a way that the other created nodes are left dead. This can cause use counts to be incorrect on nodes that aren't dead. So its probably safer to leave this alone. llvm-svn: 338298	2018-07-30 20:27:10 +00:00
Fangrui Song	f78650a8de	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
David Bolvansky	2fa7fb14ea	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270	2018-07-30 16:50:00 +00:00
Craig Topper	e978d2ee4a	[DAGCombiner] Remove unnecessary calls to AddToWorklist. The DAGCombiner has a mechanism for ensuring all nodes have been visited at least once. Every time a node is visited, it makes sure its operands have been in the worklist at least once. This ensures that when multiple nodes are created by a combine, only the last node needs to be returned. The earlier nodes can all be found Through this operand check. These means we don't need to explicitly add nodes to the worklist when a combine creates multiple nodes. I've removed the most obvious cases here. There are probably more than can be removed. llvm-svn: 338222	2018-07-29 18:39:26 +00:00
Craig Topper	9db3573d3a	[SelectionDAG] Pass std::vector by reference instead of by pointer to BuildSDIV/BuildUDIV. This removes the need for an assert to ensure the pointer isn't null. Years ago we had ifs the checked the pointer was non-null before very access to the vector. These checks were removed and replaced with a single assert. But a reference seems more suitable here. llvm-svn: 338205	2018-07-28 19:44:20 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Craig Topper	8b5a2f7aac	[DAGCombiner] Remove some calls to AddToWorklist that should be unnecessary. The DAGCombiner has a system for ensuring all nodes are visited. It doesn't require an AddToWorkList for every node that is created by a combine. llvm-svn: 338079	2018-07-26 22:40:22 +00:00
Nirav Dave	25802ac9fd	[DAG] Avoid Node Update assertion due to AND simplification Check for construction-time folding for incomplete AND nodes in BackwardsPropagateMask. Fixes PR38185. Reviewers: RKSimon, samparker Reviewed By: samparker Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D49444 llvm-svn: 337563	2018-07-20 15:27:24 +00:00
Nirav Dave	5a4e11ad9c	[DAG] Fix Memory ordering check in ReduceLoadOpStore. When merging through a TokenFactor we need to check that the load may be ordered such that no other aliasing memory operations may happen. It is not sufficient to just check that the load is a member of the chain token factor as it there may be a indirect chain. Require the load's chain has only one use. This fixes PR37826. Reviewers: spatel, davide, efriedma, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49388 llvm-svn: 337560	2018-07-20 15:20:50 +00:00
Craig Topper	d8734450a2	[DAGCombiner] Fold X - (-Y Z) -> X + (Y Z) llvm-svn: 337518	2018-07-20 01:40:03 +00:00
Craig Topper	c12c5d421f	[DAGCombiner] Teach DAGCombiner that A-(-B) is A+B. We already knew A+(-B) is A-B in visitAdd. This does the opposite for visitSub. llvm-svn: 337502	2018-07-19 22:24:43 +00:00
Simon Pilgrim	e4d12bb2d6	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258	2018-07-17 09:45:35 +00:00
Fangrui Song	cb0bab86b3	[CodeGen] Fix inconsistent declaration parameter name llvm-svn: 337200	2018-07-16 18:51:40 +00:00
Sanjay Patel	79a423cfa2	[DAGCombiner] fix typo in comment; NFC llvm-svn: 337132	2018-07-15 17:09:35 +00:00
Sanjay Patel	a41c886c55	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X) This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130	2018-07-15 16:27:07 +00:00
Diogo N. Sampaio	b0d85ef975	[NFC][InstCombine] Converts isLegalNarrowLoad into isLegalNarrowLdSt Reuse this function as to test correctness and profitability of reducing width of either load or store operations. Reviewsers: samparker Differential Revision: https://reviews.llvm.org/D48624 llvm-svn: 336800	2018-07-11 12:59:42 +00:00
Simon Pilgrim	075b04a55f	[SelectionDAG] Add constant buildvector support to isKnownNeverZero This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants). llvm-svn: 336779	2018-07-11 09:56:41 +00:00
Simon Pilgrim	df9d59771b	[DAGCombiner] Support non-uniform X%C -> X-(X/C)*C folds First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic. We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit. Differential Revision: https://reviews.llvm.org/D48975 llvm-svn: 336774	2018-07-11 09:22:42 +00:00
Simon Pilgrim	97cf111689	[DAGCombiner] Add (urem X, -1) -> select(X == -1, 0, x) fold llvm-svn: 336773	2018-07-11 09:14:37 +00:00
Simon Pilgrim	4cb4609392	[DAGCombiner] Add special case fast paths for udiv x,1 and udiv x,-1 udiv x,-1 was going down the (slow) BuildUDIV route resulting in unnecessary shifts. llvm-svn: 336701	2018-07-10 16:33:07 +00:00
Simon Pilgrim	641097d561	[DAGCombiner] visitREM - call visitSDIVLike/visitUDIVLike directly to avoid recursive combining. As suggested by @efriedma on D48975 use the visitSDIVLike/visitUDIVLike functions introduced at rL336656. llvm-svn: 336664	2018-07-10 13:18:16 +00:00
Simon Pilgrim	ce5c19b623	[DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI. As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM. llvm-svn: 336656	2018-07-10 11:38:00 +00:00
Roman Lebedev	5ccae1750b	[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts. Summary: This adds a reverse transform for the instcombine canonicalizations that were added in D47980, D47981. As discussed later, that was worse at least for the code size, and potentially for the performance, too. https://rise4fun.com/Alive/Zmpl Reviewers: craig.topper, RKSimon, spatel Reviewed By: spatel Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D48768 llvm-svn: 336585	2018-07-09 19:06:42 +00:00
Simon Pilgrim	c1d1944053	[DAGCombiner] Add EXTRACT_SUBVECTOR to SimplifyDemandedVectorElts As discussed on PR37989, this patch adds EXTRACT_SUBVECTOR handling to TargetLowering::SimplifyDemandedVectorElts and calls it from DAGCombiner::visitEXTRACT_SUBVECTOR. Differential Revision: https://reviews.llvm.org/D48825 llvm-svn: 336490	2018-07-07 17:30:06 +00:00
Nico Weber	038dbf3c24	Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084. llvm-svn: 336453	2018-07-06 17:37:24 +00:00
Diogo N. Sampaio	17be994942	Added missing semicolon llvm-svn: 336428	2018-07-06 10:09:04 +00:00
Diogo N. Sampaio	742bf1a255	[SelectionDAG] https://reviews.llvm.org/D48278 D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426	2018-07-06 09:42:25 +00:00
Diogo N. Sampaio	734cfd11fe	Testing commit permision llvm-svn: 336384	2018-07-05 18:49:32 +00:00
Simon Pilgrim	74cc4cfa94	[DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegen Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code llvm-svn: 336198	2018-07-03 14:11:32 +00:00
Simon Pilgrim	fae337704e	[DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119) The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048	2018-06-30 12:22:55 +00:00
Simon Pilgrim	9c70d48cb2	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED) We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884. llvm-svn: 335886	2018-06-28 17:33:41 +00:00
Haojian Wu	2103990e63	Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV" This reverts commit r335821. This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce. llvm-svn: 335871	2018-06-28 16:25:57 +00:00
Simon Pilgrim	abebe4c746	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821	2018-06-28 09:54:28 +00:00
Simon Pilgrim	49cb65bb7b	[DAGCombiner] Remove unused variable. NFCI. Noticed in D45806 review. llvm-svn: 335817	2018-06-28 09:29:08 +00:00
Nirav Dave	7c57ae57a8	[DAGCombine] Disable TokenFactor simplifications when optnone. llvm-svn: 335773	2018-06-27 19:41:25 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Simon Pilgrim	d3e583a52d	[DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727	2018-06-27 12:45:31 +00:00
Simon Pilgrim	e835f662fa	[DAGCombiner] visitSDIV - simplify pow2 handling. NFCI. Use the builtin constant folding of getNode() etc. instead of doing it manually. llvm-svn: 335720	2018-06-27 10:51:55 +00:00
Simon Pilgrim	dfbcc66adc	[DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0) Fixes PR37569. llvm-svn: 335719	2018-06-27 10:21:06 +00:00
Simon Pilgrim	0a566bc0ae	[DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569) llvm-svn: 335717	2018-06-27 09:41:22 +00:00
Sanjay Patel	fb9c440ba5	[DAGCombiner] use isBitwiseNot to simplify code; NFC llvm-svn: 335652	2018-06-26 19:46:56 +00:00
Simon Pilgrim	7f55af37f4	[DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119) Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported. llvm-svn: 335637	2018-06-26 17:46:51 +00:00
Simon Pilgrim	133b1cdf08	[DAGCombiner] Pull out VT bitwidth in visitSDIV. NFCI. llvm-svn: 335617	2018-06-26 15:39:16 +00:00
Simon Pilgrim	5b6b500687	Fix -Wparentheses gcc warning. NFCI. llvm-svn: 335451	2018-06-25 11:19:05 +00:00
Sanjay Patel	962ee178fa	[DAGCombiner] eliminate setcc bool math when input is low-bit of some value This patch has the same motivating example as D48466: define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) { %c.0282 = and i32 %c.0282.in, 268435455 %a16 = lshr i64 32508, %x %a17 = and i64 %a16, 1 %tobool = icmp eq i64 %a17, 0 %. = select i1 %tobool, i32 1, i32 2 %.286 = select i1 %tobool, i32 27, i32 26 %shr97 = lshr i32 %c.0282, %. %shl98 = shl i32 %c.0282.in, %.286 %or99 = or i32 %shr97, %shl98 %shr100 = lshr i32 %d.0280, %. %shl101 = shl i32 %d.0280, %.286 %or102 = or i32 %shr100, %shl101 store i32 %or99, i32* %ptr0 store i32 %or102, i32* %ptr1 ret void } ...but I'm trying to kill the setcc bool math sooner rather than later. By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, we can create a universally good fold because we always eliminate the condition code intermediate value. Here are Alive proofs for these (currently instcombine folds the 'add' variants, but misses the 'sub' patterns): https://rise4fun.com/Alive/Gsyp Name: sub of zext cmp mask %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %z = zext i1 %c to i32 %r = sub i32 C1, %z => %optional_cast = zext i8 %a to i32 %r = add i32 %optional_cast, C1-1 Name: add of zext cmp mask %a = and i32 %x, 1 %c = icmp eq i32 %a, 0 %z = zext i1 %c to i8 %r = add i8 %z, C1 => %optional_cast = trunc i32 %a to i8 %r = sub i8 C1+1, %optional_cast All of the tests look like improvements or neutral to me. But it is possible that x86 test+set+bitop is better than what we now show here. I suspect we could do better by adding another fold for the 'sub' variants. We start with select-of-constant in IR in the larger motivating test, so that's why I included tests with selects. Proofs for those variants: https://rise4fun.com/Alive/Bx1 Name: true const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C2, i64 C1 => %z = zext i8 %a to i64 %r = sub i64 C2, %z Name: false const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C1, i64 C2 => %z = zext i8 %a to i64 %r = add i64 C1, %z Differential Revision: https://reviews.llvm.org/D48466 llvm-svn: 335433	2018-06-24 14:37:30 +00:00
Stanislav Mekhanoshin	22ee191c3e	DAG combine "and\|or (select c, -1, 0), x" -> "select c, x, 0\|-1" Allowed folding for "and/or" binops with non-constant operand if arguments of select are 0/-1 values. Normally this code with "and" opcode does not get to a DAG combiner and simplified yet in the InstCombine. However AMDGPU produces it during lowering and InstCombine has no chance to optimize it out. In turn the same pattern with "or" opcode can reach DAG. Differential Revision: https://reviews.llvm.org/D48301 llvm-svn: 335250	2018-06-21 16:02:05 +00:00
David Green	a465188500	[DAGCombine] Fix alignment for offset loads/stores The alignment parameter to getExtLoad is treated as a base alignment, not the alignment of the load (base + offset). When we infer a better alignment for a Ptr we need to ensure that it applies to the base to prevent the alignment on the load from being wrong. This fixes a bug where the alignment could then be used to incorrectly prove noalias between a load and a store, leading to a miscompile. Differential Revision: https://reviews.llvm.org/D48029 llvm-svn: 335210	2018-06-21 08:30:07 +00:00
Stanislav Mekhanoshin	20279dc025	Allow binop C1, (select cc, CF, CT) -> select folding Previously this folding was done only if select is a first operand. However, for non-commutative operations constant may go before select. Differential Revision: https://reviews.llvm.org/D48223 llvm-svn: 335167	2018-06-20 20:24:20 +00:00
Nirav Dave	cd558887d3	[DAG] Fix and-mask folding when narrowing loads. Summary: Check that and masks are strictly smaller than implicit mask from narrowed load. Fixes PR37820. Reviewers: samparker, RKSimon, nemanjai Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48335 llvm-svn: 335137	2018-06-20 15:36:29 +00:00
Craig Topper	ddd88a559f	[DAGCombiner] Add some comments to some true/false arguments to make it obvious what they are. NFC llvm-svn: 335095	2018-06-20 04:32:07 +00:00
Michael Berg	7b993d762f	Utilize new SDNode flag functionality to expand current support for fadd Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47909 llvm-svn: 334996	2018-06-18 23:44:59 +00:00
Michael Berg	932ba20af8	refactor of visitFADD for AllowNewConst cases Summary: Refactoring for all constant cases which require AllowNewConst and some staging for future fmf usage. Reviewers: spatel, hfinkel, wristow Reviewed By: spatel Subscribers: nhaehnle Differential Revision: https://reviews.llvm.org/D48289 llvm-svn: 334984	2018-06-18 21:12:21 +00:00
Michael Berg	8e570c3390	Utilize new SDNode flag functionality to expand current support for fma Summary: This patch originated from D47388 and is a proper subset of the originating changes, containing only the fmf optimization guard extensions. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar, rampitec, nhaehnle, nemanjai Reviewed By: rampitec, nhaehnle Subscribers: tpr, nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47918 llvm-svn: 334876	2018-06-16 00:03:06 +00:00
Michael Berg	02d1c6c0cf	Utilize new SDNode flag functionality to expand current support for fdiv Summary: This patch originated from D46562 and is a proper subset, with some issues addressed. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng, nhaehnle Differential Revision: https://reviews.llvm.org/D47954 llvm-svn: 334862	2018-06-15 20:44:55 +00:00
Matt Arsenault	df2f4ef29d	DAG: Fix creating concat_vectors with illegal type Test passes as is, but fails with future patch to make v4i16/v4f16 legal. llvm-svn: 334823	2018-06-15 12:09:15 +00:00
Michael Berg	0c20447a02	easing the constraint for isNegatibleForFree and GetNegatedExpression Summary: Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: efriedma, wdng, tpr Differential Revision: https://reviews.llvm.org/D48057 llvm-svn: 334769	2018-06-14 20:54:13 +00:00
Michael Berg	4663ceb63f	updating isNegatibleForFree and GetNegatedExpression with fmf for fadd Summary: A FMF constraint is added to FADD with unsafe still available as the fallback Reviewers: spatel, wristow, arsenm, hfinkel Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D48180 llvm-svn: 334753	2018-06-14 18:48:31 +00:00
Sanjay Patel	7d4929611c	[DAGCombiner] remove hasOneUse() check from fadd constants transform We're constant folding here, so we shouldn't check uses. This matches the IR optimizer behavior. The x86 test shows the expected win. The AArch64 test shows something else. This only seems to happen if the "generic" AArch64 CPU model is used by MachineCombiner, so I'll file a bug report to follow-up. llvm-svn: 334608	2018-06-13 15:22:48 +00:00
Krzysztof Parzyszek	82d284c1d2	[DAGCombiner] Recognize more patterns for ABS Differential Revision: https://reviews.llvm.org/D47831 llvm-svn: 334553	2018-06-12 21:51:49 +00:00
Michael Berg	5d49f66570	Utilize new SDNode flag functionality to expand current support for fmul Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: nhaehnle, wdng Differential Revision: https://reviews.llvm.org/D47911 llvm-svn: 334514	2018-06-12 16:13:11 +00:00
Krzysztof Parzyszek	3d671248ab	[SelectionDAG] Provide default expansion for rotates Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors. Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion. Differential Revision: https://reviews.llvm.org/D47725 llvm-svn: 334497	2018-06-12 12:49:36 +00:00
Matt Arsenault	5615fa0a87	DAG: Fix extract_subvector combine for a single element This would fail before because 1x vectors aren't legal, so instead just use the scalar type. Avoids regressions in a future AMDGPU commit to add v4i16/v4f16 as legal types. Test update is just the one test that this triggers on in tree now. It wasn't checking anything before. The result is completely changed since the selects are eliminated. Not sure if it's considered better or not. llvm-svn: 334440	2018-06-11 21:27:41 +00:00
Sanjay Patel	3e5c70cc1d	[DAGCombiner] match vector compare and select sizes with extload operand (PR37427) This patch started off much more general and ambitious, but it's been a nightmare seeing all the ways x86 vector codegen can go wrong. So the code is still structured to allow extending easily, but it's currently limited in several ways: 1. Only handle cases with an extending load. 2. Only handle cases with a zero constant compare. 3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected. The motivating case from PR37427: https://bugs.llvm.org/show_bug.cgi?id=37427 ...is the 1st test, and that shows the expected win - we eliminated the unnecessary intermediate cast. There's a clear regression in the last test (sgt_zero_fp_select) because we longer recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit setcc from a sign-extended operand and remove it. Differential Revision: https://reviews.llvm.org/D47330 llvm-svn: 334378	2018-06-10 23:09:50 +00:00
Craig Topper	61998289f9	Use SmallPtrSet instead of SmallSet in places where we iterate over the set. SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work. These places were found by hiding the begin/end methods in the SmallSet forwarding llvm-svn: 334343	2018-06-09 05:04:20 +00:00
Sanjay Patel	498564e6fb	[DAGCombiner] clean up comments; NFC llvm-svn: 334312	2018-06-08 18:00:46 +00:00
Michael Berg	bf90d1f263	Utilize new SDNode flag functionality to expand current support for fsub Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub. Reviewers: spatel, hfinkel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D47910 llvm-svn: 334306	2018-06-08 17:39:50 +00:00
Sam Parker	16f963ba0d	[DAGCombine] Fix for PR37667 While trying to propagate AND masks back to loads, we currently allow one non-load node to be included as a leaf in chain. This fix now limits that node to produce only a single data value. Differential Revision: https://reviews.llvm.org/D47878 llvm-svn: 334268	2018-06-08 07:49:04 +00:00
Michael Berg	77b5be7ec6	propagate fast math flags via IR on fma and sub expressions Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode. Reviewers: spatel, arsenm, hfinkel, javed.absar Reviewed By: spatel Subscribers: nemanjai, wdng Differential Revision: https://reviews.llvm.org/D47388 llvm-svn: 334242	2018-06-07 22:49:09 +00:00
Matt Arsenault	e8eb567e17	DAG: Avoid bitcast/ext/build_vector combine This avoids regressions in a future AMDGPU change to make v4i16/v4f16 legal. For these types, build_vector is implemented as bitcasted operations on v2i32. This combine was creating v4i16s out of what would have been already been a v2i32 build_vector, creating a mess of nodes that never get cleaned up. I'm not sure this is the right condition to check. I initially tried just checking for the legality of the new build_vector. This works for my case, but breaks dozens of x86 tests. A Mips test seems to show some improvement or at least a neutral change. I don't want to think about how long it would take to analyze the set of different x86 vector operations impacted. Test included in future commit. llvm-svn: 334218	2018-06-07 19:42:27 +00:00
Michael Berg	cc1c4b6912	guard fsqrt with fmf sub flags Summary: This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483. It contains only context for fsqrt. Reviewers: spatel, hfinkel, arsenm Reviewed By: spatel Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai Differential Revision: https://reviews.llvm.org/D47749 llvm-svn: 334113	2018-06-06 18:47:55 +00:00
Reid Kleckner	adcaddb6da	Fix -Wcovered-switch-default warning and clang-format it llvm-svn: 333967	2018-06-04 23:47:29 +00:00
Amaury Sechet	da661e9236	[DAGcombine] Teach the combiner about -a = ~a + 1 Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain. Reviewers: efriedma, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46505 llvm-svn: 333943	2018-06-04 19:23:22 +00:00
Amaury Sechet	93a7d2aa3c	Get rid of SETCCE Summary: It has been deprecated in favor of SETCCCARRY for a year now and isn't used by any in tree backend. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47685 llvm-svn: 333939	2018-06-04 18:36:22 +00:00
Krzysztof Parzyszek	623eb54361	[SelectionDAG] Add missing closing parentheses in comments, NFC llvm-svn: 333907	2018-06-04 14:54:53 +00:00
Nirav Dave	fc9a700f94	[DAG] Avoid checking for consecutive stores in store merge. NFCI. llvm-svn: 333766	2018-06-01 15:05:55 +00:00
Nirav Dave	39ece11ae5	[DAG] Simplify Expression. NFC. llvm-svn: 333765	2018-06-01 15:05:30 +00:00
Nirav Dave	0fc27acaa2	[DAG] Remove untriggerable check. NFCI. Candidate check precludes this check. llvm-svn: 333764	2018-06-01 15:05:05 +00:00
Nirav Dave	a74921a696	[DAG] Prune store merge legal store check to stop invalid size. NFCI. Do not consider store sizes large than the maximum legal store size. llvm-svn: 333763	2018-06-01 15:04:40 +00:00
Krzysztof Parzyszek	0b6187c1a9	[SelectionDAG] Expand UADDO/USUBO into ADD/SUBCARRY if legal for target Additionally, implement handling of ADD/SUBCARRY on Hexagon, utilizing the UADDO/USUBO expansion. Differential Revision: https://reviews.llvm.org/D47559 llvm-svn: 333751	2018-06-01 14:00:32 +00:00
Roman Lebedev	9f65d16d5d	[DAGCombiner] isAllOnesConstantOrAllOnesSplatConstant(): look through bitcasts Summary: As pointed out in D46528, we errneously transform cases like `xor X, -1`, even though we use said function. It's because the `-1` is actually a bitcast there. So i think we can just look through it in the function. Differential Revision: https://reviews.llvm.org/D47156 llvm-svn: 332905	2018-05-21 21:41:10 +00:00
Roman Lebedev	7772de25d0	[DAGCombine][X86][AArch64] Masked merge unfolding: vector edition. Summary: This appears to be the last missing piece for the masked merge pattern handling in the backend. This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`), and we need to make sure that they are generated. Differential Revision: https://reviews.llvm.org/D46528 llvm-svn: 332904	2018-05-21 21:41:02 +00:00
Craig Topper	25444c852a	[DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates. This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work. As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this. I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it. Differential Revision: https://reviews.llvm.org/D47116 llvm-svn: 332895	2018-05-21 21:09:18 +00:00
Nirav Dave	11fd14c1ac	[DAG] Prune cycle check in store merge. As part of merging stores we check that fusing the nodes does not cause a cycle due to one candidate store being indirectly dependent on another store (this may happen via chained memory copies). This is done by searching if a store is a predecessor to another store's value. Prune the search at the candidate search's root node which is a predecessor to all candidate stores. This reduces the size of the subgraph searched in large basic blocks. Reviewers: jyknight Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D46955 llvm-svn: 332490	2018-05-16 16:48:20 +00:00
Nirav Dave	d9d86cb738	[DAG] Defer merge store cycle checking to just before merge. NFCI. llvm-svn: 332489	2018-05-16 16:47:54 +00:00
Nirav Dave	a87de7d846	[DAGCombine] Move load checks on store of loads into candidate search. NFCI. Migrate single-use and non-volatility, non-indexed requirements on stores of immediate store values to candidate collection pass from later stage. llvm-svn: 332392	2018-05-15 20:31:53 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Vedant Kumar	99d5c072f0	[DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N) ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to reflect a simplification to the load (ExtLoad). Based on my reading, ExtendSetCCUses may create new nodes to extend a constant attached to a SETCC. It also creates fresh SETCC nodes which refer to any updated operands. ISTM that the location applied to the new constant and SETCC nodes should be the same as the location of the ExtLoad. This was suggested by Adrian in https://reviews.llvm.org/D45995. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46216 llvm-svn: 332119	2018-05-11 18:40:10 +00:00
Vedant Kumar	fd340a4047	[DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N) This teaches tryToFoldExtOfLoad to set the right location on a newly-created extload. With that in place, the logic for performing a certain ([s\|z]ext (load ...)) combine becomes identical for sexts and zexts, and we can get rid of one copy of the logic. The test case churn is due to dependencies on IROrders inherited from the wrong SDLoc. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46158 llvm-svn: 332118	2018-05-11 18:40:08 +00:00
Vedant Kumar	f0e5f7c45e	[DAGCombiner] Factor out duplicated logic for an extload combine, NFC (5/N) Part of the logic for combining (zext (load ...)) and (sext (load ...)) is duplicated. This creates problems because bugs in one version have to be fixed again in the other version. To address this, as a first step, I've extracted the duplicate logic into a helper. I'll fix the debug location bug in the helper and eliminate the copy of its logic in a followup. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46157 llvm-svn: 332117	2018-05-11 18:40:02 +00:00
Nirav Dave	a5ad417589	[DAG] Avoid using deleted node in rebuildSetCC Summary: The combine in rebuildSetCC may be combined to another node leaving our references stale. Keep a handle on it to avoid stale references. Fixes PR36602. Reviewers: dbabokin, RKSimon, eli.friedman, davide Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D46404 llvm-svn: 331985	2018-05-10 14:28:54 +00:00
Craig Topper	176ec8506f	[DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only call getBitcast if its an fp->int or int->fp conversion even when before legalize ops. Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification. This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations. llvm-svn: 331896	2018-05-09 17:14:27 +00:00
Amara Emerson	4e66142f14	[DAGCombine] Change store merge candidates check cut off to 1024. The previous value of 8192 resulted in severe compile time hits in some pathological cases. rdar://39781410 Differential Revision: https://reviews.llvm.org/D46581 llvm-svn: 331888	2018-05-09 15:53:06 +00:00
Roman Lebedev	9bd6067db6	[DAGCombiner] Masked merge: enhance handling of 'andn' with immediates Summary: Split off from D46031. The previous patch, D46493, completely disabled unfolding in case of immediates. But we can do better: {F6120274} {F6120277} https://rise4fun.com/Alive/xJS Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D46494 llvm-svn: 331685	2018-05-07 21:52:22 +00:00
Roman Lebedev	cc42d08b1d	[DagCombiner] Not all 'andn''s work with immediates. Summary: Split off from D46031. In masked merge case, this degrades IPC by decreasing instruction count. {F6108777} The next patch should be able to recover and improve this. This also affects the transform @spatel have added in D27489 / rL289738, and the test coverage for X86 was missing. But after i have added it, and looked at the changes in MCA, i'm somewhat confused. {F6093591} {F6093592} {F6093593} I'd say this regression is an improvement, since `IPC` increased in that case? Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: andreadb, llvm-commits, spatel Differential Revision: https://reviews.llvm.org/D46493 llvm-svn: 331684	2018-05-07 21:52:11 +00:00
Roman Lebedev	cb1af9134a	[NFC][DAGCombine] unfoldMaskedMerge(): rename two variables The current names can be confused with the A and B sides of the canonical masked merge pattern. llvm-svn: 331609	2018-05-06 20:02:22 +00:00
Roman Lebedev	a3b0b59f54	[DAGCombiner] Masked merge: don't touch "not" xor's. Summary: Split off form D46031. It seems we don't want to transform the pattern if the `xor`'s are actually `not`'s. In vector case, this breaks `andnpd` / `vandnps` patterns. That being said, we may want to re-visit this `not` handling, maybe in D46073. Reviewers: spatel, craig.topper, javed.absar Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46492 llvm-svn: 331595	2018-05-05 15:45:40 +00:00
Roman Lebedev	49ada82fa7	[NFC][DagCombiner] unfoldMaskedMerge(): improve readability. llvm-svn: 331588	2018-05-05 10:39:54 +00:00
Craig Topper	781aa181ab	Fix a bunch of places where operator-> was used directly on the return from dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577	2018-05-05 01:57:00 +00:00
Michael Berg	7acc81b744	Fast Math Flag mapping into SDNode Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage. Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar Reviewed By: spatel Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng Differential Revision: https://reviews.llvm.org/D45710 llvm-svn: 331547	2018-05-04 18:48:20 +00:00
Vedant Kumar	e23173b677	[DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N) The logic for this combine is almost identical to the logic for a (sext (sextload x)) combine. This commit factors out the logic so it can be shared by both combines, and corrects the SDLoc assigned in the zext version of the combine. Prior to this patch, for the given test case, we would apply the location associated with the udiv instruction to instructions which perform the load. Part of: llvm.org/PR37262 llvm-svn: 331303	2018-05-01 19:51:15 +00:00
Vedant Kumar	d7117ed0f9	[DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N) Prior to this patch, for the given test case, we would apply the location associated with the sdiv instruction to instructions which perform the load. Part of: llvm.org/PR37262. Differential Revision: https://reviews.llvm.org/D46222 llvm-svn: 331302	2018-05-01 19:51:15 +00:00
Vedant Kumar	cc7b2a55c2	[DAGCombiner] Change the SDLoc on split extloads (2/N) In DAGCombiner, we try to simplify this pattern: ([s\|z]ext (load ...)) Conceptually, a new extload which is created while splitting the load should have the same debug location as the load. Making this change affects the IROrder of the new load, causing some test case churn. In practice, the new location is never different from the location of the [s\|z]ext, at least not during check-llvm or a stage2 build. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46156 llvm-svn: 331301	2018-05-01 19:29:15 +00:00
Vedant Kumar	ee4bfcaa5a	[DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N) Setting the right SDLoc on a newly-created zextload fixes a line table bug which resulted in non-linear stepping behavior. Several backend tests contained CHECK lines which relied on the IROrder inherited from the wrong SDLoc. This patch breaks that dependence where feasbile and regenerates test cases where not. In some cases, changing a node's IROrder may alter register allocation and spill behavior. This can affect performance. I have chosen not to prevent this by applying a "known good" IROrder to SDLocs, as this may hide a more general bug in the scheduler, or cause regressions on other test inputs. rdar://33755881, Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D45995 llvm-svn: 331300	2018-05-01 19:26:15 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Sanjay Patel	1babf5ff32	[DAGCombiner] rename function attribute for disabling ftrunc transform This is the matching name change for the Clang patch at: D46236 rL331209 Differential Revision: https://reviews.llvm.org/D46237 llvm-svn: 331210	2018-04-30 18:20:33 +00:00
Heejin Ahn	d20d0648ed	[DAGCombiner] Fix a case of 1 in non-splat vector pow2 divisor Summary: D42479 (rL329525) enabled SDIV combine for pow2 non-splat vector dividers. But when there is a 1 in a vector, the instruction sequence to be generated involves shifting a value by the number of its bit widths, which is undefined (`c64f4dbfe3/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (L6000-L6006)`). Especially, in architectures that do not support vector instructions, each of element in a vector will be computed separately using scalar operations, and then the resulting value will be undef for '1' values in a vector. (All 1's vector is fine; only vectors mixed with 1 and others will be affected.) Reviewers: RKSimon, jgravelle-google Subscribers: jfb, dschuff, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D46161 llvm-svn: 331092	2018-04-27 22:23:11 +00:00
Sanjay Patel	5a90285bd9	[DAGCombiner] limit ftrunc optimizations with function attribute As noted, the attribute name is subject to change once we have the clang side implemented, but it's clear that we need some kind of attribute-based predication here based on the discussion for: rL330437 llvm-svn: 330951	2018-04-26 16:04:44 +00:00
Sanjay Patel	a5da086386	[DAGCombiner] refactor FP->int->FP folds; NFC As discussed in the post-review comments for rL330437, we need to guard this fold to allow existing code to keep working with the undefined behavior that they've come to rely on. That would mean duplicating more code than we already have, so let's fix that first. llvm-svn: 330947	2018-04-26 15:20:18 +00:00
Craig Topper	f3cefad255	[DAGCombiner][X86] When promoting loads don't use ZEXTLOAD even its legal We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets. Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue. This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here. Differential Revision: https://reviews.llvm.org/D45585#inline-402825 llvm-svn: 330781	2018-04-24 22:35:27 +00:00
Roman Lebedev	95c6eaf530	[DAGCombiner] Unfold scalar masked merge if profitable Summary: This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 \| PR37104 ]]. [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly. Previously, `andl`+`andn`/`andps`+`andnps` / `bic`/`bsl` would be generated. (see `@out`) Now, they would no longer be generated (see `@in`). So we need to make sure that they are still generated. If the mask is constant, we do nothing. InstCombine should have unfolded it. Else, i use `hasAndNot()` TLI hook. For now, only handle scalars. https://rise4fun.com/Alive/bO6 ---- I really don't like the code i wrote in `DAGCombiner::unfoldMaskedMerge()`. It is super fragile. Is there something like IR Pattern Matchers for this? Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: andreadb, courbet, kristof.beyls, javed.absar, rengolin, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D45733 llvm-svn: 330646	2018-04-23 20:38:49 +00:00
Sanjay Patel	3d453ad711	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) This was originally committed at rL328921 and reverted at rL329920 to investigate failures in Chrome. This time I've added to the ReleaseNotes to warn users of the potential of exposing UB and let me repeat that here for more exposure: Optimization of floating-point casts is improved. This may cause surprising results for code that is relying on undefined behavior. Code sanitizers can be used to detect affected patterns such as this: int main() { float x = 4294967296.0f; x = (float)((int)x); printf("junk in the ftrunc: %f\n", x); return 0; } $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 Original commit message: fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 330437	2018-04-20 15:07:55 +00:00
Gerolf Hoflehner	5b4a67af1b	[DAGCombiner] Fix for oss-fuzz bug llvm-svn: 330178	2018-04-17 07:22:34 +00:00
Sanjay Patel	6c3af659a2	[DAGCombiner, PowerPC] allow X - (fpext(-Y) --> X + fpext(Y) with multiple uses This is a transform that I limited in instcombine in rL329821 because it was creating more instructions in IR when the cast has multiple uses. But if the cast is free, then we can do the transform regardless of other uses because it improves the potential throughput of the calculation by removing a dependency on the fneg. Differential Revision: https://reviews.llvm.org/D45598 llvm-svn: 330098	2018-04-15 16:43:48 +00:00
Sanjay Patel	a54e7d1a6d	[DAGCombiner] simplify code; NFC llvm-svn: 329964	2018-04-12 22:14:58 +00:00
Sanjay Patel	5ace2b765a	revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617) This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920	2018-04-12 15:27:01 +00:00
Sam Parker	1f4f4d9a08	[DAGCombine] Improve ReduceLoad for SRL Recommitting r329283, third time lucky... If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329551	2018-04-09 08:16:11 +00:00
Zvi Rackover	7a53f169f1	DAGCombiner: Combine SDIV with non-splat vector pow2 divisor Summary: Extend existing SDIV combine for pow2 constant divider to handle non-splat vectors of pow2 constants. Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma Reviewed By: RKSimon Subscribers: magabari, llvm-commits Differential Revision: https://reviews.llvm.org/D42479 llvm-svn: 329525	2018-04-08 11:35:20 +00:00
Guozhi Wei	0eb86c8efc	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst)) In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 329516	2018-04-07 23:36:10 +00:00
Craig Topper	5b95eae1c3	[DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector. llvm-svn: 329509	2018-04-07 19:09:50 +00:00
Mandeep Singh Grang	e92f0cfe34	[CodeGen] Change std::sort to llvm::sort in response to r327219 Summary: r327219 added wrappers to std::sort which randomly shuffle the container before sorting. This will help in uncovering non-determinism caused due to undefined sorting order of objects having the same key. To make use of that infrastructure we need to invoke llvm::sort instead of std::sort. Note: This patch is one of a series of patches to replace all std::sort to llvm::sort. Refer the comments section in D44363 for a list of all the required patches. Reviewers: bogner, rnk, MatzeB, RKSimon Reviewed By: rnk Subscribers: JDevlieghere, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45133 llvm-svn: 329435	2018-04-06 18:08:42 +00:00
Sam Parker	0e7deb8104	[DAGCombine] Revert r329160 Again, broke the big endian stage 2 builders. llvm-svn: 329283	2018-04-05 13:46:17 +00:00
Sam Parker	7ec722d603	[DAGCombine] Improve ReduceLoadWidth for SRL Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160	2018-04-04 09:26:56 +00:00
Sanjay Patel	6124cae8f7	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 328921	2018-03-31 17:55:44 +00:00
Sanjay Patel	e09b7dcf3d	[SelectionDAG] Removing FABS folding from DAGCombiner The code has bugs dealing with -0.0. Since D44550 introduced FABS pattern folding in InstCombine, this patch removes the now-redundant code that causes https://bugs.llvm.org/show_bug.cgi?id=36600. Patch by Mikhail Dvoretckii! Differential Revision: https://reviews.llvm.org/D44683 llvm-svn: 328872	2018-03-30 15:42:52 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
Martin Storsjo	db75aa96d3	Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))" This reverts commit r328252. This change broke building a number of projects when targeting ARM and AArch64, see PR36873. llvm-svn: 328297	2018-03-23 08:36:47 +00:00
Guozhi Wei	17ff975eb1	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst)) In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 328252	2018-03-22 21:47:25 +00:00
Craig Topper	956fec2a4a	[DAGCombiner] Fix type in comment. NFC llvm-svn: 327916	2018-03-19 22:25:26 +00:00
Craig Topper	b36cb20ef9	[X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend. I had to modify the bswap recognition to allow unshrunk masks to make this work. Fixes PR36689. Differential Revision: https://reviews.llvm.org/D44442 llvm-svn: 327530	2018-03-14 16:55:15 +00:00
Craig Topper	4aeec51986	[DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG. BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling. llvm-svn: 327446	2018-03-13 20:36:28 +00:00
Simon Pilgrim	9855b39380	[DAGCombine] visitREM - Don't assume that one divrem isn't driving another Under some circumstances the divrems won't have been combined together before getting to this code. So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen. Reduced from OSS-Fuzz #6883 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883 llvm-svn: 327424	2018-03-13 17:17:15 +00:00
Nirav Dave	775f07d121	Make early exit hasPredecessorHelper return true. NFCI. All uses conservatively assume in early exit case that it will be a predecessor. Changing default removes checking code in all uses. llvm-svn: 327169	2018-03-09 20:56:51 +00:00
Craig Topper	4196dd12a2	[DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant. Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast. llvm-svn: 326677	2018-03-04 18:51:46 +00:00
Craig Topper	e7ca6f5456	[DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors. Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code. Differential Revision: https://reviews.llvm.org/D42679 llvm-svn: 326500	2018-03-01 22:32:25 +00:00
Amaury Sechet	893a6b89ff	[DAGCOmbine] Ensure that (brcond (setcc ...)) is handled in a canonical manner. Summary: There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs. Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond. Reviewers: spatel, hfinkel, niravd, craig.topper Subscribers: nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D41235 llvm-svn: 325892	2018-02-23 11:50:42 +00:00
Simon Pilgrim	be72fe1fda	[SelectionDAG] Move matchUnaryPredicate/matchBinaryPredicate into SelectionDAGNodes.h This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.). Differential Revision: https://reviews.llvm.org/D43466 llvm-svn: 325815	2018-02-22 18:45:13 +00:00
Craig Topper	1d104b996a	[DAGCombiner] Add two calls to isVector before making calls to getVectorElementType/getVectorNumElements to avoid an assert. We looked through a BITCAST, but the bitcast might be a from a scalar type rather than a vector. I don't have a test case. I stumbled onto it while prototyping another change that isn't ready yet. llvm-svn: 325750	2018-02-22 07:05:27 +00:00
Craig Topper	35801fa5ce	[SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and simplify DAGCombiner and simplifySetCC code and fix a bug. DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places. This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner. Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too. Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future. Fixes PR36250. Differential Revision: https://reviews.llvm.org/D43449 llvm-svn: 325602	2018-02-20 17:41:05 +00:00
Simon Pilgrim	d6beac3b76	[DAGCombiner] Remove simplifyShuffleMask - now handled more generally by SimplifyDemandedVectorElts. llvm-svn: 325429	2018-02-17 12:36:56 +00:00
Craig Topper	dac3c1f5c8	[DAGCombiner] Call ExtendUsesToFormExtLoad in (zext (and (load)))->(and (zextload)) even when the and does not have multiple uses Same for the sign extend case. Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses. This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free? Differential Revision: https://reviews.llvm.org/D43063 llvm-svn: 325289	2018-02-15 20:20:32 +00:00
Simon Pilgrim	80663ee986	[SelectionDAG] Add initial implementation of TargetLowering::SimplifyDemandedVectorElts This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation. Further features can be moved/added in future patches. Differential Revision: https://reviews.llvm.org/D42896 llvm-svn: 325232	2018-02-15 12:14:15 +00:00
Alexander Ivchenko	7e5d525bd5	[SelectionDAG][X86] Fix incorrect offset generated for VMASKMOV When creating high MachineMemOperand for MSTORE/MLOAD we supply it with the original PointerInfo, while the pointer itself had been incremented. The patch adds the proper offset to the PointerInfo. llvm-svn: 325135	2018-02-14 15:55:24 +00:00
Craig Topper	f73ff612ca	[DAGCombiner] Add one use check to fold (not (and x, y)) -> (or (not x), (not y)) Summary: If the and has an additional use we shouldn't invert it. That creates an additional instruction. While there add a one use check to the transform above that looked similar. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43225 llvm-svn: 325019	2018-02-13 16:25:27 +00:00
Simon Pilgrim	0be5567a89	[X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT. Differential Revision: https://reviews.llvm.org/D43014 llvm-svn: 324837	2018-02-11 10:52:37 +00:00
Craig Topper	36f913ee80	[SelectionDAG] Remove TargetLowering::getConstTrueVal. Use SelectionDAG::getBoolConstant in the one place it was used. SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently. llvm-svn: 324832	2018-02-11 04:58:58 +00:00
Vedant Kumar	7fd9a58d8c	Revert "WIP: [DAGCombiner] Assert that debug info is preserved" This reverts commit r324648. It was committed accidentally. llvm-svn: 324650	2018-02-08 20:27:35 +00:00
Vedant Kumar	28323ff5a3	WIP: [DAGCombiner] Assert that debug info is preserved llvm-svn: 324648	2018-02-08 20:27:09 +00:00
Craig Topper	c19aed963e	[DAGCombiner] Fix a couple mistakes from r324311 by really passing the original load to ExtendSetCCUses. We're passing the binary op that uses the load instead of the load. Noticed by inspection. Not sure how to test this because this just prevents the introduction of an extend that will later be truncated and will probably be combined out. llvm-svn: 324568	2018-02-08 06:27:18 +00:00

... 11 12 13 14 15 ...

3324 Commits