llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	def7cfb7ff	[InstCombine] Use is_contained (NFC)	2020-11-21 15:47:11 -08:00
Kazu Hirata	43c0e4f665	[Transforms] Use llvm::is_contained (NFC)	2020-11-18 20:42:22 -08:00
Benjamin Kramer	7b782062b4	[InstCombine] Simplify code. NFCI.	2020-09-27 19:11:07 +02:00
Sanjay Patel	6bad3caeb0	[InstCombine] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Simon Pilgrim	48b510c4bc	[NFC] Fix compiler warnings due to integer comparison of different signedness Fix by directly using INT_MAX and INT32_MAX. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D87347	2020-09-11 15:32:03 +01:00
Christopher Tetreault	640f20b0c7	[SVE] Remove calls to VectorType::getNumElements from InstCombine Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D82237	2020-08-31 12:59:10 -07:00
Nikita Popov	6093b14c2c	[InstCombine] Return replaceInstUsesWith() result (NFC) Follow the usual usage pattern for this function and return the result.	2020-08-29 14:49:57 +02:00
Roman Lebedev	71ac9105cd	[InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW We really shouldn't use RAUW in InstCombine because we should consistently update Worklist to avoid extra iterations.	2020-08-29 15:10:14 +03:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Roman Lebedev	2f01785857	[NFC][InstCombine] Aggregate reconstruction: use plain map Now that we no longer require for this map to have stable iteration order, we no longer need to pay for keeping the iteration order stable, so switch from `SmallMapVector` to `SmallDenseMap`.	2020-08-19 01:09:25 +03:00
Roman Lebedev	78bd4231bf	[InstCombine] PHI-aware aggregate reconstruction: properly handle duplicate predecessors While it may seem like we can just "deduplicate" the case where some basic block happens to be a predecessor more than once, which happens for e.g. switches, that is not correct thing to do. We must actually add a PHI operand for each predecessor. This was initially reported to me by David Major as a clang crash during gecko build for android.	2020-08-19 01:00:42 +03:00
Roman Lebedev	03127f795b	[InstCombine] PHI-aware aggregate reconstruction: correctly detect "use" basic block While the original implementation added in D85787 / `ae7f08812e` is not incorrect, it is known to be suboptimal. In particular, it is not incorrect to use the basic block in which the original `insertvalue` instruction is located as the merge point, that is not necessarily optimal, as `@test6` shows. We should look at all the AggElts, and, if they are all defined in the same basic block, then that is the basic block we should use. On RawSpeed library, this catches +4% (+50) more cases. On vanilla LLVM test-suits, this catches +12% (+92) more cases.	2020-08-18 00:45:18 +03:00
Roman Lebedev	f4f673e0e3	[NFC][InstCombine] PHI-aware aggregate reconstruction: don't capture UseBB in lambdas, take it as argument In a following patch, UseBB will be detected later, so capturing it is potentially error-prone (capture by ref vs by val). Also, parametrized UseBB will likely be needed for multiple levels of PHI indirections later on anyways.	2020-08-18 00:45:18 +03:00
Roman Lebedev	4973ca3eac	[NFC][InstCombine] PHI-aware aggregate reconstruction: insert PHI node manually This is NFC at the moment, because right now we always insert the PHI into the same basic block in which the original `insertvalue` instruction is, but that will change. Also, fixes addition of the suffix to the value names.	2020-08-18 00:45:17 +03:00
Yonghong Song	aa61e43040	[InstCombine] Fix a compilation bug With gcc 6.3.0, I hit the following compilation bug. ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic] }; ^ cc1plus: all warnings being treated as errors The error is introduced by Commit `ae7f08812e` ("[InstCombine] Aggregate reconstruction simplification (PR47060)")	2020-08-16 21:56:42 -07:00
Roman Lebedev	0ec1f0f332	[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically	2020-08-16 23:37:36 +03:00
Roman Lebedev	ae7f08812e	[InstCombine] Aggregate reconstruction simplification (PR47060) This pattern happens in clang C++ exception lowering code, on unwind branch. We end up having a `landingpad` block after each `invoke`, where RAII cleanup is performed, and the elements of an aggregate `{i8, i32}` holding exception info are `extractvalue`'d, and we then branch to common block that takes extracted `i8` and `i32` elements (via `phi` nodes), form a new aggregate, and finally `resume`'s the exception. The problem is that, if the cleanup block is effectively empty, it shouldn't be there, there shouldn't be that `landingpad` and `resume`, said `invoke` should be a `call`. Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`. But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft, while it is pointless, does not look like "empty cleanup block". So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has higher cost than it could have on unwind branch :S This doesn't happen that often, but it will basically happen once per C++ function with complex CFG that called more than one other function that isn't known to be `nounwind`. I think, this is a missing fold in InstCombine, so i've implemented it. I think, the algorithm/implementation is rather self-explanatory: 1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate. 2. For each element, try to find from which aggregate it was extracted. If it was extracted from the aggregate with identical type, from identical element index, great. 3. If all elements were found to have been extracted from the same aggregate, then we can just use said original source aggregate directly, instead of re-creating it. 4. If we fail to find said aggregate when looking only in the current block, we need be PHI-aware - we might have different source aggregate when coming from each predecessor. I'm not sure if this already handles everything, and there are some FIXME's, i'll deal with all that later in followups. I'd be fine with going with post-commit review here code-wise, but just in case there are thoughts, i'm posting this. On RawSpeed, for example, this has the following effect: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 1253 \| 1253 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 948 \| 1355 \| 407 \| 42.93% \| 42.93% \| \| instcount.NumInsertValueInst \| 4382 \| 3210 \| -1172 \| -26.75% \| 26.75% \| \| simplifycfg.NumSinkCommonCode \| 574 \| 458 \| -116 \| -20.21% \| 20.21% \| \| simplifycfg.NumSinkCommonInstrs \| 1154 \| 921 \| -233 \| -20.19% \| 20.19% \| \| instcount.NumExtractValueInst \| 29017 \| 26397 \| -2620 \| -9.03% \| 9.03% \| \| instcombine.NumDeadInst \| 166618 \| 174705 \| 8087 \| 4.85% \| 4.85% \| \| instcount.NumPHIInst \| 51526 \| 50678 \| -848 \| -1.65% \| 1.65% \| \| instcount.NumLandingPadInst \| 20865 \| 20609 \| -256 \| -1.23% \| 1.23% \| \| instcount.NumInvokeInst \| 34023 \| 33675 \| -348 \| -1.02% \| 1.02% \| \| simplifycfg.NumSimpl \| 113634 \| 114708 \| 1074 \| 0.95% \| 0.95% \| \| instcombine.NumSunkInst \| 15030 \| 14930 \| -100 \| -0.67% \| 0.67% \| \| instcount.TotalBlocks \| 219544 \| 219024 \| -520 \| -0.24% \| 0.24% \| \| instcombine.NumCombined \| 644562 \| 645805 \| 1243 \| 0.19% \| 0.19% \| \| instcount.TotalInsts \| 2139506 \| 2135377 \| -4129 \| -0.19% \| 0.19% \| \| instcount.NumBrInst \| 156988 \| 156821 \| -167 \| -0.11% \| 0.11% \| \| instcount.NumCallInst \| 1206144 \| 1207076 \| 932 \| 0.08% \| 0.08% \| \| instcount.NumResumeInst \| 5193 \| 5190 \| -3 \| -0.06% \| 0.06% \| \| asm-printer.EmittedInsts \| 948580 \| 948299 \| -281 \| -0.03% \| 0.03% \| \| instcount.TotalFuncs \| 11509 \| 11507 \| -2 \| -0.02% \| 0.02% \| \| inline.NumDeleted \| 97595 \| 97597 \| 2 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 210514 \| 210522 \| 8 \| 0.00% \| 0.00% \| ``` So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half, and there is a very apparent decrease in instruction and basic block count. On vanilla llvm-test-suite: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| abs(%) \| \|---------------------------------------------------\|---------:\|---------:\|------:\|--------:\|-------:\| \| instcombine.NumAggregateReconstructionsSimplified \| 0 \| 744 \| 744 \| 0.00% \| 0.00% \| \| instcount.NumInsertValueInst \| 2705 \| 2053 \| -652 \| -24.10% \| 24.10% \| \| simplifycfg.NumInvokes \| 1212 \| 1424 \| 212 \| 17.49% \| 17.49% \| \| instcount.NumExtractValueInst \| 21681 \| 20139 \| -1542 \| -7.11% \| 7.11% \| \| simplifycfg.NumSinkCommonInstrs \| 14575 \| 14361 \| -214 \| -1.47% \| 1.47% \| \| simplifycfg.NumSinkCommonCode \| 6815 \| 6743 \| -72 \| -1.06% \| 1.06% \| \| instcount.NumLandingPadInst \| 14851 \| 14712 \| -139 \| -0.94% \| 0.94% \| \| instcount.NumInvokeInst \| 27510 \| 27332 \| -178 \| -0.65% \| 0.65% \| \| instcombine.NumDeadInst \| 1438173 \| 1443371 \| 5198 \| 0.36% \| 0.36% \| \| instcount.NumResumeInst \| 2880 \| 2872 \| -8 \| -0.28% \| 0.28% \| \| instcombine.NumSunkInst \| 55187 \| 55076 \| -111 \| -0.20% \| 0.20% \| \| instcount.NumPHIInst \| 321366 \| 320916 \| -450 \| -0.14% \| 0.14% \| \| instcount.TotalBlocks \| 886816 \| 886493 \| -323 \| -0.04% \| 0.04% \| \| instcount.TotalInsts \| 7663845 \| 7661108 \| -2737 \| -0.04% \| 0.04% \| \| simplifycfg.NumSimpl \| 886791 \| 887171 \| 380 \| 0.04% \| 0.04% \| \| instcount.NumCallInst \| 553552 \| 553733 \| 181 \| 0.03% \| 0.03% \| \| instcombine.NumCombined \| 3200512 \| 3201202 \| 690 \| 0.02% \| 0.02% \| \| instcount.NumBrInst \| 741794 \| 741656 \| -138 \| -0.02% \| 0.02% \| \| simplifycfg.NumHoistCommonInstrs \| 14443 \| 14445 \| 2 \| 0.01% \| 0.01% \| \| asm-printer.EmittedInsts \| 7978085 \| 7977916 \| -169 \| 0.00% \| 0.00% \| \| inline.NumDeleted \| 73188 \| 73189 \| 1 \| 0.00% \| 0.00% \| \| inline.NumInlined \| 291959 \| 291968 \| 9 \| 0.00% \| 0.00% \| ``` Roughly similar effect, less instructions and blocks total. See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e. Compile-time wise, this appears to be roughly geomean-neutral: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions And this is a win size-wize in general: http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text See https://bugs.llvm.org/show_bug.cgi?id=47060 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85787	2020-08-16 23:27:56 +03:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Christopher Tetreault	8f8029b458	[SVE] Eliminate calls to default-false VectorType::get() from InstCombine Reviewers: efriedma, david-arm, fpetrogalli, spatel Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80334	2020-05-29 15:31:31 -07:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Sanjay Patel	856cc60bc1	[InstCombine] canonicalize bitcast after insertelement into undef We have a transform in the opposite direction only for the x86 MMX type, Other types are not handled either way before this patch. The motivating case from PR45748: https://bugs.llvm.org/show_bug.cgi?id=45748 ...is the last test diff. In that example, we are triggering an existing bitcast transform, so we reduce the number of casts, and that should give us the ideal x86 codegen. Differential Revision: https://reviews.llvm.org/D79171	2020-05-10 11:37:47 -04:00
Huihui Zhang	1ec0cc0f02	[InstCombine][SVE] Fix visitExtractElementInst for scalable type. Summary: This patch fix the following issues with visitExtractElementInst: 1. Restrict VectorUtils::findScalarElement to fixed-length vector. For scalable type, the number of elements in shuffle mask is unknown at compile-time. 2. Fix out-of-range calculation for fixed-length vector. 3. Skip scalable type when analysis rely on fixed number of elements. 4. Add unit tests to check functionality of extractelement for scalable type. Reviewers: sdesmalen, efriedma, spatel, nikic Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78267	2020-05-07 13:03:52 -07:00
Huihui Zhang	08c9c13749	[InstCombine][SVE] Fix visitInsertElementInst for scalable type. Summary: This patch fixes the following issues in visitInsertElementInst: 1. Bail out for scalable type when analysis requires fixed size number of vector elements. 2. Use cast<FixedVectorType> to get vector number of elements. This ensure assertion on scalable vector type. 3. For scalable type, avoid folding a chain of insertelement into splat: insertelt(insertelt(insertelt(insertelt X, %k, 0), %k, 1), %k, 2) ... -> shufflevector(insertelt(X, %k, 0), undef, zero) The length of scalable vector is unknown at compile-time, therefore we don't know if given insertelement sequence is valid for splat. Reviewers: sdesmalen, efriedma, spatel, nikic Reviewed By: sdesmalen, efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78895	2020-05-07 12:44:52 -07:00
Benjamin Kramer	cc035d475f	Upgrade users of 'new ShuffleVectorInst' to pass indices as an int array No functionality change intended.	2020-04-15 14:29:43 +02:00
Benjamin Kramer	6f64daca8f	Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints No functionality change intended.	2020-04-15 12:51:38 +02:00
Christopher Tetreault	8226d599ff	[SVE] Remove calls to getBitWidth from Transforms Reviewers: efriedma, sdesmalen, spatel, eugenis, chandlerc Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77896	2020-04-14 14:31:42 -07:00
Sanjay Patel	6a7e958a42	[InstCombine] try to reduce more shuffles with bitcasted operand This is the widen mask element sibling to D76844. shuf (bitcast X), undef, Mask --> bitcast X' http://volta.cs.utah.edu:8080/z/4dt3V8	2020-04-14 15:03:59 -04:00
Benjamin Kramer	ec228d722c	[InstCombine] Use SmallBitVector for convienently checking if all bits are set	2020-04-13 20:37:15 +02:00
Sanjay Patel	1318ddbc14	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Christopher Tetreault	155740cc33	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: sdesmalen, rriddle, efriedma Reviewed By: sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77263	2020-04-08 15:15:41 -07:00
Sanjay Patel	538a8f0227	[InstCombine] convert bitcast-shuffle to vector trunc As discussed in D76983, that patch can turn a chain of insert/extract with scalar trunc ops into bitcast+extract and existing instcombine vector transforms end up creating a shuffle out of that (see the PhaseOrdering test for an example). Currently, that process requires at least this sequence: -instcombine -early-cse -instcombine. Before D76983, the sequence of insert/extract would reach the SLP vectorizer and become a vector trunc there. Based on a small sampling of public targets/types, converting the shuffle to a trunc is better for codegen in most cases (and a regression of that form is the reason this was noticed). The trunc is clearly better for IR-level analysis as well. This means that we can induce "spontaneous vectorization" without invoking any explicit vectorizer passes (at least a vector cast op may be created out of scalar casts), but that seems to be the right choice given that we started with a chain of insert/extract, and the backend would expand back to that chain if a target does not support the op. Differential Revision: https://reviews.llvm.org/D77299	2020-04-05 09:48:02 -04:00
Sanjay Patel	f4448063cc	[InstCombine] try to reduce shuffle with bitcasted operand shuf (bitcast X), undef, Mask --> bitcast X' The 'inverse shuffles' test (shuf_bitcast_operand) is a pattern in the motivating examples from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 (see also D76727) We can deal with this class of patterns in generic instcombine because we are not creating any new shuffles, just a bitcast. Alive2 proof: http://volta.cs.utah.edu:8080/z/mwDUZf Differential Revision: https://reviews.llvm.org/D76844	2020-04-02 13:44:50 -04:00
Nikita Popov	50a3e8738a	Revert "[InstCombine] Erase old instruction when replacing extractelements" This reverts commit `d40368fdb5`. llvm-clang-x86_64-expensive-checks-debian failure looks related.	2020-04-01 20:10:11 +02:00
Nikita Popov	d40368fdb5	[InstCombine] Erase old instruction when replacing extractelements As we are not returning the result of replaceInstUsesWith(), so we need to clean up ourselves. NFC apart from worklist order.	2020-04-01 19:55:28 +02:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Nikita Popov	53d209076a	[InstCombine] Use replaceOperand() in demanded elements simplification To make sure that dead operands get DCEd. This fixes the largest source of leftover dead operands we see in tests. NFC apart from worklist changes.	2020-03-29 20:43:19 +02:00
Simon Moll	d871ef4e6a	[instcombine] remove fsub to fneg hacks; only emit fneg Summary: Rewrite the fsub-0.0 idiom to fneg and always emit fneg for fp negation. This also extends the scalarization cost in instcombine for unary operators to result in the same IR rewrites for fneg as for the idiom. Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D75467	2020-03-10 16:57:02 +01:00
Nikita Popov	5a8819b216	[InstCombine] Use replaceOperand() in more places This is a followup to D73803, which uses the replaceOperand() helper in more places. This should be NFC apart from changes to worklist order. Differential Revision: https://reviews.llvm.org/D73919	2020-02-11 17:38:23 +01:00
Nikita Popov	d4627b90a0	[InstCombine] Avoid modifying instructions in-place As discussed on D73919, this replaces a few cases where we were modifying multiple operands of instructions in-place with the creation of a new instruction, which we generally prefer nowadays. This tends to be more readable and less prone to worklist management bugs. Test changes are only superficial (instruction naming and order).	2020-02-08 17:05:56 +01:00
Nikita Popov	878cb38a5c	[InstCombine] Add replaceOperand() helper Adds a replaceOperand() helper, which is like Instruction.setOperand() but adds the old operand to the worklist. This reduces the amount of missing or incorrect worklist management. This only applies the helper to a relatively small subset of setOperand() calls in InstCombine, namely those of the pattern `I.setOperand(); return &I;`, where it is most obviously applicable. Differential Revision: https://reviews.llvm.org/D73803	2020-02-03 19:00:17 +01:00
Nikita Popov	e6c9ab4fb7	[InstCombine] Rename worklist methods; NFC This renames Worklist.AddDeferred() to Worklist.add() and Worklist.Add() to Worklist.push(). The intention here is that Worklist.add() should be the go-to method for explicit worklist management, while the raw Worklist.push() is mostly for InstCombine internals. I will then migrate uses of Worklist.push() to Worklist.add() in followup changes. As suggested by spatel on D73411 I'm also changing the remaining method names to lowercase first character, in line with current coding standards. Differential Revision: https://reviews.llvm.org/D73745	2020-02-03 18:56:51 +01:00
Nikita Popov	480391035c	[InstCombine] Remove unnecessary worklist add; NFCI Again, this will already be added by IRBuilder.	2020-01-30 23:24:59 +01:00
Sanjay Patel	396d18aeb6	[InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220	2019-12-10 10:10:05 -05:00
Craig Topper	5ebbabc1af	[InstCombine] Revert `aafde063aa` and `6749dc3446` related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.	2019-12-03 14:02:22 -08:00
Sanjay Patel	35827164c4	[InstCombine] remove shuffle mask canonicalization that creates undef elements This is NFC-intended because SimplifyDemandedVectorElts() does the same transform later. As discussed in D70641, we may want to change that behavior, so we need to isolate where it happens.	2019-11-25 13:33:56 -05:00
Sanjay Patel	e85d2e4981	[InstCombine] prevent infinite loop from conflicting shuffle mask transforms The pattern in question is currently not possible because we aggressively (wrongly) transform mask elements to undef values if they choose from an undef operand. That, however, would change if we tighten our semantics for shuffles as discussed in D70641. Adding this check gives us the flexibility to make that change with minimal overhead for current definitions.	2019-11-25 12:00:41 -05:00
Sanjay Patel	fc31b58eff	[InstCombine] simplify code for shuffle mask canonicalization; NFC We never use the local 'Mask' before returning, so that was dead code.	2019-11-25 11:11:12 -05:00
Sanjay Patel	847aabf11f	[InstCombine] remove dead code from shuffle mask canonicalization; NFC	2019-11-25 10:54:18 -05:00
Sanjay Patel	20684092ab	[InstCombine] simplify loop for shuffle mask canonicalization; NFC	2019-11-25 10:41:50 -05:00
Sanjay Patel	f575f12c64	[InstCombine] remove identity shuffle simplification for mask with undefs And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246	2019-11-24 10:06:26 -05:00

1 2 3 4 5

220 Commits