llvm-project

Commit Graph

Author	SHA1	Message	Date
dfukalov	9899427174	[NFC][AliasSetTracker] Remove implicit conversion AliasResult to integer. Preparation to make AliasResult scoped enumeration. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D97973	2021-03-05 00:53:27 +03:00
Philip Reames	83ae49671d	[basicaa] Recurse through a single phi input BasicAA knows how to analyze phis, but to control compile time, we're fairly limited in doing so. This patch loosens that restriction just slightly when there is exactly one phi input (after discounting induction variable increments). The result of this is that we can handle more cases around nested and sibling loops with pointer induction variables. A few points to note. * This is deliberately extremely restrictive about recursing through at most one input of the phi. There's a known general problem with BasicAA sometimes hitting exponential compile time already, and this patch makes every effort not to compound the problem. Once the root issue is fixed, we can probably loosen the restrictions here a bit. * As seen in the test file, we're still missing cases which aren't directly based on phis (e.g. using the indvar increment). I believe this to be a separate problem and am going to explore this in another patch once this one lands. * As seen in the test file, this results in the unfortunate fact that using phivalues sometimes results in worse quality results. I believe this comes down to an oversight in how recursive phi detection was implemented for phivalues. I'm happy to tackle this in a follow up change. Differential Revision: https://reviews.llvm.org/D97401	2021-03-04 13:07:06 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Sanjay Patel	b3f0c2653b	[Analysis] simplify propagation of FMF in recurrences; NFC This is a mess, but this is hopefully no-functional-change. The 'Prev' descriptor is only used for min/max recurrences or when starting a match from a phi, so it should not be a factor when propagating FMF for fmul/fadd. The API is confusing (and should be reduced in subsequent steps) because the "UnsafeAlgebraInst" appears to actually be a placeholder for a recurrence that does NOT have FMF, but we still want to treat it as reassociative.	2021-03-03 17:28:10 -05:00
Philip Reames	c8cf27e333	Fix a build warning from `ea7d208`	2021-03-03 09:16:56 -08:00
Philip Reames	ea7d208b78	[basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC] This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach. The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset. This isn't an "interesting" improvement; it just happens to fall out of the natural code structure. One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise. Also added some enhancement TODOs noticed while understanding the existing code. Note: This is slightly different than the LGTMed version. I fixed the "inbounds" issue Nikita noticed with the original code in `e6e5ef4` and rebased this to include the same fix. Differential Revision: https://reviews.llvm.org/D97520	2021-03-03 09:03:28 -08:00
Philip Reames	e6e5ef40cb	[basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset This was pointed out in review of D97520 by Nikita, but existed in the original code as well. The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr. The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps. Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds". If that small sub-offset lies within the object, the result was unsound. We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.	2021-03-03 08:43:32 -08:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Nikita Popov	3d8f842712	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
dfukalov	6e967834b9	[AA] Cache (optionally) estimated PartialAlias offsets. For the cases of two clobbering loads and one loaded object is fully contained in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that is actually more common case of partial overlap, it doesn't say anything about actual overlapping sizes. AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs with non-constant offsets. The change stores estimated relative offsets so they can be used further. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93529	2021-03-02 19:04:15 +03:00
Kazu Hirata	d639120983	[llvm] Use set_is_subset (NFC)	2021-02-28 10:59:20 -08:00
Philip Reames	f2cfef3596	Be more mathematicly precise about definition of recurrence [NFC] This clarifies the interface of the matchSimpleRecurrence helper introduced in `8020be0b8` for non-commutative operators. After `ebd3aeba`, I realized the original way I framed the routine was inconsistent. For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information. So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed. I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks. (It was for me.)	2021-02-26 11:22:01 -08:00
Philip Reames	ebd3aeba27	Use helper introduced in `8020be0b8` to simplify ValueTracking [NFC] Direct rewrite of the code the helper was extracted from.	2021-02-26 10:47:26 -08:00
Philip Reames	8020be0b8b	Add a helper for matching simple recurrence cycles This helper came up in another review, and I've got about 4 different patches with copies of this copied into it. Time to precommit the routine. :)	2021-02-26 10:21:23 -08:00
Stanislav Mekhanoshin	d9c99043bd	Option to ignore llvm[.compiler].used uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96087	2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin	29e2d9461a	Option to ignore assume like intrinsic uses in hasAddressTaken() Differential Revision: https://reviews.llvm.org/D96081	2021-02-25 09:48:29 -08:00
Simon Pilgrim	96a3dfeb93	Revert rGd65ddca83ff85c7345fe9a0f5a15750f01e38420 - "[ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)" This is causing sanitizer test failures that I haven't been able to fix yet.	2021-02-24 18:03:17 +00:00
Nico Weber	3d837ad704	Revert "[ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling." This reverts commit `d37400168c`. Breaks Analysis/./AnalysisTests/ComputeKnownBitsTest.KnownNonZeroShift	2021-02-24 09:06:12 -05:00
Simon Pilgrim	d37400168c	[ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling. This no longer affects any tests after the improvements to the KnownBits shift helpers.	2021-02-24 13:49:13 +00:00
Simon Pilgrim	d65ddca83f	[ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526) Followup to D72573 - as detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value. Stop ValueTracking returning zero for poison shift patterns and use the KnownBits shift helpers directly. Extend KnownBits::shl to combine all possible shifted combinations if both min/max shift amount values are in range. Differential Revision: https://reviews.llvm.org/D90479	2021-02-24 12:15:45 +00:00
Ta-Wei Tu	98c6110d9b	[LoopNest] Use `getUniqueSuccessor()` instead when checking empty blocks Blocks that contain only a single branch instruction to the next block can be skipped in analyzing the loop-nest structure. This is currently done by `getSingleSuccessor()`. However, the branch instruction might have multiple targets which happen to all be the same. In this case, the block should still be considered as empty and skipped. An example is `test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll` (the LIT test for this patch is modified from it as well). Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D97286	2021-02-24 09:53:12 +08:00
Fangrui Song	ef312951fd	collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128 And delete the SmallPtrSetImpl overload. While here, decrease inline element counts from 8 to 4. See D97128 for the choice. Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D97257	2021-02-23 16:09:06 -08:00
Simon Pilgrim	1020d16156	[InstSimplify] Handle nsw shl -> poison patterns Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison. Differential Revision: https://reviews.llvm.org/D97305	2021-02-23 18:26:56 +00:00
Simon Pilgrim	18b9fc48f1	[InstructionSimplify] SimplifyShift - rename shift amount KnownBits. NFCI. As suggested on D97305.	2021-02-23 18:12:59 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Arthur Eubanks	468fa037b2	Only verify LazyCallGraph under expensive checks These verify calls are causing a lot of slowdown on some files, up to 8x. The LazyCallGraph infra has been tested a lot over the years, so I'm fairly confident that we don't always need to run the verifys. These verifies took >90% of total time in one of the compilations I looked at. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D97225	2021-02-22 20:18:59 -08:00
Kazu Hirata	896d0e1a2a	[Analysis] Use range-based for loops (NFC)	2021-02-22 20:17:18 -08:00
Kazu Hirata	4ed47858ab	[llvm] Use llvm::drop_begin (NFC)	2021-02-22 20:17:16 -08:00
Kazu Hirata	871affc5e7	[Analysis] Use ListSeparator (NFC)	2021-02-22 20:17:15 -08:00
Craig Topper	89440df64a	[ValueTracking] Improve ComputeNumSignBits for SRem. The result will have the same sign as the dividend unless the result is 0. The magnitude of the result will always be less than or equal to the dividend. So the result will have at least as many sign bits as the dividend. Previously we would do this if the divisor was a positive constant, but that isn't required. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D97170	2021-02-22 14:36:25 -08:00
Simon Pilgrim	476ff0327b	[InstSimplify] Cleanup out-of-range shift amount handling. Use APInt::uge() direct instead of getLimitedValue(). Use KnownBits::getMinValue() to make the bounds check more obvious.	2021-02-22 17:00:49 +00:00
Kazu Hirata	047fc3bf20	[Analysis] Use ListSeparator (NFC)	2021-02-21 19:58:04 -08:00
Nikita Popov	e0615bcd39	[Loads] Add optimized FindAvailableLoadedValue() overload (NFCI) FindAvailableLoadedValue() accepts an iterator by reference. If no available value is found, then the iterator will either be left at a clobbering instruction or the beginning of the basic block. This allows using FindAvailableLoadedValue() across multiple blocks. If this functionality is not needed, as is the case in InstCombine, then we can use a much more efficient implementation: First try to find an available value, and only perform clobber checks if we actually found one. As this function only looks at a very small number of instructions (6 by default) and usually doesn't find an available value, this saves many expensive alias analysis queries.	2021-02-21 18:42:56 +01:00
Nikita Popov	7c706aa0d8	[Loads] Extract helper frunction for available load/store (NFC) This contains the logic for extracting an available load/store from a given instruction, to be reused in a following patch.	2021-02-21 18:24:58 +01:00
Juneyoung Lee	aacf7878bc	[ValueTracking] Improve impliesPoison This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning: ``` %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 ; If %mul is poison, %overflow is also poison, and vice versa. ``` This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`: ``` define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) { ; CHECK-LABEL: @test2_logical( ; CHECK-NEXT: [[MUL:%.]] = mul i64 [[A:%.]], [[B:%.]] ; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[A]], 0 ; CHECK-NEXT: [[TMP2:%.]] = icmp ne i64 [[B]], 0 ; CHECK-NEXT: [[OVERFLOW_1:%.]] = and i1 [[TMP1]], [[TMP2]] ; CHECK-NEXT: [[NEG:%.]] = sub i64 0, [[MUL]] ; CHECK-NEXT: store i64 [[NEG]], i64 [[PTR:%.]], align 8 ; CHECK-NEXT: ret i1 [[OVERFLOW_1]] ; %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b) %overflow = extractvalue { i64, i1 } %res, 1 %mul = extractvalue { i64, i1 } %res, 0 %cmp = icmp ne i64 %mul, 0 %overflow.1 = select i1 %overflow, i1 true, i1 %cmp %neg = sub i64 0, %mul store i64 %neg, i64 %ptr, align 8 ret i1 %overflow.1 } ``` Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`. Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true. This improvement allows `impliesPoison` to do the reasoning. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96929	2021-02-20 13:22:34 +09:00
Philip Reames	b13e942224	[ValueTracking] Add a two argument form of safeCtxI [NFC] The existing implementation was relying on order of evaluation to achieve a particular result. This got really confusing when wanting to change the handling for arguments in a later patch.	2021-02-19 14:52:51 -08:00
Sanjay Patel	5b250a27ec	[Analysis][LoopVectorize] do not form reductions of pointers This is a fix for https://llvm.org/PR49215 either before/after we make a verifier enhancement for vector reductions with D96904. I'm not sure what the current thinking is for pointer math/logic in IR. We allow icmp on pointer values. Therefore, we match min/max patterns, so without this patch, the vectorizer could form a vector reduction from that sequence. But the LangRef definitions for min/max and vector reduction intrinsics do not allow pointer types: https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic So we would crash/assert at some point - either in IR verification, in the cost model, or in codegen. If we do want to allow this kind of transform, we will need to update the LangRef and all of those parts of the compiler. Differential Revision: https://reviews.llvm.org/D97047	2021-02-19 14:01:57 -05:00
Philip Reames	4a5edea193	[SCEV] Use both known bits and sign bits when computing range of SCEV unknowns When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results. One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size. Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results. Differential Revision: https://reviews.llvm.org/D96534	2021-02-19 08:29:12 -08:00
Nikita Popov	2f17ed294f	[DCE] Don't remove non-willreturn calls In both ADCE and BDCE (via DemandedBits) we should not remove instructions that are not guaranteed to return. This issue was pointed out by fhahn in the recent llvm-dev thread. Differential Revision: https://reviews.llvm.org/D96993	2021-02-19 12:35:40 +01:00
Nikita Popov	370addb996	[IR] Move willReturn() to Instruction This moves the willReturn() helper from CallBase to Instruction, so that it can be used in a more generic manner. This will make it easier to fix additional passes (ADCE and BDCE), and will give us one place to change if additional instructions should become non-willreturn (e.g. there has been talk about handling volatile operations this way). I have also included the IntrinsicInst workaround directly in here, so that it gets applied consistently. (As such this change is not entirely NFC -- FuncAttrs will now use this as well.) Differential Revision: https://reviews.llvm.org/D96992	2021-02-19 11:56:01 +01:00
Nikita Popov	1d9f4903c6	[BasicAA] Add simple depth limit to avoid stack overflow (PR49151) This is a simpler variant of D96647. It just adds a straightforward depth limit with a high cutoff, without introducing complex logic for BatchAA consistency. It accepts that we may cache a sub-optimal result if the depth limit is hit. Eventually this should be more fully addressed by D96647 or similar, but in the meantime this avoids stack overflows in a cheap way. Differential Revision: https://reviews.llvm.org/D96996	2021-02-19 11:05:42 +01:00
Wei Mi	5fb65c02ca	[SampleFDO] Stop repeated indirect call promotion for the same target. Found a problem in indirect call promotion in sample loader pass. Currently if an indirect call is promoted for a target, and if the parent function is inlined into some other function, the indirect call can be promoted for the same target again. That is redundent which can harm performance and can cause excessive compile time in some extreme case. The patch fixes the issue. If a target is promoted for an indirect call, the patch will write ICP metadata with the target call count being set to 0. In the later ICP in sample profile loader, if it sees a target has 0 count for an indirect call, it knows the target has been promoted and won't do indirect call promotion for the indirect call. The fix brings 0.1~0.2% performance on our search benchmark. Differential Revision: https://reviews.llvm.org/D96806	2021-02-18 17:01:32 -08:00
Nikita Popov	70e3c9a8b6	[BasicAA] Always strip single-argument phi nodes We can always look through single-argument (LCSSA) phi nodes when performing alias analysis. getUnderlyingObject() already does this, but stripPointerCastsAndInvariantGroups() does not. We still look through these phi nodes with the usual aliasPhi() logic, but sometimes get sub-optimal results due to the restrictions on value equivalence when looking through arbitrary phi nodes. I think it's generally beneficial to keep the underlying object logic and the pointer cast stripping logic in sync, insofar as it is possible. With this patch we get marginally better results: aa.NumMayAlias \| 5010069 \| 5009861 aa.NumMustAlias \| 347518 \| 347674 aa.NumNoAlias \| 27201336 \| 27201528 ... licm.NumPromoted \| 1293 \| 1296 I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(), as we're past the point where we can explicitly spell out everything that's getting stripped. Differential Revision: https://reviews.llvm.org/D96668	2021-02-18 23:07:50 +01:00
Kazu Hirata	e54579307b	[llvm] Ensure newlines at the end of files (NFC) This patch eliminates pesky "No newline at end of file" messages from git diff.	2021-02-17 23:58:44 -08:00
William S. Moses	40862b1a74	[SROA] Propagate correct TBAA/TBAA Struct offsets SROA does not correctly account for offsets in TBAA/TBAA struct metadata. This patch creates functionality for generating new MD with the corresponding offset and updates SROA to use this functionality. Differential Revision: https://reviews.llvm.org/D95826	2021-02-17 11:59:00 -05:00
Kazu Hirata	df35a183d7	[SCEV] Use ListSeparator (NFC)	2021-02-16 23:23:05 -08:00
Mircea Trofin	33481c9997	[mlgo] Fetch models from path / URL Allow custom location for pre-trained models used when AOT-compiling policies. Differential Revision: https://reviews.llvm.org/D96796	2021-02-16 22:47:14 -08:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Sameer Sahasrabuddhe	11bf7da64a	[NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis" since there is no conflict with LegacyDivergenceAnalysis. In the legacy PM, this analysis can only be used through the legacy DA serving as a wrapper. It is now made available as a pass in the new PM, and has no relation with the legacy DA. The new DA currently cannot handle irreducible control flow; its presence can cause the analysis to run indefinitely. The analysis is now modified to detect this and report all instructions in the function as divergent. This is super conservative, but allows the analysis to be used without hanging the compiler. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D96615	2021-02-16 10:26:45 +05:30
Sanjay Patel	378941f611	[ValueTracking] add scan limit for assumes In the motivating example from https://llvm.org/PR49171 and reduced test here, we would unroll and clone assumes so much that compile-time effectively became infinite while analyzing all of those assumes.	2021-02-15 15:24:20 -05:00
Kerry McLaughlin	5fe1593438	[LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax Currently, setting the `no-nans-fp-math` attribute to true will allow loops with fmin/fmax to vectorize, though we should be requiring that `no-signed-zeros-fp-math` is also set. This patch adds the check for no-signed-zeros at the function level and includes tests to make sure we don't vectorize functions with only one of the attributes associated. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96604	2021-02-15 13:47:05 +00:00
Caroline Concatto	2d728bbff5	[CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse This patch adds a new intrinsic experimental.vector.reduce that takes a single vector and returns a vector of matching type but with the original lane order reversed. For example: ``` vector.reverse(<A,B,C,D>) ==> <D,C,B,A> ``` The new intrinsic supports fixed and scalable vectors types. The fixed-width vector relies on shufflevector to maintain existing behaviour. Scalable vector uses the new ISD node - VECTOR_REVERSE. This new intrinsic is one of the named shufflevector intrinsics proposed on the mailing-list in the RFC at [1]. Patch by Paul Walker (@paulwalker-arm). [1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html Differential Revision: https://reviews.llvm.org/D94883	2021-02-15 13:39:43 +00:00
Sjoerd Meijer	357237e93e	Recommit "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `effc3b0799`, with the build problem fixed.	2021-02-15 11:33:00 +00:00
Sjoerd Meijer	effc3b0799	Revert "[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode" This reverts commit `cd6de0e8de`.	2021-02-15 11:01:23 +00:00
Sjoerd Meijer	cd6de0e8de	[TTI] Unify FavorPostInc and FavorBackedgeIndex into getPreferredAddressingMode This refactors shouldFavorPostInc() and shouldFavorBackedgeIndex() into getPreferredAddressingMode() so that we have one interface to steer LSR in generating the preferred addressing mode. Differential Revision: https://reviews.llvm.org/D96600	2021-02-15 10:44:15 +00:00
Nikita Popov	f197cf2126	[BasicAA] Merge aliasGEP code paths At this point, we can treat the case of GEP/GEP aliasing and GEP/non-GEP aliasing in essentially the same way. The only differences are that we need to do an additional negative GEP base check, and that we perform a bailout on unknown sizes for the GEP/non-GEP case (the latter exists only to limit compile-time). This change is not quite NFC due to the peculiar effect that the DecomposedGEP for V2 can actually be non-trivial even if V2 is not a GEP. The reason for this is that getUnderlyingObject() can look through LCSSA phi nodes, while stripPointerCasts() doesn't. This can lead to slightly better results if single-entry phi nodes occur inside a loop, where looking through the phi node via aliasPhi() would subject it to phi cycle equivalence restrictions. It would probably make sense to adjust pointer cast stripping (for AA) to handle this case, and ensure consistent results.	2021-02-14 19:35:36 +01:00
Kazu Hirata	910e2d1e57	[llvm] Use llvm::is_contained (NFC)	2021-02-14 08:36:20 -08:00
Kazu Hirata	6ac12e4b34	[Analysis] Use ListSeparator (NFC)	2021-02-14 08:36:14 -08:00
Nikita Popov	53ae96d4bb	[BasicAA] Avoid duplicate query for GEPs with identical offsets (NFCI) For two GEPs with identical offsets, we currently first perform a base address query without size information, and then if it is MayAlias, perform another with size information. This is pointless, as the latter query should produce strictly better results. This was not quite true historically due to the way that NoAlias assumptions were handled, but that issue has since been resolved.	2021-02-14 17:18:28 +01:00
Nikita Popov	728803ed74	[BasicAA] Use index difference to detect GEPs with identical indexes We currently detect GEPs that have exactly the same indexes by comparing the Offsets and VarIndices. However, the latter implicitly performs equality comparisons between two values, which is not generally legal inside BasicAA, due to the possibility of comparisons across phi cycles. I believe that in this particular instance this actually ends up being unproblematic, at least I wasn't able to come up with any cases that could result in an incorrect root query result. In the interest of being defensive, compute GetIndexDifference earlier (which knows how to handle phi cycles properly) and use the result of that to determine whether the offsets are identical.	2021-02-14 17:11:03 +01:00
aqjune	5f3c99085d	[ValueTracking] Dereferenced pointers are noundef This is a follow-up of D95238's LangRef update. This patch updates `programUndefinedIfUndefOrPoison(V)` to return true if `V` is used by any memory-accessing instruction. Interestingly, this affected many tests in Attributors, mainly about adding noundefs. The tests are updated using llvm/utils/update_test_checks.py. I checked that the diffs are about updating noundefs. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96642	2021-02-14 22:50:48 +09:00
Tyker	642e9225c6	reland [InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-13 13:03:11 +01:00
Nikita Popov	20cb6c7ceb	[AA] Add option for tracing AA queries (NFC) Add an -aa-trace debug option that can be used to print AA queries, including any recursive queries and their results.	2021-02-12 21:42:49 +01:00
Nikita Popov	191e469ede	[AA] Move Depth member from AAResults to AAQI (NFC) Rather than storing the query depth in AAResults, store it in AAQI. This makes more sense, as it is a property of the query. This sidesteps the issue of D94363, fixing slightly inaccurate AA statistics. Additionally, I plan to use the Depth from BasicAA in the future, where fetching it from AAResults would be unreliable. This change is not quite as straightforward as it seems, because we need to preserve the depth when creating a new AAQI for recursive queries across phis. I'm adding a new method for this, as we may need to preserve additional information here in the future.	2021-02-12 21:42:36 +01:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Sanjay Patel	79b1b4a581	[Vectorizers][TTI] remove option to bypass creation of vector reduction intrinsics The vector reduction intrinsics started life as experimental ops, so backend support was lacking. As part of promoting them to 1st-class intrinsics, however, codegen support was added/improved: D58015 D90247 So I think it is safe to now remove this complication from IR. Note that we still have an IR-level codegen expansion pass for these as discussed in D95690. Removing that is another step in simplifying the logic. Also note that x86 was already unconditionally forming reductions in IR, so there should be no difference for x86. I spot checked a couple of the tests here by running them through opt+llc and did not see any asm diffs. If we do find functional differences for other targets, it should be possible to (at least temporarily) restore the shuffle IR with the ExpandReductions IR pass. Differential Revision: https://reviews.llvm.org/D96552	2021-02-12 08:13:50 -05:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
David Sherwood	9700228abc	[Analysis] Change VFABI::mangleTLIVectorName to use ElementCount Adds support for mangling TLI vector names for scalable vectors. Differential Revision: https://reviews.llvm.org/D96338	2021-02-12 09:38:12 +00:00
Philip Reames	8ef4b961a3	[knownbits] Preserve known bits for small shift recurrences The motivation for this is that I'm looking at an example that uses shifts as induction variables. There's lots of other omissions, but one of the first I noticed is that we can't compute tight known bits. (This indirectly causes SCEV's range analysis to produce very poor results as well.) Differential Revision: https://reviews.llvm.org/D96440	2021-02-11 17:56:36 -08:00
Stanislav Mekhanoshin	8151c1b442	Move implementation of isAssumeLikeIntrinsic into IntrinsicInst This is remove dependency on ValueTracking in the future patch. Differential Revision: https://reviews.llvm.org/D96079	2021-02-11 11:41:34 -08:00
Michael Kruse	606aa622b2	Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache" This reverts commit `b7d870eae7` and the subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)" (commit `e6810cab09`). It caused indeterminism in the output, such that e.g. the polly-x86_64-linux buildbot failed accasionally.	2021-02-11 12:17:38 -06:00
Sander de Smalen	41500836b0	NFC: Migrate CodeMetrics to work on InstructionCost This patch migrates cost values and arithmetic to work on InstructionCost. When the interfaces to TargetTransformInfo are changed, any InstructionCost state will propagate naturally. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96030	2021-02-11 11:08:41 +00:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Philip Reames	9bf3cfa77b	[SCEV] Add a missing AssumptionCache parameter The AssumptionCache mechanism is used to feed assumes into known bits computations. Most places in SCEV passed it in, but one place appears to have been missed. Spotted via inspection, don't have a test case which actually exercises this, but it seemed like an obvious fixit.	2021-02-10 12:08:55 -08:00
Tyker	5652e192fc	Revert "[InstCombine] convert assumes to operand bundles" This reverts commit `5eb2e994f9`.	2021-02-10 01:32:00 +01:00
Tyker	5eb2e994f9	[InstCombine] convert assumes to operand bundles Instcombine will convert the nonnull and alignment assumption that use the boolean condtion to an assumption that uses the operand bundles when knowledge retention is enabled. Differential Revision: https://reviews.llvm.org/D82703	2021-02-09 19:33:53 +01:00
Andrew Litteken	56615a2654	[IROutliner] Adding instruction strings to IRSimilarityPrinting diagnostics. When doing some recent debugging of the IROutliner, and using the similarity pass for debugging, just having the basic block and function isn't really enough to get all the information. This adds the first and last instruction to the output of the IRSimilarityPrinting pass to give better information to a user. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D94304	2021-02-09 12:11:47 -06:00
Sanjay Patel	0be0a1237c	[ValueTracking] improve analysis for "C << X" and "C >> X" This is based on the example/comments in: https://llvm.org/PR48984 I tried just lifting the restriction in computeKnownBitsFromShiftOperator() as suggested in the bug report, but that doesn't catch all of the cases shown here. I didn't step through to see exactly why that happened. But it seems like a reasonable compromise to cheaply check the special-case of shifting a constant. There's a slight regression on a cmp transform as noted, but this is likely the more important/common pattern, so we can fix that icmp pattern later if needed. Differential Revision: https://reviews.llvm.org/D95959	2021-02-09 12:38:06 -05:00
Nico Weber	de1966e542	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `4a64d8fe39`. Makes clang crash when buildling trivial iOS programs, see comment after https://reviews.llvm.org/D92808#2551401	2021-02-09 11:06:32 -05:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Sherwood	3bbaece5a0	[Analysis] Remove unused functions from TargetLibraryInfo A simple clean-up to remove dead code. Differential Revision: https://reviews.llvm.org/D95934	2021-02-08 09:50:36 +00:00
Kazu Hirata	28d3132089	[Analysis] Use range-based for loops (NFC)	2021-02-06 11:17:10 -08:00
Johannes Doerfert	b7d870eae7	[AssumptionCache] Avoid dangling llvm.assume calls in the cache PR49043 exposed a problem when it comes to RAUW llvm.assumes. While D96106 would fix it for GVNSink, it seems a more general concern. To avoid future problems this patch moves away from the vector of weak reference model used in the assumption cache. Instead, we track the llvm.assume calls with a callback handle which will remove itself from the cache if the call is deleted. Fixes PR49043. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D96168	2021-02-06 12:18:39 -06:00
Akira Hatanaka	4a64d8fe39	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `3fe3946d9a` without the changes made to lib/IR/AutoUpgrade.cpp, which was violating layering. Original commit message: Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 06:09:42 -08:00
Akira Hatanaka	2fbbb18c1d	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly" This reverts commit `3fe3946d9a`. The commit violates layering by including a header from Analysis in lib/IR/AutoUpgrade.cpp.	2021-02-05 06:00:05 -08:00
Akira Hatanaka	3fe3946d9a	[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This patch makes changes to the front-end and middle-end that are needed to fix a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.rv" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if the call is annotated with claimRV since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if the implicit call is a call to retainRV and does nothing if it's a call to claimRV. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls annotated with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-05 05:55:18 -08:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Nikita Popov	be9889b350	[MemorySSA] Don't treat lifetime.end as NoAlias MemorySSA currently treats lifetime.end intrinsics as not aliasing anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily move a read of a pointer below a lifetime.end intrinsic, as no clobber is reported. I think the MemorySSA modelling here isn't correct: lifetime.end(p) has approximately the same effect as doing a memcpy(p, undef), and should be treated as a clobber. This patch removes the special handling of lifetime.end, leaving alias analysis to handle it appropriately. Differential Revision: https://reviews.llvm.org/D95763	2021-02-04 20:58:28 +01:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Kazu Hirata	7a37d981d9	[llvm] Use pop_back_val (NFC)	2021-02-01 20:55:05 -08:00
Fangrui Song	0fa61304d2	[LoopVectorize] Relax a FCmpInst assert to dyn_cast after D95690 The instruction may be `icmp eq i32`. Noticed in an internal Halide+wasm JIT test.	2021-02-01 19:28:45 -08:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Philip Reames	2a53d9a6e7	[Loads] Plumb through TLI argument [NFC] This is a (rather delayed) follow up to commit `0129cd5`. This commit is entirely NFC, the semantic change to leverage the new information will be submitted separate with a test case.	2021-02-01 11:45:30 -08:00
Florian Hahn	f1e8136115	[SCEV] Bail out if URem operand cannot be zero-extended. In some cases, LHS is larger than the target expression type. Bail out in that case for now, to avoid crashing	2021-02-01 13:50:54 +00:00
Kazu Hirata	6cedffc0ad	[MustExecute] Use ListSeparator (NFC)	2021-01-28 22:21:16 -08:00
Max Kazantsev	8a4ad8849f	[SCEV] Do not cache comparison result upon reached max depth as "equivalence". PR48725 We use `EquivalenceClasses` to cache the notion that two SCEVs are equivalent, so save time in situation when `A` is equivalent to `B` and `B` is equivalent to `C`, making check "if `A` is equivalent to `C`?" cheaper. We also return `0` in the comparator when we reach max analysis depth to save compile time. After doing this, we also cache them as being equivalent. Now, imagine the following situation: - `A` is proved equivalent to `B`; - `C` is proved equivalent to `D`; - Comparison of `A` against `D` is proved non-zero; - Comparison of `B` against `C` reaches max depth (and gets cached as equivalence). Now, before the invocation of compare(`B`, `C`), `A` and `D` belonged to different equivalence classes, and their comparison returned non-zero. After the the invocation of compare(`B`, `C`), equivalence classes get merged and `A`, `B`, `C` and `D` all fall into the same equivalence class. So the comparator will change its behavior for couple `A` and `D`, with weird consequences following it. This comparator is finally used in `std::stable_sort`, and this behavior change makes it crash (looks like it's causing a memory corruption). Solution: this patch changes `CompareSCEVComplexity` to return `None` when the max depth is reached. So in this case, we do not cache these SCEVs (and their parents in the tree) as being equivalent. Differential Revision: https://reviews.llvm.org/D94654 Reviewed By: lebedev.ri	2021-01-29 12:08:34 +07:00

1 2 3 4 5 ...

10303 Commits