llvm-project

Commit Graph

Author	SHA1	Message	Date
Nikita Popov	c1b9667148	[InstCombine] Support opaque pointers in callee bitcast fold To make this actually trigger, we also need to check whether the function types differ, which is a hidden cast under opaque pointers. The transform is somewhat less relevant there because it is primarily about pointer bitcasts, but it can also happen with other bit- or pointer-castable types. Byval handling is easier with opaque pointers because there is no need to adjust the byval type, we only need to make sure that it's still a pointer.	2022-03-03 11:07:39 +01:00
Nikita Popov	6c8adc5054	[InstCombine] Remove unnecessary byval check in callee cast fold The logic for handling this was fixed in `8d7f118ab2`, but the check for byval on the callee was retained. This resulted in a weird situation where the transform would work depending on whether the byval was only on the call or on both the call and the function.	2022-03-03 10:55:14 +01:00
Nikita Popov	c262ba2aab	[Scalarizer] Avoid pointer element type accesses Pass through the load/store type to the Scatterer instead.	2022-03-03 10:28:58 +01:00
serge-sans-paille	f90a66a544	Add missing include under -DEXPENSIVE_CHECKS This is a follow-up to `59630917d6`	2022-03-03 10:19:39 +01:00
Nikita Popov	b214f550f7	[DSE] Drop redundant WalkerStepLimit adjustment There is a general WalkerStepLimit adjustment higher up in the loop, and I don't see any reason why this particular case would need additional adjustment. Furthermore, this could underflow.	2022-03-03 09:42:38 +01:00
serge-sans-paille	59630917d6	Cleanup includes: Transform/Scalar Estimated impact on preprocessor output line: before: 1062981579 after: 1062494547 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120817	2022-03-03 07:56:34 +01:00
spupyrev	f2ade65fb2	[CSSPGO] Even flow distribution Differential Revision: https://reviews.llvm.org/D118640	2022-03-02 13:12:05 -08:00
Philip Reames	738042711b	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instr uction potentially vectorized to the last. This window can include a very large number of unrelated instruct ions which are not being considered for vectorization. This change switches the code to only schedule the su b-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-02 10:47:20 -08:00
Philip Reames	689babdf68	[SLP] Don't try to vectorize allocas While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do. Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once. I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case. The unknown op case was flowing down the shufflevector path. This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path. I didn't consider this interesting to fix given how narrow the splat of allocas case is.	2022-03-02 10:08:43 -08:00
Stephen Long	2f6c14816a	[LoopPeel] Add EXPENSIVE_CHECKS ifdef guard around domtree verify call The verify call was taking 50% of the compile time in our internal LLVM fork when trying to unroll many loops. Differential Revision: https://reviews.llvm.org/D113028	2022-03-02 09:56:20 -08:00
Florian Hahn	8777cb66a8	[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI). Instead of relying on underlying instructions, this patch updates VPScalarIVStepsRecipe to only store the required type information. This removes access to unrelated information, as well as avoiding issues with the same underlying instruction being shared by multiple recipes. This change should only change the debug output and not cause any codegen changes, hence NFCI.	2022-03-02 16:23:19 +00:00
Nikita Popov	61580d0949	Reapply [InstCombine] Remove one-use limitation from X-Y==0 fold This is a recommit without changes. I originally reverted this due to a significant code-size regression on tramp3d-v4, however further investigation showed that in the tramp3d-v4 case this change enables additional optimizations (in particular more jump threading), which happens to reduce the size of a function just enough to be eligible for inlining at hot callsites, which results in the code size increase. As such, this was just bad luck. ----- This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-03-02 16:43:33 +01:00
spupyrev	bcdc047731	speeding up ext-tsp for huge instances Differential Revision: https://reviews.llvm.org/D120780	2022-03-02 07:17:48 -08:00
Florian Hahn	9e46866c0c	[LV] Remove dead EntryVal argument from buildScalarSteps (NFC). The EntryVal argument is not needed after recent refactoring. Remove it.	2022-03-02 14:59:22 +00:00
Nikita Popov	5cf06d10f8	Revert "[InstCombine] Support switch in phi to cond fold" This reverts commit `0817ce86b5`. Seeing some ppc64le stage2 failures, reverting to investigate.	2022-03-02 12:49:47 +01:00
Nikita Popov	0817ce86b5	[InstCombine] Support switch in phi to cond fold For conditional branches, we know the value is i1 0 or i1 1 along the outgoing edges. For switches we can apply exactly the same optimization, just with the known values determined by the switch cases.	2022-03-02 12:16:32 +01:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Hongtao Yu	07846e3387	[CSSPGO][PriorityInliner] Do not use block weight to drive callsite inlining. The priority-based inliner currenlty uses block count combined with callee entry count to drive callsite inlining. This doesn't work well with LTO where postlink inlining is driven by prelink-annotated block count which could be based on the merge of all context profiles. I'm fixing it by using callee profile entry count only which should be context-sensitive. I'm seeing 0.2% perf improvment for one of our internal large benchmarks with probe-based non-CS profile. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D120784	2022-03-01 18:43:19 -08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit `0539a26d91`. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Arthur Eubanks	6987ac7903	Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]" This reverts commit `a3e9b32c00`. Required for reverting D118538.	2022-03-01 17:28:52 -08:00
Florian Mayer	1d730d80ce	[HWASAN] erase lifetime intrinsics if tag is outside. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D120437	2022-03-01 14:47:33 -08:00
Joseph Huber	6632180745	[OpenMP][NFC] Add an option to print the module before in OpenMPOpt Previously there was a debug flag to print the module after optimizations. Sometimes we wanted to print the module before optimizations so this is being split into two flags. `-openmp-opt-print-module` is now `-openmp-opt-print-module-after`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120768	2022-03-01 17:09:09 -05:00
serge-sans-paille	a494ae43be	Cleanup includes: TransformsUtils Estimation on the impact on preprocessor output: before: 1065307662 after: 1064800684 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120741	2022-03-01 21:00:07 +01:00
Craig Topper	7bc6667845	[Analysis] Simplify the interface to llvm::getICmpCode. NFC Instead of passing an InstCmpInt * and a bool just pass the predicate from the caller. I'm considering moving the similar FCmp functions from InstCombine over here and this makes the interface consistent with what is used for FCmp. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120609	2022-03-01 09:53:27 -08:00
Tong Zhang	17ce89fa80	[SanitizerBounds] Add support for NoSanitizeBounds function Currently adding attribute no_sanitize("bounds") isn't disabling -fsanitize=local-bounds (also enabled in -fsanitize=bounds). The Clang frontend handles fsanitize=array-bounds which can already be disabled by no_sanitize("bounds"). However, instrumentation added by the BoundsChecking pass in the middle-end cannot be disabled by the attribute. The fix is very similar to D102772 that added the ability to selectively disable sanitizer pass on certain functions. In this patch, if no_sanitize("bounds") is provided, an additional function attribute (NoSanitizeBounds) is attached to IR to let the BoundsChecking pass know we want to disable local-bounds checking. In order to support this feature, the IR is extended (similar to D102772) to make Clang able to preserve the information and let BoundsChecking pass know bounds checking is disabled for certain function. Reviewed By: melver Differential Revision: https://reviews.llvm.org/D119816	2022-03-01 18:47:02 +01:00
serge-sans-paille	71c3a5519d	Cleanup includes: LLVMAnalysis Number of lines output by preprocessor: before: 1065940348 after: 1065307662 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D120659	2022-03-01 18:01:54 +01:00
Nikita Popov	a1f442b278	[InstCombine] Support phi to cond fold with more than two preds This transform can still be applied if there are more than two phi inputs, as long as phi inputs with the same value are dominated by the same idom edge.	2022-03-01 16:31:49 +01:00
Nikita Popov	26748bb15a	[InstCombine] Slightly relax one-use check in abs canonicalization Treat the icmp and sub symmetrically, and require that one of them has one use, not the icmp in particular. This could be further relaxed in the abs (but not nabs) case to not check one-use at all.	2022-03-01 15:06:41 +01:00
Sanjay Patel	84812b9b07	[InstCombine] drop FMF in select->copysign transform It is not correct to propagate flags from the select to the new instructions: https://alive2.llvm.org/ce/z/tNATrd https://alive2.llvm.org/ce/z/VwcVzn Fixes #54077	2022-03-01 08:51:41 -05:00
Nikita Popov	c2428a4fad	[InstCombine] Remove SPF min/max check from select demanded bits (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. This may not be entirely NFC in practice if worklist order gets inverted and we perform demanded bits simplification of a select user before the select is canonicalized.	2022-03-01 14:50:37 +01:00
Alexandros Lamprineas	33830326aa	[FuncSpec] Remove definitions of fully specialized functions. A function is basically dead when: * it has no uses * it has only self-referencing uses (it's recursive) Differential Revision: https://reviews.llvm.org/D119878	2022-03-01 11:57:08 +00:00
Alexandros Lamprineas	b803aee67b	[FuncSpec][NFC] Improve debug messages. Adds diagnostic messages when debugging the pass. Differential Revision: https://reviews.llvm.org/D119875	2022-03-01 11:55:08 +00:00
Alexandros Lamprineas	7b74123a3d	[FuncSpec][NFC] Variable renaming. Just preparing the ground for follow up patches to make the reviews easier. Differential Revision: https://reviews.llvm.org/D119874	2022-03-01 11:38:57 +00:00
Kirill Stoimenov	b7fd30eac3	[ASan] Removed unused AddressSanitizerPass functional pass. This is a clean-up patch. The functional pass was rolled into the module pass in D112732. Reviewed By: vitalybuka, aeubanks Differential Revision: https://reviews.llvm.org/D120674	2022-03-01 00:41:29 +00:00
Philip Reames	8cb0ac5825	[SLP] Check invariant that all instructions in bundle are in same block [NFC]	2022-02-28 13:17:44 -08:00
Sanjay Patel	278b407a30	[InstCombine] fold mul-with-overflow intrinsic with -1 operand extractvalue (any_mul_with_overflow X, -1), 0 --> -X There are similar other potential transforms that we could do as noted by the last TODO in the test diffs. Fixes #54053	2022-02-28 14:13:48 -05:00
Sanjay Patel	f422c5d871	[InstCombine] fold select-of-zero-or-ones with negated op (X u< 2) ? -X : -1 --> sext (X != 0) (X u> 1) ? -1 : -X --> sext (X != 0) https://alive2.llvm.org/ce/z/U3y5Bb https://alive2.llvm.org/ce/z/hgi-4p This is part of solving:	2022-02-28 12:07:49 -05:00
Alexey Bataev	e4b9640867	[SLP]Improve bottom-to-top reordering. Currently bottom-to-top reordering analysis counts orders of the operands and then adds natural order counts for the operand users. It is very conservative, this the user nodes themselves may require reordering. Patch improves bottom-to-top analysis by checking for the user nodes if they require/allows the reordring. If the user node must be reordered, has reused scalars, is an alternate op vectorization node, is a non-ordered gather node or may allow reordering because of the reordered operands, such node is considered as the node that allows reodring and is not counted as a node with the natural order. Differential Revision: https://reviews.llvm.org/D120492	2022-02-28 06:48:46 -08:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit `ff93260bf6`. The underlying issue causing the PPC bot failures has been fixed in `cbaac14734` and a corresponding test case has been added in `ad2cad1c52`. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	cbaac14734	[LV] Remove induction recipes only used outside vector loop. Exit values of vector inductions are generated completely independent of the induction recipes. Consider them for removal, if they are not used in loop. This fixes a crash exposed by `49b23f451c`.	2022-02-28 11:14:22 +00:00
Nikita Popov	5423b0a525	[InstCombine] Remove not of SPF min/max fold (NFCI) This should no longer be necessary now that we canonicalize to intrinsics. Might not be strictly NFC due to worklist order.	2022-02-28 11:02:31 +01:00
Nikita Popov	d5ea3b2f33	[InstCombine] Remove sub of SPF min/max fold (NFCI) This isn't necessary anymore, now that we canonicalize SPF min/max to intrinsics. Might not be strictly NFC due to worklist order changes.	2022-02-28 10:57:24 +01:00
Nikita Popov	9353ed6a53	[InstCombine] Don't call matchSAddSubSat() for SPF (NFC) Only call it for intrinsic min/max. The moved implementation is unchanged apart from the one-use check: It is now hardcoded to one-use, without the two-use special case for SPF.	2022-02-28 10:41:56 +01:00
Nikita Popov	53602e4c70	[InstCombine] Remove SPF moveAddAfterMinMax() (NFC) As SPF min/max is canonicalized to intrinsics before this point, this change should be entirely NFC.	2022-02-28 10:28:16 +01:00
Nikita Popov	ee62dcdb34	[InstCombine] Remove SPF moveNotAfterMinMax() (NFC) This happens after SPF -> intrinsic canonicalization, and as such should be entirely NFC.	2022-02-28 10:23:07 +01:00
Nikita Popov	0bc3e233d7	[InstCombine] Remove SPF factorizeMinMaxTree() (NFC) SPF integer min/max is canonicalized to min/max intrinsics before this code is reached, so this should be entirely NFC.	2022-02-28 10:22:05 +01:00
Philip Reames	319265328c	[SLP] Remove field unused after `33ce97f` to silence buildbots [NFC]	2022-02-27 10:18:10 -08:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit `49b23f451c`. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Philip Reames	33ce97f413	[SLP] Use BatchAA to reduce capture analysis cost [NFC] SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes. AA internally usings pointerMayBeCaptured to prove some noalias results. In a local profile, we were spending about 4% of total O2 time in capture tracking. By using BatchAA interface - which caches capture results - this drops to 2%. Note that there is no invalidation of BatchAA here. This assumes that no transformation done by SLP invalidates alias or capture results. This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.	2022-02-27 09:47:24 -08:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	9bc866cc6f	[VPlan] Add recipe to handle SCEV expansion (NFC). This can be used to explicitly model VPValues that depend on SCEV expansion, like the step for inductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116288	2022-02-27 12:47:02 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Craig Topper	1b1f8d6eff	[SeparateConstOffsetFromGEP] Remove TargetMachine.h include. NFC This doesn't appear to be used and it would be a layering violation if it was.	2022-02-25 21:40:00 -08:00
Evgeniy Brevnov	10e99eb7e4	[SLP] "Normal" instructions should not go between PHI and Lading pad Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation. Differential Revision: https://reviews.llvm.org/D120552	2022-02-26 11:44:26 +07:00
Nikita Popov	e7fb1c15cb	[MergeICmps] Don't require GEP With opaque pointers, the zero-offset load will generally not use a GEP. Allow a direct load without GEP, which is treated the same way as a zero-offset GEP.	2022-02-25 17:38:02 +01:00
Simon Pilgrim	3b422455dd	[IPO] AAFunctionReachabilityFunction.updateImpl - reduce AAReachability scope. NFCI. We already have a check for !InstQueries.empty(), so move the for-range over InstQueries inside to avoid the AAReachability uninitialized variable static analysis warnings.	2022-02-25 14:42:31 +00:00
Nikita Popov	4736e57199	[IndVars] Use phis() (NFC)	2022-02-25 12:08:12 +01:00
Nikita Popov	e1608a9df8	[InstCombine] Remove SPF min/max canonicalization Now that we canonicalize SPF min/max to intrinsics, there's no need to canonicalize the structure of the SPF min/max itself anymore. This is conceptually NFC, but in practice does slightly impact results due to folding order differences.	2022-02-25 11:24:09 +01:00
Nikita Popov	16a2d5f885	[SCEVExpander] Use early returns in FindValueInExprValueMap() (NFC)	2022-02-25 10:09:16 +01:00
Nikita Popov	2d0fc3e46f	[SCEV] Return ArrayRef from getSCEVValues() (NFC) Return a read-only view on this set. For the one internal use, directly access ExprValueMap.	2022-02-25 09:32:22 +01:00
Nikita Popov	d9715a7266	[SCEV] Don't try to reuse expressions with offset SCEVs ExprValueMap currently tracks not only which IR Values correspond to a given SCEV expression, but additionally stores that it may be expanded in the form X+Offset. In theory, this allows reusing existing IR Values in more cases. In practice, this doesn't seem to be particularly useful (the test changes are rather underwhelming) and adds a good bit of complexity. Per https://github.com/llvm/llvm-project/issues/53905, we have an invalidation issue with these offseted expressions. Differential Revision: https://reviews.llvm.org/D120311	2022-02-25 09:16:48 +01:00
Anton Afanasyev	904a00d17a	[AggressiveInstCombine] Fix `TruncInstCombine` (fix `f84d732f`) Erase phi-nodes from `InstInfoMap` before erasing themselves	2022-02-25 08:04:11 +03:00
Anton Afanasyev	0dd8401371	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri (recommit of fixed `f84d732f`, reverted by `8ad6d5e` after sanitizer breakage) Differential Revision: https://reviews.llvm.org/D109817	2022-02-25 07:57:35 +03:00
Vasileios Porpodas	4bbc3290a2	[SLP] Fix for the min/max intrinsic cost. The min/max intrinsic cost is currently too low because in the cost calculation we subtract the cost of the vector compare as we will not emit it. For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE which returns 3, the worst case cost. I think we should be passing VecPred instead, since we know the predicates of the compare instr. I think this is related to commit `b3b993a7ad` which introduced the predicate argument to getCmpSelInstrCost(). https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83 Differential Revision: https://reviews.llvm.org/D120439	2022-02-24 18:08:40 -08:00
Joseph Huber	7aef8b3754	[OpenMP] Make section variable external to prevent collisions Summary: We use a section to embed offloading code into the host for later linking. This is normally unique to the translation unit as it is thrown away during linking. However, if the user performs a relocatable link the sections will be merged and we won't be able to access the files stored inside. This patch changes the section variables to have external linkage and a name defined by the section name, so if two sections are combined during linking we get an error.	2022-02-24 10:57:09 -05:00
Sanjay Patel	5379f76e63	[InstCombine] try harder to preserve 'nsz' in fneg-of-select transform The corner case where 'nsz' needs to be removed is very narrow as discussed here: https://reviews.llvm.org/rG3cdd05e519dd If the select condition is not undef, there's no problem with propagating 'nsz': https://alive2.llvm.org/ce/z/4GWJdq	2022-02-24 10:43:53 -05:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Nikita Popov	aa551ad198	Revert "[InstCombine] Remove one-use limitation from X-Y==0 fold" This reverts commit `65dc78d63e`. This caused a major code-size regression on tramp3d-v4, revert until I can investigate.	2022-02-24 08:50:40 +01:00
Matthias Braun	6a383369f9	PGOInstrumentation, GCOVProfiling: Split indirectbr critical edges regardless of PHIs The `SplitIndirectBrCriticalEdges` function was originally designed for `CodeGenPrepare` and skipped splitting of edges when the destination block didn't contain any `PHI` instructions. This only makes sense when reducing COPYs like `CodeGenPrepare`. In the case of `PGOInstrumentation` or `GCOVProfiling` it would result in missed counters and wrong result in functions with computed goto. Differential Revision: https://reviews.llvm.org/D120096	2022-02-23 16:27:37 -08:00
minglotus-6	142cedc283	[SampleProf][Inliner] Add an option to turn off inliner in sample-profile pass. Use case is offline evaluation (for inliner effectiveness) or debugging. Differential Revision: https://reviews.llvm.org/D120344	2022-02-23 14:21:33 -08:00
Philip Reames	ed54296ea3	[SLP] Fastpath instructions not in block being scheduled [nfc]	2022-02-23 13:51:36 -08:00
Philip Reames	a4541fdfe4	[SLP] Replace a impossible branch condition with an assert [NFC] An entire bundle must be inside the scheduling window. Assert that this property holds as opposed to checking it at runtime.	2022-02-23 13:43:45 -08:00
Philip Reames	9a40f9f681	{SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC]	2022-02-23 13:31:36 -08:00
Philip Reames	9392c0d4ef	Revert "[SLP] Remove cap on schedule window size" This reverts commit `6adf4b039e`. Reverting while investigating https://github.com/llvm/llvm-project/issues/54029	2022-02-23 13:12:07 -08:00
Philip Reames	a83441e8cd	Revert "[SLP] Simplify extendSchedulingRegion" This reverts commit `8c85f3a052`.	2022-02-23 13:12:07 -08:00
Philip Reames	222e8610f1	[SLP] Rearrange fields in ScheduleData for density [NFC]	2022-02-23 12:33:43 -08:00
Philip Reames	a3e9b32c00	[SLP] Remove SchedulingPriority from ScheduleData [NFC] First step in trying to shrink the memory footprint of ScheduleData to improve cache locality.	2022-02-23 11:43:46 -08:00
Philip Reames	8c85f3a052	[SLP] Simplify extendSchedulingRegion This change uses instruction's comesBefore method to simplify the code significantly. There's little compile time concern here because getSpillCost already calls comesBefore on every basic block which contains a vectorization candidate. The only additional times we'll build basic block ordering is when we can't schedule a vector candidate anywhere in the containing block. Differential Revision: https://reviews.llvm.org/D120364	2022-02-23 11:23:38 -08:00
Augie Fackler	95f3cc222a	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 14:21:02 -05:00
Arthur Eubanks	1fd980de04	Revert "AttributorAttributes: avoid a crashing on bad alignments" This reverts commit `70ff6fbeb9`. Breaks bots, e.g. http://45.33.8.238/linux/69375/step_12.txt.	2022-02-23 09:08:03 -08:00
Augie Fackler	70ff6fbeb9	AttributorAttributes: avoid a crashing on bad alignments Prior to this change, LLVM would attempt to optimize an aligned_alloc(33, ...) call to the stack. This flunked an assertion when trying to emit the alloca, which crashed LLVM. Avoid that with extra checks. Differential Revision: https://reviews.llvm.org/D119604	2022-02-23 11:46:15 -05:00
Philip Reames	6adf4b039e	[SLP] Remove cap on schedule window size This cap was first added in `848c1aa45` (back in 2015). Per the original commit message, the purpose was to avoid a compile time explosion in long basic blocks. The algorithmic problem in scheduling has now been fixed in `0539a26d`. In the meantime, the code has rotten fairly badly. Some intermediate refactoring caused the size to only be incremented if both iterators advance in the window search. This causes the size to be badly undercounted when near one end of a basic block. We no longer have any test which exercises the logic in an intentional way; there's one test which differs with this change, but the changes appear fairly orthoganol to the purpose of the test file. Unfortunately, we no longer have the original motivating example, so it's possible that it also hits some other issue. I tested locally with a large example, but even at it's worst, that one doesn't demonstrate anything too extreme even without the algorithmic fix. It's clearly faster with, but only by ~20% which doesn't seem in line with the original commit message. If regressions with this patch are seen, please file a bug and I'll try to fix any other algorithmic problems which fall out.	2022-02-23 08:27:45 -08:00
Nikita Popov	587c7ff15c	[InstCombine] Support min/max intrinsics in udiv->lshr fold This complements the existing fold for selects. This fold is a bit more conservative, requiring one-use. The other folds here should probably also be subjected to a one-use restriction. https://alive2.llvm.org/ce/z/Q9eCDU https://alive2.llvm.org/ce/z/8YK2CJ	2022-02-23 15:51:36 +01:00
Nikita Popov	03e6efb8c2	[InstCombine] Further simplify udiv -> lshr folding Rather than queuing up actions, have one function that does the log2() fold in the obvious way, but with a flag that allows us to check whether the fold will succeed without actually performing it.	2022-02-23 15:29:21 +01:00
Nikita Popov	5ccb0582c2	[InstCombine] Simplify udiv -> lshr folding What we're really doing here is converting Op0 udiv Op1 into Op0 lshr log2(Op1), so phrase it in that way. Actually pushing the lshr into the log2(Op1) expression should be seen as a separate transform.	2022-02-23 14:55:23 +01:00
Anton Afanasyev	8ad6d5e465	Revert "[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine`" This reverts commit `f84d732f8c`. Breakage of "sanitizer-x86_64-linux-fast"	2022-02-23 15:56:11 +03:00
Nikita Popov	5fb65557e3	[InstCombine] Remove unused visitUDivOperand() argument (NFC) This function only works on the RHS operand.	2022-02-23 13:16:44 +01:00
Anton Afanasyev	f84d732f8c	[AggressiveInstCombine] Add `phi` nodes support to `TruncInstCombine` Expand `TruncInstCombine` to handle loops by adding `phi` nodes to expression graph. Reviewed by: RKSimon, lebedev.ri Differential Revision: https://reviews.llvm.org/D109817	2022-02-23 14:01:55 +03:00
Nikita Popov	e2f627e5e3	[InstCombine] Fold sub of umin to usub.sat We were handling sub of umax, but not the conjugated umin case. https://alive2.llvm.org/ce/z/4fdZfy https://alive2.llvm.org/ce/z/BhUQBM	2022-02-23 12:00:34 +01:00
Bill Wendling	a5bbc6ef99	[NFC] Remove unnecessary "#include"s from header files	2022-02-23 01:20:48 -08:00
Nikita Popov	65dc78d63e	[InstCombine] Remove one-use limitation from X-Y==0 fold This one-use limitation is artificial, we do not increase instruction count if we perform the fold with multiple uses. The motivating case is shown in @sub_eq_zero_select, where the one-use limitation causes us to miss a subsequent select fold. I believe the backend is pretty good about reusing flag-producing subs for cmps with same operands, so I think doing this is fine. Differential Revision: https://reviews.llvm.org/D120337	2022-02-23 09:37:30 +01:00
minglotus-6	f415d74d1d	[SampleProfile] Handle the case when the option `MaxNumPromotions` is zero. In places where `MaxNumPromotions` is used to allocated an array, bail out early to prevent allocating an array of length 0. Differential Revision: https://reviews.llvm.org/D120295	2022-02-22 21:44:32 -08:00
Brendon Cahoon	3cc15e2cb6	[SLP] Fix assert from non-constant index in insertelement A call to getInsertIndex() in getTreeCost() is returning None, which causes an assert because a non-constant index value for insertelement was not expected. This case occurs when the insertelement index value is defined with a PHI. Differential Revision: https://reviews.llvm.org/D120223	2022-02-22 15:57:14 -06:00
Dmitry Vassiliev	90a3b31091	[Transforms] Enhance CorrelatedValuePropagation to handle both values of select The "Correlated Value Propagation" pass was missing a case when handling select instructions. It was only handling the "false" constant value, while in NVPTX the select may have the condition (and thus the branches) inverted, for example: ``` loop: %phi = phi i32* [ %sel, %loop ], [ %x, %entry ] %f = tail call i32* @f(i32* %phi) %cmp1 = icmp ne i32* %f, %y %sel = select i1 %cmp1, i32* %f, i32* null %cmp2 = icmp eq i32* %sel, null br i1 %cmp2, label %return, label %loop ``` But the select condition can be inverted: ``` %cmp1 = icmp eq i32* %f, %y %sel = select i1 %cmp1, i32* null, i32* %f ``` The fix is to enhance "Correlated Value Propagation" to handle both branches of the select instruction. Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D119643	2022-02-23 00:11:20 +04:00
Philip Reames	8612b11c86	[SLP] Use isInSchedulingRegion consistently [NFC]	2022-02-22 10:27:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Jay Foad	0e74d75a29	[StructurizeCFG] Fix boolean not bug D118623 added code to fold not-of-compare into a compare with the inverted predicate, if the compare had no other uses. This relies on accurate use lists in the IR but it was run before setPhiValues, when some phi inputs are still stored in a data structure on the side, instead of being real uses in the IR. The effect was that a phi that should be using the original compare result would now get an inverted result instead. Fix this by moving simplifyConditions after setPhiValues. Differential Revision: https://reviews.llvm.org/D120312	2022-02-22 17:36:20 +00:00
Egor Zhdan	3a1cb36237	Add DriverKit support This patch is the first in a series of patches to upstream the support for Apple's DriverKit. Once complete, it will allow targeting DriverKit platform with Clang similarly to AppleClang. This code was originally authored by JF Bastien. Differential Revision: https://reviews.llvm.org/D118046	2022-02-22 13:42:53 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Nikita Popov	3c0096a1d4	[MergeICmps] Don't call comesBefore() if in different blocks (PR53959) Only call comesBefore() if the instructions are in the same block. Otherwise make a conservative assumption. Fixes https://github.com/llvm/llvm-project/issues/53959.	2022-02-22 12:27:20 +01:00
Nikita Popov	f8d7210032	[GlobalStatus] Keep Visited set in isSafeToDestroyConstant() Constants cannot be cyclic, but they can be tree-like. Keep a visited set to ensure we do not degenerate to exponential run-time. This fixes the problem reported in https://reviews.llvm.org/D117223#3335482, though I haven't been able to construct a concise test case for the issue. This requires a combination of dead constants and the kind of constant expression tree that textual IR cannot represent (because the textual representation, unlike the in-memory representation, is also exponential in size).	2022-02-22 10:02:37 +01:00
Florian Hahn	7662d1687b	[MemCpyOpt] Check all access for MemoryUses in writtenBetween. Currently writtenBetween can miss clobbers of Loc between End and Start, if End is a MemoryUse. To guarantee we see all write clobbers of Loc between Start and End for MemoryUses, restrict to Start and End being in the same block and check all accesses between them. This fixes 2 mis-compiles illustrated in llvm/test/Transforms/MemCpyOpt/memcpy-byval-forwarding-clobbers.ll Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119929	2022-02-21 16:54:30 +00:00
Arthur Eubanks	053c2a0020	[SimplifyCFG][OpaquePtr] Check store type when merging conditional store	2022-02-20 11:29:54 -08:00
Florian Hahn	c141d158e5	[VectorCombine] Remove redundant checks (NFC). The removed conditions are already checked by the if above. Fixes #53761.	2022-02-19 21:05:32 +00:00
Philip Reames	6f9d557e08	[instcombine] Cleanup foldAllocaCmp slightly [NFC]	2022-02-18 18:49:39 -08:00
Philip Reames	3ad0bdae8f	[SLP] Address post commit comment from `2e50760`	2022-02-18 10:57:15 -08:00
Simon Pilgrim	be1ffda0a5	[InstCombine] visitCallInst - pull out repeated bswap scalar type bitwidth. NFC.	2022-02-18 17:33:11 +00:00
Florian Hahn	00ab91b70d	[ConstraintElimination] Remove ConstraintListTy (NFCI). This patch simplifies constraint handling by removing the ConstraintListTy wrapper struct and moving the Preconditions directly into ConstraintTy. This reduces the amount of memory needed for managing constraints. The only use case for ConstraintListTy was adding 2 constraints to model ICMP_EQ conditions. But this can be handled by adding an IsEq flag. When adding an equality constraint, we need to add the constraint and the inverted constraint.	2022-02-18 14:35:01 +00:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Alexey Bataev	b0a0df9809	[SLP]Fix vectorization of the alternate cmp instruction with swapped predicates. If the alternate cmp instruction is a swapped predicate of the main cmp instruction, need to generate alternate instruction, not the one with the swapped predicate. Also, the lane with the alternate opcode should be selected only, if the corresponding operands are not compatible. Correctness confirmed: https://alive2.llvm.org/ce/z/94BG66 Differential Revision: https://reviews.llvm.org/D119855	2022-02-18 04:27:45 -08:00
Alexander Potapenko	c85a26454d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))). Adding the disable_sanitizer_instrumentation to functions will drop the sanitize_XXX attributes on the IR level. This is the third reland of https://reviews.llvm.org/D114421. Now that TSan test is fixed (https://reviews.llvm.org/D120050) there should be no deadlocks. Differential Revision: https://reviews.llvm.org/D120055	2022-02-18 09:51:54 +01:00
Kuba Mracek	6b53ad298e	[GlobalDCE] [VFE] Avoid dropping vfunc dependencies when an invalid vtable entry is present When we scan vtables for a particular vload in ScanVTableLoad and an entry in one possible vtable is invalid (null or non-fptr), we bail in a wrong way -- we completely stop the scanning of vtables and this results in dropped dependencies and incorrectly removed vfuncs from vtables. Let's fix that by correcting the bailing logic to keep iterating and only skip the invalid entries. Differential Revision: https://reviews.llvm.org/D120006	2022-02-17 19:41:46 -08:00
William S. Moses	d9da6a535f	[LICM][PhaseOrder] Don't speculate in LICM until after running loop rotate LICM will speculatively hoist code outside of loops. This requires removing information, like alias analysis (https://github.com/llvm/llvm-project/issues/53794), range information (https://bugs.llvm.org/show_bug.cgi?id=50550), among others. Prior to https://reviews.llvm.org/D99249 , LICM would only be run after LoopRotate. Running Loop Rotate prior to LICM prevents a instruction hoist from being speculative, if it was conditionally executed by the iteration (as is commonly emitted by clang and other frontends). Adding the additional LICM pass first, however, forces all of these instructions to be considered speculative, even if they are not speculative after LoopRotate. This destroys information, resulting in performance losses for discarding this additional information. This PR modifies LICM to accept a ``speculative'' parameter which allows LICM to be set to perform information-loss speculative hoists or not. Phase ordering is then modified to not perform the information-losing speculative hoists until after loop rotate is performed, preserving this additional information. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D119965	2022-02-17 20:13:07 -05:00
Arthur Eubanks	af6b9939aa	[EarlyCSE][OpaquePtr] Check access type when performing DSE This will bail out on target specific intrinsics. If those are deemed important enough for EarlyCSE to handle, we can augment MemIntrinsicInfo with an access type for TargetTransformInfo::getTgtMemIntrinsic() to handle. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120077	2022-02-17 11:58:53 -08:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	254d6da020	[Attributor][FIX] Ensure stable iteration order With `668c5c688b` we introduced an ordering issue revealed by the reverse iteration buildbot. Depending on the order of the map that tracks the AAIsDead AAs we ended up with slightly different attributes. This is not totally unexpected and can happen. We should however be deterministic in our orderings to avoid such issues.	2022-02-17 12:53:10 -06:00
Daniil Suchkov	7c3e2b92cf	[RewriteStatepointsForGC] Fix an incorrect assertion The assertion verifying that a newly computed value matches what is already cached used stripPointerCasts() to strip bitcasts, however the values can be not only pointers, but also vectors of pointers. That is problematic because stripPointerCasts() doesn't handle vectors of pointers. This patch introduces an ad-hoc utility function to strip all bitcasts regardless of the value type. Reviewed By: skatkov, reames Differential Revision: https://reviews.llvm.org/D119994	2022-02-17 18:44:57 +00:00
Arthur Eubanks	4a26abc0b9	[InstCombine][OpaquePtr] Check store type in DSE implementation	2022-02-17 10:01:14 -08:00
Arthur Eubanks	129af4daa7	[SCEVExpander][OpaquePtr] Check GEP source type when finding identical GEP Fixes an opaque pointers miscompile. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120004	2022-02-17 08:48:11 -08:00
Jay Foad	9071393c18	[GlobalDCE] Simplify and return Changed = true less often Removing dead constants should not count as making a change to the module. This means that RemoveUnusedGlobalValue simplifies to just calling removeDeadConstantUsers, so inline it. Differential Revision: https://reviews.llvm.org/D120052	2022-02-17 16:03:13 +00:00
Sanjay Patel	58df2da054	[InstCombine] push constant operand down/outside in sequence of min/max intrinsics A generalization like this was suggested in D119754. This is the inverse direction of D119851, and we get all of the folds there plus the one that was missed. There is precedence for this kind of transform in instcombine with "or" instructions (but strangely only with that one opcode AFAICT). Similar justification as in the other patch: The line between instcombine and reassociate for these kinds of folds is blurry. This doesn't appear to have much cost and gives us the expected wins from repeated folds as seen in the last set of test diffs. Differential Revision: https://reviews.llvm.org/D119955	2022-02-17 10:36:37 -05:00
Alexey Bataev	d1cd64ffdd	[SLP][NFC]Fix misprint in function name, NFC.	2022-02-17 05:57:51 -08:00
Nikita Popov	36fdfaba19	[RelLookupTableConverter] Ensure that GV, GEP and load types match This code could be generalized to be type-independent, but for now just ensure that the same type constraints are enforced with opaque pointers as with typed pointers.	2022-02-17 12:05:05 +01:00
Roman Lebedev	371fcb720e	[SimplifyCFG][PhaseOrdering] Defer lowering switch into an integer range comparison and branch until after at least the IPSCCP That transformation is lossy, as discussed in https://github.com/llvm/llvm-project/issues/53853 and https://github.com/rust-lang/rust/issues/85133#issuecomment-904185574 This is an alternative to D119839, which would add a limited IPSCCP into SimplifyCFG. Unlike lowering switch to lookup, we still want this transformation to happen relatively early, but after giving a chance for the things like CVP to do their thing. It seems like deferring it just until the IPSCCP is enough for the tests at hand, but perhaps we need to be more aggressive and disable it until CVP. Fixes https://github.com/llvm/llvm-project/issues/53853 Refs. https://github.com/rust-lang/rust/issues/85133 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119854	2022-02-17 12:13:55 +03:00
Florian Mayer	c195addb60	[NFC] [MTE] [HWASan] Remove unnecessary member of AllocaInfo Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119981	2022-02-16 15:19:30 -08:00
Arthur Eubanks	826fae51d2	[SLPVectorizer][OpaquePtrs] Check GEP source element type Fixes a miscompile with opaque pointers. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D119980	2022-02-16 14:47:20 -08:00
Johannes Doerfert	8ad39fbaf2	[Attributor][FIX] Heap2Stack needs to use the alloca AS When we move an allocation from the heap to the stack we need to allocate it in the alloca AS and then cast the result. This also prevents us from inserting the alloca after the allocation call but rather right before. Fixes https://github.com/llvm/llvm-project/issues/53858	2022-02-16 15:58:32 -06:00
Johannes Doerfert	668c5c688b	[Attributor][FIX] Use liveness information of the right function When we use liveness for edges during the `genericValueTraversal` we need to make sure to use the AAIsDead of the correct function. This patch adds the proper logic and some simple caching scheme. We also add an assertion to the `isEdgeDead` call to make sure future misuse is detected earlier. Fixes https://github.com/llvm/llvm-project/issues/53872	2022-02-16 15:58:32 -06:00
Johannes Doerfert	6ed1ef0643	[Attributor][FIX] Pipe UsedAssumedInformation through more interfaces `UsedAssumedInformation` is a return argument utilized to determine what information is known. Most APIs used it already but `genericValueTraversal` did not. This adds it to `genericValueTraversal` and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the commonly used `UsedAssumedInformation`. This was supposed to be a NFC commit, then the test change appeared. Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the way we set `AllCallSitesKnown` was wrong as we ignored the fact some call sites were optimistically assumed dead. Included a dedicated test for this as well now. Fixes https://github.com/llvm/llvm-project/issues/53884	2022-02-16 14:44:20 -06:00
Nikita Popov	c9032f1a69	[LowerMemIntrinsics] Explicitly use i8 type in memmove lowering By convention, memcpy/memmove intrinsics are always used with i8 pointers (though this is not enforced), so in practice this code was always using an i8 type. Make that explicit. Of course, i8 is not a very profitable choice, and this code could be more performant by picking an appropriate larger type. But that would require additional test coverage and correctness review, and certainly shouldn't be a decision based on the pointer element type.	2022-02-16 16:31:55 +01:00
Florian Hahn	d03d3d7966	[DSE] Fall back to CFG scan for unreachable terminators. Blocks with UnreachableInst terminators are considered as root nodes in the PDT. This pessimize DSE, if there are no aliasing reads from the potentially dead store and the block with the unreachable terminator. If any of the root nodes of the PDF has UnreachableInst as terminator, fall back to the CFG scan, even the common dominator of all killing blocks does not post-dominate the block with potentially dead store. It looks like the compile-time impact for the extra scans is negligible. https://llvm-compile-time-tracker.com/compare.php?from=779bbbf27fe631154bdfaac7a443f198d4654688&to=ac59945f1bec1c6a7d7f5590c8c69fd9c5369c53&stat=instructions Fixes #53800. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119760	2022-02-16 14:06:40 +00:00
Bin Cheng	dfec0b3053	[FuncSpec] Save compilation time by caching uses for propagation We only need to do propagation on use instructions of the original value, rather than the replacing const value which might have lots of irrelavant uses. This is done by caching uses before replacing. Differential Revision: https://reviews.llvm.org/D119815	2022-02-16 10:46:26 +08:00
Philip Reames	2e50760775	[SLP] Add assert that entities are scheduled as expected Requested in D118538	2022-02-15 12:21:49 -08:00
Florian Mayer	59e7de26aa	[HWASan] remove replacement of DbgVariableIntrinsics. This code was dead because we AI->replaceUsesWithIf above. I verified this doesn't actually get run by applying https://gist.github.com/fmayer/aea7cbb4700cfe2c9d932591ae1073c3 to the Android toolchain and building AOSP, without any crash. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119802	2022-02-15 11:40:58 -08:00
Max Kazantsev	bfc1217119	[NFC] Introduce option to switch off compatible invokes merge Does not affect default behavior (transform is on).	2022-02-15 21:51:03 +07:00
Alexander Potapenko	05ee1f4af8	Revert "[asan] Add support for disable_sanitizer_instrumentation attribute" This reverts commit `dd145f953d`. https://reviews.llvm.org/D119726, like https://reviews.llvm.org/D114421, still causes TSan to fail, see https://lab.llvm.org/buildbot/#/builders/70/builds/18020 Differential Revision: https://reviews.llvm.org/D119838	2022-02-15 15:04:53 +01:00
Sanjay Patel	6357ccf57f	[InstCombine] reassociate min/max intrinsics with constant operands Integer min/max operations are associative: max (max X, C0), C1 --> max X, (max C0, C1) --> max X, NewC https://alive2.llvm.org/ce/z/wW5HVM This would avoid a regression when we canonicalize to min/max intrinsics (see D98152 ). Differential Revision: https://reviews.llvm.org/D119754	2022-02-15 08:31:23 -05:00
Simon Pilgrim	9606c69087	[InstCombine] Fold sub(Y,and(lshr(X,C),1)) --> add(ashr(shl(X,(BW-1)-C),BW-1),Y) (PR53610) As noted on PR53610, we can fold a 'bit splat' negation of a shifted bitmask pattern into a pair of shifts. https://alive2.llvm.org/ce/z/eGrsoN Differential Revision: https://reviews.llvm.org/D119715	2022-02-15 13:24:20 +00:00
Anton Afanasyev	b7574b092a	[SLP] Don't try to vectorize pair with insertelement Particularly this breaks vectorization of insertelements where some of intermediate (i.e. not last) insertelements are used externally. Fixes PR52275 Fixes #51617 Differential Revision: https://reviews.llvm.org/D119679	2022-02-15 16:12:59 +03:00
Alexander Potapenko	dd145f953d	[asan] Add support for disable_sanitizer_instrumentation attribute For ASan this will effectively serve as a synonym for __attribute__((no_sanitize("address"))) This is a reland of https://reviews.llvm.org/D114421 Reviewed By: melver, eugenis Differential Revision: https://reviews.llvm.org/D119726	2022-02-15 14:06:12 +01:00
Nikita Popov	2460a2ce47	[DSE] Extract a common PDT check (NFC)	2022-02-15 13:05:45 +01:00
Hongtao Yu	62ef77ca63	[CSSPGO] Do not merge a context that is already duplicated into the base profile. Do not merge a context that is already duplicated into the base profile. Also fixing a typo caused by previous refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119735	2022-02-14 18:07:11 -08:00
Florian Mayer	8de457eafc	[HWASAN] use common alignAndPadAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119614	2022-02-14 15:28:32 -08:00
Florian Mayer	205308de6b	[NFC] [MTE] Move alignAndPadAlloca to MemoryTaggingSupport. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119610	2022-02-14 14:54:04 -08:00
Nick Desaulniers	9dcb006165	[funcattrs] check reachability to improve noreturn There was a fixme in the code pertaining to attributing functions as noreturn. By using reachability, if none of the blocks that are reachable from the entry return, then the function is noreturn. Previously, the code only checked if any blocks returned. If they're unreachable, then they don't matter. This improves codegen for the Linux kernel. Fixes: https://github.com/ClangBuiltLinux/linux/issues/1563 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D119571	2022-02-14 14:01:59 -08:00
Ahmed Bougacha	c703f852c9	[IR] Define "ptrauth" operand bundle. This introduces a new "ptrauth" operand bundle to be used in call/invoke. At the IR level, it's semantically equivalent to an @llvm.ptrauth.auth followed by an indirect call, but it additionally provides additional hardening, by preventing the intermediate raw pointer from being exposed. This mostly adds the IR definition, verifier checks, and support in a couple of general helper functions. Clang IRGen and backend support will come separately. Note that we'll eventually want to support this bundle in indirectbr as well, for similar reasons. indirectbr currently doesn't support bundles at all, and the IR data structures need to be updated to allow that. Differential Revision: https://reviews.llvm.org/D113685	2022-02-14 11:27:35 -08:00
Nikita Popov	41c5a762e5	[DeadArgElim] Check that function type is the same If the function types differ, the call arguments don't necessarily correspon to the function arguments. It's likely not worthwhile to handle this more precisely, but at least we shouldn't crash.	2022-02-14 14:08:42 +01:00
Anton Afanasyev	954ea0f044	[SLP] Simplify indices processing for insertelements Get rid of non-constant and undef indices of insertelements at `buildTree()` stage. Fix bugs. Differential Revision: https://reviews.llvm.org/D119623	2022-02-14 14:50:44 +03:00
Nikita Popov	7c83f8c45d	[InstCombine] Check GEP source type in select of gep fold This is no longer implicitly checked through the pointer type with opaque pointers.	2022-02-14 11:46:45 +01:00
Nikita Popov	efece08ae2	[InstCombine] Remove manual debug loc transfer While this might be marginally more precise, we generally don't bother with this in InstCombine, and let the IRBuilder assign the debug location. I don't see why this one fold, out of the thousands done in InstCombine, should be treated specially.	2022-02-14 11:07:05 +01:00
Nikita Popov	18bf42c0a6	[CVP] Extract helper from phi processing (NFC) So we can use early returns and avoid those awkward !V checks.	2022-02-14 10:51:34 +01:00
Dávid Bolvanský	1be1fd735d	[AlwaysInliner] Check for callsite noinline attribute simplified	2022-02-14 09:33:30 +01:00
Kazu Hirata	befeb5acf6	[Transforms] Use default member initialization in MemmoveVerifier (NFC)	2022-02-13 10:34:03 -08:00
Kazu Hirata	fd3e8044cd	[Transforms] Use default member initialization in Prefetch (NFC)	2022-02-13 10:34:02 -08:00
Kazu Hirata	0b9a610a75	[Transforms] Use default member initialization in ConditionInfo (NFC)	2022-02-13 10:34:00 -08:00
Kazu Hirata	fda6a1ad42	[Transforms] Use default member initialization in CHRStats (NFC)	2022-02-13 10:33:56 -08:00
Florian Hahn	2cd22ce0d0	[LV] Pass start value directly to emitTransformedIndex (NFC).	2022-02-12 19:03:32 +00:00
Florian Mayer	6759cdd829	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 16:01:46 -08:00
Florian Mayer	bf2f72fa10	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 16:01:02 -08:00
Michael Gottesman	19279ffc77	[debug-info] If one sees a spill with a dbg.addr use, salvageDebugInfo upon it and don't hoist it. This ensures that if we have a dbg.addr in a coroutine funclet that is on one of our function arguments, that the dbg.addr is not mapped to undef and also that later it isn't hoisted to the front of the basic block. Instead it remains at its original cloned location. rdar://83957028 Differential Revision: https://reviews.llvm.org/D119576	2022-02-11 15:15:13 -08:00
Florian Mayer	26dbc47468	Revert "[hwasan] keep debug intrinsicts in AllocaInfo." This reverts commit `19fdf85f58`.	2022-02-11 14:41:24 -08:00
Florian Mayer	b1bd64aeee	Revert "[NFC] [MTE] Use helpers for stack tagging." This reverts commit `8f0e5b4e26`.	2022-02-11 14:41:24 -08:00
Florian Hahn	66400fc2dd	[ConstraintElimination] Support add with precondition. If we can prove that an addition without wrap flags won't wrap, decompse the operation. Issue #48253	2022-02-11 20:26:25 +00:00
Arthur Eubanks	b59a402237	[MSan][OpaquePtr] Use inline asm elementtype instead of getPointerElementType()	2022-02-11 11:50:35 -08:00
Florian Mayer	8f0e5b4e26	[NFC] [MTE] Use helpers for stack tagging. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119503	2022-02-11 10:59:09 -08:00
Florian Mayer	19fdf85f58	[hwasan] keep debug intrinsicts in AllocaInfo. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D119498	2022-02-11 10:56:53 -08:00
Florian Mayer	e7356fb3e2	[nfc] [hwasan] factor out logic to collect info about stack this is the first step in unifying some of the logic between hwasan and mte stack tagging. this only moves around code, changes to converge different implementations of the same logic follow later. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D118947	2022-02-11 10:54:12 -08:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Dávid Bolvanský	d828281e78	[AlwaysInliner] Respect noinline call site attribute ``` always_inline foo() { } bar () { noinline foo(); } ``` We should prefer call site attribute over attribute on decl. This is fix for AlwaysInliner, similar fix is needed for normal Inliner (follow up). Related to https://reviews.llvm.org/D119061 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D119553	2022-02-11 19:23:11 +01:00
Austin Kerbow	0bb25b4603	[InferAddressSpaces] Fix assert on invalid cast ordering If a cast is needed when replacing uses with newly created values, the cast must be inserted after the instruction that defines the new value. Fixes: SWDEV-321215 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D119524	2022-02-11 10:02:30 -08:00
Arthur Eubanks	22f4f94256	[CoroFrame][OpaquePtr] Remove getPointerElementType() call Get it from the byval type instead.	2022-02-11 09:53:20 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Nikita Popov	4c6289c369	[InstCombine] Check source element type in gep of phi of gep fold	2022-02-11 17:10:48 +01:00
Matt Arsenault	52fbb786a6	InferAddressSpaces: Fix assert on inferred source for inttoptr/ptrtoint If we had some source value we could infer an address space from that went through a ptrtoint/inttoptr pair, this would fail since bitcast can't change the address space. Fixes issue 53665.	2022-02-11 10:35:29 -05:00
Anton Afanasyev	cd685f5736	[NFC][SLP] Set default parameter for Offset equal to zero	2022-02-11 17:22:33 +03:00
Nikita Popov	5450963085	[InstCombine] Check source element type in phi of gep fold Rather than checking that the type is the same (which is always the case, given how these are part of the same phi) check that the source element type is the same. With opaque pointers, this is no longer implied.	2022-02-11 14:26:18 +01:00
Nikita Popov	2a1b1f1b1b	[GVN] Store source element type for GEP expressions To avoid incorrectly merging GEPs with different source types under opaque pointers. To avoid increasing the Expression structure size, this reuses the existing type member. The code does not rely on this to be the expression result type, it's only used as a disambiguator.	2022-02-11 13:03:30 +01:00
Simon Pilgrim	a5d6851489	LoopReroll::isLoopControlIV - use cast<> instead of dyn_cast<> to avoid dereference of nullptr The pointer is always dereferenced by isCompareUsedByBranch, so assert the cast is correct instead of returning nullptr	2022-02-11 10:19:25 +00:00
Nikita Popov	e714b98fff	[InstCombine] Check type compatibility in indexed load fold This fold could use a rewrite to an offset-based implementation, but for now make sure it doesn't crash with opaque pointers.	2022-02-11 10:16:27 +01:00
Nikita Popov	3571bdb4f3	[InstCombine] Require equal source element type in icmp of gep fold Without opaque pointers, this is implicitly enforced. This previously resulted in a miscompile.	2022-02-11 09:38:28 +01:00
Nikita Popov	e24067819f	[ArgPromotion] Protect harder against recursive promotion (PR42028) In addition to the self-recursion check, also check whether there is more than one node in the SCC, which implies that there is a larger cycle. I believe checking SCC structure (rather than something like norecurse) is the right thing to do here, because this is specifically about preventing infinite loops over the SCC. Fixes https://github.com/llvm/llvm-project/issues/42028. Differential Revision: https://reviews.llvm.org/D119418	2022-02-11 09:30:39 +01:00
Nico Weber	e76037db44	[llvm] Remove unused file MaximumSpanningTree.h The last use of this file was removed in late 2013 in `ea56494625`. The last use was in PathProfiling.cpp, which had an overview comment of the overall approach. Similar functionality lives in the slight more cryptically named CFGMST.h in this same directory. A similar overview comment is in PGOInstrumentation.cpp. No behavior change.	2022-02-10 21:01:24 -05:00
Philip Reames	5ba115031d	[PSE] Remove assumption that top level predicate is union from public interface [NFC] Note that this doesn't actually cause the top level predicate to become a non-union just yet. The above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.	2022-02-10 16:14:52 -08:00
Teresa Johnson	dd3f483335	[ThinLTO][WPD] LICM set lookup (NFC) Minor efficiency fix. There is no reason to perform the same set lookup repeatedly in the inner loop as it is invariant there. Differential Revision: https://reviews.llvm.org/D119474	2022-02-10 13:16:31 -08:00
Simon Pilgrim	6af7c1371a	[LoopVectorize] getStepVector - reduce scope of local variable. NFC.	2022-02-10 20:44:25 +00:00
Johannes Doerfert	dd75c0ea64	[Attributor][NFC] Expose new API in AAPointerInfo New users might want to check bins without a load or store instruction at hand. Since we use those instructions only to find the offset and size of the access anyway, we can expose an offset and size interface to the outside world as well. This commit mainly moves code around and exposes a class (OffsetAndSize) as well as a method forallInterferingAccesses in AAPointerInfo. Differential Revision: https://reviews.llvm.org/D119249	2022-02-10 13:52:24 -06:00
Johannes Doerfert	d1387a26a5	[Attributor][FIX] Reachability needs to account for readonly callees The oversight caused us to ignore call sites that are effectively dead when we computed reachability (or more precise the call edges of a function). The problem is that loads in the readonly callee might depend on stores prior to the callee. If we do not track the call edge we mistakenly assumed the store before the call cannot reach the load. The problem is nicely visible in: `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll` Caused by D118673. Fixes https://github.com/llvm/llvm-project/issues/53726	2022-02-10 13:52:24 -06:00
Johannes Doerfert	e39b419312	[Attributor][FIX] Honor alloca address space in AAPrivatizablePtr When we privatize a pointer (~argument promotion) we introduce new private allocas as replacement. These need to be placed in the alloca address space as later passes cannot properly deal with them otherwise. Fixes https://github.com/llvm/llvm-project/issues/53725	2022-02-10 13:52:24 -06:00
Simon Pilgrim	aca355a3bb	[InstCombine] Extend fold (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0) to support smin intrinsic Replace matchSelectPattern pattern match with the more general m_SMin so that it can handle smin intrinsics as well as the icmp+select pattern Noticed while reviewing regressions from D98152	2022-02-10 13:28:15 +00:00
Sanjay Patel	995d400f3a	[InstCombine] reduce mul operands based on undemanded high bits We already do this in SDAG, but mul was left out of the fold for unused high bits in IR. The high bits of a mul's operands do not change the low bits of the result: https://alive2.llvm.org/ce/z/XRj5Ek Verify some test diffs to confirm that they are correct: https://alive2.llvm.org/ce/z/y_W8DW https://alive2.llvm.org/ce/z/7DM5uf https://alive2.llvm.org/ce/z/GDiHCK This gets a fold that was presumed not possible in D114272: https://alive2.llvm.org/ce/z/tAN-WY Removing nsw/nuw is needed for general correctness (and is also done in the codegen version), but we might be able to recover more of those with better analysis. Differential Revision: https://reviews.llvm.org/D119369	2022-02-10 08:10:22 -05:00
Florian Hahn	80eea38d8d	[ConstraintElimination] Remove unnecessary recursion (NFC). Perform predicate normalization in a single switch, rather then going through recursions.	2022-02-10 12:26:35 +00:00
Nikita Popov	8018d6be34	[ArgPromotion] Transfer metadata to promoted loads Also transfer selected non-AA metadata to the promoted load. Only metadata from guaranteed to execute loads is transferred.	2022-02-10 11:28:07 +01:00
Florian Hahn	79d60b93b4	[ConstraintElimination] Skip floating point compares. (NFC) The solver only supports integer conditions. Adding floating point compares to the worklist only adds extra work. Just skip them.	2022-02-09 21:16:49 +00:00
Philip Reames	d39f4ac494	[SCEV] Unwind SCEVUnionPredicate from getPredicatedBackedgeTakenCount [NFC] For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.	2022-02-09 12:55:40 -08:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Florian Hahn	b71eed7e8f	[ConstraintElimination] Remove redundant lookup (NFC).	2022-02-09 18:00:03 +00:00
Florian Hahn	902db4ec1c	[ConstraintElimination] Move some definitions closer to uses (NFC).	2022-02-09 17:29:49 +00:00
Arthur Eubanks	1bdc6eacba	[LoopLoadElim] Support opaque pointers With typed pointers the pointer operand type checks the address space and the load/store type. With opaque pointers we have to check the load/store type separately.	2022-02-09 09:22:21 -08:00
Alexey Bataev	370ea1a199	[SLP][NFC]Fix comment, NFC.	2022-02-09 07:14:14 -08:00
Florian Hahn	8aa122081f	[LV] Pass step to emitTransformedIndex (NFC). Move out the induction step creation from emitTransformedIndex to the callers. In some places (e.g. widenIntOrFpInduction) the step is already created. Passing the step in ensures the steps are kept in sync.	2022-02-09 11:12:45 +00:00
Nikita Popov	68c1eeb4ba	[ArgPromotion] Make implementation offset based This rewrites ArgPromotion to be based on offsets rather than GEP structure. We inspect all loads at constant offsets and remember which types are loaded at which offsets. Then we promote based on those types. This generalizes ArgPromotion to work with bitcasted loads, and is compatible with opaque pointers. This patch also fixes incorrect handling of alignment during argument promotion. Previously, the implementation only checked that the pointer is dereferenceable, but was happy to speculate overaligned loads. (I would have fixed this separately in advance, but I found this hard to do with the previous implementation approach). Differential Revision: https://reviews.llvm.org/D118685	2022-02-09 09:35:01 +01:00
Florian Hahn	c9e6678b56	[LV] Move buildScalarSteps out of ILV (NFC). This makes the function independent of shared state in ILV (ensures no new dependencies on things like the cost model are introduced) and allows for use directly in recipe's ::execute functions.	2022-02-08 21:18:40 +00:00
Sylvestre Ledru	f2c2e924e7	Fix a typo (occured => occurred) Reported: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1005195	2022-02-08 21:35:26 +01:00
Roman Lebedev	c8ba2b67a0	[SimplifyCFG] 'merge compatible invokes': fully support indirect invokes As long as all the invokes in the set are indirect, we can merge them, but don't merge direct invokes into the set, even though it would be legal to do.	2022-02-08 21:29:38 +03:00
Roman Lebedev	414b47645d	[SimplifyCFG] 'merge compatible invokes': don't create trivial PHI's with all-identical incoming values	2022-02-08 21:29:38 +03:00
Joseph Huber	caf7f05c1c	[Attributor] Emit fixed-point remark on function list This patch replaces the function we emit the remark on when we run into the fix-point limit. Previously we got a function to emit a remark on from the worklist's associated function. However, the worklist may not always have an associated function in the case of global variables. Replace this with the function set, and if there are no functions don't emit the remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D119248	2022-02-08 12:10:21 -05:00
Philip Reames	c302f1e677	[SCEV] Generalize SCEVEqualsPredicate to any compare [NFC] PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B. This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.	2022-02-08 08:18:09 -08:00
Nikita Popov	074561a4a2	[Mem2Reg] Check that load type matches alloca type Alloca promotion can only deal with cases where the load/store types match the alloca type (it explicitly does not support bitcasted load/stores). With opaque pointers this is no longer enforced through the pointer type, so add an explicit check.	2022-02-08 17:16:15 +01:00
Roman Lebedev	42ca7cc889	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ uses If the original invokes had uses, the uses must have been in PHI's, but that immediately results in the incoming values being incompatible. But we'll replace uses of the original invokes with the use of the merged invoke, so as long as the incoming values become compatible after that, we can merge.	2022-02-08 17:49:38 +03:00
Roman Lebedev	9986d60224	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ PHIs but no uses As long as the incoming values for all the invokes in the set are identical, we can merge the invokes.	2022-02-08 17:49:38 +03:00
Roman Lebedev	8411560fd0	[SimplifyCFG] 'merge compatible invokes': support normal destination w/ no uses, no PHI's Even if the invokes have normal destination, iff it's the same block, we can merge them. For now, require that there are no PHI nodes, and the returned values of invokes aren't used.	2022-02-08 17:49:38 +03:00
Nikita Popov	b896334834	[ArgPromotion] Check dereferenceability on argument as well Before walking all the callers, check whether we have a dereferenceable attribute directly on the argument. Also make it clearer that the code currently does not treat alignment correctly.	2022-02-08 10:29:51 +01:00
Johannes Doerfert	dd101c808b	[Attributor][FIX] Do not use assumed information for UB detection The helper `Attributor::checkForAllReturnedValuesAndReturnInsts` simplifies the returned value optimistically. In `AAUndefinedBehavior` we cannot use such optimistic values when deducing UB. As a result, we assumed UB for the return value of a function because we initially (=optimistically) thought the function return is `undef`. While we later adjusted this properly, the `AAUndefinedBehavior` was under the impression the return value is "known" (=fix) and could never change. To correct this we use `Attributor::checkForAllInstructions` and then manually to perform simplification of the return value, only allowing known values to be used. This actually matches the other UB deductions. Fixes #53647	2022-02-07 20:19:19 -06:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Florian Hahn	5a72357697	[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC). By using IRBuilderBase instead of IRBuilder<> a forward declaration can be used instead of including IRBuilder.h	2022-02-07 17:46:16 +00:00
Sanjay Patel	897d92faef	[InstCombine] generalize 2 LSB of demanded bits for X*X This is a follow-up suggested in D119060. Instead of checking each of the bottom 2 bits individually, we can check them together and handle the possibility that we demand both together. https://alive2.llvm.org/ce/z/C2ihC2 Differential Revision: https://reviews.llvm.org/D119139	2022-02-07 11:33:55 -05:00
Nikita Popov	cdc0573f75	[MatrixBuilder] Remove unnecessary IRBuilder template (NFC) IRBuilderBase exists specifically to avoid the need for this.	2022-02-07 16:42:38 +01:00
Sanjay Patel	79b3fe8070	[InstCombine] SimplifyDemandedBits - mul(x,x) is odd iff x is odd https://alive2.llvm.org/ce/z/AXPr3k	2022-02-07 08:43:12 -05:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Djordje Todorovic	afd54e1ed1	[SLPVectorizer] Fix "unused variable" build warning	2022-02-07 10:38:19 +01:00
Kazu Hirata	3a3cb929ab	[llvm] Use = default (NFC)	2022-02-06 22:18:35 -08:00
Kazu Hirata	a1a8d10a17	[Transforms] Use default member initialization in LibCallSimplifier (NFC)	2022-02-06 16:36:27 -08:00
Kazu Hirata	3fce5bb7b0	[Transforms] Use default member initialization in LoopVersioning (NFC)	2022-02-06 16:36:25 -08:00
Congzhe Cao	1ef04326ec	[LoopInterchange] Support loop interchange with floating point reductions Enabled loop interchange support for floating point reductions if it is allowed to reorder floating point operations. Previously when we encouter a floating point PHI node in the outer loop exit block, we bailed out since we could not detect floating point reductions in the early days. Now we remove this limiation since we are able to detect floating point reductions. Reviewed By: #loopoptwg, Meinersbur Differential Revision: https://reviews.llvm.org/D117450	2022-02-06 17:04:47 -05:00
Florian Hahn	541ca12dcd	[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI). In scalarizeInstruction(), isUniformAfterVectorization is used to detect cases where it is sufficient to always access the first lane. This should map directly checking whether the operand is a uniform replicate recipe. Differential Revision: https://reviews.llvm.org/D116654	2022-02-06 16:37:20 +00:00
Kazu Hirata	2d650ee03e	[Transforms] Use default member initialization in SCEVFindUnsafe (NFC)	2022-02-05 21:39:27 -08:00
Kazu Hirata	cb13ebbf46	[Transforms] Use default member initialization in AAIsDeadCallSiteReturned (NFC)	2022-02-05 21:39:25 -08:00
Kazu Hirata	31d72f0e45	[Transforms] Use default member initialization in TruncInstCombine (NFC)	2022-02-05 21:39:23 -08:00
Kazu Hirata	9ed6800ef9	[Transforms] Use default member initialization in MaskOps (NFC)	2022-02-05 21:39:21 -08:00
Kazu Hirata	e24384b506	[Transforms] Use default member initialization in SimplifyIndvar (NFC)	2022-02-05 16:29:22 -08:00
Benjamin Kramer	ce9417348e	[SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI.	2022-02-06 00:57:47 +01:00
Benjamin Kramer	a40dc4eaf8	Simplify mask creation with llvm::seq. NFCI.	2022-02-05 23:35:41 +01:00
Sanjay Patel	5372160a18	[InstCombine] SimplifyDemandedBits - mul(x,x) - if only demand bit[1] then fold to zero This is a translation of the fold added to codegen with: `2d1390efbe` Part of solving issue #48027	2022-02-05 09:51:38 -05:00
Bill Wendling	c6f0940d99	[NFC] Remove unnecessary #includes An attempt to reduce the number of files that are recompiled due to a change. Differential Revision: https://reviews.llvm.org/D119055	2022-02-04 21:22:41 -08:00
Hongtao Yu	dee058c670	[CSSPGO] Turn on ext-tsp by default for CSSPGO. I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D119048	2022-02-04 19:46:44 -08:00
Roman Lebedev	18ff1ec3c3	Reland [SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ``` So if we `invoke` a `noreturn` function, and the normal destination of an invoke is not an `unreachable`, point it at the new `unreachable` block. The change/fix from the original commit is that we now actually create the new block, and don't just repurpose the original block, because said normal destination block could have other users. This reverts commit `db1176ce66`, relanding commit `598833c987`.	2022-02-05 02:58:19 +03:00
Roman Lebedev	db1176ce66	Revert "[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable`" The normal destination may have other uses. This reverts commit `598833c987`.	2022-02-05 02:30:20 +03:00
Roman Lebedev	598833c987	[SimplifyCFG] `markAliveBlocks()`: recognize that normal dest of `invoke`d `noreturn` function is `unreachable` As per LangRef's definition of `noreturn` attribute: ``` noreturn This function attribute indicates that the function never returns normally, hence through a return instruction. This produces undefined behavior at runtime if the function ever does dynamically return. nnotated functions may still raise an exception, i.a., nounwind is not implied. ```	2022-02-05 02:15:07 +03:00
Roman Lebedev	cd9e6a9c10	[NFC][InstCombine] `visitCallInst()`: make comment more understandable	2022-02-05 02:15:07 +03:00
Joseph Huber	6b78526b1b	[OpenMP] Emit remark on the captured call instead of the variable Changes the remark to emit on the function call that captures the globalized variable instead of the globalized variable itself. The user should be able to see which variable it was in the argument list of the function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106980	2022-02-04 17:50:53 -05:00
Philip Reames	0cc6165d05	[SLP] Strengthen internal asserts about scheduled node state [NFC] All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.	2022-02-04 12:22:52 -08:00
Philip Reames	f3f8e3da9f	[SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish] We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed. This also minorly reduces memory usage, but that's a secondary value at best.	2022-02-04 10:12:09 -08:00
Roman Lebedev	55cd727c9a	[SimplifyCFG] 'merge compatible invokes': allow PHI nodes in landing pads ... iff the incoming values for the invokes-to-be-merged are compatible (identical).	2022-02-04 20:26:44 +03:00
Roman Lebedev	0d384e9228	[NFC][SimplifyCFG] Extract `IncomingValuesAreCompatible()` out of `SafeToMergeTerminators()`	2022-02-04 20:26:44 +03:00
Sanjay Patel	0236c57181	[InstCombine] try to fold one-demanded-bit-of-multiply This is a generalization of the icmp fold in D118061 (and that can be abandoned). We're looking for a disguised form of "odd * odd must be odd". Some Alive2 proofs to show correctness: https://alive2.llvm.org/ce/z/60Y8hz https://alive2.llvm.org/ce/z/HfAP6R Differential Revision: https://reviews.llvm.org/D118539	2022-02-04 11:40:54 -05:00
Benjamin Kramer	85243124cf	Tweak some uses of std::iota to skip initializing the underlying storage. NFCI.	2022-02-04 17:00:50 +01:00
Roman Lebedev	36df803dfd	[SimplifyCFG] Merge compatible `invoke`s of a `landingpad` While nowadays SimplifyCFG knows how to hoist code from then-else blocks, sink code from unconditional predecessors, and even promote the latter by tail-merging `ret`/`resume` function terminators, that isn't everything. While i (& others) have been trying to deal with merging/sinking `unreachable`, apparently perhaps the more impactful remaining problem is merging the `throw` calls. If we start at the `landingpad`, all the predecessors are unwind edges of `invoke`s, and in some cases some of the `invoke`s are mergeable. ``` /// This is a weird mix of hoisting and sinking. Visually, it goes from: /// [...] [...] /// \| \| /// [invoke0] [invoke1] /// / \ / \ /// [cont0] [landingpad] [cont1] /// to: /// [...] [...] /// \ / /// [invoke] /// / \ /// [cont] [landingpad] ``` This simplifies the IR/CFG, at the cost of debug info and extra PHI nodes. Note that we don't require for all the `invokes` of the `landingpad` to be mergeable, they can form more than a single set, we gracefully handle that. For now, i completely disallowed normal destination, PHI nodes and indirect invokes but that can be supported. Out of all the CTMark projects, only 7zip is C++, so there isn't much impact: https://llvm-compile-time-tracker.com/compare.php?from=ba8eb31bd9542828f6424e15a3014f80f14522c8&to=722fc871c84f14157d45c2159bc9c8c7e2825785&stat=size-total ... but there it currently causes size-total decrease. Differential Revision: https://reviews.llvm.org/D117805	2022-02-04 17:04:21 +03:00
Florian Hahn	0a781d98fb	[ConstraintElimination] Add initial signed support. This patch adds initial support for signed conditions. To do so, ConstraintElimination maintains two separate systems, one with facts from signed and one for unsigned conditions. To start with this means information from signed and unsigned conditions is kept completely separate. When it is safe to do so, information from signed conditions may be also transferred to the unsigned system and vice versa. That's left for follow-ups. In the initial version, de-composition of signed values just handles constants and otherwise just uses the value, without trying to decompose the operation. Again this can be extended in follow-up changes. The main benefit of this limited signed support is proving >=s 0 pre-conditions added in D118799. But even this initial version also fixes PR53273. Depends on D118799. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118806	2022-02-04 14:02:48 +00:00
Florian Hahn	06f3ef6626	[ConstraintElimination] Allow adding pre-conditions for constraints. With this patch pre-conditions can be added to a list of constraints. Constraints with pre-conditions can only be used if all pre-conditions are satisfied when the constraint is used. The pre-conditions at the moment are specified as a list of (Predicate, Value ,Value ) tuples. This allow easily checking them like any other condition, using the existing infrastructure. This then is used to limit GEP decomposition to cases where we can prove that offsets are signed positive. This fixes a couple of incorrect transforms where GEP offsets where assumed to be signed positive, but they were not. Note that this effectively disables GEP decomposition, as there's no support for reasoning about signed predicates. D118806 adds initial signed support. Fixes PR49624. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D118799	2022-02-04 11:45:07 +00:00

... 3 4 5 6 7 ...

30067 Commits