llvm-project

Commit Graph

Author	SHA1	Message	Date
modimo	ce7f9cdb50	[InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only. The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining. In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places. Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay. Testing: ninja check-llvm newly added test correctly replays CGSCC inlining decisions Reviewed By: mtrofin, wenlei Differential Revision: https://reviews.llvm.org/D94334	2021-01-25 15:38:57 -08:00
Kazu Hirata	16baad8f4e	[llvm] Use pop_back_val (NFC)	2021-01-24 12:18:57 -08:00
Nikita Popov	5d12b976b0	[ValueTracking] Don't assume readonly function will return This is similar to D94106, but for the isGuaranteedToTransferExecutionToSuccessor() helper. We should not assume that readonly functions will return, as this is only true for mustprogress functions (in which case we already infer willreturn). As with the DCE change, for now continue assuming that readonly intrinsics will return, as not all target intrinsics have been annotated yet. Differential Revision: https://reviews.llvm.org/D95288	2021-01-24 10:40:21 +01:00
Kazu Hirata	a3254904b2	[Analysis] Use llvm::append_range (NFC)	2021-01-22 23:25:01 -08:00
Shimin Cui	99a0aa07e9	[Analysis] Support AIX vec_malloc routines This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX. Differential Revision: https://reviews.llvm.org/D94710	2021-01-22 16:03:01 -05:00
Arthur Eubanks	6699029b67	[NewPM][opt] Run the "default" AA pipeline by default We tend to assume that the AA pipeline is by default the default AA pipeline and it's confusing when it's empty instead. PR48779 Initially reverted due to BasicAA running analyses in an unspecified order (multiple function calls as parameters), fixed by fetching analyses before the call to construct BasicAA. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D95117	2021-01-21 21:08:54 -08:00
Arthur Eubanks	a11bf9a7fb	[AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook Having a custom inliner doesn't really fit in with the new PM's pipeline. It's also extra technical debt. amdgpu-inline only does a couple of custom things compared to the normal inliner: 1) It disables inlining if the number of BBs in a function would exceed some limit 2) It increases the threshold if there are pointers to private arrays(?) These can all be handled as TTI inliner hooks. There already exists a hook for backends to multiply the inlining threshold. This way we can remove the custom amdgpu-inline pass. This caused inline-hint.ll to fail, and after some investigation, it looks like getInliningThresholdMultiplier() was previously getting applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it not applying at all, so some later inliner change must have fixed something), so I had to change the threshold in the test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D94153	2021-01-21 20:29:17 -08:00
Kazu Hirata	cfa241680f	[llvm] Don't include StringSwitch.h where unnecessary (NFC)	2021-01-21 19:59:48 -08:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Kazu Hirata	8f5da41c4d	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-20 21:35:52 -08:00
Mircea Trofin	ccec2cf1d9	Reland "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `d97f776be5`. The original problem was due to build failures in shared lib builds. D95079 moved ImportedFunctionsInliningStatistics under Analysis, unblocking this.	2021-01-20 13:33:43 -08:00
Mircea Trofin	95ce32c787	[NFC] Move ImportedFunctionsInliningStatistics to Analysis This is related to D94982. We want to call these APIs from the Analysis component, so we can't leave them under Transforms. Differential Revision: https://reviews.llvm.org/D95079	2021-01-20 13:18:03 -08:00
Mircea Trofin	d97f776be5	Revert "[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor" This reverts commit `e8aec763a5`.	2021-01-20 11:19:34 -08:00
Mircea Trofin	e8aec763a5	[NPM][Inliner] Factor ImportedFunctionStats in the InlineAdvisor When using 2 InlinePass instances in the same CGSCC - one for other mandatory inlinings, the other for the heuristic-driven ones - the order in which the ImportedFunctionStats would be output-ed would depend on the destruction order of the inline passes, which is not deterministic. This patch moves the ImportedFunctionStats responsibility to the InlineAdvisor to address this problem. Differential Revision: https://reviews.llvm.org/D94982	2021-01-20 11:07:36 -08:00
Kazu Hirata	b023cdeacc	[llvm] Use llvm::all_of (NFC)	2021-01-19 20:19:17 -08:00
Juneyoung Lee	4479c0c2c0	Allow nonnull/align attribute to accept poison Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison. To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null. This makes many transformations like below legal: ``` %p = gep inbounds %x, 1 ; % p is non-null pointer or poison call void @f(%p) ; instcombine converts this to call void @f(nonnull %p) ``` Instead, this semantics makes propagation of `nonnull` to caller illegal. The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument. Having `noundef` attribute there re-allows this. ``` define void @f(i8* %p) { ; functionattr cannot mark %p nonnull here anymore call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p. ret void } ``` Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90529	2021-01-20 11:31:23 +09:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
Nikita Popov	051ec9f5f4	[ValueTracking] Strengthen impliesPoison reasoning Split impliesPoison into two recursive walks, one over V, the other over ValAssumedPoison. This allows us to reason about poison implications in a number of additional cases that are important in practice. This is a generalized form of D94859, which handles the cmp to cmp implication in particular. Differential Revision: https://reviews.llvm.org/D94866	2021-01-19 18:04:23 +01:00
Florian Hahn	3747b69b53	[LoopRotate] Calls not lowered to calls should not block rotation. `83daa49758` made loop-rotate more conservative in the presence of function calls in the prepare-for-lto stage. The code did not properly account for calls that are no actual function calls, like calls to intrinsics. This patch updates the code to ensure only calls that are lowered to actual calls are considered inline candidates.	2021-01-19 14:37:36 +00:00
Florian Hahn	83daa49758	[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands. D84108 exposed a bad interaction between inlining and loop-rotation during regular LTO, which is causing notable regressions in at least CINT2006/473.astar. The problem boils down to: we now rotate a loop just before the vectorizer which requires duplicating a function call in the preheader when compiling the individual files ('prepare for LTO'). But this then prevents further inlining of the function during LTO. This patch tries to resolve this issue by making LoopRotate more conservative with respect to rotating loops that have inline-able calls during the 'prepare for LTO' stage. I think this change intuitively improves the current situation in general. Loop-rotate tries hard to avoid creating headers that are 'too big'. At the moment, it assumes all inlining already happened and the cost of duplicating a call is equal to just doing the call. But with LTO, inlining also happens during full LTO and it is possible that a previously duplicated call is actually a huge function which gets inlined during LTO. From the perspective of LV, not much should change overall. Most loops calling user-provided functions won't get vectorized to start with (unless we can infer that the function does not touch memory, has no other side effects). If we do not inline the 'inline-able' call during the LTO stage, we merely delayed loop-rotation & vectorization. If we inline during LTO, chances should be very high that the inlined code is itself vectorizable or the user call was not vectorizable to start with. There could of course be scenarios where we inline a sufficiently large function with code not profitable to vectorize, which would have be vectorized earlier (by scalarzing the call). But even in that case, there probably is no big performance impact, because it should be mostly down to the cost-model to reject vectorization in that case. And then the version with scalarized calls should also not be beneficial. In a way, LV should have strictly more information after inlining and make more accurate decisions (barring cost-model issues). There is of course plenty of room for things to go wrong unexpectedly, so we need to keep a close look at actual performance and address any follow-up issues. I took a look at the impact on statistics for MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer loops rotated, but no change to the number of loops vectorized. Reviewed By: sanwou01 Differential Revision: https://reviews.llvm.org/D94232	2021-01-19 10:15:29 +00:00
Juneyoung Lee	0441df94ad	[InstCombine,InstSimplify] Optimize select followed by and/or/xor This patch adds `A & (A && B)` -> `A && B` (similarly for or + logical or) Also, this patch adds `~(select C, (icmp pred X, Y), const)` -> `select C, (icmp pred' X, Y), ~const`. Alive2 proof: merge_and: https://alive2.llvm.org/ce/z/teMR97 merge_or: https://alive2.llvm.org/ce/z/b4yZUp xor_and: https://alive2.llvm.org/ce/z/_-TXHi xor_or: https://alive2.llvm.org/ce/z/2uYx_a Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94861	2021-01-19 09:14:17 +09:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Nikita Popov	4229b87ed3	[ValueTracking] Fix isSafeToSpeculativelyExecute for sdiv (PR48778) The != -1 check does not work correctly for all bitwidths. Use isAllOnesValue() instead.	2021-01-17 20:06:17 +01:00
Nikita Popov	a13c0f62c3	[InstSimplify] Fold xC1/C2 <= x (PR48744) We can fold xC1/C2 <= x to true if C1 <= C2. This is valid even if the multiplication is not nuw: https://alive2.llvm.org/ce/z/vULors The multiplication or division can be replaced by shifts. We don't handle the case where both are shifts, as that should get folded away by InstCombine.	2021-01-17 16:02:55 +01:00
Nikita Popov	0b84afa5fc	Reapply [BasicAA] Handle recursive queries more efficiently There are no changes relative to the original commit. However, an issue this exposed in BasicAA assumption tracking has been fixed in the previous commit. ----- An alias query currently works out roughly like this: * Look up location pair in cache. * Perform BasicAA logic (including cache lookup and insertion...) * Perform a recursive query using BestAAResults. * Look up location pair in cache (and thus do not recurse into BasicAA) * Query all the other AA providers. * Query all the other AA providers. This is a lot of unnecessary work, all ultimately caused by the BestAAResults query at the end of aliasCheck(). The reason we perform it, is that aliasCheck() is getting called recursively, and we of course want those recursive queries to also make use of other AA providers, not just BasicAA. We can solve this by making the recursive queries directly use BestAAResults (which will check both BasicAA and other providers), rather than recursing into aliasCheck(). There are some tradeoffs: * We can no longer pass through the precomputed underlying object to aliasCheck(). This is not a major concern, because nowadays getUnderlyingObject() is quite cheap. * Results from other AA providers are no longer cached inside BasicAA. The way this worked was already a bit iffy, in that a result could be cached, but if it was MayAlias, we'd still end up re-querying other providers anyway. If we want to cache non-BasicAA results, we should do that in a more principled manner. In any case, despite those tradeoffs, this works out to be a decent compile-time improvment. I think it also simplifies the mental model of how BasicAA works. It took me quite a while to fully understand how these things interact. Differential Revision: https://reviews.llvm.org/D90094	2021-01-17 10:34:35 +01:00
Nikita Popov	b1c2f1282a	[BasicAA] Move assumption tracking into AAQI D91936 placed the tracking for the assumptions into BasicAA. However, when recursing over phis, we may use fresh AAQI instances. In this case AssumptionBasedResults from an inner AAQI can reesult in a removal of an element from the outer AAQI. To avoid this, move the tracking into AAQI. This generally makes more sense, as the NoAlias assumptions themselves are also stored in AAQI. The test case only produces an assertion failure with D90094 reapplied. I think the issue exists independently of that change as well, but I wasn't able to come up with a reproducer.	2021-01-17 10:34:35 +01:00
Dávid Bolvanský	bfd75bdf3f	[NFC] Removed extra text in comments	2021-01-16 22:48:56 +01:00
Dávid Bolvanský	63bedc80da	[InstSimplify] Handle commutativity for 'and' and 'outer or' for (~A & B) \| ~(A \| B) --> ~A Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94870	2021-01-16 19:42:50 +01:00
Kazu Hirata	2082b10d10	[llvm] Use *::empty (NFC)	2021-01-16 09:40:55 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Dávid Bolvanský	bdd4dda58b	[InstSimplify] Update comments, remove redundant tests	2021-01-16 16:31:23 +01:00
Dávid Bolvanský	a4e2a5145a	[InstSimplify] Add (~A & B) \| ~(A \| B) --> ~A	2021-01-16 15:43:34 +01:00
Mircea Trofin	e8049dc3c8	[NewPM][Inliner] Move the 'always inliner' case in the same CGSCC pass as 'regular' inliner Expanding from D94808 - we ensure the same InlineAdvisor is used by both InlinerPass instances. The notion of mandatory inlining is moved into the core InlineAdvisor: advisors anyway have to handle that case, so this change also factors out that a bit better. Differential Revision: https://reviews.llvm.org/D94825	2021-01-15 17:59:38 -08:00
Reid Kleckner	64db296e5a	Revert "[BasicAA] Handle recursive queries more efficiently" This reverts commit `a3904cc77f`. It causes the compiler to crash while building Harfbuzz for ARM in Chromium, reduced reproducer forthcoming: https://crbug.com/1167305	2021-01-15 12:29:57 -08:00
Kazu Hirata	2efcbe24a7	[llvm] Use llvm::drop_begin (NFC)	2021-01-14 20:30:33 -08:00
Nikita Popov	a3904cc77f	[BasicAA] Handle recursive queries more efficiently An alias query currently works out roughly like this: * Look up location pair in cache. * Perform BasicAA logic (including cache lookup and insertion...) * Perform a recursive query using BestAAResults. * Look up location pair in cache (and thus do not recurse into BasicAA) * Query all the other AA providers. * Query all the other AA providers. This is a lot of unnecessary work, all ultimately caused by the BestAAResults query at the end of aliasCheck(). The reason we perform it, is that aliasCheck() is getting called recursively, and we of course want those recursive queries to also make use of other AA providers, not just BasicAA. We can solve this by making the recursive queries directly use BestAAResults (which will check both BasicAA and other providers), rather than recursing into aliasCheck(). There are some tradeoffs: * We can no longer pass through the precomputed underlying object to aliasCheck(). This is not a major concern, because nowadays getUnderlyingObject() is quite cheap. * Results from other AA providers are no longer cached inside BasicAA. The way this worked was already a bit iffy, in that a result could be cached, but if it was MayAlias, we'd still end up re-querying other providers anyway. If we want to cache non-BasicAA results, we should do that in a more principled manner. In any case, despite those tradeoffs, this works out to be a decent compile-time improvment. I think it also simplifies the mental model of how BasicAA works. It took me quite a while to fully understand how these things interact. Differential Revision: https://reviews.llvm.org/D90094	2021-01-14 20:32:41 +01:00
Jay Foad	517196e569	[Analysis,CodeGen] Make use of KnownBits::makeConstant. NFC. Differential Revision: https://reviews.llvm.org/D94588	2021-01-14 14:02:43 +00:00
Kazu Hirata	5c1c39e8d8	[llvm] Use *Set::contains (NFC)	2021-01-13 19:14:41 -08:00
Markus Lavin	f8cece1863	[ValueTracking] Fix one s/dyn_cast/dyn_cast_or_null/ Handle if Constant::getAggregateElement() returns nullptr in canCreateUndefOrPoison(). Differential Revision: https://reviews.llvm.org/D94494	2021-01-13 13:39:53 +01:00
Kazu Hirata	8a20e2b3d3	[llvm] Use Optional::getValueOr (NFC)	2021-01-12 21:43:50 -08:00
Kazu Hirata	12fc9ca3a4	[llvm] Remove redundant string initialization (NFC) Identified with readability-redundant-string-init.	2021-01-12 21:43:46 -08:00
modimo	2a49b7c64a	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	7ecad2e4ce	[InstSimplify] Don't fold gep p, -p to null This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403. Folding gep p, q-p to q is only legal if p and q have the same provenance. This fold should probably be guarded by something like getUnderlyingObject(p) == getUnderlyingObject(q). This patch is a partial fix that removes the special handling for gep p, 0-p, which will fold to a null pointer, which would certainly not pass an underlying object check (unless p is also null, in which case this would fold trivially anyway). Folding to a null pointer is particularly problematic due to the special handling it receives in many places, making end-to-end miscompiles more likely. Differential Revision: https://reviews.llvm.org/D93820	2021-01-12 20:24:23 +01:00
Bjorn Pettersson	675be65106	Require chained analyses in BasicAA and AAResults to be transitive This patch fixes a bug that could result in miscompiles (at least in an OOT target). The problem could be seen by adding checks that the DominatorTree used in BasicAliasAnalysis and ValueTracking was valid (e.g. by adding DT->verify() call before every DT dereference and then running all tests in test/CodeGen). Problem was that the LegacyPassManager calculated "last user" incorrectly for passes such as the DominatorTree when not telling the pass manager that there was a transitive dependency between the different analyses. And then it could happen that an incorrect dominator tree was used when doing alias analysis (which was a pretty serious bug as the alias analysis result could be invalid). Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D94138	2021-01-11 11:50:07 +01:00
Kazu Hirata	e3d3dbd339	[llvm] Ensure newlines at the end of files (NFC) This patch eliminates pesky "No newline at end of file" messages from git diff.	2021-01-10 09:24:57 -08:00
Kazu Hirata	1d10a1d5b1	[MemorySSA] Remove unused dominatesUse (NFC) The function was introduced without a use on Feb 2, 2016 in commit `e1100f533f`.	2021-01-10 09:24:55 -08:00
Nikita Popov	1ecae1e62a	[ConstantFold] Fold fptoi.sat intrinsics The APFloat::convertToInteger() API already implements the desired saturation semantics.	2021-01-10 17:37:27 +01:00
Florian Hahn	c701f85c45	[STLExtras] Use return type from operator* of the wrapped iter. Currently make_early_inc_range cannot be used with iterators with operator* implementations that do not return a reference. Most notably in the LLVM codebase, this means the User iterator ranges cannot be used with make_early_inc_range, which slightly simplifies iterating over ranges while elements are removed. Instead of directly using BaseT::reference as return type of operator, this patch uses decltype to get the actual return type of the operator implementation in WrappedIteratorT. This patch also updates a few places to use make use of make_early_inc_range. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D93992	2021-01-10 14:41:13 +00:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Kazu Hirata	b7c5e0b02c	[Target, Transforms] Use *Set::contains (NFC)	2021-01-08 18:39:54 -08:00

1 2 3 4 5 ...

10150 Commits