llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	9bd22595ba	[unroll] Prune all but first copy of invariant exit If we have an exit which is controlled by a loop invariant condition and which dominates the latch, we know only the copy in the first unrolled iteration can be taken. All other copies are dead. The change itself is pretty straight forward, but let me add two points of context: * I'd have expected other transform passes to catch this after unrolling, but I'm seeing multiple examples where we get to the end of O2/O3 without simplifying. * I'd like to do a stronger change which did CSE during unroll and accounted for invariant expressions (as defined by SCEV instead of trivial ones from LoopInfo), but that doesn't fit cleanly into the current code structure. Differential Revision: https://reviews.llvm.org/D116496	2022-01-03 09:55:19 -08:00
Nikita Popov	730414b341	[CodeExtractor] Remove unnecessary explicit attribute handling (NFC) The nounwind and uwtable attributes will get handled as part of the loop below as well, there is no need to special-case them here.	2022-01-03 14:23:25 +01:00
Nikita Popov	587495ffa1	[CodeExtractor] Separate function from param/ret attributes (NFC) This list is confusing because it conflates functions attributes (which are either extractable or not) and other attribute kinds, which are simply irrelevant for this code.	2022-01-03 14:07:12 +01:00
Benjamin Kramer	4683ce2cd8	[InferAttrs] Give strnlen the same attributes as strlen This moves the only string function out of the big list of math funcs. And let's us CSE strnlen calls.	2021-12-30 20:43:43 +01:00
Kazu Hirata	e7774f499b	Use static_assert instead of assert (NFC) Identified with misc-static-assert.	2021-12-26 14:26:44 -08:00
Djordje Todorovic	93615b88f5	[Debugify] Use WeakWH map collected before Pass when checking loc drop This fixes a typo/bug when checking for pointer reuse when testing DI location preservation in the Debugify original mode (when checking -g generated Debug Info). Differential Revision: https://reviews.llvm.org/D115621	2021-12-21 15:54:09 +01:00
Sami Tolvanen	5dc8aaac39	[llvm][IR] Add no_cfi constant With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces function references with CFI jump table references, which is a problem for low-level code that needs the address of the actual function body. For example, in the Linux kernel, the code that sets up interrupt handlers needs to take the address of the interrupt handler function instead of the CFI jump table, as the jump table may not even be mapped into memory when an interrupt is triggered. This change adds the no_cfi constant type, which wraps function references in a value that LowerTypeTestsModule::replaceCfiUses does not replace. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Reviewed By: nickdesaulniers, pcc Differential Revision: https://reviews.llvm.org/D108478	2021-12-20 12:55:32 -08:00
Kazu Hirata	26bd534a79	[llvm] Use none_of instead of \!any_of (NFC)	2021-12-17 13:48:57 -08:00
Kazu Hirata	2b7be47b22	[llvm] Strip redundant lambda (NFC)	2021-12-17 10:51:40 -08:00
Philip Reames	f632c49478	Extract a helper function for computing estimate trip count of an exiting branch Plan to use this in following change to support estimated trip counts derived from multiple loop exits.	2021-12-16 17:29:32 -08:00
eopXD	6734be290b	Revert "[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified" This reverts commit `fbf6c8ac15`.	2021-12-16 02:06:11 -08:00
Yueh-Ting Chen	fbf6c8ac15	[LoopVersioning] Allow versionLoop to create plain branch inst when no runtime check is specified After this function call, the LLVM IR would look like the following: ``` if (true) /* NonVersionedLoop / else / VersionedLoop */ ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D104631	2021-12-16 00:41:29 -08:00
Arthur Eubanks	d5583366ba	[FunctionComparator] Use getAlign() instead of getAlignment() getAlignment() is deprecated.	2021-12-15 14:40:56 -08:00
Kazu Hirata	7787a8f1b7	[llvm] Use llvm::reverse (NFC)	2021-12-13 21:54:51 -08:00
Nick Desaulniers	95ba0e4563	[SimplifyLibCalls] propagate tail flags on CallInsts I noticed we weren't propagating tail flags on calls when FortifiedLibCallSimplifier.optimizeCall() was replacing calls to runtime checked calls to the non-checked routines (when safe to do so). Make sure to check this before replacing the original calls! Also, avoid any libcall transforms when notail/musttail is present. PR46734 Fixes: https://github.com/llvm/llvm-project/issues/46079 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107872	2021-12-13 11:18:30 -08:00
Gulfem Savrun Yeniceri	e5a8af7a90	[Passes] Fix relative lookup table converter pass This patch fixes the relative table converter pass for the lookup table accesses that are resulted in an instruction sequence, where gep is not immediately followed by a load, such as gep being hoisted outside the loop or another instruction is inserted in between them. The fix inserts the call to load.relative.instrinsic in the original place of load instead of gep. Issue is reported by FreeBSD via https://bugs.freebsd.org/259921. Differential Revision: https://reviews.llvm.org/D115571	2021-12-12 04:40:17 +00:00
Philip Reames	0d13f94c1d	[reductions] Delete another piece of dead flag handling [NFC] The code claimed to handle nsw/nuw, but those aren't passed via builder state and the explicit IR construction just above never sets them. The only case this bit of code is actually relevant for is FMF flags. However, dropPoisonGeneratingFlags currently doesn't know about FMF at all, so this was a noop. It's also unneeded, as the caller explicitly configures the flags on the builder before this call, and the flags on the individual ops should be controled by the intrinsic flags anyways. If any of the flags aren't safe to propagate, the caller needs to make that change.	2021-12-09 10:56:55 -08:00
Philip Reames	b24db85c0b	[recurrence] Delete dead flag/fmf handling [NFC] The recurrence lowering code has handling which claims to be about flag intersection, but all the callers pass empty arrays to the arguments. The sole exception is a caller of a method which has the argument, but no implementation. I don't know what the intent was here, but it certaintly doesn't actually do anything today.	2021-12-09 10:43:53 -08:00
Philip Reames	2d31b02517	Compute estimated trip counts for multiple exit loops This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst. Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable. The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change. This was noticed when analyzing regressions on D113939. I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately. Differential Revision: https://reviews.llvm.org/D115362	2021-12-09 09:53:49 -08:00
Dmitry Makogon	0b533c1833	[MetaRenamer] Add command line options to disable renaming name with specified prefixes This patch adds 4 options for specifying functions, aliases, globals and structs name prefixes hat don't need to be renamed by MetaRenamer pass. This is useful if one has some downstream logic that depends directly on an entity name. MetaRenamer can break this logic, but with the patch you can tell it not to rename certain names. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D115323	2021-12-09 18:45:06 +07:00
spupyrev	f573f6866e	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-07 07:31:10 -08:00
Nico Weber	3678326d28	Revert "ext-tsp basic block layout" This reverts commit `c68f71eb37`. Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424	2021-12-06 19:08:20 -05:00
spupyrev	c68f71eb37	ext-tsp basic block layout A new basic block ordering improving existing MachineBlockPlacement. The algorithm tries to find a layout of nodes (basic blocks) of a given CFG optimizing jump locality and thus processor I-cache utilization. This is achieved via increasing the number of fall-through jumps and co-locating frequently executed nodes together. The name follows the underlying optimization problem, Extended-TSP, which is a generalization of classical (maximum) Traveling Salesmen Problem. The algorithm is a greedy heuristic that works with chains (ordered lists) of basic blocks. Initially all chains are isolated basic blocks. On every iteration, we pick a pair of chains whose merging yields the biggest increase in the ExtTSP value, which models how i-cache "friendly" a specific chain is. A pair of chains giving the maximum gain is merged into a new chain. The procedure stops when there is only one chain left, or when merging does not increase ExtTSP. In the latter case, the remaining chains are sorted by density in decreasing order. An important aspect is the way two chains are merged. Unlike earlier algorithms (e.g., based on the approach of Pettis-Hansen), two chains, X and Y, are first split into three, X1, X2, and Y. Then we consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y, X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score. This improves the quality of the final result (the search space is larger) while keeping the implementation sufficiently fast. Differential Revision: https://reviews.llvm.org/D113424	2021-12-06 08:56:39 -08:00
Anna Thomas	72750f0012	[TrivialDeadness] Introduce API separating two different usages The earlier usage of wouldInstructionBeTriviallyDead is based on the assumption that the use_count of that instruction being checked will be zero. This patch separates the API into two different ones: 1. The strictly conservative one where the instruction is trivially dead iff the uses are dead. 2. The slightly relaxed form, where an instruction is dead along paths where it is not used. The second form can be used in identifying instructions that are valid to sink down to uses (D109917). Reviewed-By: reames Differential Revision: https://reviews.llvm.org/D114647	2021-12-03 10:09:52 -05:00
spupyrev	93a2c2919f	profi - a flow-based profile inference algorithm: Part III (out of 3) This is a continuation of D109860 and D109903. An important challenge for profile inference is caused by the fact that the sample profile is collected on a fully optimized binary, while the block and edge frequencies are consumed on an early stage of the compilation that operates with a non-optimized IR. As a result, some of the basic blocks may not have associated sample counts, and it is up to the algorithm to deduce missing frequencies. The problem is illustrated in the figure where three basic blocks are not present in the optimized binary and hence, receive no samples during profiling. We found that it is beneficial to treat all such blocks equally. Otherwise the compiler may decide that some blocks are “cold” and apply undesirable optimizations (e.g., hot-cold splitting) regressing the performance. Therefore, we want to distribute the counts evenly along the blocks with missing samples. This is achieved by a post-processing step that identifies "dangling" subgraphs consisting of basic blocks with no sampled counts; once the subgraphs are found, we rebalance the flow so as every branch probability is 50:50 within the subgraphs. Our experiments indicate up to 1% performance win using the optimization on some binaries and a significant improvement in the quality of profile counts (when compared to ground-truth instrumentation-based counts) {F19093045} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109980	2021-12-02 12:01:30 -08:00
spupyrev	98dd2f9ed3	profi - a flow-based profile inference algorithm: Part II (out of 3) This is a continuation of D109860. Traditional flow-based algorithms cannot guarantee that the resulting edge frequencies correspond to a connected flow in the control-flow graph. For example, for an instance in the attached figure, a flow-based (or any other) inference algorithm may produce an output in which the hot loop is disconnected from the entry block (refer to the rightmost graph in the figure). Furthermore, creating a connected minimum-cost maximum flow is a computationally NP-hard problem. Hence, we apply a post-processing adjustments to the computed flow by connecting all isolated flow components ("islands"). This feature helps to keep all blocks with sample counts connected and results in significant performance wins for some binaries. {F19077343} Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D109903	2021-12-02 11:04:21 -08:00
Kazu Hirata	22d82949b0	[llvm] Fix "unused variable" warnings	2021-12-02 09:20:17 -08:00
Djordje Todorovic	2cdc6f2ca6	Reland "[LICM] Hoist LOAD without sinking the STORE" When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-02 03:53:50 -08:00
Florian Hahn	2de5f39e54	[BuildLibCalls] Add support for memset_pattern{4,8}. Add support for memset_pattern{4,8} similar to the existing memset_pattern16 handling. Reviewed By: ab Differential Revision: https://reviews.llvm.org/D114883	2021-12-02 11:04:25 +00:00
Nikita Popov	a0ff26e08c	[GlobalOpt] Fix assertion failure during instruction deletion This fixes the assertion failure reported in https://reviews.llvm.org/D114889#3166417, by making RecursivelyDeleteTriviallyDeadInstructionsPermissive() more permissive. As the function accepts a WeakTrackingVH, even if originally only Instructions were inserted, we may end up with different Value types after a RAUW operation. As such, we should not assume that the vector only contains instructions. Notably this matches the behavior of the RecursivelyDeleteTriviallyDeadInstructions() function variant which accepts a single value rather than vector.	2021-12-02 11:58:39 +01:00
Florian Hahn	0496edad49	[BuildLibCalls] Add additional attrs to memcpy_chk. `memcpy_chk` can be treated like `memcpy`, with the exception that it may not return (if it aborts the program). See D114793 for a similar patch for `memset_chk`. Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D114863	2021-12-02 09:50:14 +00:00
Arthur Eubanks	512534bc16	[Cloning] Clone metadata on function declarations Previously we missed cloning metadata on function declarations because we don't call CloneFunctionInto() on declarations in CloneModule(). Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D113812	2021-12-01 15:40:05 -08:00
spupyrev	7cc2493daa	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. UPD Dec 1st 2021: - synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`; - added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-12-01 15:30:38 -08:00
Djordje Todorovic	72f9f066df	Revert "[LICM] Hoist LOAD without sinking the STORE" This reverts commit `ecb9d8e4e3`. I'll reland this as soon as the failing tests are fixed/updated.	2021-12-01 04:39:26 -08:00
Djordje Todorovic	ecb9d8e4e3	[LICM] Hoist LOAD without sinking the STORE When doing load/store promotion within LICM, if we cannot prove that it is safe to sink the store we won't hoist the load, even though we can prove the load could be dereferenced and moved outside the loop. This patch implements the load promotion by moving it in the loop preheader by inserting proper PHI in the loop. The store is kept as is in the loop. By doing this, we avoid doing the load from a memory location in each iteration. Please consider this small example: loop { var = ptr; if (var) break; ptr= var + 1; } After this patch, it will be: var0 = ptr; loop { var1 = phi (var0, var2); if (var1) break; var2 = var1 + 1; ptr = var2; } This addresses some problems from [0]. [0] https://bugs.llvm.org/show_bug.cgi?id=51193 Differential revision: https://reviews.llvm.org/D113289	2021-12-01 04:27:50 -08:00
Florian Hahn	6a5e29d13f	[BuildLibCalls] Add argmemonly, writeonly, nounwind to memset_chk. The memset_chk library function should match memset's attributes with respect of memory effects (argmemonly, writeonly). It also does not raise exceptions. It may not return, in case it aborts the program. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114793	2021-12-01 10:09:52 +00:00
Nikita Popov	84b978da3b	[LoopUnrollRuntime] Remove unnecessary pointer BECount check (NFC) BECounts are guaranteed to be integers nowadays.	2021-12-01 10:32:37 +01:00
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Bjorn Pettersson	297fb66484	Use a deterministic order when updating the DominatorTree This solves a problem with non-deterministic output from opt due to not performing dominator tree updates in a deterministic order. The problem that was analysed indicated that JumpThreading was using the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When preparing the list of updates to send to DomTreeUpdater::applyUpdates we iterated over a SmallPtrSet, which didn't give a well-defined order of updates to perform. The added domtree-updates.ll test case is an example that would result in non-deterministic printouts of the domtree. Semantically those domtree:s are equivalent, but it show the fact that when we use the domtree iterator the order in which nodes are visited depend on the order in which dominator tree updates are performed. Since some passes (at least EarlyCSE) are iterating over nodes in the dominator tree in a similar fashion as the domtree printer, then the order in which transforms are applied by such passes, transitively, also depend on the order in which dominator tree updates are performed. And taking EarlyCSE as an example the end result could be different depending on in which order the transforms are applied. Reviewed By: nikic, kuhar Differential Revision: https://reviews.llvm.org/D110292	2021-11-29 13:14:50 +01:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Jun Ma	07333810ca	Revert "Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""" This reverts commit `c93f93b2e3`.	2021-11-24 10:26:37 +08:00
Mehdi Amini	1392b654ff	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `884b6dd311`. The windows build is broken with a linker error.	2021-11-23 20:10:36 +00:00
spupyrev	884b6dd311	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 11:02:40 -08:00
Zarko Todorovski	0d3add216f	[llvm][NFC] Inclusive language: Reword replace uses of sanity in llvm/lib/Transform comments and asserts Reworded some comments and asserts to avoid usage of `sanity check/test` Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D114372	2021-11-23 13:22:55 -05:00
Philip Reames	065f777d27	Revert "profi - a flow-based profile inference algorithm: Part I (out of 3)" This reverts commit `b00fc19822`. This change fails to build (link) on ubuntu x86,	2021-11-23 09:18:28 -08:00
spupyrev	b00fc19822	profi - a flow-based profile inference algorithm: Part I (out of 3) The benefits of sampling-based PGO crucially depends on the quality of profile data. This diff implements a flow-based algorithm, called profi, that helps to overcome the inaccuracies in a profile after it is collected. Profi is an extended and significantly re-engineered classic MCMF (min-cost max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing missing and inaccurate profiling using a minimum cost circulation algorithm]. It models profile inference as an optimization problem on a control-flow graph with the objectives and constraints capturing the desired properties of profile data. Three important challenges that are being solved by profi: - "fixing" errors in profiles caused by sampling; - converting basic block counts to edge frequencies (branch probabilities); - dealing with "dangling" blocks having no samples in the profile. The main implementation (and required docs) are in SampleProfileInference.cpp. The worst-time complexity is quadratic in the number of blocks in a function, O(\|V\|^2). However a careful engineering and extensive evaluation shows that the running time is (slightly) super-linear. In particular, instances with 1000 blocks are solved within 0.1 second. The algorithm has been extensively tested internally on prod workloads, significantly improving the quality of generated profile data and providing speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it generally improves the performance (with a few outliers) but extra work in the compiler might be needed to re-tune existing optimization passes relying on profile counts. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109860	2021-11-23 09:08:30 -08:00
Kazu Hirata	d1abf481da	[llvm] Use range-based for loops (NFC)	2021-11-19 21:12:13 -08:00
ksyx	97b9e8438e	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. The values are converted to signed types to avoid unsigned operation with negative offsets. Part of revision D100179 Reapply commit `c35e8185d8` with fixing problem reported by mstorsjo	2021-11-19 20:24:36 -05:00
Senran Zhang	0425ea4621	[NFC][OpaquePtr][Evaluator] Remove call to PointerType::getElementType There are still another 2 uses of PointerType::getElementType in Evaluator when evaluating BitCast's on pointers. BitCast's on pointers should be removed when opaque ptr is ready, so I just keep them as is. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D114131	2021-11-19 10:32:55 +08:00
Philip Reames	8f95e915cd	[unroll-runtime] Relax two profitability limitations on multi-exit unrolling This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change. If anyone sees actually interesting code differences out of this, please let me know. I'm not expecting this to have much impact at all. Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block. We can only hit this with a poorly documented debug flag. Case 2 - Why should we treat epilog cases differently from prolog cases? Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled? Sorry for the lack of tests here. These are both exceedingly narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags. Note that the legality cases for each are already exercised with force flags.	2021-11-15 13:00:14 -08:00
Philip Reames	423da61835	[runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC] All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller. The assert use has little value with the restructured code and is simply dropped.	2021-11-15 11:39:07 -08:00
Philip Reames	e99902a872	[runtime-unroll] Restructure if-clause to improve readability [NFC]	2021-11-15 11:13:27 -08:00
ksyx	72b5138d37	Revert "[GVN][NFC] Remove redundant check" This reverts commit `c35e8185d8`. mstorsjo reported in the revision thread that one VNCoercion assertion is violated and seemly in relate to this commit. As per "If a test case that demonstrates a problem is reported in the commit thread, please revert and investigate offline", this commit is reverted.	2021-11-15 09:14:13 -05:00
Mircea Trofin	a32c2c3808	[NFC] Use Optional<ProfileCount> to model invalid counts ProfileCount could model invalid values, but a user had no indication that the getCount method could return bogus data. Optional<ProfileCount> addresses that, because the user must dereference the optional. In addition, the patch removes concept duplication. Differential Revision: https://reviews.llvm.org/D113839	2021-11-14 19:03:30 -08:00
Kazu Hirata	098e935174	[llvm] Use range-based for loops with CallBase::args (NFC)	2021-11-14 09:32:36 -08:00
Mircea Trofin	0662a3612c	[NFC][InlineFunction] Renamed some vars to conform to coding style	2021-11-14 07:26:44 -08:00
ksyx	c35e8185d8	[GVN][NFC] Remove redundant check The if-check above deleted part guarantees that StoreOffset <= LoadOffset and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0, while it could be meaningless to have a type with nonpositive size, so that the check could be removed. Part of revision D100179 Reviewed By: nikic	2021-11-13 15:59:43 -05:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Philip Reames	de2fed6152	[unroll] Keep unrolled iterations with initial iteration The unrolling code was previously inserting new cloned blocks at the end of the function. The result of this with typical loop structures is that the new iterations are placed far from the initial iteration. With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting. As such, placing Count-1 copies out of line is a fairly poor code placement choice. We'd much rather fall through into the hot (non-exiting) path. For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code. However, the real motivation for this change isn't performance. Its readability and human understanding. Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.	2021-11-12 11:40:50 -08:00
Florian Hahn	2ead34716a	[SimplifyCFG] Add early bailout if Use is not in same BB. Without this patch, passingValueIsAlwaysUndefined will iterate over all instructions from I to the end of the basic block, even if the use is outside the block. This patch adds an early bail out, if the use instruction is outside I's BB. This can greatly reduce compile-time in cases where very large basic blocks are involved, with a large number of PHI nodes and incoming values. Note that the refactoring makes the handling of the case where I is a phi and Use is in PHI more explicit as well: for phi nodes, we can also directly bail out. In the existing code, we would iterate until we reach the end and return false. Based on an earlier patch by Matt Wala. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D113293	2021-11-09 12:57:03 +00:00
Max Kazantsev	cb728cb8a9	[NFC] Get rid of hardcoded magical constant and use Optionals instead Refactor calculateIterationsToInvariance so that it doesn't need a magical constant to signify unknown answer.	2021-11-09 18:13:19 +07:00
Dmitry Makogon	ae14fae0ff	[SCEVExpander] Use stable_sort to sort loop Phis in SCEVExpander::replaceCongruentIVs This is a fix for test failures on expensive checks build caused by `db289340c8`. With LLVM_ENABLE_EXPENSIVE_CHECKS enabled the llvm::sort shuffles the given container. However, the sort is only called when the TTI is passed to replaceCongruentIVs. In the mentioned patch we pass it TTI, so the sort happens. But due to shuffling equivalent Phis may appear in different order from run to run. With the stable_sort instead of sort this is impossible - the order of sorted Phis is preserved.	2021-11-09 16:29:57 +07:00
Anton Afanasyev	ce4fa93db8	[SCCP] Tune cast instruction handling for overdefined operand Extended value is known to be inside range smaller than full one. Prevent SCCP to mark such value as overdefined. Fixes PR52253 Differential Revision: https://reviews.llvm.org/D112721	2021-11-08 18:34:30 +03:00
Kazu Hirata	0d182d9d1e	[Transforms] Use make_early_inc_range (NFC)	2021-11-07 17:03:15 -08:00
Benjamin Kramer	9b8b16457c	Put implementation details into anonymous namespaces. NFCI.	2021-11-07 15:18:30 +01:00
Kazu Hirata	843d1eda18	[llvm] Use llvm::reverse (NFC)	2021-11-06 19:31:18 -07:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Dmitry Makogon	e09958d5eb	[LoopPeel] Peel loops with exits followed by an unreachable or deopt block Added support for peeling loops with exits that are followed either by an unreachable-terminated block or block that has a terminatnig deoptimize call. All blocks in the sequence must have an unique successor, maybe except for the last one. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D110922	2021-11-02 23:12:04 +07:00
Jun Ma	c93f93b2e3	Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."""" This reverts commit `3a998c06a8`.	2021-11-01 15:31:59 +08:00
Kazu Hirata	c714da2ceb	[Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC)	2021-10-31 07:57:32 -07:00
Roman Lebedev	156f10c840	[IR] `SCEVExpander::generateOverflowCheck()`: short-circuit `umul_with_overflow`-by-one It's a no-op, no overflow happens ever: https://alive2.llvm.org/ce/z/Zw89rZ While generally i don't like such hacks, we have a very good reason to do this: here we are expanding a run-time correctness check for the vectorization, and said `umul_with_overflow` will not be optimized out before we query the cost of the checks we've generated. Which means, the cost of run-time checks would be artificially inflated, and after https://reviews.llvm.org/D109368 that will affect the minimal trip count for which these checks are even evaluated. And if they aren't even evaluated, then the vectorized code certainly won't be run. We could consider doing this in IRBuilder, but then we'd need to also teach `CreateExtractValue()` to look into chain of `insertvalue`'s, and i'm not sure there's precedent for that. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 19:45:55 +03:00
Nikita Popov	11a8423dab	[SCEV] Use reverse() (NFC)	2021-10-26 11:08:58 +02:00
Max Kazantsev	9bbfe0f72c	[NFC] Remove obsolete simplifyOnceImpl function The function simplifyOnce only calls simplifyOnceImpl and does nothing else. Having this separate helper makes no sense. Removing it. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D112517 Reviewed By: mkazantsev	2021-10-26 13:51:42 +07:00
Max Kazantsev	d4c74cd4e8	[NFC] [LoopPeel] Update IDoms of non-loop blocks dominated by the loop When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. Previously we stored exits' IDoms in a map before peeling a loop and then, after peeling off one iteration, we changed their IDoms. Now we use the same logic not only for exits but for all non-loop blocks dominated by the loop. So when we add logic to support peeling loops with exits which branch, for example, to an unreachable-terminated block, we would update the IDoms not only for exits, but for their successors. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev, nikic	2021-10-26 13:09:07 +07:00
Nikita Popov	3a995c918e	[SCEV] Move SCEVLostPoisonFlags() check into SCEVExpander Always insert values into ExprValueMap, and instead skip using them in SCEVExpander if poison-generating flags have been lost. This ensures that all values that are in ValueExprMap are also in ExprValueMap, so we can use the latter to invalidate the former. This change is probably not entirely NFC for the case where originally the SCEV had no nowrap flags but they were inferred later, in which case that would now allow reusing the existing value for expansion. Differential Revision: https://reviews.llvm.org/D112389	2021-10-25 22:37:20 +02:00
Nikita Popov	477551fd09	[SCEVExpander] Minor cleanup in value reuse (NFC) Use dyn_cast_or_null and convert one of the checks into an assertion. SCEV is a per-function analysis.	2021-10-25 10:32:17 +02:00
Nikita Popov	710596a1e1	[ConstantFolding] Accept offset in ConstantFoldLoadFromConstPtr (NFCI) As this API is now internally offset-based, we can accept a starting offset and remove the need to create a temporary bitcast+gep sequence to perform an offset load. The API now mirrors the ConstantFoldLoadFromConst() API.	2021-10-23 17:59:39 +02:00
Kazu Hirata	6fe949c4ed	[Target, Transforms] Use StringRef::contains (NFC)	2021-10-22 08:52:33 -07:00
Nikita Popov	1848525842	[CodeMetrics] Don't require speculatability for ephemeral values As discussed in D112016, our current requirement of speculatability for ephemeral is overly strict: What we really care about is that the instruction will be DCEd once the assume is dropped. For that it is sufficient that the instruction is side-effect free and not a terminator. In particular, this allows non-dereferenceable loads to be ephemeral values. Differential Revision: https://reviews.llvm.org/D112179	2021-10-21 20:30:01 +02:00
Florian Hahn	8977bd5806	[IndVars] Invalidate SCEV when IR is changed in rewriteLoopExitValue. At the moment, rewriteLoopExitValue forgets the current phi node in the loop that collects phis to rewrite. A few lines after the value is forgotten, SCEV is used again to analyze incoming values and potentially expand SCEV expression. This means that another SCEV is created for PN, before the IR is actually updated in the next loop. This leads to accessing invalid cached expression in combination with D71539. PN should only be changed once the actual incoming exit value is set in the next loop. Moving invalidation there should ensure that PN is invalidated in all relevant cases. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D111495	2021-10-20 20:48:33 +01:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Florian Hahn	e844f05397	[LoopUtils] Simplify addRuntimeCheck to return a single value. This simplifies the return value of addRuntimeCheck from a pair of instructions to a single `Value `. The existing users of addRuntimeChecks were ignoring the first element of the pair, hence there is not reason to track FirstInst and return it. Additionally all users of addRuntimeChecks use the second returned `Instruction ` just as `Value `, so there is no need to return an `Instruction `. Therefore there is no need to create a redundant dummy `and X, true` instruction any longer. Effectively this change should not impact the generated code because the redundant AND will be folded by later optimizations. But it is easy to avoid creating it in the first place and it allows more accurately estimating the cost of the runtime checks.	2021-10-18 18:03:09 +01:00
Peter Waller	c4603a8a43	[InstCombine][DebugInfo] Remove superflous assertion, add test When this code was added, an unnecessary assertion slipped in which we now hit in real code. Add a test to defend against it firing again.	2021-10-18 11:00:01 +00:00
Max Kazantsev	baad10c09e	Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits" This reverts commit `fa16329ae0`. See comments in discussion. Merged by mistake, not entirely getting what the problem was.	2021-10-18 17:14:11 +07:00
Max Kazantsev	fa16329ae0	[NFC] [LoopPeel] Change the way DT is updated for loop exits When peeling a loop, we assume that the latch has a `br` terminator and that all loop exits are either terminated with an `unreachable` or have a terminating deoptimize call. So when we peel off the 1st iteration, we change the IDom of all loop exits to the peeled copy of `NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support loops with exits that are followed by a block with an `unreachable` or a terminating deoptimize call, changing the exit's idom wouldn't be enough and DT would be broken. For example, let `Exit1` and `Exit2` are loop exits, and each of them unconditionally branches to the same `unreachable` terminated block. So neither of the exits dominates this unreachable block. If we change the IDoms of the exits to some peeled loop block, we don't update the dominators of the unreachable block. Currently we just don't get to the peeling logic, saying that we can't peel such loops. With this NFC we just insert edges from cloned exiting blocks to their exits after peeling each iteration (we accumulate the insertion updates and then after peeling apply the updates to DT). This patch was a part of D110922. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D111611 Reviewed By: mkazantsev	2021-10-18 10:23:05 +07:00
Stephen Tozer	f5ed223b0f	[DebugInfo] Limit the size of DIExpressions that we will salvage up to Fixes: https://bugs.llvm.org/show_bug.cgi?id=51841 This patch places an arbitrary limit on the size of DIExpressions that we will produce via salvaging, for performance reasons. This helps to fix a performance issue observed in the bug above, in which debug values would be salvaged hundreds of times, producing expressions with over 1000 elements and causing the compiler to hang. Limiting the size of debug values that we will produce to 128 largely fixes this issue. Reviewed By: dblaikie, jmorse Differential Revision: https://reviews.llvm.org/D110332	2021-10-15 17:34:21 +01:00
Kazu Hirata	81e9c90686	[llvm] Use llvm::is_contained (NFC)	2021-10-14 22:44:09 -07:00
Nikita Popov	69853f9920	[IVUsers] Move preheader check into SCEVExpander Rather than checking for loop nest preheaders upfront in IVUsers, move this requirement into isSafeToExpand() from SCEVExpander. Historically, LSR did not check whether SCEVs are safe to expand and fully relied on IVUsers to validate this. Later, support for non-expandable SCEVs was added via rigid formulas. Checking this in isSafeToExpand() makes it more obvious what exactly this check is guarding against, and avoids the awkward loop nest scan. This is a followup to https://reviews.llvm.org/D111493#3055286. Differential Revision: https://reviews.llvm.org/D111681	2021-10-14 21:52:31 +02:00
Arthur Eubanks	3628bb7436	Make various assume bundle data structures use uint64_t Following D110451, we need to make sure to support 64 bit values.	2021-10-13 10:38:41 -07:00
Hongtao Yu	098a0d8fbc	[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation. Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are: - Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes - Unblocked CSE by avoiding pseudo probe from clobbering memory SSA - Unblocked induction variable simpliciation - Allow empty loop deletion by treating probe intrinsic isDroppable - Some refactoring. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D110847	2021-10-12 09:44:12 -07:00
Florian Hahn	40d85f16c4	[LoopPeel] Use any_of & contains instead of for & find. Using contains was suggested in D108114, but I forgot to include it when landing the patch.	2021-10-12 12:18:01 +01:00
Florian Hahn	cd0ba9dc58	[LoopPeel] Peel if it turns invariant loads dereferenceable. This patch adds a new cost heuristic that allows peeling a single iteration off read-only loops, if the loop contains a load that 1. is feeding an exit condition, 2. dominates the latch, 3. is not already known to be dereferenceable, 4. and has a loop invariant address. If all non-latch exits are terminated with unreachable, such loads in the loop are guaranteed to be dereferenceable after peeling, enabling hoisting/CSE'ing them. This enables vectorization of loops with certain runtime-checks, like multiple calls to `std::vector::at` if the vector is passed as pointer. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D108114	2021-10-12 11:42:28 +01:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Nikita Popov	ea12adc169	[CanonicalizeFreeze] Drop IVUsers.h include (NFC) Looking for users of IVUsers, this was a false positive. Only LSR uses IVUsers.	2021-10-09 17:01:26 +02:00
Arthur Eubanks	9405217999	Revert "Recommit "[LoopPeel] Peel loops with deoptimizing exits"" This reverts commit `d68b59f3eb`. This is causing crashes, see D110922 for details.	2021-10-08 10:53:23 -07:00
Philip Reames	d694dd0f0d	Add iterator range variants of isGuaranteedToTransferExecutionToSuccessor [mostly-nfc] This factors out utilities for scanning a bounded block of instructions since we have this code repeated in a bunch of places. The change to InlineFunction isn't strictly NFC as the limit mechanism there didn't handle debug instructions correctly.	2021-10-08 09:50:10 -07:00
Max Kazantsev	d68b59f3eb	Recommit "[LoopPeel] Peel loops with deoptimizing exits" Removed obsolete DT verification that should not be there because the strategy of DT updates has changed. Differential Revision: https://reviews.llvm.org/D110922	2021-10-08 17:54:27 +07:00
Max Kazantsev	48a5a2d1af	Revert "[LoopPeel] Peel loops with deoptimizing exits" This reverts commit `8a959625c4`. Reported failures with LLVM_ENABLE_EXPENSIVE_CHECKS, need to investigate.	2021-10-08 16:07:59 +07:00
Max Kazantsev	8a959625c4	[LoopPeel] Peel loops with deoptimizing exits Added support for peeling loops with "deoptimizing" exits - such exits that it or any of its children (or any of their children, etc) either has a @llvm.experimental.deoptimize call prior to the terminating return instruction of this basic block or is terminated with unreachable. All blocks in the the sequence must have a single successor, maybe except for the last one. Previously we only checked the exit block for being deoptimizing. Now we check if the last reachable block from the exit is deoptimizing. Patch by Dmitry Makogon! Differential Revision: https://reviews.llvm.org/D110922 Reviewed By: mkazantsev	2021-10-08 10:32:13 +07:00
Bjorn Pettersson	7f93bb4a58	[LoopRotate] Forget SCEV values in RewriteUsesOfClonedInstructions This patch fixes problems reported in PR51981. When rotating a loop it isn't enough to just forget SCEV for that loop nest. When rotating we might clone some instructions from the old header into the preheader, and insert new PHI nodes to merge values together. There could be users of the original value that are updated to use the PHI result. And those users were not necessarily depending on a PHI node earlier, so they weren't cleaned up when just forgetting all SCEV:s for the loop nest. So we need to explicitly forget those values to avoid invalid cached SCEV expressions. Reviewed By: fhahn, mkazantsev Differential Revision: https://reviews.llvm.org/D110813	2021-10-07 19:36:30 +02:00
Itay Bookstein	40ec1c0f16	[IR][NFC] Rename getBaseObject to getAliaseeObject To better reflect the meaning of the now-disambiguated {GlobalValue, GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction (D109792), the function is renamed to getAliaseeObject.	2021-10-06 19:33:10 -07:00
Arthur Eubanks	72dddce652	More size_t -> uint64_t fixes after `05392466` Fixes some bots where the two differ.	2021-10-06 15:13:47 -07:00
Arthur Eubanks	ab7d421869	size_t -> uint64_t after `05392466` Fixes some bots where the two differ.	2021-10-06 13:52:04 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	1b76312e98	Update some types after D110451 To fix mismatched size_t vs uint64_t on some platforms.	2021-10-06 11:27:48 -07:00
Simon Pilgrim	0dcd2b40e6	[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument. This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known. Differential Revision: https://reviews.llvm.org/D111024	2021-10-06 15:40:35 +01:00
Simon Pilgrim	21661607ca	[llvm] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine) As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.	2021-10-06 12:04:30 +01:00
kpyzhov	7e390dfea7	[AMDGPU] Correction to `095c48fdf3`. Differential Revision: https://reviews.llvm.org/D110337	2021-10-05 21:47:25 -04:00
kpyzhov	095c48fdf3	[AMDGPU] Use "hostcall" module flag instead of searching for ockl_hostcall_internal() declaration. The current way to detect hostcalls by looking for "ockl_hostcall_internal()" function in the module seems to be not reliable enough. The LTO may rename the "ockl_hostcall_internal()" function when an application is compiled with "-fgpu-rdc", and MetadataStreamer pass to fail to detect hostcalls, therefore it does not set the "hidden_hostcall_buffer" kernel argument. This change adds a new module flag: hostcall that can be used to detect whether GPU functions use host calls for printf. Differential revision: https://reviews.llvm.org/D110337	2021-10-05 09:56:04 -04:00
Sjoerd Meijer	cdfc678572	[SCCPSolver] Fix use-after-free in markArgInFuncSpecialization In SCCPSolver::markArgInFuncSpecialization, the ValueState map may be reallocated after the initial ValueLatticeElement reference is grabbed, but before its use in copy initialization. This causes a use-after-free. To fix this, this commit changes the behavior to create the new ValueLatticeElement before assigning the old one to it. Patch by: https://github.com/duck-37/ Differential Revision: https://reviews.llvm.org/D111112	2021-10-05 12:56:32 +01:00
Bjorn Pettersson	7f84fa4ad4	[TargetLibraryInfo] Refactor size_t checks in isValidProtoForLibFunc. NFC In TargetLibraryInfoImpl::isValidProtoForLibFunc we no longer need the IsSizeTTy lambda function and the SizeTTy object. Instead we just follow the regular structure of checking for integer types given an exepected number of bits.	2021-10-04 15:46:39 +02:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Dávid Bolvanský	5f2f611880	Fixed more warnings in LLVM produced by -Wbitwise-instead-of-logical	2021-10-03 13:58:10 +02:00
Arthur Eubanks	a7b4ce9cfd	[NFC][AttributeList] Replace index_begin/end with an iterator We expose the fact that we rely on unsigned wrapping to iterate through all indexes. This can be confusing. Rather, keeping it as an implementation detail through an iterator is less confusing and is less code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D110885	2021-10-01 10:17:41 -07:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Kazu Hirata	f631173d80	[llvm] Migrate from arg_operands to args (NFC) Note that arg_operands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-09-30 08:51:21 -07:00
Anna Thomas	452714f8f8	[BPI] Keep BPI available in loop passes through LoopStandardAnalysisResults This is analogous to D86156 (which preserves "lossy" BFI in loop passes). Lossy means that the analysis preserved may not be up to date with regards to new blocks that are added in loop passes, but BPI will not contain stale pointers to basic blocks that are deleted by the loop passes. This is achieved through BasicBlockCallbackVH in BPI, which calls eraseBlock that updates the data structures in BPI whenever a basic block is deleted. This patch does not have any changes in the upstream pipeline, since none of the loop passes in the pipeline use BPI currently. However, since BPI wasn't previously preserved in loop passes, the loop predication pass was invoking BPI on the entire function every time it ran in an LPM. This caused massive compile time in our downstream LPM invocation which contained loop predication. See updated test with an invocation of a loop-pipeline containing loop predication and -debug-pass turned ON. Reviewed-By: asbirlea, modimo Differential Revision: https://reviews.llvm.org/D110438	2021-09-30 10:27:05 -04:00
Djordje Todorovic	f8dfc35256	NFC: [Debugify] Fix a typo when checking variables in the original mode	2021-09-29 04:35:10 -07:00
Adrian Prantl	1b998a5f0c	Add salvageDebugInfo support for truncating/extending ptr/int conversions. This patch enables debug info salvaging for truncating/extending ptr int conversions. The testcase uncovered a bug in adce, which is addressed separately. rdar://80227769 Differential Revision: https://reviews.llvm.org/D110461	2021-09-28 10:24:50 -07:00
Congzhe Cao	c42772752a	[CodeMoverUtils] Enhance isSafeToMoveBefore() when control flow equivalence is satisfied With improved analysis in determining CFG equivalence that does not require strict dominance and post-dominance conditions, we now relax isSafeToMoveBefore() such that an instruction I can be moved before InsertPoint even if they do not strictly dominate each other, as long as they follow the same control flow path. For example, we can move Instruction 0 before Instruction 1, and vice versa. ``` if (cond1) // Instruction 0: %add = add i32 1, 2 if (cond1) // Instruction 1: %add2 = add i32 2, 1 ``` Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110456	2021-09-27 18:37:36 -04:00
Jun Ma	3a998c06a8	Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""" This reverts commit `8ba2adcf9e`.	2021-09-27 20:39:05 +08:00
Congzhe Cao	751be2a064	[CodeMoverUtils] Enhance isSafeToMoveBefore() when moving BBs When moving an entire basic block BB before InsertPoint, currently we check for all instructions whether the operands dominates InsertPoint, however, this can be improved such that even an operand does not dominate InsertPoint, as long as it appears as a previous instruction in the same BB, it is safe to move. Reviewed By: Whitney Differential Revision: https://reviews.llvm.org/D110378	2021-09-24 05:48:15 -04:00
Fangrui Song	1a6e1ee42a	Resolve {GlobalValue,GloalIndirectSymol}::getBaseObject confusion While both GlobalAlias and GlobalIFunc are GlobalIndirectSymbol, their `getIndirectSymbol()` usage is quite different (GlobalIFunc's resolver is an entity different from GlobalIFunc itself). As discussed on https://lists.llvm.org/pipermail/llvm-dev/2020-September/144904.html ("[IR] Modelling of GlobalIFunc"), the name `getBaseObject` is confusing when used with GlobalIFunc. To resolve the confusion: * Move GloalIndirectSymol::getBaseObject to GlobalAlias:: (GlobalIFunc should use `getResolver` instead) * Change GlobalValue::getBaseObject not to inspect GlobalIFunc. Note: the function has 7 references. * Add GlobalIFunc::getResolverFunction to peel off potential ConstantExpr indirection (`strlen` in `test/LTO/Resolution/X86/ifunc.ll`) Note: GlobalIFunc::getResolver (like GlobalAlias::getAliasee which does not peel off ConstantExpr indirection) is kept to be used by ValueEnumerator. Reviewed By: ibookstein Differential Revision: https://reviews.llvm.org/D109792	2021-09-23 09:23:35 -07:00
Simon Pilgrim	5f2c53bdf4	Pass some DataLayout arguments by const-ref Avoid unnecessary copies, reported by MSVC static analyzer.	2021-09-23 15:50:31 +01:00
Bjorn Pettersson	85a586501b	[BasicBlockUtils] Fixup of an assumed typo in MergeBlockIntoPredecessor The NFC commit `e5692a564a` changed the logic for DomTreeUpdates to use the range [succ_begin, succ_begin) when looking for SuccsOfPredBB rather than using [succ_begin, succ_end). As the commit was NFC this is identified as a typo (it has been discussed briefly in phabricator). The typo was found when inspecting the code, so I've got no idea if changing back to the old range has any significant impact (such as solving any PR:s or causing some new problems). But at least this restores the code to the originally indented behavior.	2021-09-23 13:03:26 +02:00
Arthur Eubanks	e7249e4acf	[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest When determining whether to fold branches to a common destination by merging two blocks, SimplifyCFG will count the number of instructions to be moved into the first basic block. However, there's no reason to count free instructions like bitcasts and other similar instructions. This resolves missed branch foldings with -fstrict-vtable-pointers in llvm-test-suite's lambda benchmark. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108837	2021-09-22 09:52:37 -07:00
Max Kazantsev	073b254cff	[SimplifyCFG] Redirect switch cases that lead to UB into an unreachable block When following a case of a switch instruction is guaranteed to lead to UB, we can safely break these edges and redirect those cases into a newly created unreachable block. As result, CFG will become simpler and we can remove some of Phi inputs to make further analyzes easier. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109428 Reviewed By: lebedev.ri	2021-09-21 10:45:19 +07:00
Nikita Popov	0fc624f029	[IR] Return AAMDNodes from Instruction::getMetadata() (NFC) getMetadata() currently uses a weird API where it populates a structure passed to it, and optionally merges into it. Instead, we can return the AAMDNodes and provide a separate merge() API. This makes usages more compact. Differential Revision: https://reviews.llvm.org/D109852	2021-09-16 21:06:57 +02:00
Arthur Eubanks	d49cb5b303	[SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest This makes some tests in vector-reductions-logical.ll more stable when applying D108837. The cost of branching is higher when vector ops are involved due to potential SLP transformations. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D108935	2021-09-16 10:50:36 -07:00
Kazu Hirata	24c8eaec94	[Transforms] Use make_early_inc_range (NFC)	2021-09-15 19:55:24 -07:00
Owen Anderson	68079ef0eb	Teach SimplifyCFG to fold switches into lookup tables in more cases. In particular, it couldn't handle cases where lookup table constant expressions involved bitcasts. This does not seem to come up frequently in C++, but comes up reasonably often in Rust via `#[derive(Debug)]`. Originally reported by pcwalton. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D109565	2021-09-15 22:07:08 +00:00
Anna Thomas	b6cb03e6b9	Revert use of getUniqueUndroppableUser in AssumeBundleBuilder Fix build bot failure in rG4ac4e521 caused due to assumeBundleBuilder using new API (getUniqueUndroppableUser). We now continue using the existing API for AssumeBundleBuilder (getSingleUndroppableUser). Sorry for the noise here. Tests-Run: failing testcase passes.	2021-09-15 17:45:09 -04:00
Anna Thomas	4ac4e52189	[InstCombine] Improve TryToSinkInstruction with multiple uses This patch allows sinking an instruction which can have multiple uses in a single user. We were previously over-restrictive by looking for exactly one use, rather than one user. Also, the API for retrieving undroppable user has been updated accordingly since in both usecases (Attributor and InstCombine), we seem to care about the user, rather than the use. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D109700	2021-09-15 20:39:38 +00:00
Markus Lavin	1ac209ed76	[NPM] Added -print-pipeline-passes print params for a few passes. Added '-print-pipeline-passes' printing of parameters for those passes declared with _WITH_PARAMS macro in PassRegistry.def. Note that it only prints the parameters declared inside _WITH_PARAMS as in a few cases there appear to be additional parameters not parsable. The following passes are now covered (i.e. all of those with *_WITH_PARAMS in PassRegistry.def). LoopExtractorPass - loop-extract HWAddressSanitizerPass - hwsan EarlyCSEPass - early-cse EntryExitInstrumenterPass - ee-instrument LowerMatrixIntrinsicsPass - lower-matrix-intrinsics LoopUnrollPass - loop-unroll AddressSanitizerPass - asan MemorySanitizerPass - msan SimplifyCFGPass - simplifycfg LoopVectorizePass - loop-vectorize MergedLoadStoreMotionPass - mldst-motion GVN - gvn StackLifetimePrinterPass - print<stack-lifetime> SimpleLoopUnswitchPass - simple-loop-unswitch Differential Revision: https://reviews.llvm.org/D109310	2021-09-15 08:34:04 +02:00
Kazu Hirata	abca4c012f	[Utils] Use make_early_inc_range (NFC)	2021-09-13 08:57:23 -07:00
Johannes Doerfert	c09fbbdcfb	Reapply "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals"" This reapplies commit `7dbba3376f`, or, put differently, this reverts commit `d9a8d20827`. The test now requires the amdgpu and nvptx backend explicitly as it won't work without properly.	2021-09-10 15:22:56 -05:00
Johannes Doerfert	d9a8d20827	Revert "[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals" This reverts commit `7dbba3376f`. There seems to be a problem with the tests, investigating now: https://lab.llvm.org/buildbot/#/builders/61/builds/14574	2021-09-10 12:23:08 -05:00
Johannes Doerfert	7dbba3376f	[GlobalOpt][FIX] Do not embed initializers into AS!=0 globals Not all address spaces support initializers for globals and we can therefore not set them without checking if they are allowed. This patch adds a hook into TTI to check if an AS allows non-undef initializers. We disable it for all but address space 0 by default, NVPTX and AMDGPU targets allow all but address space 3. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109337	2021-09-10 12:08:50 -05:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Kazu Hirata	92c9ff6d5f	[IR, Transforms] Use arg_empty (NFC)	2021-09-09 08:50:10 -07:00
Roman Lebedev	909cba9699	[SimplifyCFG] performBranchToCommonDestFolding(): require block-closed SSA form for bonus instructions (PR51125) I can't seem to wrap my head around the proper fix here, we should be fine without this requirement, iff we can form this form, but the naive attempt (https://reviews.llvm.org/D106317) has failed. So just to unblock the release, put up a restriction. Fixes https://bugs.llvm.org/show_bug.cgi?id=51125	2021-09-09 12:28:09 +03:00
Jun Ma	8ba2adcf9e	Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values."" Differential Revision: https://reviews.llvm.org/D106056	2021-09-09 16:53:33 +08:00
Andrew Litteken	144cd22bae	[CodeExtractor] Creating exit stubs based off original order branch instructions. Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block. Reviewers: paquette, roelofs Differential Revision: https://reviews.llvm.org/D108657	2021-09-08 15:15:15 -07:00
Nikita Popov	6dfdc6bfd2	[SROA] Support opaque pointers Make the following changes in order to support opaque pointers in SROA: * Generate i8 GEPs for opaque pointers. * Explicitly enforce that promotable allocas only have stores of the alloca type -- previously this was implicitly enforced. * Replace a check for pointer element type with load/store type. Differential Revision: https://reviews.llvm.org/D109259	2021-09-08 22:25:44 +02:00
Akira Hatanaka	dea6f71af0	[ObjC][ARC] Use the addresses of the ARC runtime functions instead of integer 0/1 for the operand of bundle "clang.arc.attachedcall" https://reviews.llvm.org/D102996 changes the operand of bundle "clang.arc.attachedcall". This patch makes changes to llvm that are needed to handle the new IR. This should make it easier to understand what the IR is doing and also simplify some of the passes as they no longer have to translate the integer values to the runtime functions. Differential Revision: https://reviews.llvm.org/D103000	2021-09-08 11:58:03 -07:00
Max Kazantsev	29d054bf12	[SimplifyCFG] Preserve knowledge about guarding condition by adding assume This improvement adds "assume" after removal of branch basing on UB in successor block. Consider the following example: ``` pred: x = ... cond = x > 10 br cond, bb, other.succ bb: phi [nullptr, pred], ... // other possible preds load(phi) // UB if we came from pred other.succ: // here we know that x <= 10, but this knowledge is lost // after the branch is turned to unconditional unless we // preserve it with assume. ``` If we remove the branch basing on knowledge about UB in a successor block, then the fact that x <= 10 is other.succ might be lost if this condition is not inferrable from any dominating condition. To preserve this knowledge, we can add assume intrinsic with (possibly inverted) branch condition. Patch by Dmitry Bakunevich! Differential Revision: https://reviews.llvm.org/D109054 Reviewed By: lebedev.ri	2021-09-08 14:05:17 +07:00
Andy Kaylor	34528c32d2	Copy Elementtype Attribute to IR at Link step Copying IR during linking causes a type mismatch due to the field being missing in IRMover/Valuemapper. Adds the full range of typed attributes including elementtype attribute in the copy functions. Patch by Chenyang Liu Differential Revision: https://reviews.llvm.org/D108796	2021-09-07 11:41:43 -07:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Dávid Bolvanský	9c476172b9	[InstCombine] stpcpy(d,s) -> strcpy(d,s) if the result is not used	2021-09-05 12:12:07 +02:00
Philip Reames	fa82a3d016	[runtimeunroll] Support epilogue unrolling with a parent loop This patch adds support for unrolling inner loops using epilogue unrolling. The basic issue is that the original latch exit block of the inner loop could be outside the outer loop. When we clone the inner loop and split the latch exit, the cloned blocks need to be in the outer loop. Differential Revision: https://reviews.llvm.org/D108476	2021-09-02 16:29:20 -07:00
Philip Reames	45c672e20d	[runtimeunroll] Under EXPENSIVE_CHECKS, validate loop info Requested in review comment on D108476	2021-09-02 16:28:46 -07:00
Nikita Popov	c86e1ce73b	[SCEVExpander] Simplify pointer overflow check This is a followup to D104662 to generate slightly nicer code for pointer overflow checks. Bypass expandAddToGEP and instead explicitly generate i8 GEPs. This saves some bitcasts and negates the value in a more obvious way. In particular, this prevents SCEV from looking through the umul.with.overflow, same as in the integer case. The wrapping-pointer-ni.ll test deserves a comment: Previously, this generated a typed GEP which used the umulo argument rather than the multiplication result. This results in more compact IR in that case, but effectively does the multiplication twice, the second one is just hidden in the GEP. Reusing the umulo result seems pretty reasonable to me. Differential Revision: https://reviews.llvm.org/D109093	2021-09-02 20:15:59 +02:00
Philip Reames	c3b3aa277a	Fix a missing MemorySSA update in breakLoopBackedge This is a case I'd missed in 6a8237. The odd bit here is that missing the edge removal update seems to produce MemorySSA which verifies, but is still corrupt in a way which bothers following passes. I wasn't able to reduce a single pass test case, which is why the reported test case is taken as is. Differential Revision: https://reviews.llvm.org/D109068	2021-09-01 16:59:01 -07:00
Philip Reames	e735f2bf37	[SCEVExpander] Prefer pointer expansion for overflow checks We'd special cased this logic to use pointer types for non-integral pointers, but there's no reason we can't do that for all pointer types. Doing it this was has a few advantages: a) The code itself becomes more straight forward, and easier to test. b) We avoid introducing ptrtoint into programs which didn't have them in the source. c) The resulting codegen is easier to analyze and simplify (mostly due to lack of ptrtoint). Note that there are some test diffs, but a) running them through instcombine helps a ton, and b) there's enough missing obvious transforms on both before and after IR that it's clear this isn't performance sensitive. This is mostly motivated by cleaning up mentions of non-integrals to have a clearer idea of what we actually need to support. Differential Revision: https://reviews.llvm.org/D104662	2021-09-01 13:11:25 -07:00
Arthur Eubanks	52e6d70c40	[NFC] Use newly introduced *AtIndex methods Introduced in D108788. These are clearer.	2021-09-01 11:18:41 -07:00
Philip Reames	b604fcb7bc	[runtime] Move prolog/epilog block to a post-simplify strategy The runtime unroller will try to produce a non-loop if the unroll count is 2 and thus the prolog/epilog loop would only run at most one iteration. The old implementation did this by avoiding loop construction entirely. This patches instead constructs the trivial loop and then explicitly breaks the backedge and simplifies. This does result in some additional code churn when triggered, but a) results in better quality code and b) removes a codepath which didn't work properly for multiple exit epilogs. One oddity that I want to draw to reviewer attention is that this somehow changes revisit order. The new order looks equivalent to me, but I don't understand how creating and erasing an extra loop here creates this effect. Differential Revision: https://reviews.llvm.org/D108521	2021-08-31 09:29:36 -07:00
Doug Beck	ed6cff667e	Fix typo s/beloinging/belonging Differential Revision: https://reviews.llvm.org/D107099	2021-08-31 12:01:50 +05:30
Roman Lebedev	795d142d23	[NFCI][IndVars] rewriteLoopExitValues(): don't expand SCEV's until needed Previously, we'd expand ALL the SCEV's eagerly, because we needed to check with `isValidRewrite()`, and discard bad rewrite candidates, but now that we do not do that, we also don't need to always expand. In particular, this avoids expanding potentially-huge SCEV's that we would discard anyways because they are high-cost and we aren't rewriting aggressively.	2021-08-30 12:28:24 +03:00
Roman Lebedev	7b0d59da9a	[IndVars] Drop check for the validity of rewrite `isValidRewrite()` checks that the both the original SCEV, and the rewrite SCEV have the same base pointer. I //believe//, after all the recent SCEV improvements, this invariant is already enforced by SCEV itself. I originally tried changing it into an assert in D108043, but that showed that it triggers on e.g. https://reviews.llvm.org/D108043#2946621, where SCEV manages to forward the store to load, test added. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108655	2021-08-30 12:06:58 +03:00
Nikita Popov	9f7873784d	[SCEVExpander] Reuse removePointerBase() for canonical addrecs ExposePointerBase() in SCEVExpander implements basically the same functionality as removePointerBase() in SCEV, so reuse it. The SCEVExpander code assumes that the pointer operand on adds is the last one -- I'm not sure that always holds. As such this might not be strictly NFC.	2021-08-29 21:12:35 +02:00
Nikita Popov	0886fd5b3a	[SCEVExpander] Remove unnecessary mul/udiv check (NFC) Pointer-typed SCEV expressions can no longer be mul or udiv, so we do not need to specially handle them here.	2021-08-29 20:47:00 +02:00
Nikita Popov	3f162e8e6d	[SCEVExpander] Assert single pointer op in add (NFC) There can only be one pointer operand in an add expression, and we have sorted operands to guarantee that it is the first. As such, the pointer check for other operands is dead code.	2021-08-29 20:30:56 +02:00
Philip Reames	6a82376012	Special case common branch patterns in breakLoopBackedge (try 2) Changes since aec08e: * Adjust placement of a closing brace so that the general case actually runs. Turns out we had no coverage of the switch case. I added one in `eae90fd`. * Drop .llvm.loop.* metadata from the new branch as there is no longer a loop to annotate. Original commit message: This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-27 10:27:16 -07:00
Andrew Litteken	9d2c859ebb	[CodeExtractor] Making the arguments outlined easier to access from the outside The Code Extractor does not provide an easy mechanism for determining the inputs and outputs after extraction has occurred, this patch gives the ability to pass in empty SetVectors to be filled with the inputs and outputs if they need to be analyzed. Added Tests: - InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp Reviewers: paquette Differential Revision: https://reviews.llvm.org/D106991	2021-08-26 09:47:53 -07:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Florian Hahn	90d09eb300	[LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks. Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6. So far it has only been enabled for loops where all non-latch exits are 'de-optimizing' exits (D63923). But peeling of multi-exit loops can be highly beneficial in other cases too, like if all non-latch exiting blocks are unreachable. The motivating case are loops with runtime checks, like the C++ example below. The main issue preventing vectorization is that the invariant accesses to load the bounds of B is conditionally executed in the loop and cannot be hoisted out. If we peel off the first iteration, they become dereferenceable in the loop, because they must execute before the loop is executed, as all non-latch exits are terminated with unreachable. This subsequently allows hoisting the loads and runtime checks out of the loop, allowing vectorization of the loop. int sum(std::vector<int> A, std::vector<int> B, int N) { int cost = 0; for (int i = 0; i < N; ++i) cost += A->at(i) + B->at(i); return cost; } This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64. Note that this requires a follow-up improvement to the peeling cost model to actually peel iterations off loops as above. I will share that shortly. Also, peeling of multi-exits might be beneficial for exit blocks with other terminators, but I would like to keep the scope limited to known high-reward cases for now. I removed the option to disable peeling for multi-deopt exits because the code is more general now. Alternatively, the option could also be generalized, but I am not sure if there's much value in the option? Reviewed By: reames Differential Revision: https://reviews.llvm.org/D108108	2021-08-25 13:26:40 +01:00
Philip Reames	1e07f19bfc	Revert "Special case common branch patterns in breakLoopBackedge" This reverts commit `aec08e8600`. Several problems have been reported with malformed loopinfo after this change, see discussion on https://reviews.llvm.org/rGaec08e86004b.	2021-08-24 08:53:42 -07:00
Philip Reames	d8d84c9df8	[runtimeunroll] Use early return to reduce nesting [nfc]	2021-08-22 11:34:50 -07:00
Philip Reames	aec08e8600	Special case common branch patterns in breakLoopBackedge This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability. I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.	2021-08-22 10:42:23 -07:00
Alexander Potapenko	b0391dfc73	[clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute The purpose of __attribute__((disable_sanitizer_instrumentation)) is to prevent all kinds of sanitizer instrumentation applied to a certain function, Objective-C method, or global variable. The no_sanitize(...) attribute drops instrumentation checks, but may still insert code preventing false positive reports. In some cases though (e.g. when building Linux kernel with -fsanitize=kernel-memory or -fsanitize=thread) the users may want to avoid any kind of instrumentation. Differential Revision: https://reviews.llvm.org/D108029	2021-08-20 14:01:06 +02:00
Roman Lebedev	5d4f37e895	[NFCI][SimplifyCFG] Rewrite `createUnreachableSwitchDefault()` The only thing that function should do as per it's semantic, is to ensure that the switch's default is a block consisting only of an `unreachable` terminator. So let's just create such a block and update switch's default to point to it. There should be no need for all this weird dance around predecessors/successors.	2021-08-20 13:28:08 +03:00
Akira Hatanaka	898dc4590c	Refactor inlineRetainOrClaimRVCalls. NFC This is in preparation for committing https://reviews.llvm.org/D103000.	2021-08-19 14:55:45 -07:00
Arthur Eubanks	44a3241f10	[NFC] Replace some attribute methods that use confusing indexes	2021-08-19 14:10:26 -07:00
Philip Reames	17b9cb1817	[runtimeunroll] Support multiple exits to latch exit w/prolog loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. This is the prolog companion to D107381. Since this was LGTMed, a problem with DT updating was reported against that patch. I roled in the analogous fix here as it seemed obvious, and not worth re-review. As an aside, our prolog form leaves a lot of potential value on the floor when there is an invariant load or invariant condition in the loop being runtime unrolled. We should probably consider a "required prolog" heuristic. (Alternatively, maybe we should be peeling these cases more aggressively?) Differential Revision: https://reviews.llvm.org/D108262	2021-08-19 11:43:52 -07:00
Philip Reames	447256f22b	[runtimeunroll] Fix reported DT verification error after `94d0914` In `94d0914`, I added support for unrolling of multiple exit loops which have multiple exits reaching the latch. Per reports on the review post commit, I'd missed updating the domtree for one case. This fix addresses that ommission. There's no new test as this is covered by existing tests with expensive verification turned on.	2021-08-19 11:06:17 -07:00
Arthur Eubanks	33d44b762e	[OpaquePtr][Inline] Use byval type instead of pointee type Reviewed By: #opaque-pointers, dblaikie Differential Revision: https://reviews.llvm.org/D105711	2021-08-19 09:56:08 -07:00
Sanjay Patel	ec54e275f5	Revert "[CVP] processSwitch: Remove default case when switch cover all possible values." This reverts commit `9934a5b2ed`. This patch may cause miscompiles because it missed a constraint as shown in the examples from: https://llvm.org/PR51531	2021-08-19 08:43:51 -04:00
Arthur Eubanks	fde0eb1f9a	[NFC] A couple more removeAttribute() cleanups	2021-08-18 11:15:20 -07:00
Arthur Eubanks	3f4d00bc3b	[NFC] More get/removeAttribute() cleanup	2021-08-17 21:05:41 -07:00
Arthur Eubanks	de0ae9e89e	[NFC] Cleanup more AttributeList::addAttribute()	2021-08-17 21:05:41 -07:00
Arthur Eubanks	ad727ab7d9	[NFC] Migrate some callers away from Function/AttributeLists methods that take an index These methods can be confusing.	2021-08-17 21:05:40 -07:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Jun Ma	9934a5b2ed	[CVP] processSwitch: Remove default case when switch cover all possible values. Differential Revision: https://reviews.llvm.org/D106056	2021-08-18 10:23:13 +08:00
Philip Reames	94d0914292	[runtimeunroll] Support multiple exits to latch exit w/epilogue loop This patch extends the runtime unrolling infrastructure to support unrolling a loop with multiple exiting blocks branching to the same exit block used by the latch. It intentionally does not include a cost model change to enable this functionality unless appropriate force flags are used. I decided to restrict this to the epilogue case. Given the changes ended up being pretty generic, we may be able to unblock the prolog case too, but I want to do that in a separate change to reduce the amount of code we all have to understand at one time. Differential Revision: https://reviews.llvm.org/D107381	2021-08-17 17:52:04 -07:00
Philip Reames	982da7a20c	[SCEVExpander] Stop hoisting IR when reusing phis his is a fix for PR43678, and is an alternate patch to D105723. The basic issue we're running into is that LSR + SCEVExpander are moving the very instruction whose operand we're in the process of expanding. This breaks the subtle and ill-documented invariant which let LSR work. (Full story can be found here: https://reviews.llvm.org/D105723#2878473) Rather than attempting a fix, this change just removes the optimization entirely. The code is entirely untested, and removing it appears to have no impact I can find. This code was added back in 2014 by `1e12f8563d` with a single test which does not seem to actually test the hoisting logic. From a philosophical standpoint, it also seems very strange to have the expander implementing optimizations which should live in a dedicated transform pass. Differential Revision: https://reviews.llvm.org/D106178	2021-08-17 09:38:32 -07:00
Arthur Eubanks	0d822da2bd	[NFC] Remove/replace some confusing attribute getters on Function	2021-08-16 16:12:37 -07:00
Nikita Popov	735a590471	[MemorySSA] Remove -enable-mssa-loop-dependency option This option has been enabled by default for quite a while now. The practical impact of removing the option is that MSSA use cannot be disabled in default pipelines (both LPM and NPM) and in manual LPM invocations. NPM can still choose to enable/disable MSSA using loop vs loop-mssa. The next step will be to require MSSA for LICM and drop the AST-based implementation entirely. Differential Revision: https://reviews.llvm.org/D108075	2021-08-16 20:59:37 +02:00
Nikita Popov	570c9beb8e	[MemorySSA] Remove unnecessary MSSA dependencies LoopLoadElimination, LoopVersioning and LoopVectorize currently fetch MemorySSA when construction LoopAccessAnalysis. However, LoopAccessAnalysis does not actually use MemorySSA and we can pass nullptr instead. This saves one MemorySSA calculation in the default pipeline, and thus improves compile-time. Differential Revision: https://reviews.llvm.org/D108074	2021-08-16 20:40:55 +02:00
Roman Lebedev	febcedf18c	Revert "[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer" https://bugs.llvm.org/show_bug.cgi?id=51490 was filed. This reverts commit `35a8bdc775`.	2021-08-16 14:30:29 +03:00
David Sherwood	9b19b77883	[NFC] Remove unused code in llvm::createSimpleTargetReduction	2021-08-16 09:50:45 +01:00
Roman Lebedev	2eb554a9fe	Revert "Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" This is still wrong, as failing bots suggest. This reverts commit `3d9beefc7d`.	2021-08-16 11:07:42 +03:00
Roman Lebedev	3d9beefc7d	Reland [SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) ... with test change this time. LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:16:04 +03:00
Roman Lebedev	60dd0121c9	Revert "[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125)" Forgot to stage the test change. This reverts commit `78af5cb213`.	2021-08-15 19:15:09 +03:00
Roman Lebedev	78af5cb213	[SimplifyCFG] performBranchToCommonDestFolding(): form block-closed SSA form before cloning instructions (PR51125) LLVM IR SSA form is "implicit" in `@pr51125`. While is a valid LLVM IR, and does not require any PHI nodes, that completely breaks the further logic in `CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses()` that updates the live-out uses of the bonus instructions. What i believe we need to do, is to first make the SSA form explicit, by inserting tautological PHI nodes, and rewriting the offending uses. ``` $ /builddirs/llvm-project/build-Clang12/bin/opt -load /repositories/alive2/build-Clang-release/tv/tv.so -load-pass-plugin /repositories/alive2/build-Clang-release/tv/tv.so -tv -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -bonus-inst-threshold=10 -tv -o /dev/null /tmp/test.ll ---------------------------------------- @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: br label %L %L: %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 br i1 %iszero, label %exit, label %L2 %L2: store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp eq i32 %ld, 4294967295 br i1 %cmp, label %L, label %exit %exit: %r = phi i32 [ %ld, %L2 ], [ %ld, %L ] ret i32 %r } => @global_pr51125 = global 4 bytes, align 4 define i32 @pr51125() { %entry: %ld.old = load i32, * @global_pr51125, align 4 %iszero.old = icmp eq i32 %ld.old, 0 br i1 %iszero.old, label %exit, label %L2 %L2: %ld2 = phi i32 [ %ld.old, %entry ], [ %ld, %L2 ] store i32 4294967295, * @global_pr51125, align 4 %cmp = icmp ne i32 %ld2, 4294967295 %ld = load i32, * @global_pr51125, align 4 %iszero = icmp eq i32 %ld, 0 %or.cond = select i1 %cmp, i1 1, i1 %iszero br i1 %or.cond, label %exit, label %L2 %exit: %ld1 = phi i32 [ poison, %L2 ], [ %ld.old, %entry ] %r = phi i32 [ %ld2, %L2 ], [ %ld.old, %entry ] ret i32 %r } Transformation seems to be correct! ``` Fixes https://bugs.llvm.org/show_bug.cgi?id=51125 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D106317	2021-08-15 19:02:34 +03:00
Roman Lebedev	35a8bdc775	[NFCI][IndVars] rewriteLoopExitValues(): nowadays SCEV should not change `GEP` base pointer Currently/previously, while SCEV guaranteed that it produces the same value, the way it was produced may be illegal IR, so we have an ugly check that the replacement is valid. But now that the SCEV strictness wrt the pointer/integer types has been improved, i believe this invariant is already upheld by the SCEV itself, natively. I think we should add an assertion, wait for a week, and then, if all is good, rip out all this checking. Or we could just do the latter directly i guess. This reverts commit rL127839. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D108043	2021-08-15 18:59:32 +03:00
Arthur Eubanks	c19d7f8af0	[CallPromotion] Check for inalloca/byval mismatch Previously we would allow promotion even if the byval/inalloca attributes on the call and the callee didn't match. It's ok if the byval/inalloca types aren't the same. For example, LTO importing may rename types. Fixes PR51397. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D107998	2021-08-13 16:52:04 -07:00
Arthur Eubanks	a9831cce1e	[NFC] Remove public uses of AttributeList::getAttributes() Use methods that better convey the intent.	2021-08-13 11:38:12 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00

... 2 3 4 5 6 ...

6198 Commits