llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	fc4455891c	[VectorCombine] refactor matching code to reduce duplication; NFC cmp/binop were already diverging even though they are largely the same logic.	2020-02-21 12:06:51 -05:00
Florian Hahn	134bab7cd5	[DSE,MSSA] Add debug counter. Can be used like -debug-counter=dse-memoryssa-skip=10,dse-memoryssa-counter-count=20 Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72147	2020-02-21 17:04:37 +00:00
Bill Wendling	2fe457690d	Filter callbr insts from critical edge splitting Similarly to how splitting predecessors with an indirectbr isn't handled in the generic way, we also shouldn't split callbrs, for similar reasons.	2020-02-20 16:24:42 -08:00
Florian Hahn	99809f98d7	[SCCP] Do not mark unknown loads as overdefined. For tracked globals that are unknown after solving, we expect all non-store uses to be replaced. This is a follow-up to `f8045b250d`, which removed forcedconstant. We should not mark unknown loads as overdefined, as they either load from an unknown pointer or an undef global. Restore the original logic for loads.	2020-02-20 22:48:58 +01:00
dfukalov	dbfc682e2b	SpeculativeExecution: fixed ingoring free execution Summary: After updating cost model in AMDGPU target (`47a5c36b37`) the pass started to ignore some BBs since they got all instructions estimated as free. Reviewers: arsenm, chandlerc, nhaehnle Reviewed By: nhaehnle Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74825	2020-02-20 14:45:02 +03:00
Johannes Doerfert	d95cb56649	[Attributor] Make sure abstract attributes are properly initialized	2020-02-20 02:46:40 -06:00
Johannes Doerfert	6185fb13d6	[Attributor][NFC] Refactor interface	2020-02-20 02:46:40 -06:00
Johannes Doerfert	8e76fec0ae	[Attributor][NFC] Improve the debug output & add a TODO	2020-02-19 23:46:08 -06:00
Johannes Doerfert	f8ad735729	[Attributor] Use existing `returned` information better We can look through calls with `returned` argument attributes when we collect subsuming positions. This allows us to get existing attributes from more places.	2020-02-19 23:46:07 -06:00
Johannes Doerfert	a801ee869d	[Attributor][FIX] Avoid setting wrong load/store alignments	2020-02-19 23:46:07 -06:00
Johannes Doerfert	e1eed6c5b9	[Attributor] Generalize `getAssumedConstantInt` interface We are often interested in an assumed constant and sometimes it has to be an integer constant. Before we only looked for the latter, now we can ask for either.	2020-02-19 22:33:51 -06:00
Johannes Doerfert	16188f9d70	[Attributor][FIX] Do not create new calls edge we cannot handle If we propagate function pointers across function boundaries we can create new call edges. These need to be represented in the CG if we run as a CGSCC pass. In the new pass manager that is currently not handled by the CallGraphUpdater so we need to prevent the situation for now.	2020-02-19 22:33:51 -06:00
Johannes Doerfert	1e99fc9d58	[Attributor] Add initial AAIsDead for arguments We usually will ask for liveness of an argument anyway so we ended up lazily creating the attribute anyway. However, that is not always the case and even if it is we should go the eager route here. Various tests show how this can improve the outcome. One test exposed a problem with type mismatches between argument and call site argument, a fix is included. For liveness various more tests were added as well.	2020-02-19 21:39:45 -06:00
Johannes Doerfert	c6ac717aa7	[Attributor] Allow multiple uses of a casted function pointer If a function pointer is casted into a different type the resulting expression can be a constant. If so, it can be used multiple times which cannot be handled by the AbstractCallSite constructor alone. Instead, we follow the cast expression uses now explicitly during the call site traversal.	2020-02-19 20:43:38 -06:00
Michael Kruse	e4d20ec8ad	[IndVarSimply] Fix assert/release build difference. In builds with assertions enabled (!NDEBUG), IndVarSimplify does an additional query to ScalarEvolution which may change future SCEV queries since it fills the internal cache differently. The result is actually only used with the -verify-indvars command line option. We fix the issue by only calling SE->getBackedgeTakenCount(L) if -verify-indvars is enabled such that only -verify-indvars shows the behavior, but not debug builds themselves. Also add a remark to the description of -verify-indvars about this behavior. Fixes llvm.org/PR44815 Differential Revision: https://reviews.llvm.org/D74810	2020-02-19 14:36:22 -06:00
Florian Hahn	c7fc0e5da6	Revert "[PatternMatch] Match XOR variant of unsigned-add overflow check." This reverts commit `e01a3d49c2`. and commit `a6a585b803`. This causes a failure on GreenDragon: http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597	2020-02-19 19:37:08 +01:00
Florian Hahn	e01a3d49c2	[PatternMatch] Match XOR variant of unsigned-add overflow check. Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match the expected pattern in CodeGenPerpare via UAddWithOverflow. This causes a regression over Clang 7 on both X86 and AArch64: https://gcc.godbolt.org/z/juhXYV This patch extends UAddWithOverflow to also catch the XOR case, if the XOR is only used in the ICMP. This covers just a single case, but I'd like to make sure I am not missing anything before tackling the other cases. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: nikic, lebedev.ri Differential Revision: https://reviews.llvm.org/D74228	2020-02-19 15:25:18 +01:00
Brian Gesiak	5a187d8ed1	[Coroutines][4/6] New pass manager: coro-cleanup Summary: Depends on https://reviews.llvm.org/D71900. The fourth in a series of patches that ports the LLVM coroutines passes to the new pass manager infrastructure. This patch implements 'coro-cleanup'. No existing regression tests check the behavior of coro-cleanup on its own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but it relies on the entire coroutines pipeline being run. It's updated to test the new pass manager in the 5th patch of this series.) Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei Reviewed By: wenlei Subscribers: wenlei, EricWF, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71901	2020-02-19 00:30:27 -05:00
Brian Gesiak	2365238b9d	Re-land new pass manager coro-split and coro-elide This re-applies patches https://reviews.llvm.org/D71899 and https://reviews.llvm.org/D71900, which were reverted in https://reviews.llvm.org/rG11053a1cc61 and https://reviews.llvm.org/rGe999aa38d16. The underlying problem that caused two buildbots to fail with these patches is explained in https://reviews.llvm.org/rG26f356350bd -- older compliers disagree with the order in which the left- and right-hand side of an assignment in LazyCallGraph ought to be evaluated, which caused an assertion in SmallVector::operator[] to fire when the test suite was run.	2020-02-19 00:11:23 -05:00
Reid Kleckner	0c2b09a9b6	[IR] Lazily number instructions for local dominance queries Essentially, fold OrderedBasicBlock into BasicBlock, and make it auto-invalidate the instruction ordering when new instructions are added. Notably, we don't need to invalidate it when removing instructions, which is helpful when a pass mostly delete dead instructions rather than transforming them. The downside is that Instruction grows from 56 bytes to 64 bytes. The resulting LLVM code is substantially simpler and automatically handles invalidation, which makes me think that this is the right speed and size tradeoff. The important change is in SymbolTableTraitsImpl.h, where the numbering is invalidated. Everything else should be straightforward. We probably want to implement a fancier re-numbering scheme so that local updates don't invalidate the ordering, but I plan for that to be future work, maybe for someone else. Reviewed By: lattner, vsk, fhahn, dexonsmith Differential Revision: https://reviews.llvm.org/D51664	2020-02-18 14:44:24 -08:00
Fangrui Song	13a97305ba	[JumpThreading] Skip unconditional PredBB when threading jumps through two basic blocks Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by `4698bf145d`) ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault. AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB, ValueMapping); We can also allow unconditional PredBB, but the produced code is not better. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D74747	2020-02-18 11:01:46 -08:00
Tyker	c9e93c84f6	Add Query API for llvm.assume holding attributes Reviewers: jdoerfert, sstefan1, uenoku Reviewed By: jdoerfert Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72885	2020-02-18 19:42:07 +01:00
Huihui Zhang	8ee0e1dc02	[NFC] Silence compiler warning [-Wmissing-braces].	2020-02-18 10:37:12 -08:00
Florian Hahn	e32522ca17	[SLPVectorizer] Do not assume extracelement idx is a ConstantInt. The index of an ExtractElementInst is not guaranteed to be a ConstantInt. It can be any integer value. Check explicitly for ConstantInts. The new test cases illustrate scenarios where we crash without this patch. I've also added another test case to check the matching of extractelement vector ops works. Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D74758	2020-02-18 18:16:06 +01:00
Nikita Popov	ec6c623ff9	[SimplifyLibCalls] Accept IRBuilderBase; NFC	2020-02-18 17:59:07 +01:00
Nikita Popov	28ffe38bba	[LoopUtils] Accept IRBuilderBase; NFC	2020-02-18 17:58:46 +01:00
Nikita Popov	ed6d30b517	[BuildLibCalls] Accept IRBuilderBase; NFC Accept IRBuilderBase instead of IRBuilder<>. Remove dependency on IRBuilder from header.	2020-02-18 17:58:16 +01:00
Nikita Popov	1ab37fad61	[InstCombine] Fix worklist management when simplifying demanded bits When simplifying demanded bits, we currently only report the instruction on which SimplifyDemandedBits was called as changed. However, this is a recursive call, and the actually modified instruction will usually be further up the chain. Additionally, all the intermediate instructions should also be revisited, as additional combines may be possible after the demanded bits simplification. We fix this by explicitly adding them back to the worklist. Differential Revision: https://reviews.llvm.org/D72944	2020-02-18 17:55:40 +01:00
Nikita Popov	c9540fe59b	[InstCombine] Fix multi-use handling in cttz transform The select-of-cttz transform can currently duplicate cttz intrinsics and zext/trunc ops. The cause is that it unnecessarily duplicates the intrinsic and the zext/trunc when setting the "undef_on_zero" flag to false. However, it's always legal to set the flag from true to false, so we can make this replacement even if there are extra users. Differential Revision: https://reviews.llvm.org/D74685	2020-02-18 17:55:00 +01:00
Nikita Popov	9adedd146d	[InstCombine] Relax preconditions for ashr+and+icmp fold (PR44754) Fix for https://bugs.llvm.org/show_bug.cgi?id=44754. We already have a fold that converts icmp (and (ashr X, C3), C2), C1 into icmp (and C2'), C1', but it imposed overly strict requirements on the transform. Relax this by checking that both C2 and C1 don't shift out bits (in a signed sense) when forming the new constants. Alive proofs (https://rise4fun.com/Alive/PTz0): Name: ashr_legal Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) == C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp i16 %b, C1 => %d = and i16 %x, C2 << C3 %c = icmp i16 %d, C1 << C3 Name: ashr_shiftout_eq Pre: ((C2 << C3) >> C3) == C2 && ((C1 << C3) >> C3) != C1 %a = ashr i16 %x, C3 %b = and i16 %a, C2 %c = icmp eq i16 %b, C1 => %c = false Note that >> corresponds to ashr here. The case of an equality comparison has some special handling in this transform, because it will form to a true/false result if the condition on the comparison constant it violated. Differential Revision: https://reviews.llvm.org/D74294	2020-02-18 17:49:46 +01:00
Florian Hahn	9063022573	[InstCombin] Avoid nested Create calls, to guarantee order. The original code allowed creating the != checks in unpredictable order, causing http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/34014 to fail.	2020-02-18 09:44:11 +01:00
Florian Hahn	6c85e92bcf	[InstCombine] Simplify a umul overflow check to a != 0 && b != 0. This patch adds a simplification if an OR weakens the overflow condition for umul.with.overflow by treating any non-zero result as overflow. In that case, we overflow if both umul.with.overflow operands are != 0, as in that case the result can only be 0, iff the multiplication overflows. Code like this is generated by code using __builtin_mul_overflow with negative integer constants, e.g. bool test(unsigned long long v, unsigned long long *res) { return __builtin_mul_overflow(v, -4775807LL, res); } ``` ---------------------------------------- Name: D74141 %res = umul_overflow {i8, i1} %a, %b %mul = extractvalue {i8, i1} %res, 0 %overflow = extractvalue {i8, i1} %res, 1 %cmp = icmp ne %mul, 0 %ret = or i1 %overflow, %cmp ret i1 %ret => %t0 = icmp ne i8 %a, 0 %t1 = icmp ne i8 %b, 0 %ret = and i1 %t0, %t1 ret i1 %ret %res = umul_overflow {i8, i1} %a, %b %mul = extractvalue {i8, i1} %res, 0 %cmp = icmp ne %mul, 0 %overflow = extractvalue {i8, i1} %res, 1 Done: 1 Optimization is correct! ``` Reviewers: nikic, lebedev.ri, spatel, Bigcheese, dexonsmith, aemerson Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74141	2020-02-18 09:11:55 +01:00
Brian Gesiak	11053a1cc6	Revert new pass manager coro-split and coro-elide This reverts https://reviews.llvm.org/rG7125d66f9969605d886b5286780101a45b5bed67 and https://reviews.llvm.org/rG00fec8004aca6588d8d695a2c3827c3754c380a0 due to buildbot failures: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/34004	2020-02-17 23:55:10 -05:00
Brian Gesiak	00fec8004a	[Coroutines][3/6] New pass manager: coro-elide Summary: Depends on https://reviews.llvm.org/D71899. The third in a series of patches that ports the LLVM coroutines passes to the new pass manager infrastructure. This patch implements 'coro-elide'. The new pass manager infrastructure does not implicitly repeat CGSCC pass pipelines when a function is devirtualized, and so the tests for the new pass manager that rely on that behavior now explicitly specify `repeat<2>`. Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei Reviewed By: wenlei Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71900	2020-02-17 23:41:57 -05:00
Brian Gesiak	7125d66f99	[Coroutines][2/6] New pass manager: coro-split Summary: This patch has four dependencies: 1. The first in this series of patches that implement coroutine passes in the new pass manager: https://reviews.llvm.org/D71898. 2. A patch that introduces an API for CGSCC passes to add new reference edges to a `LazyCallGraph`, `updateCGAndAnalysisManagerForCGSCCPass`: https://reviews.llvm.org/D72025. 3. A patch that introduces a `CallGraphUpdater` helper class that is capable of mutating internal `LazyCallGraph` state in order to insert new function nodes into a specific SCC: https://reviews.llvm.org/D70927. 4. And finally, a small edge case fix for updating `LazyCallGraph` that patch 3 above happens to run into: https://reviews.llvm.org/D72226. This is the second in a series of patches that ports the LLVM coroutines passes to the new pass manager infrastructure. This patch implements 'coro-split'. Some notes: * Using the new CGSCC pass manager resulted in IR being printed in the reverse order in some tests. To prevent FileCheck checks from failing due to these reversed orders, this patch splits up test files that test multiple different coroutine functions: specifically coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll. * CoroSplit.cpp contained 2 overloads of `splitCoroutine`, one of which dispatched to the other based on the coroutine ABI being used (C++20 switch-based versus Swift returned-continuation-based). I found this confusing, especially with the additional branching based on `CallGraph` vs. `LazyCallGraph`, so I removed the ABI-checking overload of `splitCoroutine`. Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei Reviewed By: wenlei Subscribers: wenlei, qcolombet, EricWF, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71899	2020-02-17 23:35:27 -05:00
Vedant Kumar	c74026daf3	[HotColdSplit] Mark entire function cold when entry block is cold rdar://58855712	2020-02-17 15:57:50 -08:00
Nicolai Hähnle	58297e4d8f	LowerMatrixIntrinsics: Avoid use of deprecated CreateCall methods Reviewers: t.p.northover Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74675	2020-02-18 00:24:09 +01:00
Tim Northover	464d4cf7e6	Coroutines: avoid use of deprecated CreateLoad and CreateCall methods Summary: Patch originally by Tim Northover Reviewers: t.p.northover Subscribers: EricWF, hiraditya, modocache, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74674	2020-02-18 00:24:09 +01:00
Brian Gesiak	e9849d5195	[Coroutines][1/6] New pass manager: coro-early Summary: The first in a series of patches that ports the LLVM coroutines passes to the new pass manager infrastructure. This patch implements 'coro-early'. NB: All coroutines passes begin by checking that coroutine intrinsics are declared within the LLVM IR module they're operating on. To do so, they call `coro::declaresIntrinsics`. The next 3 patches in this series, which add new pass manager implementations of the 'coro-split', 'coro-elide', and 'coro-cleanup' passes, use a similar pattern as the one used here: a static function is shared across both old and new passes to detect if relevant coroutine intrinsics are delcared. To make this pattern easier to read, this patch adds `const` keywords to the parameters of `coro::declaresIntrinsics`. Reviewers: GorNishanov, lewissbaker, junparser, chandlerc, deadalnix, wenlei Reviewed By: wenlei Subscribers: ychen, wenlei, EricWF, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71898	2020-02-17 13:27:48 -05:00
Nikita Popov	3eaa53e805	Reapply "[IRBuilder] Virtualize IRBuilder" Relative to the original commit, this fixes some warnings, and is based on the deletion of the IRBuilder copy constructor in D74693. The automatic copy constructor would no longer be safe. ----- Related llvm-dev thread: http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html This patch moves the IRBuilder from templating over the constant folder and inserter towards making both of these virtual. There are a couple of motivations for this: 1. It's not possible to share code between use-sites that use different IRBuilder folders/inserters (short of templating the code and moving it into headers). 2. Methods currently defined on IRBuilderBase (which is not templated) do not use the custom inserter, resulting in subtle bugs (e.g. incorrect InstCombine worklist management). It would be possible to move those into the templated IRBuilder, but... 3. The vast majority of the IRBuilder implementation has to live in the header, because it depends on the template arguments. 4. We have many unnecessary dependencies on IRBuilder.h, because it is not easy to forward-declare. (Significant parts of the backend depend on it via TargetLowering.h, for example.) This patch addresses the issue by making the following changes: * IRBuilderDefaultInserter::InsertHelper becomes virtual. IRBuilderBase accepts a reference to it. * IRBuilderFolder is introduced as a virtual base class. It is implemented by ConstantFolder (default), NoFolder and TargetFolder. IRBuilderBase has a reference to this as well. * All the logic is moved from IRBuilder to IRBuilderBase. This means that methods can in the future replace their IRBuilder<> & uses (or other specific IRBuilder types) with IRBuilderBase & and thus be usable with different IRBuilders. * The IRBuilder class is now a thin wrapper around IRBuilderBase. Essentially it only stores the folder and inserter and takes care of constructing the base builder. What this patch doesn't do, but should be simple followups after this change: * Fixing use of the inserter for creation methods originally defined on IRBuilderBase. * Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful. * Moving code from the IRBuilder header to the source file. From the user perspective, these changes should be mostly transparent: The only thing that consumers using a custom inserted may need to do is inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper as public. Differential Revision: https://reviews.llvm.org/D73835	2020-02-17 19:04:11 +01:00
Nikita Popov	80397d2d12	[IRBuilder] Delete copy constructor D73835 will make IRBuilder no longer trivially copyable. This patch deletes the copy constructor in advance, to separate out the breakage. Currently, the IRBuilder copy constructor is usually used by accident, not by intention. In rG7c362b25d7a9 I've fixed a number of cases where functions accepted IRBuilder rather than IRBuilder &, thus performing an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an IRBuilder was copied, while an InsertPointGuard should have been used instead. The only non-trivial use of the copy constructor is the getIRBForDbgInsertion() helper, for which I separated construction and setting of the insertion point in this patch. Differential Revision: https://reviews.llvm.org/D74693	2020-02-17 18:14:48 +01:00
Benjamin Kramer	564a9de28e	Hide implementation details. NFC>	2020-02-17 17:55:23 +01:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
Nikita Popov	5f7b92b1b4	[IRBuilder] Prefer InsertPointGuard over full copy; NFC Don't copy the IRBuilder when an InsertPointGuard would also do.	2020-02-16 18:02:29 +01:00
Nikita Popov	7c362b25d7	[IRBuilder] Fix unnecessary IRBuilder copies; NFC Fix a few cases where an IRBuilder is passed to a helper function by value, while a by reference pass was intended.	2020-02-16 17:57:18 +01:00
Nikita Popov	af480e8c63	Revert "[IRBuilder] Virtualize IRBuilder" This reverts commit `0765d3824d`. This reverts commit `1b04866a3d`. Relevant looking crashes observed on: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win	2020-02-16 17:01:10 +01:00
Sanjay Patel	62dd44d76d	[VectorCombine] fix cost calc for extract-cmp getOperationCost() is not the cost we wanted; that's not the throughput value that the rest of the calculation uses. We may want to switch everything in this code to use the getInstructionThroughput() wrapper to avoid these kinds of problems, but I'll look at that as a follow-up because that can create other logical diffs via using optional parameters (we'd need to speculatively create the vector instruction to make a fair(er) comparison).	2020-02-16 10:40:28 -05:00
Nikita Popov	893c630fbe	[InstCombine] Create new log2 intrinsic; NFCI Rather than mixing creation of new instructions and in-place modification here, create a new log2 intrinsic. This should be NFC apart from worklist order changes.	2020-02-16 15:52:09 +01:00
Nikita Popov	1b04866a3d	[IRBuilder] Try to fix warnings Try to fix -Wnon-virtual-dtor warnings that cause build failure on clang-pcc64le-rhel.	2020-02-16 15:32:11 +01:00
Nikita Popov	0765d3824d	[IRBuilder] Virtualize IRBuilder Related llvm-dev thread: http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html This patch moves the IRBuilder from templating over the constant folder and inserter towards making both of these virtual. There are a couple of motivations for this: 1. It's not possible to share code between use-sites that use different IRBuilder folders/inserters (short of templating the code and moving it into headers). 2. Methods currently defined on IRBuilderBase (which is not templated) do not use the custom inserter, resulting in subtle bugs (e.g. incorrect InstCombine worklist management). It would be possible to move those into the templated IRBuilder, but... 3. The vast majority of the IRBuilder implementation has to live in the header, because it depends on the template arguments. 4. We have many unnecessary dependencies on IRBuilder.h, because it is not easy to forward-declare. (Significant parts of the backend depend on it via TargetLowering.h, for example.) This patch addresses the issue by making the following changes: * IRBuilderDefaultInserter::InsertHelper becomes virtual. IRBuilderBase accepts a reference to it. * IRBuilderFolder is introduced as a virtual base class. It is implemented by ConstantFolder (default), NoFolder and TargetFolder. IRBuilderBase has a reference to this as well. * All the logic is moved from IRBuilder to IRBuilderBase. This means that methods can in the future replace their IRBuilder<> & uses (or other specific IRBuilder types) with IRBuilderBase & and thus be usable with different IRBuilders. * The IRBuilder class is now a thin wrapper around IRBuilderBase. Essentially it only stores the folder and inserter and takes care of constructing the base builder. What this patch doesn't do, but should be simple followups after this change: * Fixing use of the inserter for creation methods originally defined on IRBuilderBase. * Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful. * Moving code from the IRBuilder header to the source file. From the user perspective, these changes should be mostly transparent: The only thing that consumers using a custom inserted may need to do is inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper as public. Differential Revision: https://reviews.llvm.org/D73835	2020-02-16 13:48:55 +01:00
Johannes Doerfert	1d5da8cd30	[Attributor][FIX] Use pointer not reference as it can be null	2020-02-15 20:38:49 -06:00
Florian Hahn	f8045b250d	Recommit "[SCCP] Remove forcedconstant, go to overdefined instead" This includes a fix for cases where things get marked as overdefined in ResolvedUndefsIn, but we later discover a constant. To avoid crashing, we consistently bail out on overdefined values in the visitors. This is similar to the previous behavior with forcedconstant. This reverts the revert commit `02b72f564c`.	2020-02-15 18:36:44 +01:00
Simon Pilgrim	8a48c4a97c	Fix boolean/bitwise operator precedence warnings. NFCI.	2020-02-15 13:53:18 +00:00
Johannes Doerfert	ef746aa11f	[Attributor] Collect memory accesses with their respective kind and location In addition to a single bit per memory locations, e.g., globals and arguments, we now collect more information about the actual accesses, e.g., what instruction caused it, was it a read/write/read+write, and what the underlying base pointer was. Follow up patches will make explicit use of this. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D73527	2020-02-15 02:12:04 -06:00
Fangrui Song	fd5665af2c	[Attributor] Fix -Wunused-variable for -DLLVM_ENABLE_ASSERTIONS=off builds after `b4352e43d8`	2020-02-14 21:47:19 -08:00
Johannes Doerfert	b70297a39a	[Attributor][FIX] Ensure abstract attributes are existing before manifest While the function return updateImpl did only look at call sites the manifest method looked at return values. If we don't do this during the updateImpl we might create new abstract attributes during manifest. This is a problem when it comes to liveness information.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	ad121ea14d	[Attributor] Manifest simplified (return) values properly If we simplify a function return value we have to modify the return instructions.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	b53af0e7f9	[Attributor][FIX] Collapse `undef` to a proper value If we see an undef we cannot assume it's the same as "no value". For now we just collapse it to 0.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	137c99a6a5	[Attributor][FIX] Restrict cross-SCC call deletion If we know a call was not needed we might have ended up deleting it even if it was in a different SCC. This prevents us from doing so.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	32e98a7089	[Attributor][FIX] Carefully strip casts in AANoAlias We can strip casts in AANoAlias but that might cause us to end up with a non-pointer type. We do properly handle that case now.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	b4352e43d8	[Attributor][FIX] Do not RAUW void values This caused an error when passes iterated over cached assumptions in the tracker and assumed them to be `null` or an instruction. I failed to create a test case so far.	2020-02-14 21:44:46 -06:00
Johannes Doerfert	282f5d7ad1	[Attributor] Derive memory location attributes (argmemonly, ...) In addition to memory behavior attributes (readonly/writeonly) we now derive memory location attributes (argmemonly/inaccessiblememonly/...). The former is part of AAMemoryBehavior and the latter part of AAMemoryLocation. While they are similar in nature it got messy when they were put in a single AA. Location attributes for arguments and floating values will follow later. Note that both memory attributes kinds can derive readnone. If there are no accesses AAMemoryBehavior will derive readnone. If there are accesses but only to stack (=local) locations AAMemoryLocation will derive readnone. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D73426	2020-02-14 19:05:51 -06:00
Johannes Doerfert	7cbb107feb	[Attributor][FIX] Validate the type for AAValueConstantRange as needed Due to the genericValueTraversal we might visit values for which we did not create an AAValueConstantRange object, e.g., as they are behind a PHI or select or call with `returned` argument. As a consequence we need to validate the types as we are about to query AAValueConstantRange for operands.	2020-02-14 17:22:40 -06:00
Alina Sbirlea	1326a5a4cf	[LoopRotate] Get and update MSSA only if available in legacy pass manager. Summary: Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408 In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes. There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available. The side-effect of this is that it will split the Loop pipeline. This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA. Reviewers: dmgreen, fedor.sergeev, nikic Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74574	2020-02-14 10:47:26 -08:00
Kadir Cetinkaya	1674f772b4	[VecotrCombine] Fix unused variable for assertion disabled builds	2020-02-14 09:30:29 +01:00
Vedant Kumar	8e77b33b3c	[Local] Do not move around dbg.declares during replaceDbgDeclare replaceDbgDeclare is used to update the descriptions of stack variables when they are moved (e.g. by ASan or SafeStack). A side effect of replaceDbgDeclare is that it moves dbg.declares around in the instruction stream (typically by hoisting them into the entry block). This behavior was introduced in llvm/r227544 to fix an assertion failure (llvm.org/PR22386), but no longer appears to be necessary. Hoisting a dbg.declare generally does not create problems. Usually, dbg.declare either describes an argument or an alloca in the entry block, and backends have special handling to emit locations for these. In optimized builds, LowerDbgDeclare places dbg.values in the right spots regardless of where the dbg.declare is. And no one uses replaceDbgDeclare to handle things like VLAs. However, there doesn't seem to be a positive case for moving dbg.declares around anymore, and this reordering can get in the way of understanding other bugs. I propose getting rid of it. Testing: stage2 RelWithDebInfo sanitized build, check-llvm rdar://59397340 Differential Revision: https://reviews.llvm.org/D74517	2020-02-13 14:35:02 -08:00
Sanjay Patel	19b62b79db	[VectorCombine] try to form vector binop to eliminate an extract element binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider. We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks). Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now. Differential Revision: https://reviews.llvm.org/D74495	2020-02-13 17:23:27 -05:00
Vedant Kumar	02b72f564c	Revert "Recommit "[SCCP] Remove forcedconstant, go to overdefined instead"" This reverts commit `bb310b3f73`. This breaks the stage2 ASan build, see: https://bugs.llvm.org/show_bug.cgi?id=44898 rdar://59431448	2020-02-13 11:55:18 -08:00
stozer	9bda7ab835	Re-revert: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `61b35e4111`. This commit causes a timeout in chromium builds; likely to have a similar cause to the previous timeout issue caused by this commit (see `6ded69f294` for more details). It is possible that there is no way to fix this bug that will not cause this issue; further investigations as to the efficiency of handling large amounts of debug info will be necessary.	2020-02-13 11:48:19 +00:00
Johannes Doerfert	70cac41a2b	Reapply "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late" Reapply `8a56d64d76` with minor fixes. The problem was that cancellation can cause new edges to the parallel region exit block which is not outlined. The CodeExtractor will encode the information which "exit" was taken as a return value. The fix is to ensure we do not return any value from the outlined function, to prevent control to value conversion we ensure a single exit block for the outlined region. This reverts commit `3aac953afa`.	2020-02-12 22:29:07 -06:00
Johannes Doerfert	3aac953afa	Revert "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late" This reverts commit `8a56d64d76`. Will be recommitted once the clang test problem is addressed.	2020-02-12 18:50:43 -06:00
Johannes Doerfert	8a56d64d76	[OpenMP][IRBuilder] Perform finalization (incl. outlining) late In order to fix PR44560 and to prepare for loop transformations we now finalize a function late, which will also do the outlining late. The logic is as before but the actual outlining step happens now after the function was fully constructed. Once we have loop transformations we can apply them in the finalize step before the outlining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74372	2020-02-12 17:55:01 -06:00
Johannes Doerfert	23f41f16d4	[Attributor] Use fine-grained liveness in all helpers We used coarse-grained liveness before, thus we looked if the instruction was executed, but we did not use fine-grained liveness, hence if the instruction was needed or could be deleted even if the surrounding ones are live. This patches introduces this level of liveness checks together with other liveness queries, e.g., for uses. For more control we enforce that all liveness queries go through the Attributor. Test have been adjusted to reflect the changes or augmented to prevent deletion of the parts we want to check. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D73313	2020-02-12 17:36:38 -06:00
Johannes Doerfert	b2c76002ca	[Attributor] Ignore uses if a value is simplified If we have a replacement for a value, via AAValueSimplify, the original value will lose all its uses. Thus, as long as a value is simplified we can skip the uses in checkForAllUses, given that these uses are transitive uses for the simplified version and will therefore affect the simplified version as necessary. Since this allowed us to remove calls without side-effects and a known return value, we need to make sure not to eliminate `musttail` calls. Those we keep around, or later remove the entire `musttail` call chain.	2020-02-12 17:36:38 -06:00
Johannes Doerfert	86509e8c3b	[Attributor] Use assumed information to determine side-effects We relied on wouldInstructionBeTriviallyDead before but that functions does not take assumed information, especially for calls, into account. The replacement, AAIsDead::isAssumeSideEffectFree, does. This change makes AAIsDeadCallSiteReturn more complex as we can have a dead call or only dead users. The test have been modified to include a side effect where there was none in order to keep the coverage. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D73311	2020-02-12 17:36:38 -06:00
Ehud Katz	d8a2ea9fd5	[LoopExtractor] Fix legacy pass dependencies Fixes a memory leak of allocating `LoopInfoWrapperPass` and `DominatorTreeWrapperPass`.	2020-02-12 22:39:21 +02:00
Vedant Kumar	34d9f93977	[AddressSanitizer] Ensure only AllocaInst is passed to dbg.declare Various parts of the LLVM code generator assume that the address argument of a dbg.declare is not a `ptrtoint`-of-alloca. ASan breaks this assumption, and this results in local variables sometimes being unavailable at -O0. GlobalISel, SelectionDAG, and FastISel all do not appear to expect dbg.declares to have a `ptrtoint` as an operand. This means that they do not place entry block allocas in the usual side table reserved for local variables available in the whole function scope. This isn't always a problem, as LLVM can try to lower the dbg.declare to a DBG_VALUE, but those DBG_VALUEs can get dropped for all the usual reasons DBG_VALUEs get dropped. In the ObjC test case I'm looking at, the cause happens to be that `replaceDbgDeclare` has hoisted dbg.declares into the entry block, causing LiveDebugValues to "kill" the DBG_VALUEs because the lexical dominance check fails. To address this, I propose: 1) Have ASan (always) pass an alloca to dbg.declares (this patch). This is a narrow bugfix for -O0 debugging. 2) Make replaceDbgDeclare not move dbg.declares around. This should be a generic improvement for optimized debug info, as it would prevent the lexical dominance check in LiveDebugValues from killing as many variables. This means reverting llvm/r227544, which fixed an assertion failure (llvm.org/PR22386) but no longer seems to be necessary. I was able to complete a stage2 build with the revert in place. rdar://54688991 Differential Revision: https://reviews.llvm.org/D74369	2020-02-12 11:24:02 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Florian Hahn	bb310b3f73	Recommit "[SCCP] Remove forcedconstant, go to overdefined instead" This version includes a fix for a set of crashes caused by marking values depending on a yet unknown & tracked call as overdefined. In some cases, we would later discover that the call has a constant result and try to mark a user of it as constant, although it was already marked as overdefined. Most instruction handlers bail out early if the instruction is already overdefined. But that is not necessary for CastInsts for example. By skipping values that depend on skipped calls, we resolve the crashes and also improve the precision in some cases (see resolvedundefsin-tracked-fn.ll). Note that we may not skip PHI nodes that may depend on a skipped call, but they can be safely marked as overdefined, as we bail out early if the PHI node is overdefined. This reverts the revert commit a74b31a3e9cd844c7ce2087978568e3f5ec8519.	2020-02-12 18:02:18 +00:00
Anh Tuyen Tran	a5b6480d05	[NFC] Remove extra headers included in Loop Unroll and LoopUnrollAndJam files Summary: This refactor patch removes some header files which are not needed and also add some to meet IWYU principles. Reviewers: rnk (Reid Kleckner), Meinersbur (Michael Kruse), dmgreen (Dave Green) Reviewed By: dmgreen (Dave Green), rnk (Reid Kleckner), Meinersbur (Michael Kruse) Subscribers: dmgreen (Dave Green), Whitney (Whitney Tsang), hiraditya (Aditya Kumar), zzheng (Z. Zheng), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D73498	2020-02-12 17:57:56 +00:00
Alina Sbirlea	4f33a68973	Compute ORE, BPI, BFI in Loop passes. Summary: Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it is incorrect to retrieve these passes as cached. This patch makes the loop passes in question compute a new instance. In some of these cases, however, it may be beneficial to change the Loop pass to a Function pass instead, similar to the change for LoopUnrollAndJam. Reviewers: chandlerc, dmgreen, jdoerfert, reames Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72891	2020-02-12 09:15:18 -08:00
Sven van Haastregt	665dcdacc0	Add missing newlines at EOF; NFC	2020-02-12 15:57:25 +00:00
stozer	61b35e4111	Re-reapply: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `636c93ed11`. The original patch caused build failures on TSan buildbots. Commit `6ded69f294` fixes this issue by reducing the rate at which empty debug intrinsics propagate, reducing the memory footprint and preventing a fatal spike.	2020-02-12 14:36:30 +00:00
Florian Hahn	81dbb6aec6	Recommit "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)." This includes a fix for the santizier failures. This reverts the revert commit `42f8b915eb`.	2020-02-12 14:17:50 +00:00
Ayman Musa	35f02aa021	Revert "[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand." This reverts commit `cf155150f9`.	2020-02-12 15:04:49 +02:00
Ayman Musa	cf155150f9	[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.	2020-02-12 15:01:27 +02:00
stozer	ffeb64db35	Reapply "[DebugInfo] Prevent explosion of debug intrinsics during jump threading" This reverts commit `6ded69f294`.	2020-02-12 12:39:54 +00:00
Ayman Musa	3bda9059b8	[AggressiveInstCombine] Add support for select instruction. Differential Revision: https://reviews.llvm.org/D72837	2020-02-12 13:59:34 +02:00
stozer	6ded69f294	Revert "[DebugInfo] Prevent explosion of debug intrinsics during jump threading" This reverts commit `fe6f6cd6b8`. Found test failure on several buildbots.	2020-02-12 11:48:00 +00:00
Ayman Musa	49a4d85f6d	[NFC][AggressiveInstCombine] Remove redundant std::max. Differential Revision: https://reviews.llvm.org/D74476	2020-02-12 13:47:40 +02:00
stozer	fe6f6cd6b8	[DebugInfo] Prevent explosion of debug intrinsics during jump threading This patch is a fix following the revert of `72ce759` (https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba) and fixes the failure that it caused. The above patch failed on the Thread Sanitizer buildbot with an out of memory error. After an investigation, the cause was identified as an explosion in debug intrinsics while running the Jump Threading pass on ModuleMap.ll. The above patched prevented debug intrinsics from being dropped when their Basic Block was deleted due to being "empty". In this case, one of the functions in ModuleMap.ll had (after many optimization passes) a very large number of debug intrinsics representing a set of repeatedly inlined variables. Previously the vast majority of these were silently dropped during Jump Threading when their blocks were deleted, but as of the above patch they survived for longer, causing a large increase in the number of debug intrinsics. These intrinsics were then repeatedly cloned by the Jump Threading pass as edges were threaded, multiplying the intrinsic count further. The memory consumed by this process spiralled out of control, crashing the buildbot that uses TSan (which has an estimated 5-10x memory overhead compared to non-sanitized builds). This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in order to reduce the number of debug intrinsics down to a manageable amount in cases where many intrinsics for the same variable end up bunched together contiguously, as in this case. Differential Revision: https://reviews.llvm.org/D73054	2020-02-12 11:22:54 +00:00
Florian Hahn	fa74b31a3e	Revert "[SCCP] Remove forcedconstant, go to overdefined instead" This causes a crash for the reproducer below enum { a }; enum b { c, d }; e; static _Bool g(struct f *h, enum b i) { i &&j(); return a; } static k(char h, enum b i) { _Bool l = g(e, i); l; } m(h) { k(h, c); g(h, d); } This reverts commit `aadb635e04`.	2020-02-12 09:41:19 +00:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Johannes Doerfert	52aec3221f	[Attributor][NFC] Clarify the documentation a bit more	2020-02-11 15:11:55 -06:00
Johannes Doerfert	8e62968d45	[Attributor] Identify dead uses in PHIs (almost) based on dead edges As an approximation to a dead edge we can check if the terminator is dead. If so, the corresponding operand use in a PHI node is dead even if the PHI node itself is not.	2020-02-11 15:11:55 -06:00
Teresa Johnson	80d0a137a5	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `748bb5a0f1`, along with a fix for a Chromium test suite build issue (and a new test for that case). Differential Revision: https://reviews.llvm.org/D73242	2020-02-11 10:48:05 -08:00
Johannes Doerfert	185e9b083e	[Attributor][NFC] Improve documentation	2020-02-11 11:19:34 -06:00
Johannes Doerfert	f95553923f	[Attributor] Return uses do not free pointers If a pointer is returned that does not mean it is freed in the current (function) scope. We can ignore such uses in AANoFree.	2020-02-11 11:02:59 -06:00
Johannes Doerfert	4c62a35860	[Attributor][FIX] Remove duplicate, half-broken functionality The changeXXXAfterManifest functions are better suited to deal with changes so we should prefer them. These functions also recursively delete dead instructions which is why we see test changes.	2020-02-11 11:02:59 -06:00
Johannes Doerfert	77a9e61c9a	[Attributor][NFC] Improve debug message	2020-02-11 11:02:59 -06:00
Nikita Popov	5a8819b216	[InstCombine] Use replaceOperand() in more places This is a followup to D73803, which uses the replaceOperand() helper in more places. This should be NFC apart from changes to worklist order. Differential Revision: https://reviews.llvm.org/D73919	2020-02-11 17:38:23 +01:00
Florian Hahn	aadb635e04	[SCCP] Remove forcedconstant, go to overdefined instead This patch removes forcedconstant to simplify things for the move to ValueLattice, which includes constant ranges, but no forced constants. This patch removes forcedconstant and changes ResolvedUndefsIn to mark instructions with unknown operands as overdefined. This means we do not do simplifications based on undef directly in SCCP any longer, but this seems to hardly come up in practice (see stats below), presumably because InstCombine & others take care of most of the relevant folds already. It is still beneficial to keep ResolvedUndefIn, as it allows us delaying going to overdefined until we propagated all known information. I also built MultiSource, SPEC2000 and SPEC2006 and compared sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact is quite low: Tests: 244 Same hash: 238 (filtered out) Remaining: 6 Metric: sccp.IPNumInstRemoved Program base patch diff test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0% test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5% test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9% test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0% test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0% test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0% Tests: 244 Same hash: 238 (filtered out) Remaining: 6 Metric: sccp.NumInstRemoved Program base patch diff test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4% test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1% test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0% test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0% test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0% Reviewers: davide, efriedma, mssimpso Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61314	2020-02-11 15:24:15 +00:00
Kadir Cetinkaya	42f8b915eb	Revert "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)." This reverts commit `d0c4d4fe09`. Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll." This reverts commit `02266e64bb`. Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE." This reverts commit `74f03e4ff0`.	2020-02-11 15:34:48 +01:00
Sanjay Patel	a2a0f9a43a	[VectorCombine] remove unused debug counter; NFC The variable was added to the initial commit via copy/paste of existing code, but it wasn't actually used in the code. We can add it back with the proper usage if/when that is needed.	2020-02-11 08:24:07 -05:00
Sanjay Patel	b8ebc11f03	[EarlyCSE] avoid crashing when detecting min/max/abs patterns (PR41083) As discussed in PR41083: https://bugs.llvm.org/show_bug.cgi?id=41083 ...we can assert/crash in EarlyCSE using the current hashing scheme and instructions with flags. ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or other flags when detecting patterns such as min/max/abs composed of compare+select. But the value numbering / hashing mechanism used by EarlyCSE intersects those flags to allow more CSE. Several alternatives to solve this are discussed in the bug report. This patch avoids the issue by doing simple matching of min/max/abs patterns that never requires instruction flags. We give up some CSE power because of that, but that is not expected to result in much actual performance difference because InstCombine will canonicalize these patterns when possible. It even has this comment for abs/nabs: /// Canonicalize all these variants to 1 pattern. /// This makes CSE more likely. (And this patch adds PhaseOrdering tests to verify that the expected transforms are still happening in the standard optimization pipelines. I left this code to use ValueTracking's "flavor" enum values, so we don't have to change the callers' code. If we decide to go back to using the ValueTracking call (by changing the hashing algorithm instead), it should be obvious how to replace this chunk. Differential Revision: https://reviews.llvm.org/D74285	2020-02-10 17:25:34 -05:00
Hiroshi Yamauchi	bb383ae612	[CallPromotionUtils] Add tryPromoteCall. Summary: It attempts to devirtualize a call on alloca through vtable loads. Reviewers: davidxl Subscribers: mgorny, Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71308	2020-02-10 13:43:16 -08:00
Sanjay Patel	62ce7e650a	[InstCombine] fix use check when canonicalizing abs/nabs We were checking for extra uses of the negated operand even if we were not going to create it as part of this canonicalization. This was showing up as a regression when we limit EarlyCSE as proposed in D74285.	2020-02-10 14:57:37 -05:00
David Stenberg	982944525c	Revert "[InstCombine][DebugInfo] Fold constants wrapped in metadata" This reverts commit `b54a8ec1bc`. The commit triggered debug invariance (different output with/without -g). The patch seems to have exposed a pre-existing invariance problem in GlobalOpt, which I'll write a bug report for.	2020-02-10 17:58:33 +01:00
Bill Wendling	c55cf4afa9	Revert "Remove redundant "std::move"s in return statements" The build failed with error: call to deleted constructor of 'llvm::Error' errors. This reverts commit `1c2241a793`.	2020-02-10 07:07:40 -08:00
Bill Wendling	1c2241a793	Remove redundant "std::move"s in return statements	2020-02-10 06:39:44 -08:00
Mikael Holmen	a50c0b0df7	Fix compiler warning when compiling without asserts [NFC]	2020-02-10 13:55:52 +01:00
Florian Hahn	d0c4d4fe09	[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk). This patch adds a first version of a MemorySSA based DSE. It is missing a lot of features, which will get added as follow-ups, to help to keep the review manageable. The patch uses the following general approach: given a MemoryDef, walk upwards to find clobbering MemoryDefs that may be killed by the starting def. Then check that there are no uses that may read the location of the original MemoryDef in between both MemoryDefs. A bit more concretely: For all MemoryDefs StartDef: 1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards. 2. Check that there no reads between DomAccess and the StartDef by checking all uses starting at DomAccess and walking until we see StartDef. 3. For each found DomDef, check that: 1. There are no barrier instructions between DomDef and StartDef (like throws or stores with ordering constraints). 2. StartDef is executed whenever DomDef is executed. 3. StartDef completely overwrites DomDef. 4. Erase DomDef from the function and MemorySSA. The patch uses a very simple approach to guarantee that no throwing instructions are between 2 stores: We only allow accesses to stack objects, access that are in the same basic block if the block does not contain any throwing instructions or accesses in functions that do not contain any throwing instructions. This will get lifted later. Besides adding support for the missing cases, there is plenty of additional potential for improvements as follow-up work, e.g. the way we visit stores (could be just a traversal of the MemorySSA, rather than collecting them up-front), using the alias information discovered during walking to optimize the MemorySSA. This is loosely based on D40480 by Dave Green. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72700	2020-02-10 11:52:11 +00:00
Florian Hahn	da52b9c118	[DSE] Add tests for MemorySSA based DSE. This copies the DSE tests into a MSSA subdirectory to test the MemorySSA backed DSE implementation, without disturbing the original tests. Differential Revision: https://reviews.llvm.org/D72145	2020-02-10 10:28:43 +00:00
Johannes Doerfert	87ddf1f4fa	[Attributor] Simple casts preserve no-alias property This is a minimal but important advancement over the existing code. A cast with an operand that is only used in the cast retains the no-alias property of the operand.	2020-02-10 01:11:32 -06:00
Johannes Doerfert	8155439331	[Attributor] Allow PHI nodes in AAValueConstantRangeFloating Traversing PHI nodes is natural with the genericValueTraversal but also a bit tricky. The problem is similar to the ones we have seen in AAAlign and AADereferenceable, namely that we continue to increase the range in each iteration. We use a pessimistic approach here to stop the iterations. Nevertheless, optimistic information can now be propagated through a PHI node.	2020-02-10 00:55:10 -06:00
Johannes Doerfert	63adbb9a0e	[Attributor][FIX] Remove FIXME that seems outdated The change is performed as stated by the FIXME and the tests are adjusted. All changes look fine to me and values can be inferred as undef without it being an error.	2020-02-10 00:55:10 -06:00
Johannes Doerfert	7e7e6594b3	[Attributor] Allow SelectInst in AAValueConstantRangeFloating The genericValueTraversal will already handle SelectInst properly and we just needed to allow them in the initialize method.	2020-02-10 00:55:09 -06:00
Johannes Doerfert	ffdbd2a06c	[Attributor] Look through (some) casts in AAValueConstantRangeFloating Casts can be handled natively by the ConstantRange class. We do limit it to extends for now as we assume an integer type in different locations. A TODO and a test case with a FIXME was added to remove that restriction in the future.	2020-02-10 00:38:01 -06:00
Johannes Doerfert	028db8c490	[Attributor][FIX] Call right base method in AAValueConstantRangeFloating We now call the base class method as we should.	2020-02-10 00:38:01 -06:00
Michael Liao	ab3da5dd66	Fix `-Wparentheses` warning. NFC.	2020-02-10 00:45:02 -05:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00
Ehud Katz	3b70ee27a5	[LoopExtractor] Convert LoopExtractor from LoopPass to ModulePass The LoopExtractor created new functions (by definition), which violates the restrictions of a LoopPass. The correct implementation of this pass should be as a ModulePass. Includes reverting rL82990 implications on the LoopExtractor. Fixes PR3082 and PR8929. Differential Revision: https://reviews.llvm.org/D69069	2020-02-09 12:25:21 +02:00
Johannes Doerfert	b0c77c36d2	[Attributor] Add an Attributor CGSCC pass and run it In addition to the module pass, this patch introduces a CGSCC pass that runs the Attributor on a strongly connected component of the call graph (both old and new PM). The Attributor was always design to be used on a subset of functions which makes this patch mostly mechanical. The one change is that we give up `norecurse` deduction in the module pass in favor of doing it during the CGSCC pass. This makes the interfaces simpler but can be revisited if needed. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D70767	2020-02-08 21:27:34 -06:00
Johannes Doerfert	e565db49c6	[OpenMP][Opt] Delete terminating and read-only parallel regions Parallel regions known to be read-only, e.g., after we removed all dead write accesses, and terminating (`willreturn`) can be removed. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69954	2020-02-08 18:52:04 -06:00
Johannes Doerfert	e28936f613	[OpenMP][Opt] Annotate known runtime functions and deduplicate more This adds ~27 more runtime calls to the OpenMPKinds.def file, all with attributes. We deduplicate 16 of those automatically in function = thread scope. And we annotate all of them automatically during the OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to track annotation coverage is included. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69984	2020-02-08 18:35:39 -06:00
Nikita Popov	a05932931c	[InstCombine] Refactor foldICmpAndShift(); NFCI Separate out handling for shl, lshr and ashr. The combined handling obscured some overly pessimistic requirements for the transform.	2020-02-08 22:27:43 +01:00
Johannes Doerfert	9548b74a83	[OpenMP] Introduce the OpenMPOpt transformation pass The OpenMPOpt pass is a CGSCC pass in which OpenMP specific optimizations can reside. The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime calls and their uses. This allows targeted transformations and eases their implementation. This initial patch deduplicates `__kmpc_global_thread_num` and `omp_get_thread_num` calls. We can also identify arguments that are equivalent to such a call result and use it instead. Later we can determine "gtid" arguments based on the use in kernel functions etc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69930	2020-02-08 14:47:03 -06:00
Johannes Doerfert	72277ecd62	Introduce a CallGraph updater helper class The CallGraphUpdater is a helper that simplifies the process of updating the call graph, both old and new style, while running an CGSCC pass. The uses are contained in different commits, e.g. D70767. More functionality is added as we need it. Reviewed By: modocache, hfinkel Differential Revision: https://reviews.llvm.org/D70927	2020-02-08 14:16:48 -06:00
George Burgess IV	f8c9ceb1ce	[SimplifyLibCalls] Add __strlen_chk. Bionic has had `__strlen_chk` for a while. Optimizing that into a constant is quite profitable, when possible. Differential Revision: https://reviews.llvm.org/D74079	2020-02-08 11:51:00 -08:00
Nikita Popov	a148b9e990	[InstCombine] Fix infinite min/max canonicalization loop (PR44541) While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541, it does so in a more roundabout manner and there might be other loopholes to trigger the same issue. This is a more direct fix, that prevents the transform if the min/max is based on a non-canonical sub X, 0 instruction. Differential Revision: https://reviews.llvm.org/D73849	2020-02-08 20:42:17 +01:00
Nikita Popov	5b2b67be8e	[InstCombine] Remove unnecessary worklist push; NFCI This is no longer needed after `d4627b90a0`, should have dropped it there...	2020-02-08 17:09:28 +01:00
Nikita Popov	d4627b90a0	[InstCombine] Avoid modifying instructions in-place As discussed on D73919, this replaces a few cases where we were modifying multiple operands of instructions in-place with the creation of a new instruction, which we generally prefer nowadays. This tends to be more readable and less prone to worklist management bugs. Test changes are only superficial (instruction naming and order).	2020-02-08 17:05:56 +01:00
Nikita Popov	9d03b7d0d0	[InstCombine] Use swapValues(); NFC Less code, and makes it more obvious that these operands do not need to be added back to the worklist.	2020-02-08 16:57:28 +01:00
Nikita Popov	23db9724d0	[InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835) Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform if it wouldn't actually do anything (apart from removing and reinserting the same instructions). Note that the test case doesn't loop on current master anymore, only on the LLVM 10 release branch. The issue is already mitigated on master due to worklist order fixes, but we should fix the root cause there as well. As a side note, we should probably assert in combineLoadToNewType() that it does not combine to the same type. Not doing this here, because this assertion would also be triggered in another place right now. Differential Revision: https://reviews.llvm.org/D74278	2020-02-08 16:55:22 +01:00
Akira Hatanaka	4dcc029edb	[ObjC][ARC] Keep track of phis that have been discovered to avoid an infinite loop This fixes a bug introduced in `6770fbb314`. rdar://problem/59137105	2020-02-07 20:33:11 -08:00
Akira Hatanaka	6770fbb314	[ObjC][ARC] Delete ARC runtime calls that take inert phi values This improves on the following patch, which removed ARC runtime calls taking inert global variables: https://reviews.llvm.org/D62433 rdar://problem/59137105	2020-02-07 16:31:36 -08:00
Hiroshi Yamauchi	4ed205c816	[PGO][PGSO] Enable profile guided size optimization for non-cold code under instrumentation PGO. Summary: This enables it for large working set size cases only. This does not enable it under sample PGO. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74073	2020-02-06 10:29:01 -08:00
Denis Antrushin	99a6e405ed	[IRCE] Use SCEVExpander to modify loop bound IRCE pass checks that it can calculate loop bounds by checking SCEV availability at loop entry. However it is possible that loop bound SCEV is loop invariant, but instruction used to compute it resides within loop. In such case adjusting loop bound in preheader using IRBuilder leads to malformed SSA. Use SCEVExpander instead to generate proper instructions. Reviewed-by: mkazantsev Differential Revision: https://reviews.llvm.org/D73496	2020-02-06 12:44:43 +03:00
Teresa Johnson	25aa2eef99	Revert "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This reverts commit `748bb5a0f1`. Due to Chromium CFI+ThinLTO test crashes reported on patch.	2020-02-05 19:27:32 -08:00
Juneyoung Lee	5687acf431	[MemCpyOpt] Simplify find*Alignment	2020-02-06 06:42:07 +09:00
Juneyoung Lee	ad9ae6ee2b	MemCpyOpt cannot use ABI alignment even if it was not given Summary: This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44388 which incorrectly assigns an ABI alignment to memset when there was no explicit alignment given. Reviewers: gchatelet, lenary, nikic Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74083	2020-02-06 06:21:55 +09:00
Hiroshi Yamauchi	b70f23f599	[PGO][PGSO] Tune flags for profile guided size optimization. Summary: Tune the profile threshold flag value for instrumentation PGO based on internal benchmarks. Also, add flags to allow profile guided size optimizations for non-cold code to be enabled separately for instrumentation and sample PGSO. Neither changes the default behavior (yet) as it's disabled for non-cold code. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72937	2020-02-05 09:37:32 -08:00
Kazu Hirata	4698bf145d	Resubmit^2: [JumpThreading] Thread jumps through two basic blocks This reverts commit `41784bed01`. Since the original revision `ead815924e`, this revision fixes three issues: - This revision fixes the Windows build. My original patch improperly copied EH pads on Windows. This patch disregards jump threading opportunities having to do with EH pads. - This revision fixes jump threading to a wrong destination. Specifically, my original patch treated any Constant other than 0 as 1 while evaluating the branch condition. This bug led to treating constant expressions like: icmp ugt i8* null, inttoptr (i64 4 to i8) to "true". This patch fixes the bug by calling isOneValue. - This revision fixes the cost calculation of two basic blocks being threaded through. Note that getJumpThreadDuplicationCost returns "(unsigned)~0" for those basic blocks that cannot be duplicated. If we sum of two return values from getJumpThreadDuplicationCost, we could have an unsigned overflow like: (unsigned)~0 + 5 = 4 and mistakenly determine that it's safe and profitable to proceed with the jump threading opportunity. The patch fixes the bug by checking each return value before summing them up. [JumpThreading] Thread jumps through two basic blocks Summary: This patch teaches JumpThreading.cpp to thread through two basic blocks like: bb3: %var = phi i32 [ null, %bb1 ], [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 by duplicating basic blocks like bb3 above. Once we duplicate bb3 as bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have: bb3: %var = phi i32* [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb3.dup: %var = phi i32* [ null, %bb1 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 Then the existing code in JumpThreading.cpp can thread edge bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70247	2020-02-05 09:23:37 -08:00
Alina Sbirlea	67904db23c	[IRCE] Make IRCE a Function pass. Summary: Make InductiveRangeCheckElimination a FunctionPass. Reviewers: reames, mkazantsev Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73592	2020-02-05 09:22:41 -08:00
Teresa Johnson	748bb5a0f1	[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP Summary: Currently type test assume sequences inserted for devirtualization are removed during WPD. This patch delays their removal until later in the optimization pipeline. This is an enabler for upcoming enhancements to indirect call promotion, for example streamlined promotion guard sequences that compare against vtable address instead of the target function, when there are small number of possible vtables (either determined via WPD or by in-progress type profiling). We need the type tests to correlate the callsites with the address point offset needed in the compare sequence, and optionally to associated type summary info computed during WPD. This depends on work in D71913 to enable invocation of LowerTypeTests to drop type test assume sequences, which will now be invoked following ICP in the ThinLTO post-LTO link pipelines, and also after the existing export phase LowerTypeTests invocation in regular LTO (which is already after ICP). We cannot simply move the existing import phase LowerTypeTests pass later in the ThinLTO post link pipelines, as the comment in PassBuilder.cpp notes (it must run early because when performing CFI other passes may disturb the sequences it looks for). This necessitated adding a new type test resolution "Unknown" that we can use on the type test assume sequences previously removed by WPD, that we now want LTT to ignore. Depends on D71913. Reviewers: pcc, evgeny777 Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73242	2020-02-05 08:59:48 -08:00
Alina Sbirlea	1c03cc5a39	[NFCI] Update according to style. clang-tidy + clang-format	2020-02-04 17:11:36 -08:00
Sanjay Patel	dc42ff6697	[InstCombine] add FIXME comment to shuffle transform; NFC Existing tests: rG5d04e008f708 rG2a191cf8500f ...should verify that the underlying analysis doesn't improve too much without updating this user code.	2020-02-04 13:02:06 -05:00
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Sanjay Patel	0cf0be993c	[InstCombine] fix operands of shouldChangeType() for casted phi transform This is a bug noted in the recent D72733 and seen in the similar transform just above the changed source code. I added tests with illegal types and zexts to show the bug - we could transform legal phi ops to illegal, etc. I did not add tests with trunc because we won't see any diffs on those patterns. That is because InstCombiner::SliceUpIllegalIntegerPHI() appears to do those transforms independently of datalayout. It can also create more casts than are present in existing code. There are some existing regression tests that do not include a datalayout that would be altered by this fix. I assumed that the lack of a datalayout in those regression files is an oversight, so I added the minimal layout (make i32 legal) necessary to preserve behavior on those tests. Differential Revision: https://reviews.llvm.org/D73907	2020-02-04 07:45:48 -05:00
Thomas Raoux	e53bbf1213	[GVN] Add GVNOption to control load-pre more fine-grained. Adds the global (cl::opt) GVNOption enable-load-in-loop-pre in order to control whether the optimization will be performed if the load is part of a loop. Patch by Hendrik Greving! Differential Revision: https://reviews.llvm.org/D73804	2020-02-03 23:00:58 -08:00
Tyker	15f54d348b	[NFC] Factor out function to detect if an attribute has an argument.	2020-02-03 22:27:24 +01:00
Alina Sbirlea	388de9dfcd	[LoopUtils] Make duplicate method a utility. [NFCI] Summary: Method appendLoopsToWorklist is duplicate in LoopUnroll and in the LoopPassManager as an internal method. Make it an utility. Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73569	2020-02-03 10:24:18 -08:00
Nikita Popov	575a975afd	[SimplifyLibCalls] Remove unused IRBuilder argument; NFC isLocallyOpenedFile() does not use IRBuilder.	2020-02-03 19:12:57 +01:00
Nikita Popov	878cb38a5c	[InstCombine] Add replaceOperand() helper Adds a replaceOperand() helper, which is like Instruction.setOperand() but adds the old operand to the worklist. This reduces the amount of missing or incorrect worklist management. This only applies the helper to a relatively small subset of setOperand() calls in InstCombine, namely those of the pattern `I.setOperand(); return &I;`, where it is most obviously applicable. Differential Revision: https://reviews.llvm.org/D73803	2020-02-03 19:00:17 +01:00
Nikita Popov	e6c9ab4fb7	[InstCombine] Rename worklist methods; NFC This renames Worklist.AddDeferred() to Worklist.add() and Worklist.Add() to Worklist.push(). The intention here is that Worklist.add() should be the go-to method for explicit worklist management, while the raw Worklist.push() is mostly for InstCombine internals. I will then migrate uses of Worklist.push() to Worklist.add() in followup changes. As suggested by spatel on D73411 I'm also changing the remaining method names to lowercase first character, in line with current coding standards. Differential Revision: https://reviews.llvm.org/D73745	2020-02-03 18:56:51 +01:00
Nikita Popov	a59954051e	[InstCombine] Fix unused variable warning; NFC	2020-02-03 18:47:38 +01:00
Teresa Johnson	bed4d9c897	[ThinLTO] More efficient export computation (NFC) Summary: A recent change to enable more importing of global variables with references exposed some efficiency issues with export computation. See D73724 for more information and detailed analysis. The first was specific to variable importing. The code was marking every copy of a referenced value (from possibly thousands of files in the case of linkonce_odr) as exported, and we only need to mark the copy in the module containing the variable def being imported as exported. The reason is that this is tracking what values are newly exported as a result of importing. Anything that was defined in another module and simply used in the exporting module is already exported, and would have been identified by the caller (e.g. the LTO API implementations). The second issue is that the code was re-adding previously exported values (along with all references). It is easy to identify when a variable was already imported into the same module (via the import list insert call return value), and we already did this for function importing. However, what we weren't doing for either function or variable importing was avoiding a re-insertion when it was previously exported into a different importing module. The reason we couldn't do this is there was no way of telling from the export list whether it was previously inserted there because its definition was exported (in which case we already marked all its references as exported) from when it was inserted there because it was referenced by another exported value (in which case we haven't yet inserted its own references). To address this we can restructure the way the export list is constructed. This patch only adds the actual imported definitions (variable or function) to the export list for its module during the import computation. After import computation is complete, where we were already post-processing the export list we go ahead and add all references made by those exported values to the export list. These changes speed up the thin link not only with constant variable importing enabled, but also without (due to the efficiency improvement in function importing). Some thin link user time measurements for one large application, average of 5 runs: With constant variable importing enabled: - without this patch: 479.5s - with this patch: 74.6s Without constant variable importing enabled: - without this patch: 80.6s - with this patch: 70.3s Note I have not re-enabled constant variable importing here, as I would like to do additional compile time measurements with these fixes first. Reviewers: evgeny777 Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73851	2020-02-03 09:15:33 -08:00
Sanjay Patel	e78fb556c5	[InstCombine] reassociate splatted vector ops bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp This patch depends on the splat analysis enhancement in D73549. See the test with comment: ; Negative test - mismatched splat elements ...as the motivation for that first patch. The motivating case for reassociating splatted ops is shown in PR42174: https://bugs.llvm.org/show_bug.cgi?id=42174 In that example, a slight change in order-of-associative math results in a big difference in IR and codegen. This patch gets all of the unnecessary shuffles out of the way, but doesn't address the potential scalarization (see D50992 or D73480 for that). Differential Revision: https://reviews.llvm.org/D73703	2020-02-03 09:08:36 -05:00
Sam Parker	2663a25fad	[JumpThreading] Half the duplicate threshold at Oz Duplicating instructions can lead to code size increases but using a threshold of 3 is good for reducing code size. Differential Revision: https://reviews.llvm.org/D72916	2020-02-03 08:40:20 +00:00
Johannes Doerfert	26d02b0f28	[Attributor] AANoRecurse check all call sites for `norecurse` If all call sites are in `norecurse` functions we can derive `norecurse` as the ReversePostOrderFunctionAttrsPass does. This should make ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is enabled. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D72017	2020-02-02 23:57:17 -06:00
Johannes Doerfert	368f7ee7a5	[Attributor] Propagate known information from `checkForAllCallSites` If we know that all call sites have been processed we can derive an early fixpoint. The use in this patch is likely not to trigger right now but a follow up patch will make use of it. Reviewed By: uenoku, baziotis Differential Revision: https://reviews.llvm.org/D72016	2020-02-02 23:57:17 -06:00
Juneyoung Lee	578d2e2cb1	[llvm-extract] Add -keep-const-init commandline option Summary: This adds -keep-const-init option to llvm-extract which preserves initializers of used global constants. For example: ``` $ cat a.ll @g = constant i32 0 define i32 @f() { %v = load i32, i32* @g ret i32 %v } $ llvm-extract --func=f a.ll -S -o - @g = external constant i32 define i32 @f() { .. } $ llvm-extract --func=f a.ll -keep-const-init -S -o - @g = constant i32 0 define i32 @f() { .. } ``` This option is useful in checking whether a function that uses a constant global is optimized correctly. Reviewers: jsji, MaskRay, david2050 Reviewed By: MaskRay Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73833	2020-02-03 14:30:28 +09:00
Johannes Doerfert	342357c568	[Inliner][NoAlias] Use call site attributes too If we had `noalias` on an argument the inliner created alias scope metadata already. However, the call site `noalias` annotation was not considered. Since the Attributor can derive such call site `noalias` annotation we should treat them the same as argument annotations. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D73528	2020-02-02 23:21:29 -06:00
Tyker	a7bbe45a3e	Build assume from call Fix attempt this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call.	2020-02-02 19:43:36 +01:00
Tyker	7cb5d96fbe	Revert "[WIP] Build assume from call" casued buildbot failure This reverts commit `8ebe001553`.	2020-02-02 18:35:19 +01:00
Tyker	8ebe001553	[WIP] Build assume from call Summary: this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72475	2020-02-02 18:15:50 +01:00
Tyker	c2d0336208	Revert "[WIP] Build assume from call" caused build bot failure This reverts commit `780d2c532f`.	2020-02-02 18:09:06 +01:00
Tyker	780d2c532f	[WIP] Build assume from call Summary: this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72475	2020-02-02 17:54:31 +01:00
Tyker	ad8ffc5010	Revert "[WIP] Build assume from call" This reverts commit `355e4bfd78`.	2020-02-02 17:49:23 +01:00
Tyker	355e4bfd78	[WIP] Build assume from call Summary: this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72475	2020-02-02 17:17:46 +01:00
Tyker	0adda3df92	Revert "[WIP] Build assume from call" This reverts commit `2ff5602cb5`.	2020-02-02 15:05:33 +01:00
Tyker	d431c5d9af	Revert "[NFC] Factor out function to detect if an attribute has an argument." This reverts commit `ff1b9add2f`.	2020-02-02 15:03:06 +01:00
Tyker	ff1b9add2f	[NFC] Factor out function to detect if an attribute has an argument. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72884	2020-02-02 14:50:31 +01:00
Tyker	2ff5602cb5	[WIP] Build assume from call Summary: this is part of the implementation of http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html this patch gives the basis of building an assume to preserve all information from an instruction and add support for building an assume that preserve the information from a call. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgrang, fhahn, mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72475	2020-02-02 14:50:31 +01:00
Fangrui Song	ba3a1774a9	[Transforms] Simplify with make_early_inc_range	2020-02-02 00:54:32 -08:00
Artur Pilipenko	34547ac959	NFC. Comments cleanup in DSE::memoryIsNotModifiedBetween Separated from https://reviews.llvm.org/D68006 review.	2020-01-31 15:22:33 -08:00
Nikita Popov	ff17da3f75	[InstCombine] Push negation through multiply (PR44234) Fixes https://bugs.llvm.org/show_bug.cgi?id=44234 by adding multiply support to freelyNegateValue(). Only one of the operands needs to be negatible, so this still fits within the framework. Differential Revision: https://reviews.llvm.org/D73410	2020-01-31 20:58:55 +01:00
Hiroshi Yamauchi	ac8da31a0f	[PGO][PGSO] Handle MBFIWrapper Some code gen passes use MBFIWrapper to keep track of the frequency of new blocks. This was not taken into account and could lead to incorrect frequencies as MBFI silently returns zero frequency for unknown/new blocks. Add a variant for MBFIWrapper in the PGSO query interface. Depends on D73494.	2020-01-31 09:36:55 -08:00
Sanjay Patel	bc1148e7bc	[PATCH] D73727: [SLP] drop poison-generating flags for shuffle reduction ops (PR44536) We may calculate reassociable math ops in arbitrary order when creating a shuffle reduction, so there's no guarantee that things like 'nsw' hold on those intermediate values. Drop all poison-generating flags for safety. This change is limited to shuffle reductions because I don't think we have a problem in the general case (where we intersect flags of each scalar op that goes into a vector op), but if there's evidence of other cases being wrong, we can extend this fix to cover those cases. https://bugs.llvm.org/show_bug.cgi?id=44536 Differential Revision: https://reviews.llvm.org/D73727	2020-01-31 09:54:35 -05:00
Nikita Popov	480391035c	[InstCombine] Remove unnecessary worklist add; NFCI Again, this will already be added by IRBuilder.	2020-01-30 23:24:59 +01:00
Nikita Popov	90b5ed996b	[InstCombine] Remove unnecessary worklist add; NFCI The IRBuilder will automatically add instructions to the worklist. Adding it manually is unnecessary, but may mess up worklist order.	2020-01-30 23:06:28 +01:00
Nikita Popov	cad91074a6	[InstCombine] Create new insts in foldICmpEqIntrinsicWithConstant; NFCI In line with current conventions, create new instructions rather than modify two operands in place and performing manual worklist management. This should be NFC apart from possible worklist order changes.	2020-01-30 23:03:16 +01:00
Whitney Tsang	e44f4a8a54	[LoopFusion] Move instructions from FC1.GuardBlock to FC0.GuardBlock and from FC0.ExitBlock to FC1.ExitBlock when proven safe. Summary: Currently LoopFusion give up when the second loop nest guard block or the first loop nest exit block is not empty. For example: if (0 < N) { for (int i = 0; i < N; ++i) {} x+=1; } y+=1; if (0 < N) { for (int i = 0; i < N; ++i) {} } The above example should be safe to fuse. This PR moves instructions in FC1 guard block (e.g. y+=1;) to FC0 guard block, or instructions in FC0 exit block (e.g. x+=1;) to FC1 exit block, which then LoopFusion is able to fuse them. Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel, bmahjour, etiotto Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D73641	2020-01-30 18:02:22 +00:00
David Stenberg	b54a8ec1bc	[InstCombine][DebugInfo] Fold constants wrapped in metadata Summary: When constant folding, constants that are wrapped in metadata were not folded. This could lead to dbg.values being the only user of a constant expression, due to the non-dbg uses having been rewritten, resulting in the constant later on being removed by some other pass. This occurred with the attached test case, in which the non-rewritten GEP in the dbg.value intrinsic was later on removed by globalopt. This patch makes the code look through metadata and fold such constants. I guess that we in the future may want to allow dbg.values using GEPs and other constant expressions to be emittable even if there are no non-dbg uses, but for example SelectionDAG does not support that. Reviewers: jmorse, aprantl, vsk, davide Reviewed By: aprantl, vsk, davide Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D73630	2020-01-30 15:50:16 +01:00
Piotr Sobczak	dd7148822b	[InstCombine][AMDGPU] Trim components of s_buffer_load Summary: Add trimming of unused components of s_buffer_load. For s_buffer_load and unformatted buffer_load also trim unused components at the beginning of vector and update offset accordingly. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71785	2020-01-30 10:48:25 +01:00
Michael Forster	676c29694c	Inline debug variable. Summary: In a release build this variable becomes unused and may break the build with `-Werror,-Wunused-variable`. Reviewers: gribozavr2, jdoerfert, sstefan1 Reviewed By: gribozavr2 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73683	2020-01-30 10:29:24 +01:00
Nikita Popov	8058196677	[InstCombine] Process newly inserted instructions in the correct order InstCombine operates on the basic premise that the operands of the currently processed instruction have already been simplified. It achieves this by pushing instructions to the worklist in reverse program order, so that instructions are popped off in program order. The worklist management in the main combining loop also makes sure to uphold this invariant. However, the same is not true for all the code that is performing manual worklist management. The largest problem (addressed in this patch) are instructions inserted by InstCombine's IRBuilder. These will be pushed onto the worklist in order of insertion (generally matching program order), which means that a) the users of the original instruction will be visited first, as they are pushed later in the main loop and b) the newly inserted instructions will be visited in reverse program order. This causes a number of problems: First, folds operate on instructions that have not had their operands simplified, which may result in optimizations being missed (ran into this in https://reviews.llvm.org/D72048#1800424, which was the original motivation for this patch). Additionally, this increases the amount of folds InstCombine has to perform, both within one iteration, and by increasing the number of total iterations. This patch addresses the issue by adding a Worklist.AddDeferred() method, which is used for instructions inserted by IRBuilder. These will only be added to the real worklist after the combine finished, and in reverse order, so they will end up processed in program order. I should note that the same should also be done to nearly all other uses of Worklist.Add(), but I'm starting with just this occurrence, which has by far the largest test fallout. Most of the test changes are due to https://bugs.llvm.org/show_bug.cgi?id=44521 or other cases where we don't canonicalize something. These are neutral. One regression has been addressed in D73575 and D73647. The remaining regression in an shl+sdiv fold can't really be fixed without dropping another transform, but does not seem particularly problematic in the first place. Differential Revision: https://reviews.llvm.org/D73411	2020-01-30 09:40:10 +01:00
Francesco Petrogalli	623cff81fe	[llvm][VectorUtils] Tweak VFShape for scalable vector functions. Summary: This patch makes sure that the field VFShape.VF is greater than zero when demangling the vector function name of scalable vector functions encoded in the "vector-function-abi-variant" attribute. This change is required to be able to provide instances of VFShape that can be used to query the VFDatabase for the vectorization passes, as such passes always require a positive value for the Vectorization Factor (VF) needed by the vectorization process. It is not possible to extract the value of VFShape.VF from the mangled name of scalable vector functions, because it is encoded as `x`. Therefore, the VFABI demangling function has been modified to extract such information from the IR declaration of the vector function, under the assumption that _all_ vectors in the signature of the vector function have the same number of lanes. Such assumption is valid because it is also assumed by the Vector Function ABI specifications supported by the demangling function (x86, AArch64, and LLVM internal one). The unit tests that demangle scalable names have been modified by adding the IR module that carries the declaration of the vector function name being demangled. In particular, the demangling function fails in the following cases: 1. When the declaration of the scalable vector function is not present in the module. 2. When the value of VFSHape.VF is not greater than 0. Reviewers: jdoerfert, sdesmalen, andwar Reviewed By: jdoerfert Subscribers: mgorny, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73286	2020-01-30 05:53:56 +00:00
Johannes Doerfert	89c2e733e8	[Attributor] Pointer privatization attribute (argument promotion) A pointer is privatizeable if it can be replaced by a new, private one. Privatizing pointer reduces the use count, interaction between unrelated code parts. This is a first step towards replacing argument promotion. While we can already handle recursion (unlike argument promotion!) we are restricted to stack allocations for now because we do not analyze the uses in the callee. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68852	2020-01-29 21:31:04 -06:00
Johannes Doerfert	791c9f1145	[Attributor] Fix TODO to avoid recomputation of results The helpers AAReturnedFromReturnedValues and AACallSiteReturnedFromReturned are useful not only to avoid code duplication but also to avoid recomputation of results. If we have N call sites we should not recompute the function return information N times but once. These are mostly straightforward usages with some minor improvements on the helpers and addition of a new one (IRPosition::getAssociatedType) that knows about function return types.	2020-01-29 19:24:34 -06:00
Nikita Popov	e086e23024	[InstCombine] Support non-splat vectors in icmp eq + add/sub fold For the icmp eq (add X, C1), C2 => icmp eq X, C2-C1 icmp eq (sub C1, X), C2 => icmp eq X, C1-C2 folds, this allows C1 to be non-splat and contain undefs. C2 is still splat, due to the structure of the code. This is to address the remaining part of the regression in D73411, where demanded element analysis replaces some elements with undef. Differential Revision: https://reviews.llvm.org/D73647	2020-01-29 20:56:58 +01:00
Elia Geretto	ab2300bc15	[PassManagerBuilder] Remove global extension when a plugin is unloaded This commit fixes PR39321. GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction. This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet. Differential Revision: https://reviews.llvm.org/D71959	2020-01-29 16:15:45 +00:00
Simon Pilgrim	79748add70	Fix MSVC lamdba default capture mode warning. NFCI.	2020-01-29 15:47:04 +00:00
Whitney Tsang	da58e68fdf	[LoopFusion] Move instructions from FC1.Preheader to FC0.Preheader when proven safe. Summary: Currently LoopFusion give up when the second loop nest preheader is not empty. For example: for (int i = 0; i < 100; ++i) {} x+=1; for (int i = 0; i < 100; ++i) {} The above example should be safe to fuse. This PR moves instructions in FC1 preheader (e.g. x+=1; ) to FC0 preheader, which then LoopFusion is able to fuse them. Reviewer: kbarton, Meinersbur, jdoerfert, dmgreen, fhahn, hfinkel, bmahjour, etiotto Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D71821	2020-01-29 15:06:11 +00:00
Sanjay Patel	87f6314f8c	[InstCombine] canonicalize splat shuffle after cmp cmp (splat V1, M), SplatC --> splat (cmp V1, SplatC'), M As discussed in PR44588: https://bugs.llvm.org/show_bug.cgi?id=44588 ...we try harder to push shuffles after binops than after compares. This patch handles the special (but presumably most common case) of splat shuffles. If both operands are splats, then we can do the comparison on the non-splat inputs followed by splat of the compare. That should take care of the regression noted in D73411. There's another potential fold requested in PR37463 to scalarize the compare, but that's another patch (and it's not clear if we can do that without the ability to undo it later): https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D73575	2020-01-29 08:34:29 -05:00
Johannes Doerfert	76843ba37f	[Attributor][Fix] Initialize unused but loaded variable This hopefully un-breaks: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38333	2020-01-28 23:52:16 -06:00
Johannes Doerfert	ea5fabe60c	[Attributor] Reuse existing logic to avoid duplication There was a TODO in AAValueConstantRangeArgument to reuse AAArgumentFromCallSiteArguments. We now do this by allowing new States to be build from the bestState.	2020-01-28 23:45:59 -06:00
Johannes Doerfert	224085409d	[Attributor][FIX] Treat invalidated attributes as changed If we invalidate an attribute we need to inform all dependent ones even if the fixpoint state is not invalid. Before we only continued invalidation if the fixpoint state was invalid, now we signal a change in case the fixpoint state is valid. The test case was already included in D71620 but the problem was hiding because it only manifested with the old PM (for that input).	2020-01-28 23:40:41 -06:00
Johannes Doerfert	53992c7bf7	[Attributor] Modularize AANoAliasCallSiteArgument to simplify extensions This patch modularizes the way we check for no-alias call site arguments by putting the existing logic into helper functions. The reasoning was not changed but special cases for readonly/readnone were added.	2020-01-28 23:39:29 -06:00
Johannes Doerfert	24ae77eebf	[Attributor] Mark a non-defined `null` pointer as `noalias` If `null` is not defined we cannot access it, hence the pointer is `noalias`. While this is not helpful on it's own it simplifies later deductions that can skip over already known `noalias` pointers in certain situations.	2020-01-28 23:09:37 -06:00
Johannes Doerfert	6626d1b7c0	[Attributor][NFC] Remove ugly and unneeded cast	2020-01-28 22:54:31 -06:00
Johannes Doerfert	02bd8180fc	[Attributor][NFC] Improve debug messages	2020-01-28 22:53:19 -06:00
Johannes Doerfert	b6dbd0f71f	[Attributor][NFC] Internalize helper function	2020-01-28 22:50:34 -06:00
Vedant Kumar	8359511c62	[CodeExtractor] Remove stale llvm.assume calls from extracted region During extraction, stale llvm.assume handles may be retained in the original function. The setup is: 1) CodeExtractor unregisters assumptions in the blocks that are to be extracted. 2) Extraction happens. There are now two functions: f1 and f1.extracted. 3) Leftover assumptions in f1 (/not/ removed as they were not in the set of blocks to be extracted) now have affected-value llvm.assume handles in f1.extracted. When assumptions for a value used in f1 are looked up, ValueTracking can assert as some of the handles are in the wrong function. To fix this, simply erase the llvm.assume calls in the extracted function. Alternatives include flushing the assumption cache in the original function, or walking all values used in the original function to prune stale affected-value handles. Both seem more expensive. Testing: check-llvm, LNT run with -mllvm -hot-cold-split enabled rdar://58460728	2020-01-28 17:18:01 -08:00
Eli Friedman	2f6b9edfa8	[AliasAnalysis] Add missing FMRB_* enums. Previously, the enums didn't account for all the possible cases, which could cause misleading results (particularly for a "switch" on FunctionModRefBehavior). Fixes regression in polly from recent patch to add writeonly to memset. While I'm here, also fix a few dubious uses of the FMRB_* enum values. Differential Revision: https://reviews.llvm.org/D73154	2020-01-28 15:47:08 -08:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Whitney Tsang	cd0cff4392	[NFCI][LoopUnrollAndJam] Minor changes. Summary: 1. Add assertions. 2. Verify more analyses. These changes are moved out of https://reviews.llvm.org/D73129 to simplify that review. Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto Reviewed By: dmgreen Subscribers: fhahn, hiraditya, zzheng, llvm-commits, prithayan, anhtuyen Tag: LLVM Differential Revision: https://reviews.llvm.org/D73204	2020-01-28 20:24:23 +00:00
Petr Hosek	127d3abf25	[Instrumentation] Set hidden visibility for the bias variable We have to avoid using a GOT relocation to access the bias variable, setting the hidden visibility achieves that. Differential Revision: https://reviews.llvm.org/D73529	2020-01-28 12:07:03 -08:00
Sanjay Patel	7a717d82ff	[InstCombine] refactor foldVectorCmp(); NFC We can handle other patterns here as shown in PR44588.	2020-01-28 14:40:48 -05:00
Florian Hahn	5d0ffbeb4d	[Matrix] Mark expressions shared between multiple remarks. This patch adds support for explicitly highlighting sub-expressions shared by multiple leaf nodes. For example consider the following code %shared.load = tail call <8 x double> @llvm.matrix.columnwise.load.v8f64.p0f64(double* %arg1, i32 %stride, i32 2, i32 4), !dbg !10, !noalias !10 %trans = tail call <8 x double> @llvm.matrix.transpose.v8f64(<8 x double> %shared.load, i32 2, i32 4), !dbg !10 tail call void @llvm.matrix.columnwise.store.v8f64.p0f64(<8 x double> %trans, double* %arg3, i32 10, i32 4, i32 2), !dbg !10 %load.2 = tail call <30 x double> @llvm.matrix.columnwise.load.v30f64.p0f64(double* %arg3, i32 %stride, i32 2, i32 15), !dbg !10, !noalias !10 %mult = tail call <60 x double> @llvm.matrix.multiply.v60f64.v8f64.v30f64(<8 x double> %trans, <30 x double> %load.2, i32 4, i32 2, i32 15), !dbg !11 tail call void @llvm.matrix.columnwise.store.v60f64.p0f64(<60 x double> %mult, double* %arg2, i32 10, i32 4, i32 15), !dbg !11 We have two leaf nodes (the 2 stores) and the first store stores %trans which is also used by the matrix multiply %mult. We generate separate remarks for each leaf (stores). To denote that parts are shared, the shared expressions are marked as shared (), with a reference to the other remark that shares it. The operation summary also denotes the shared operations separately. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72526	2020-01-28 09:27:55 -08:00
Florian Hahn	d1f849a284	[LV] Hoist code to mark conditional assumes as dead to caller (NFC). This is a follow-up suggested in D73423. It is sufficient to just add the conditional assumes to DeadInstructions once.	2020-01-28 08:50:44 -08:00
Michael Liao	9c54b42338	Fix warning of `-Wcast-qual`. NFC.	2020-01-28 11:36:43 -05:00
Florian Hahn	a911fef3dd	[LV] Do not try to sink dead instructions. Dead instructions do not need to be sunk. Currently we try and record the recipies for them, but there are no recipes emitted for them and there's nothing to sink. They can be removed from SinkAfter while marking them for recording. Fixes PR44634. Reviewers: rengolin, hsaito, fhahn, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D73423	2020-01-28 08:28:03 -08:00
Whitney Tsang	78dc64989c	[CodeMoverUtils] Improve IsControlFlowEquivalent. Summary: Currently IsControlFlowEquivalent determine if two blocks are control flow equivalent by checking if A dominates B and B post dominates A. There exists blocks that are control flow equivalent even if they don't satisfy the A dominates B and B post dominates A condition. For example, if (cond) A if (cond) B In the PR, we determine if two blocks are control flow equivalent by also checking if the two sets of conditions A and B depends on are equivalent. Reviewer: jdoerfert, Meinersbur, dmgreen, etiotto, bmahjour, fhahn, hfinkel, kbarton Reviewed By: fhahn Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D71578	2020-01-28 14:18:00 +00:00
Florian Hahn	62e228f8fd	[Matrix] Add info about number of operations to remarks. This patch updates the remark to also include a summary of the number of vector operations generated for each matrix expression. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72480	2020-01-27 17:43:39 -08:00
Florian Hahn	949294f396	[Matrix] Add optimization remarks for matrix expression. Generate remarks for matrix operations in a function. To generate remarks for matrix expressions, the following approach is used: 1. Collect leafs of matrix expressions (done in RemarkGenerator::getExpressionLeafs). Leafs are lowered matrix instructions without other matrix users (like stores). 2. For each leaf, create a remark containing a linearizied version of the matrix expression. The following improvements will be submitted as follow-ups: * Summarize number of vector instructions generated for each expression. * Account for shared sub-expressions. * Propagate matrix remarks up the inlining chain. The information provided by the matrix remarks helps users to spot cases where matrix expression got split up, e.g. due to inlining not happening. The remarks allow users to address those issues, ensuring best performance. Reviewers: anemet, Gerolf, thegameg, hfinkel, andrew.w.kaylor, LuoYuanke Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D72453	2020-01-27 16:39:29 -08:00
Sanjay Patel	747242af8d	[InstCombine] allow more narrowing of casted select D47163 created a rule that we should not change the casted type of a select when we have matching types in its compare condition. That was intended to help vector codegen, but it also could create situations where we miss subsequent folds as shown in PR44545: https://bugs.llvm.org/show_bug.cgi?id=44545 By using shouldChangeType(), we can continue to get the vector folds (because we always return false for vector types). But we also solve the motivating bug because it's ok to narrow the scalar select in that example. Our canonicalization rules around select are a mess, but AFAICT, this will not induce any infinite looping from the reverse transform (but we'll need to watch for that possibility if committed). Side note: there's a similar use of shouldChangeType() for phi ops just below this diff, and the source and destination types appear to be reversed. Differential Revision: https://reviews.llvm.org/D72733	2020-01-27 16:35:50 -05:00
Sanjay Patel	242fed9d7f	[InstCombine] convert fsub nsz with fneg operand to -(X + Y) This was noted in D72521 - we need to match fneg specifically to consistently handle that pattern along with (-0.0 - X).	2020-01-27 14:49:15 -05:00
Nikita Popov	bcfa0f592f	[InstCombine] Move negation handling into freelyNegateValue() Followup to D72978. This moves existing negation handling in InstCombine into freelyNegateValue(), which make it composable. In particular, root negations of div/zext/sext/ashr/lshr/sub can now always be performed through a shl/trunc as well. Differential Revision: https://reviews.llvm.org/D73288	2020-01-27 20:46:23 +01:00
Teresa Johnson	2f63d549f1	Restore "[LTO/WPD] Enable aggressive WPD under LTO option" This restores `59733525d3` (D71913), along with bot fix `19c76989bb`. The bot failure should be fixed by D73418, committed as `af954e441a`. I also added a fix for non-x86 bot failures by requiring x86 in new test lld/test/ELF/lto/devirt_vcall_vis_public.ll.	2020-01-27 07:55:05 -08:00
Whitney Tsang	2b335e9aae	[LoopUnroll] Remove remapInstruction(). Summary: LoopUnroll can reuse the RemapInstruction() in ValueMapper, or remapInstructionsInBlocks() in CloneFunction, depending on the needs. There is no need to have its own version in LoopUnroll. By calling RemapInstruction() without TypeMapper or Materializer and with Flags (RF_NoModuleLevelChanges \| RF_IgnoreMissingLocals), it does the same as remapInstruction(). remapInstructionsInBlocks() calls RemapInstruction() exactly as described. Looking at the history, I cannot find any obvious reason to have its own version. Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto, foad, aprantl Reviewed By: jdoerfert Subscribers: hiraditya, zzheng, llvm-commits, prithayan, anhtuyen Tag: LLVM Differential Revision: https://reviews.llvm.org/D73277	2020-01-27 15:42:13 +00:00
Guillaume Chatelet	07c9d53266	[Alignment][NFC] Use Align with CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73449	2020-01-27 10:58:36 +01:00
Guillaume Chatelet	d0a7cc7177	[Alignment][NFC] Use Align with CreateMaskedScatter/Gather Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 This patch shows that CreateMaskedScatter/CreateMaskedGather can only take positive non zero alignment values. Reviewers: courbet Subscribers: hiraditya, llvm-commits, delena Tags: #llvm Differential Revision: https://reviews.llvm.org/D73361	2020-01-27 10:17:14 +01:00
Evgenii Stepanov	1df8549b26	[msan] Instrument x86.pclmulqdq* intrinsics. Summary: These instructions ignore parts of the input vectors which makes the default MSan handling too strict and causes false positive reports. Reviewers: vitalybuka, RKSimon, thakis Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73374	2020-01-24 14:31:06 -08:00
Andy Kaylor	b35b7da460	[PGO] Attach appropriate funclet operand bundles to value profiling instrumentation calls Patch by Chris Chrulski When generating value profiling instrumentation, ensure the call gets the correct funclet token, otherwise WinEHPrepare will turn the call (and all subsequent instructions) into unreachable. Differential Revision: https://reviews.llvm.org/D73221	2020-01-24 11:20:53 -08:00
Simon Pilgrim	abd1927d44	Fix some comment typos. NFC.	2020-01-24 18:18:42 +00:00
Alina Sbirlea	0d90d2457c	[LoopStrengthReduce] Teach LoopStrengthReduce to preserve MemorySSA is available.	2020-01-24 10:13:52 -08:00
Andy Kaylor	a33accde95	[PGO] Early detection regarding whether pgo counter promotion is possible Patch by Chris Chrulski This fixes a problem with the current behavior when assertions are enabled. A loop that exits to a catchswitch instruction is skipped for the counter promotion, however this check was being done after the PGOCounterPromoter tried to collect an insertion point for the exit block. A call to getFirstInsertionPt() on a block that begins with a catchswitch instruction triggers an assertion. This change performs a check whether the counter promotion is possible prior to collecting the ExitBlocks and InsertPts. Differential Revision: https://reviews.llvm.org/D73222	2020-01-24 09:55:41 -08:00
Guillaume Chatelet	805c157e8a	[Alignment][NFC] Deprecate Align::None() Summary: This is a follow up on https://reviews.llvm.org/D71473#inline-647262. There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case. Reviewers: xbolva00, courbet, bollu Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73099	2020-01-24 12:53:58 +01:00
Evgeny Leviant	8973fae195	[WPD] Allow load/save bitcoded index when running opt -wholeprogramdevirt Differential revision: https://reviews.llvm.org/D73094	2020-01-24 00:31:39 -08:00
Andy Kaylor	c467faf23c	[WinEH] Ignore lifetime.end PHI nodes in empty cleanuppads This fixes a bug where a PHI node that is only referenced by a lifetime.end intrinsic in an otherwise empty cleanuppad can cause SimplyCFG to create an SSA violation while removing the empty cleanuppad. Theoretically the same problem can occur with debug intrinsics. Differential Revision: https://reviews.llvm.org/D72540	2020-01-23 18:18:50 -08:00
Teresa Johnson	90e630a95e	Revert "[LTO/WPD] Enable aggressive WPD under LTO option" This reverts commit `59733525d3`. There is a windows sanitizer bot failure in one of the cfi tests that I will need some time to figure out: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/57155/steps/stage%201%20check/logs/stdio	2020-01-23 17:29:24 -08:00
Johannes Doerfert	7ad17e008b	[Attributor] Avoid REQUIRED dependences in favor of OPTIONAL ones When we use information only to short-cut deduction or improve it, we can use OPTIONAL dependences instead of REQUIRED ones to avoid cascading pessimistic fixpoints. We also need to track dependences only when we use assumed information, e.g., we act on assumed liveness information.	2020-01-23 18:42:46 -06:00
Johannes Doerfert	214ed3f676	[Attributor] Record dependences only when necessary If we use assumed information from AAValueSimplify we need to record an OPTIONAL dependence, otherwise we do not.	2020-01-23 18:42:45 -06:00
Johannes Doerfert	5429c82db2	[Attributor][FIX] Avoid dangling pointers during code deletion It can happen that we have instructions in the ToBeDeletedInsts set which are deleted earlier already. To avoid dangling pointers we use weak tracking handles.	2020-01-23 18:42:45 -06:00
Johannes Doerfert	ff6254dc26	[Attributor][FIX] Handle non-pointers when following uses When we follow uses, e.g., in AAMemoryBehavior or AANoCapture, we need to make sure the value is a pointer before we ask for abstract attributes only valid for pointers. This happens because we follow pointers through calls that do not capture but may return the value.	2020-01-23 18:42:45 -06:00
Johannes Doerfert	9dcf889d15	[Attributor][NFC] Do not (try to) simplify void values We might accidentally ask AAValueSimplify to simplify a void value. That can lead to very interesting, and very wrong, results. We now handle this case gracefully.	2020-01-23 18:42:45 -06:00
Alina Sbirlea	1d09174290	[LoopStrengthReduce] Reuse utility method to clean dead instructions. [NFCI] Create a utility wrapper for the RecursivelyDeleteTriviallyDeadInstructions utility method, which sets to nullptr the instructions that are not trivially dead. Use the new method in LoopStrengthReduce. Alternative: add a bool to the same method; this option adds a marginal amount of overhead to the other callers, and the method needs to be updated to return a bool status when it removes/doesn't remove instructions.	2020-01-23 16:27:32 -08:00
Johannes Doerfert	30179d7ecf	[Attributor][FIX][Alignment] Do not report a change if there was none If alignment was manifested but it is actually only as good as the data-layout provided one we should not report it as a change. For testing purposes we still manifest the information.	2020-01-23 18:13:52 -06:00
Johannes Doerfert	e273ac4d88	[Attributor][NFC] Add an assertion	2020-01-23 18:13:52 -06:00
Johannes Doerfert	d07b5a5525	[Attributor][NFC] Fix spelling	2020-01-23 18:13:52 -06:00
Johannes Doerfert	2baf000ecc	[Attributor] `byval` arguments are always `noalias` `byval` introduces a local copy of the argument. That copy cannot alias anything.	2020-01-23 18:13:52 -06:00
Johannes Doerfert	30ae859c69	[Attributor][FIX] Store alignment only holds for the pointer value We accidentally used the store alignment for the value operand as well, which is incorrect and crashed the SPASS application in the test suite.	2020-01-23 18:13:52 -06:00
Teresa Johnson	59733525d3	[LTO/WPD] Enable aggressive WPD under LTO option Summary: Third part in series to support Safe Whole Program Devirtualization Enablement, see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html This patch adds type test metadata under -fwhole-program-vtables, even for classes without hidden visibility. It then changes WPD to skip devirtualization for a virtual function call when any of the compatible vtables has public vcall visibility. Additionally, internal LLVM options as well as lld and gold-plugin options are added which enable upgrading all public vcall visibility to linkage unit (hidden) visibility during LTO. This enables the more aggressive WPD to kick in based on LTO time knowledge of the visibility guarantees. Support was added to all flavors of LTO WPD (regular, hybrid and index-only), and to both the new and old LTO APIs. Unfortunately it was not simple to split the first and second parts of this part of the change (the unconditional emission of type tests and the upgrading of the vcall visiblity) as I needed a way to upgrade the public visibility on legacy WPD llvm assembly tests that don't include linkage unit vcall visibility specifiers, to avoid a lot of test churn. I also added a mechanism to LowerTypeTests that allows dropping type test assume sequences we now aggressively insert when we invoke distributed ThinLTO backends with null indexes, which is used in testing mode, and which doesn't invoke the normal ThinLTO backend pipeline. Depends on D71907 and D71911. Reviewers: pcc, evgeny777, steven_wu, espindola Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71913	2020-01-23 16:09:44 -08:00
Alina Sbirlea	9e66c4ec12	[Utils] Use WeakTrackingVH in vector used as scratch storage. The utility method RecursivelyDeleteTriviallyDeadInstructions receives as input a vector of Instructions, where all inputs are valid instructions. This same vector is used as a scratch storage (per the header comment) to recursively delete instructions. If an instruction is added as an operand of multiple other instructions, it may be added twice, then deleted once, then the second reference in the vector is invalid. Switch to using a Vector<WeakTrackingVH>. This change facilitates a clean-up in LoopStrengthReduction.	2020-01-23 16:04:57 -08:00
Florian Hahn	4ed7355e44	[IPSCCP] Use ParamState for arguments at call sites. We currently use integer ranges to merge concrete function arguments. We use the ParamState range for those, but we only look up concrete values in the regular state. For concrete function arguments that are themselves arguments of the containing function, we can use the param state directly and improve the precision in some cases. Besides improving the results in some cases, this is also a small step towards switching to ValueLatticeElement, by allowing D60582 to be a NFC. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D71836	2020-01-23 13:55:42 -08:00
Teresa Johnson	458676db6e	[WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables Summary: First patch to support Safe Whole Program Devirtualization Enablement, see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html Always emit !vcall_visibility metadata under -fwhole-program-vtables, and not just for -fvirtual-function-elimination. The vcall visibility metadata will (in a subsequent patch) be used to communicate to WPD which vtables are safe to devirtualize, and we will optionally convert the metadata to hidden visibility at link time. Subsequent follow on patches will help enable this by adding vcall_visibility metadata to the ThinLTO summaries, and always emit type test intrinsics under -fwhole-program-vtables (and not just for vtables with hidden visibility). In order to do this safely with VFE, since for VFE all vtable loads must be type checked loads which will no longer be the case, this patch adds a new "Virtual Function Elim" module flag to communicate to GlobalDCE whether to perform VFE using the vcall_visibility metadata. One additional advantage of using the vcall_visibility metadata to drive more WPD at LTO link time is that we can use the same mechanism to enable more aggressive VFE at LTO link time as well. The link time option proposed in the RFC will convert vcall_visibility metadata to hidden (aka linkage unit visibility), which combined with -fvirtual-function-elimination will allow it to be done more aggressively at LTO link time under the same conditions. Reviewers: pcc, ostannard, evgeny777, steven_wu Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71907	2020-01-23 11:36:01 -08:00
Alina Sbirlea	6770de9b8d	[LoopIdiomRecognize] Teach LoopIdiomRecognize to preserve MemorySSA.	2020-01-23 11:31:12 -08:00
Alina Sbirlea	a0f627d584	[IndVarSimplify] Fix for MemorySSA preserve.	2020-01-23 11:06:16 -08:00
Justin Bogner	b81a337be7	[LoopUnroll] Avoid UB when converting from WeakVH to `Value ` Calling `operator` on a WeakVH with a null value yields a null reference, which is UB. Avoid this by implicitly converting the WeakVH to a `Value *` rather than dereferencing and then taking the address for the type conversion. Differential Revision: https://reviews.llvm.org/D73280	2020-01-23 10:36:39 -08:00
Guillaume Chatelet	59f95222d4	[Alignment][NFC] Use Align with CreateAlignedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73274	2020-01-23 17:34:32 +01:00
Kazu Hirata	41784bed01	Revert "Resubmit: [JumpThreading] Thread jumps through two basic blocks" This reverts commit `53b68e676f`. Our internal tests are showing breakage with this patch.	2020-01-23 06:34:03 -08:00
Fedor Sergeev	2f6987ba61	[LoopRotate] add ability to repeat loop rotation until non-deoptimizing exit is found In case of loops with multiple exit where all-but-one exit are deoptimizing it might happen that the first rotation will end up with latch having a deoptimizing exit. This makes the loop unsuitable for trip-count analysis (say, getLoopEstimatedTripCount) as well as for loop transformations that know how to handle multple deoptimizing exits. It pretty much means that canonical form in multple-deoptimizing-exits case should be with non-deoptimizing exit at latch. Teach loop-rotation to reach this canonical form by repeating rotation. -loop-rotate-multi option introduced to control this behavior, currently disabled by default. Reviewers: skatkov, asbirlea, reames, fhahn Reviewed By: skatkov Tags: #llvm Differential Revision: https://reviews.llvm.org/D73058	2020-01-23 15:56:24 +03:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Daniil Suchkov	4a8dbc617d	[SSAUpdater] Don't call ValueIsRAUWd upon single use replacement It is incorrect to call ValueHandleBase::ValueIsRAUWd when only one use is replaced since it simply violates semantics of the callback and leads to bugs like PR44320. Previously this call was used specifically to keep LICM's cache of AliasSetTrackers up to date across passes (as PR36801 showed, even for that purpose it didn't work properly), but since LICM doesn't have that cache anymore, we can safely remove this incorrect call with no repercussions. This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44320 Reviewers: asbirlea, fhahn, efriedma, reames Reviewed-By: asbirlea Differential Revision: https://reviews.llvm.org/D73089	2020-01-23 15:53:53 +07:00
Daniil Suchkov	6fc9e60149	NFC. Remove obsolete SimpleAnalysis infrastructure Apparently cache of AliasSetTrackers held by LICM was the only user of SimpleAnalysis infrastructure. Now, given that we no longer have that cache, this infrastructure is obsolete and, taking into account its nature, we don't want any new solutions to be based on it. Reviewers: asbirlea, fhahn, efriedma, reames Reviewed-By: asbirlea Differential Revision: https://reviews.llvm.org/D73085	2020-01-23 13:58:30 +07:00
Daniil Suchkov	53a28bd891	[LICM] NFC. Remove AST caching infrastructure Since LICM doesn't use AST caching any more (see D73081), this infrastructure is now obsolete and we can remove it. Reviewers: asbirlea, fhahn, efriedma, reames Reviewed-By: asbirlea Differential Revision: https://reviews.llvm.org/D73084	2020-01-23 12:33:50 +07:00
Florian Hahn	f14f2a8568	[LV] Fix predication for branches with matching true and false succs. Currently due to the edge caching, we create wrong predicates for branches with matching true and false successors. We will cache the condition for the edge from the true successor, and then lookup the same edge (src and dst are the same) for the edge to the false successor. If both successors match, the condition should always be true. At the moment, we cannot really create constant VPValues, but we can just create a true condition as X \| !X. Later passes will clean that up. Fixes PR44488. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D73079	2020-01-22 18:34:11 -08:00
Jonas Devlieghere	cf2b498d28	[llvm/Transforms] Fix warning: private field 'MSSA' is not used	2020-01-22 18:07:53 -08:00
Alina Sbirlea	adc4faf532	[IndVarSimplify] Teach IndVarSimplify to preserve MemorySSA.	2020-01-22 16:33:17 -08:00
Alina Sbirlea	b5b6126d97	[IndVarSimplify] Cleanup spaces and reduce variable scope [NFCI] Minor clean-ups + clang-format.	2020-01-22 15:32:20 -08:00
Alina Sbirlea	6baf31b7c1	[LoopIdiomRecognize] Reduce variable scope. [NFCI]	2020-01-22 15:30:08 -08:00
Nikita Popov	0b83c5a78f	[InstCombine] Combine neg of shl of sub (PR44529) Fixes https://bugs.llvm.org/show_bug.cgi?id=44529. We already have a combine to sink a negation through a left-shift, but it currently only works if the shift operand is negatable without creating any instructions. This patch introduces freelyNegateValue() as a more powerful extension of dyn_castNegVal(), which allows negating a value as long as this doesn't end up increasing instruction count. Specifically, this patch adds support for negating A-B to B-A. This mechanism could in the future be extended to handle general negation chains that a) start at a proper 0-X negation and b) only require one operand to be freely negatable. This would end up as a weaker form of D68408 aimed at the most obviously profitable subset that eliminates a negation entirely. Differential Revision: https://reviews.llvm.org/D72978	2020-01-22 23:03:58 +01:00
Nikita Popov	efba7ed05e	[PatternMatch] Make m_c_ICmp swap the predicate (PR42801) This addresses https://bugs.llvm.org/show_bug.cgi?id=42801. The m_c_ICmp() matcher is changed to provide the swapped predicate if the operands are swapped. Existing uses of m_c_ICmp() fall in one of two categories: Working on equality predicates only, where swapping is irrelevant. Or performing a manual swap, in which case this patch removes it. The only exception is the foldICmpWithLowBitMaskedVal() fold, which does not swap the predicate, and instead reasons about whether a swap occurred or not for each predicate. Getting the swapped predicate allows us to merge the logic for pairs of predicates, instead of duplicating it. Differential Revision: https://reviews.llvm.org/D72976	2020-01-22 22:56:26 +01:00
Alina Sbirlea	efb130fc93	[LoopDeletion] Teach LoopDeletion to preserve MemorySSA if available. If MemorySSA analysis is analysis, LoopDeletion now preserves it.	2020-01-22 11:38:38 -08:00
Sanjay Patel	0ade2abdb0	[InstCombine] fneg(X + C) --> -C - X This is 1 of the potential folds uncovered by extending D72521. We don't seem to do this in the backend either (unless I'm not seeing some target-specific transform). icc and gcc (appears to be target-specific) do this transform. Differential Revision: https://reviews.llvm.org/D73057	2020-01-22 09:48:43 -05:00
Guillaume Chatelet	0957233320	[Alignment][NFC] Use Align with CreateMaskedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73106	2020-01-22 11:04:39 +01:00
Daniil Suchkov	7bdc83f340	[LICM] Don't cache AliasSetTrackers when run under legacy PM Summary: This is the first step towards complete removal of AST caching from LICM. Attempts to keep LICM's AST cache up to date across passes can lead to miscompiles like this one: https://bugs.llvm.org/show_bug.cgi?id=44320. LICM has already switched to using MemorySSA to do sinking and hoisting and only builds an AliasSetTracker on demand for the promoteToScalars step, without caching it from one LICM instance to the next. Given this, we don't have compile-time reasons to keep AST caching any more. The only scenario where the caching would be used currently is when using the LegacyPassManager and setting -enable-mssa-loop-dependency=false. This switch should help us to surface any possible issues that may arise along this way, also it turns subsequent removal of AST caching into NFC. Reviewers: asbirlea, fhahn, efriedma, reames Reviewed By: asbirlea Subscribers: hiraditya, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73081	2020-01-22 13:16:45 +07:00
Andrei Elovikov	e1d6d36852	[SLP] Don't allow Div/Rem as alternate opcodes Summary: We don't have control/verify what will be the RHS of the division, so it might happen to be zero, causing UB. Reviewers: Vasilis, RKSimon, ABataev Reviewed By: ABataev Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie Tags: #llvm Differential Revision: https://reviews.llvm.org/D72740	2020-01-21 15:21:17 -08:00
Florian Hahn	f42994f228	[Matrix] Hide and describe matrix-propagate-shape option.	2020-01-21 14:28:47 -08:00
Guillaume Chatelet	bc8a1ab26f	[Alignment][NFC] Use Align with CreateMaskedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73087	2020-01-21 14:13:22 +01:00
Sanjay Patel	7bee94410c	[InstCombine] form copysign from select of FP constants (PR44153) This should be the last step needed to solve the problem in the description of PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 If we're casting an FP value to int, testing its signbit, and then choosing between a value and its negated value, that's a complicated way of saying "copysign": (bitcast X) < 0 ? -TC : TC --> copysign(TC, X) Differential Revision: https://reviews.llvm.org/D72643	2020-01-20 10:51:14 -05:00
Guillaume Chatelet	46b9563cf6	[Alignment][NFC] Use Align with CreateElementUnorderedAtomicMemCpy Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, nicolasvasilache Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73041	2020-01-20 15:39:45 +01:00
Evgeniy Brevnov	af7e158872	[LV] Vectorizer should adjust trip count in profile information Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly. Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov Reviewed By: Ayal, DaniilSuchkov Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67905	2020-01-20 18:36:28 +07:00
Evgeniy Brevnov	cfe97681cd	[NFC][LoopUtils] Minor change in comment according to review D71990.	2020-01-20 17:10:10 +07:00
Evgeniy Brevnov	10357e1c89	[LoopUtils] Better accuracy for getLoopEstimatedTripCount. Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2. Reviewers: Ayal, fhahn Reviewed By: Ayal Subscribers: mgorny, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71990	2020-01-20 16:58:07 +07:00
Sjoerd Meijer	93175a5caa	[IndVarSimplify][LoopUtils] rewriteLoopExitValues. NFCI This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus making it a generic loop utility function. This allows to rewrite loop exit values by just calling this function without running the whole IndVarSimplify pass. We use this in D72714 to rematerialise the iteration count in exit blocks, so that we can clean-up loop update expressions inside the hardware-loops later. Differential Revision: https://reviews.llvm.org/D72602	2020-01-20 09:05:00 +00:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Alina Sbirlea	9f6c6ee6b9	[MemDepAnalysis/VNCoercion] Move static method to its only use. [NFCI] Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize does not have or use any info specific to MemoryDependenceResults. Move it to its only user: VNCoercion.	2020-01-17 15:18:42 -08:00
Petr Hosek	d3db13af7e	[profile] Support counter relocation at runtime This is an alternative to the continous mode that was implemented in D68351. This mode relies on padding and the ability to mmap a file over the existing mapping which is generally only available on POSIX systems and isn't suitable for other platforms. This change instead introduces the ability to relocate counters at runtime using a level of indirection. On every counter access, we add a bias to the counter address. This bias is stored in a symbol that's provided by the profile runtime and is initially set to zero, meaning no relocation. The runtime can mmap the profile into memory at abitrary location, and set bias to the offset between the original and the new counter location, at which point every subsequent counter access will be to the new location, which allows updating profile directly akin to the continous mode. The advantage of this implementation is that doesn't require any special OS support. The disadvantage is the extra overhead due to additional instructions required for each counter access (overhead both in terms of binary size and performance) plus duplication of counters (i.e. one copy in the binary itself and another copy that's mmapped). Differential Revision: https://reviews.llvm.org/D69740	2020-01-17 15:02:23 -08:00
Peter Collingbourne	cd40bd0a32	hwasan: Move .note.hwasan.globals note to hwasan.module_ctor comdat. As of D70146 lld GCs comdats as a group and no longer considers notes in comdats to be GC roots, so we need to move the note to a comdat with a GC root section (.init_array) in order to prevent lld from discarding the note. Differential Revision: https://reviews.llvm.org/D72936	2020-01-17 13:40:52 -08:00
Drew Wock	0bcfafc5e7	[SeparateConstOffsetFromGEP] Fix: sext(a) + sext(b) -> sext(a + b) matches add and sub instructions with one another During the SeparateConstOffsetFromGEP pass, signed extensions are distributed to the values that feed into them and then later recombined. The recombination stage is somewhat problematic- it doesn't differ add and sub instructions from another when matching the sext(a) +/- sext(b) -> sext(a +/- b) pattern in some instances. An example- the IR contains: %unextendedA %unextendedB %subuAuB = unextendedA - unextendedB %extA = extend A %extB = extend B %addeAeB = extA + extB The problematic optimization will transform that into: %unextendedA %unextendedB %subuAuB = unextendedA - unextendedB %extA = extend A %extB = extend B %addeAeB = extend subuAuB ; Obviously not semantically equivalent to the IR input. This patch fixes that. Patch by Drew Wock <drew.wock@sas.com> Differential Revision: https://reviews.llvm.org/D65967	2020-01-17 12:22:52 -05:00
Nikita Popov	522c030aa9	[InstCombine] Fix worklist management in DSE (PR44552) Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make sure that the store is reprocessed, because performing DSE may expose more DSE opportunities. There is a slight caveat here though: We need to make sure that we add back the store the worklist first, because that means it will be processed after the operands of the removed store have been processed. This is a general bug in InstCombine worklist management that I hope to address at some point, but for now it means we need to do this manually rather than just returning the instruction as changed. Differential Revision: https://reviews.llvm.org/D72807	2020-01-17 18:10:56 +01:00
Nikita Popov	77befe54f7	[InstCombine] Fix worklist management in return combine There are two related bugs here: First, we don't add the operand we're replacing to the worklist, which means it may not get DCEd (see test change). Second, usually this would just get picked up in the next iteration, but we also do not report the instruction as changed. This means that we do not get that extra instcombine iteration, and more importantly, may break the pass pipeline, as the function is not marked as changed. Differential Revision: https://reviews.llvm.org/D72864	2020-01-17 17:59:23 +01:00
Nikita Popov	2ca092f320	[InstCombine] Support disabling expensive combines in opt Currently, there is no way to disable ExpensiveCombines when doing a standalone opt -instcombine run, as that's the default, and the opt option can currently only be used to force enable, not to force disable. The only way to disable expensive combines is via -O1 or -O2, but that of course also runs the rest of the kitchen sink... This patch allows using opt -instcombine -expensive-combines=0 to run InstCombine without ExpensiveCombines. Differential Revision: https://reviews.llvm.org/D72861	2020-01-17 17:56:20 +01:00
Matt Arsenault	3ef8cdf666	AMDGPU: Do permlane16 vdst_in discard optimization in InstCombine There's more potential value to discarding the source value earlier, since we always know the value of the fi/bc bits.	2020-01-16 17:27:53 -05:00
Kazu Hirata	53b68e676f	Resubmit: [JumpThreading] Thread jumps through two basic blocks This reverts commit `2d258ed931`. This revision fixes the Windows build and adds a testcase for it, namely thread-two-bbs3.ll. My original patch improperly copied EH pads on Windows. This patch disregards jump threading opportunities having to do with EH pads. [JumpThreading] Thread jumps through two basic blocks Summary: This patch teaches JumpThreading.cpp to thread through two basic blocks like: bb3: %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 by duplicating basic blocks like bb3 above. Once we duplicate bb3 as bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have: bb3: %var = phi i32* [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb3.dup: %var = phi i32* [ null, %bb1 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 Then the existing code in JumpThreading.cpp can thread edge bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70247	2020-01-16 12:33:37 -08:00
Arkady Shlykov	c87982b467	Revert "[Loop Peeling] Add possibility to enable peeling on loop nests." This reverts commit `3f3017e` because there's a failure on peel-loop-nests.ll with LLVM_ENABLE_EXPENSIVE_CHECKS on. Differential Revision: https://reviews.llvm.org/D70304	2020-01-16 10:33:38 -08:00
Fedor Sergeev	3478551bf3	[GVN] introduce GVNOptions to control GVN pass behavior There are a few global (cl::opt) controls that enable optional behavior in GVN. Introduce GVNOptions that provide corresponding per-pass instance controls. That will allow to use GVN multiple times in pipeline each time with different settings. Reviewers: asbirlea, rnk, reames, skatkov, fhahn Reviewed By: fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D72732	2020-01-16 20:21:08 +03:00
Mircea Trofin	7acfda633f	[llvm] Make new pass manager's OptimizationLevel a class Summary: The old pass manager separated speed optimization and size optimization levels into two unsigned values. Coallescing both in an enum in the new pass manager may lead to unintentional casts and comparisons. In particular, taking a look at how the loop unroll passes were constructed previously, the Os/Oz are now (==new pass manager) treated just like O3, likely unintentionally. This change disallows raw comparisons between optimization levels, to avoid such unintended effects. As an effect, the O{s\|z} behavior changes for loop unrolling and loop unroll and jam, matching O2 rather than O3. The change also parameterizes the threshold values used for loop unrolling, primarily to aid testing. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72547	2020-01-16 09:00:56 -08:00
Francesco Petrogalli	66c120f025	[VectorUtils] Rework the Vector Function Database (VFDatabase). Summary: This commits is a rework of the patch in https://reviews.llvm.org/D67572. The rework was requested to prevent out-of-tree performance regression when vectorizing out-of-tree IR intrinsics. The vectorization of such intrinsics is enquired via the static function `isTLIScalarize`. For detail see the discussion in https://reviews.llvm.org/D67572. Reviewers: uabelho, fhahn, sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72734	2020-01-16 15:08:26 +00:00
Simon Pilgrim	23a887b0dd	Fix unused variable warning. NFCI.	2020-01-16 13:02:40 +00:00
Florian Hahn	23c113802e	[LV] Allow assume calls in predicated blocks. The assume intrinsic is intentionally marked as may reading/writing memory, to avoid passes moving them around. When flattening the CFG for predicated blocks, we have to drop the assume calls, as they are control-flow dependent. There are some cases where we can do better (when control flow is preserved), but that is follow-up work. Fixes PR43620. Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D68814	2020-01-16 10:11:35 +00:00
Sameer Sahasrabuddhe	ed181efa17	[HIP][AMDGPU] expand printf when compiling HIP to AMDGPU Summary: This change implements the expansion in two parts: - Add a utility function emitAMDGPUPrintfCall() in LLVM. - Invoke the above function from Clang CodeGen, when processing a HIP program for the AMDGPU target. The printf expansion has undefined behaviour if the format string is not a compile-time constant. As a sufficient condition, the HIP ToolChain now emits -Werror=format-nonliteral. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71365	2020-01-16 15:15:38 +05:30
Vedant Kumar	360abb7ee5	[CodeExtractor] Transfer debug info to extracted function After extracting, fix up debug info in both the old and new functions by 1) Pointing line locations and debug intrinsics to the new subprogram scope, and 2) Deleting intrinsics which point to values outside of the new function. Depends on https://reviews.llvm.org/D72795. Testing: check-llvm, check-clang, a build of LNT in the `-Os -g` config with "-mllvm -hot-cold-split=1" set, and end-to-end debugging of a toy program which undergoes splitting to verify that lldb can find variables, single step, etc. in extracted code. rdar://45507940 Differential Revision: https://reviews.llvm.org/D72801	2020-01-15 15:38:36 -08:00
Fedor Sergeev	8a4d12ae5b	[BasicBlock] add helper getPostdominatingDeoptimizeCall It appears to be rather useful when analyzing Loops with multiple deoptimizing exits, perhaps merged ones. For now it is used in LoopPredication, will be adding more uses in other loop passes. Reviewers: asbirlea, fhahn, skatkov, spatel, reames Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D72754	2020-01-16 01:15:57 +03:00
Mircea Trofin	5466597fee	[NFC] Refactor InlineResult for readability Summary: InlineResult is used both in APIs assessing whether a call site is inlinable (e.g. llvm::isInlineViable) as well as in the function inlining utility (llvm::InlineFunction). It means slightly different things (can/should inlining happen, vs did it happen), and the implicit casting may introduce ambiguity (casting from 'false' in InlineFunction will default a message about hight costs, which is incorrect here). The change renames the type to a more generic name, and disables implicit constructors. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: kerbowa, arsenm, jvesely, nhaehnle, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72744	2020-01-15 13:34:20 -08:00
Zhongduo Lin	34ba96a3d4	[NFC][IndVarSimplify] remove duplicate code in widenWithVariantLoadUseCodegen. Summary: Duplicate code in widenWithVariantLoadUseCodegen is removed and also use assert to check unknown extension type as it should be filtered out by the pre condition check before calling this function. Reviewers: az, sanjoy, sebpop, efriedma, javed.absar, sanjoy.google Reviewed By: efriedma Subscribers: hiraditya, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D72652	2020-01-15 16:27:58 -05:00
Vedant Kumar	a2cc80bc95	DebugInfo: Factor out logic to update locations in MD_loop metadata, NFC Factor out the logic needed to update debug locations contained within MD_loop metadata. This refactor is preparation for a future change that also needs to rewrite MD_loop metadata. rdar://45507940	2020-01-15 13:02:36 -08:00
Arkady Shlykov	3f3017e162	[Loop Peeling] Add possibility to enable peeling on loop nests. Summary: Current peeling implementation bails out in case of loop nests. The patch introduces a field in TargetTransformInfo structure that certain targets can use to relax the constraints if it's profitable (disabled by default). Also additional option is added to enable peeling manually for experimenting and testing purposes. Reviewers: fhahn, lebedev.ri, xbolva00 Reviewed By: xbolva00 Subscribers: xbolva00, hiraditya, zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D70304	2020-01-15 08:25:21 -08:00
Sanjay Patel	3180af4362	[InstCombine] reassociate fsub+fsub into fsub+fadd As discussed in the motivating PR44509: https://bugs.llvm.org/show_bug.cgi?id=44509 ...we can end up with worse code using fast-math than without. This is because the reassociate pass greedily transforms fsub into fneg/fadd and apparently (based on the regression tests seen here) expects instcombine to clean that up if it wasn't profitable. But we were missing this fold: (X - Y) - Z --> X - (Y + Z) There's another, more specific case that I think we should handle as shown in the "fake" fneg test (but missed with a real fneg), but that's another patch. That may be tricky to get right without conflicting with existing transforms for fneg. Differential Revision: https://reviews.llvm.org/D72521	2020-01-15 11:14:13 -05:00
Hideto Ueno	188f9a348d	[Attributor] AAValueConstantRange: Value range analysis using constant range Summary: This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point. One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument). The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty). Currently, AAValueConstantRange is created in `getAssumedConstant` method when `AAValueSimplify` returns `nullptr`(worst state). Supported - BinaryOperator(add, sub, ...) - CmpInst(icmp eq, ...) - !range metadata `AAValueConstantRange` is not intended to extend to polyhedral range value analysis. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: phosek, davezarzycki, baziotis, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71620	2020-01-15 16:34:23 +09:00
Nikita Popov	04e586151e	[InstCombine] Fix worklist management when removing guard intrinsic When multiple guard intrinsics are merged into one, currently the result of eraseInstFromFunction() is returned -- however, this should only be done if the current instruction is being removed. In this case we're removing a different instruction and should instead report that the current one has been modified by returning it. For this test case, this reduces the number of instcombine iterations from 5 to 2 (the minimum possible). Differential Revision: https://reviews.llvm.org/D72558	2020-01-14 21:47:48 +01:00
Nikita Popov	410331869d	[NewPM] Port MergeFunctions pass This ports the MergeFunctions pass to the NewPM. This was rather straightforward, as no analyses are used. Additionally MergeFunctions needs to be conditionally enabled in the PassBuilder, but I left that part out of this patch. Differential Revision: https://reviews.llvm.org/D72537	2020-01-14 20:55:41 +01:00
Nikita Popov	65c0805be5	[InstCombine] Fix infinite loop due to bitcast <-> phi transforms Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by explicitly performing the load combine as part of the bitcast of phi transform. Other attempts to force the load to be combined first were ultimately too unreliable. Differential Revision: https://reviews.llvm.org/D71164	2020-01-14 20:45:13 +01:00
Nikita Popov	b4dd928ffb	[InstCombine] Make combineLoadToNewType a method; NFC So it can be reused as part of other combines. In particular for D71164.	2020-01-14 20:40:03 +01:00
Nikita Popov	652cd7c100	[InstCombine] Fix user iterator invalidation in bitcast of phi transform This fixes the issue encountered in D71164. Instead of using a range-based for, manually iterate over the users and advance the iterator beforehand, so we do not skip any users due to iterator invalidation. Differential Revision: https://reviews.llvm.org/D72657	2020-01-14 20:38:10 +01:00
Teresa Johnson	2cefb93951	[ThinLTO/WPD] Remove an overly-aggressive assert Summary: An assert added to the index-based WPD was trying to verify that we only have multiple vtables for a given guid when they are all non-external linkage. This is too conservative because we may have multiple external vtable with the same guid when they are in comdat. Remove the assert, as we don't have comdat information in the index, the linker should issue an error in this case. See discussion on D71040 for more information. Reviewers: evgeny777, aganea Subscribers: mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72648	2020-01-14 10:57:14 -08:00
Juneyoung Lee	3e32b7e127	[InstCombine] Let combineLoadToNewType preserve ABI alignment of the load (PR44543) Summary: If aligment on `LoadInst` isn't specified, load is assumed to be ABI-aligned. And said aligment may be different for different types. So if we change load type, but don't pay extra attention to the aligment (i.e. keep it unspecified), we may either overpromise (if the default aligment of the new type is higher), or underpromise (if the default aligment of the new type is smaller). Thus, if no alignment is specified, we need to manually preserve the implied ABI alignment. This addresses https://bugs.llvm.org/show_bug.cgi?id=44543 by making combineLoadToNewType preserve ABI alignment of the load. Reviewers: spatel, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72710	2020-01-15 03:20:53 +09:00
Dmitri Gribenko	2948ec5ca9	Removed PointerUnion3 and PointerUnion4 aliases in favor of the variadic template	2020-01-14 18:56:29 +01:00
Florian Hahn	192cce10f6	Revert "Recommit "[GlobalOpt] Pass DTU to removeUnreachableBlocks instead of recomputing."" This reverts commit `a03d7b0f24`. As discussed in D68298, this causes a compile-time regression, in case the DTs requested are not used elsewhere in GlobalOpt. We should only get the DTs if they are available here, but this seems not possible with the legacy pass manager from a module pass.	2020-01-14 14:50:07 +00:00
Benjamin Kramer	df186507e1	Make helper functions static or move them into anonymous namespaces. NFC.	2020-01-14 14:06:37 +01:00
Hiroshi Yamauchi	7b9f8e17d1	[PGO][CHR] Guard against 0-to-0 branch weight and avoid division by zero crash. Summary: This fixes a crash in internal builds under SamplePGO. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72653	2020-01-13 14:38:58 -08:00
Teresa Johnson	31441a3e00	[ThinLTO/WPD] Fix index-based WPD for alias vtables Summary: A recent fix in D69452 fixed index based WPD in the presence of available_externally vtables. It added a cast of the vtable def summary to a GlobalVarSummary. However, in some cases one def may be an alias, in which case we need to get the base object before casting, otherwise we will crash. Reviewers: evgeny777, steven_wu, aganea Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71040	2020-01-13 13:38:26 -08:00
Simon Pilgrim	2740b2d5d5	Fix uninitialized value clang static analyzer warning. NFC.	2020-01-11 16:02:22 +00:00
Nuno Lopes	87407fc03c	DSE: fix bug where we would only check libcalls for name rather than whole decl	2020-01-11 11:57:29 +00:00
Nikita Popov	0e322c8a1f	[InstCombine] Preserve nuw on sub of geps (PR44419) Fix https://bugs.llvm.org/show_bug.cgi?id=44419 by preserving the nuw on sub of geps. We only do this if the offset has a multiplication as the final operation, as we can't be sure the operations is nuw in the other cases without more thorough analysis. Differential Revision: https://reviews.llvm.org/D72048	2020-01-11 11:01:12 +01:00
Andrew Paverd	bdd88b7ed3	Add support for __declspec(guard(nocf)) Summary: Avoid using the `nocf_check` attribute with Control Flow Guard. Instead, use a new `"guard_nocf"` function attribute to indicate that checks should not be added on indirect calls within that function. Add support for `__declspec(guard(nocf))` following the same syntax as MSVC. Reviewers: rnk, dmajor, pcc, hans, aaron.ballman Reviewed By: aaron.ballman Subscribers: aaron.ballman, tomrittervg, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72167	2020-01-10 16:04:12 +00:00
Simon Pilgrim	2e66405d8d	Don't use dyn_cast_or_null if we know the pointer is nonnull. Fix clang static analyzer null dereference warning by using dyn_cast instead.	2020-01-10 10:32:36 +00:00
Benjamin Kramer	498856fca5	[LV] Silence unused variable warning in Release builds. NFC.	2020-01-10 11:21:27 +01:00
Gil Rapaport	8647a72c4a	[LV] VPValues for memory operation pointers (NFCI) Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to feed these recipes with pointers based on other ingredients or none at all. This patch modifies these recipes to use a VPValue for the pointer instead, in order to reduce ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. The recipes are constructed with VPValues bound to these ingredients, maintaining current behavior. Differential revision: https://reviews.llvm.org/D70865	2020-01-10 09:24:59 +02:00
Whitney Tsang	d27a15fed7	[NFCI][LoopUnrollAndJam] Changing LoopUnrollAndJamPass to a function pass. Summary: This patch changes LoopUnrollAndJamPass to a function pass, and keeps the loops traversal order same as defined in FunctionToLoopPassAdaptor LoopPassManager.h. The next patch will change the loop traversal to outer to inner order, so more loops can be transform. Discussion in llvm-dev mailing list: https://groups.google.com/forum/#!topic/llvm-dev/LF4rUjkVI2g Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto Reviewed By: dmgreen Subscribers: hiraditya, zzheng, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D72230	2020-01-09 16:18:36 +00:00
@raghesh (Raghesh Aloor)	6c04ef472a	[InstCombine] Z / (1.0 / Y) => (Y * Z) This is a special case of Z / (X / Y) => (Y * Z) / X, with X = 1.0. The m_OneUse check is avoided because even in the case of the multiple uses for 1.0/Y, the number of instructions remain the same and a division is replaced by a multiplication. Differential Revision: https://reviews.llvm.org/D72319	2020-01-09 10:52:39 -05:00
Florian Hahn	ccf24225e3	[Matrix] Update shape propagation to iterate until done. This patch updates the shape propagation to iterate until no new shape information is discovered. As initial seed for the forward propagation, we use the matrix intrinsic instructions. Both propagateShapeForward and propagateShapeBackward return new work lists, with the instructions to be used for the next iteration. When propagating forward, we record all instructions we added new shape information for. When propagating backward, we record all users of instructions we added new shape information for. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70901	2020-01-09 10:52:52 +00:00
Florian Hahn	7adf6644f5	[Matrix] Propagate and use shape information for loads. This patch extends to shape propagation to also include load instructions and implements shape aware lowering for vector loads. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70900	2020-01-09 10:21:20 +00:00
Evgeniy Brevnov	f0abe820ee	[LoopUtils][NFC] Minor refactoring in getLoopEstimatedTripCount.	2020-01-09 16:49:15 +07:00
Florian Hahn	459ad8e97e	[Matrix] Implement back-propagation of shape information. This patch extends the shape propagation for matrix operations to also propagate the shape of instructions to their operands. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70899	2020-01-09 09:48:07 +00:00
Sjoerd Meijer	8f1887456a	[LV] Still vectorise when tail-folding can't find a primary inducation variable This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this: void foo(char A, char B, char C, uint32_t N) { while (N > 0) { C++ = A++ + B++; N--; } } These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered. Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded. Differential revision: https://reviews.llvm.org/D72324	2020-01-09 09:14:00 +00:00
Johannes Doerfert	a4088c75cc	[Attributor][FIX] Carefully change invokes to calls (after manifest) Before we manually inserted unreachable early but that could lead to broken PHI nodes. Now we use the existing late modification functionality.	2020-01-08 19:32:38 -06:00
Johannes Doerfert	1e46eb74be	[Attributor][FIX] Avoid dangling value pointers during code modification When we replace instructions with unreachable we delete instructions. We now avoid dangling pointers to those deleted instructions in the `ToBeChangedToUnreachableInsts` set. Other modification collections might need to be updated in the future as well.	2020-01-08 19:32:37 -06:00
Kazu Hirata	2d258ed931	Revert "[JumpThreading] Thread jumps through two basic blocks" It looks like my patch breaks the sanitizer-windows build: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/56324 This reverts commit `ead815924e`.	2020-01-08 13:58:39 -08:00
Kazu Hirata	ead815924e	[JumpThreading] Thread jumps through two basic blocks Summary: This patch teaches JumpThreading.cpp to thread through two basic blocks like: bb3: %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 by duplicating basic blocks like bb3 above. Once we duplicate bb3 as bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have: bb3: %var = phi i32* [ @a, %bb2 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb3.dup: %var = phi i32* [ null, %bb1 ] %tobool = icmp eq i32 %cond, 0 br i1 %tobool, label %bb4, label ... bb4: %cmp = icmp eq i32* %var, null br i1 %cmp, label bb5, label bb6 Then the existing code in JumpThreading.cpp can thread edge bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70247	2020-01-08 06:57:36 -08:00
Kadir Cetinkaya	b212eb7159	Revert "[InstCombine] fold zext of masked bit set/clear" This reverts commit `a041c4ec6f`. This looks like a non-trivial change and there has been no code reviews (at least there were no phabricator revisions attached to the commit description). It is also causing a regression in one of our downstream integration tests, we haven't been able to come up with a minimal reproducer yet.	2020-01-08 11:21:21 +01:00
Philip Reames	312a532dc0	[GVN/FP] Considate logic for reasoning about equality vs equivalance for floats Factor out common logic into some reasonable commented helper functions. In the process, ensure that the in-block vs cross-block cases are handled the same. They previously weren't. Differential Revision: https://reviews.llvm.org/D67126	2020-01-07 16:05:04 -08:00
Sanjay Patel	f8962571f7	[InstCombine] try to pull 'not' of select into compare operands not (select ?, (cmp TPred, ?, ?), (cmp FPred, ?, ?) --> select ?, (cmp TPred', ?, ?), (cmp FPred', ?, ?) If both sides of the select are cmps, we can remove an instruction. The case where only side is a cmp is deferred to a possible follow-on patch. We have a more general 'isFreeToInvert' analysis, but I'm not seeing a way to use that more widely without inducing infinite looping (opposing transforms). Here, we flip the compare predicates directly, so we should not have any danger by creating extra intermediate 'not' ops. Alive proofs: https://rise4fun.com/Alive/jKa Name: both select values are compares - invert predicates %tcmp = icmp sle i32 %x, %y %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = icmp sgt i32 %x, %y %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Name: false val is compare - invert/not %fcmp = icmp ugt i32 %z, %w %sel = select i1 %cond, i1 %tcmp, i1 %fcmp %not = xor i1 %sel, true => %tcmp_not = xor i1 %tcmp, -1 %fcmp_not = icmp ule i32 %z, %w %not = select i1 %cond, i1 %tcmp_not, i1 %fcmp_not Differential Revision: https://reviews.llvm.org/D72007	2020-01-07 10:44:23 -05:00
Simon Pilgrim	bd1dc6a3eb	Fix "use of uninitialized variable" static analyzer warnings. NFCI.	2020-01-07 10:55:37 +00:00
Fangrui Song	6904cd9486	Add Triple::isX86() Reviewed By: craig.topper, skan Differential Revision: https://reviews.llvm.org/D72247	2020-01-06 15:51:02 -08:00
James Henderson	d68904f957	[NFC] Fix trivial typos in comments Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.	2020-01-06 10:50:26 +00:00
Brian Gesiak	83a9321f60	[Coroutines] Remove corresponding phi values when apply simplifyTerminatorLeadingToRet Summary: In addMustTailToCoroResumes, we set musttail on those resume instructions that are followed by a ret instruction. This is done by simplifyTerminatorLeadingToRet which replace a sequence of branches leading to a ret with a clone of the ret. However it forgets to remove corresponding PHI values that come from basic block of replaced branch, and may cause jumpthreading pass hangs (https://bugs.llvm.org/show_bug.cgi?id=43720) This patch fix this issue Test Plan: cppcoro library with O3+flto check-llvm Reviewers: modocache, GorNishanov, lewissbaker Reviewed By: modocache Subscribers: mehdi_amini, EricWF, hiraditya, dexonsmith, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71826 Patch by junparser (JunMa)!	2020-01-05 18:26:30 -05:00
Florian Hahn	b8a3c34eee	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)." This reverts commit `51ef53f3bd`, as it breaks some bots.	2020-01-04 18:44:38 +00:00
Florian Hahn	51ef53f3bd	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-01-04 18:29:35 +00:00
Florian Hahn	99f74a64a2	[SCEV] Remove unused ScalarEvolutionExpander.h includes (NFC).	2020-01-04 18:29:35 +00:00
Roman Lebedev	6d05bc2e3a	[NFCI][InstCombine] Refactor 'sink negation into select if that folds one hand of select to 0' fold I would think it's better than having two practically identical folds next to eachother, but then generalization isn't all that pretty due to the fact that we need to produce different `sub` each time.. This change is no-functional-changes-intended refactoring.	2020-01-04 17:30:51 +03:00
Roman Lebedev	772ede3d5d	[InstCombine] Sink sub into hands of select if one hand becomes zero. Part 2 (PR44426) This decreases use count of %Op0, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub %Op0, (select %Cond, %Op0, %FalseVal) -> select %Cond, 0, (sub %Op0, %FalseVal) %Op0 = %TrueVal %o = select i1 %Cond, i8 %Op0, i8 %FalseVal %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %FalseVal %r = select i1 %Cond, i8 0, i8 %n Name: sub %Op0, (select %Cond, %TrueVal, %Op0) -> select %Cond, (sub %Op0, %TrueVal), 0 %Op0 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op0 %r = sub i8 %Op0, %o => %n = sub i8 %Op0, %TrueVal %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/aHRt https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Roman Lebedev	4d8e47ca18	[InstCombine] Sink sub into hands of select if one hand becomes zero (PR44426) This decreases use count of %Op1, makes one hand of select to be 0, and possibly exposes further folding potential. Name: sub (select %Cond, %Op1, %FalseVal), %Op1 -> select %Cond, 0, (sub %FalseVal, %Op1) %Op1 = %TrueVal %o = select i1 %Cond, i8 %Op1, i8 %FalseVal %r = sub i8 %o, %Op1 => %n = sub i8 %FalseVal, %Op1 %r = select i1 %Cond, i8 0, i8 %n Name: sub (select %Cond, %TrueVal, %Op1), %Op1 -> select %Cond, (sub %TrueVal, %Op1), 0 %Op1 = %FalseVal %o = select i1 %Cond, i8 %TrueVal, i8 %Op1 %r = sub i8 %o, %Op1 => %n = sub i8 %TrueVal, %Op1 %r = select i1 %Cond, i8 %n, i8 0 https://rise4fun.com/Alive/avL https://bugs.llvm.org/show_bug.cgi?id=44426	2020-01-04 17:30:51 +03:00
Alexey Lapshin	831bfcea47	[Transforms][GlobalSRA] huge array causes long compilation time and huge memory usage. Summary: For artificial cases (huge array, few usages), Global SRA optimization creates a lot of redundant data. It creates an instance of GlobalVariable for each array element. For huge array, that means huge compilation time and huge memory usage. Following example compiles for 10 minutes and requires 40GB of memory. namespace { char LargeBuffer[64 * 1024 * 1024]; } int main ( void ) { LargeBuffer[0] = 0; printf("\n "); return LargeBuffer[0] == 0; } The fix is to avoid Global SRA for large arrays. Reviewers: craig.topper, rnk, efriedma, fhahn Reviewed By: rnk Subscribers: xbolva00, lebedev.ri, lkail, merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71993	2020-01-04 16:42:38 +03:00
Roman Lebedev	7973aa05f6	[NFC][InstCombine] '(Op1 & С) - Op1' -> '-(Op1 & ~C)' fold (PR44427) This decreases use count of Op1, potentially allows us to further hoist said 'neg' later on, and results in marginally better X86 codegen. Name: (Op1 & С) - Op1 -> -(Op1 & ~C) %o = and i64 %Op1, C1 %r = sub i64 %o, %Op1 => %n = and i64 %Op1, ~C1 %r = sub i64 0, %n https://rise4fun.com/Alive/rwgA https://godbolt.org/z/R_RMfM https://bugs.llvm.org/show_bug.cgi?id=44427	2020-01-03 21:25:48 +03:00
Roman Lebedev	cc0216bedb	[NFC][InstCombine] '(X & (- Y)) - X' -> '- (X & (Y - 1))' fold (PR44448) Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448) %negy = sub i8 0, %y %unbiasedx = and i8 %negy, %x %r = sub i8 %unbiasedx, %x => %ymask = add i8 %y, -1 %xmasked = and i8 %ymask, %x %r = sub i8 0, %xmasked https://rise4fun.com/Alive/OIpla This decreases use count of %x, may allow us to later hoist said negation even further, and results in marginally nicer X86 codegen. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 20:27:29 +03:00
Johannes Doerfert	d2d2fb19f7	[Attributor][FIX] Allow dead users of rewritten function If we replace a function with a new one because we rewrite the signature, dead users may still refer to the old version. With this patch we reuse the code that deals with dead functions, which the old versions are, to avoid problems.	2020-01-03 10:43:40 -06:00
Johannes Doerfert	6b9ee2d6cd	[Attributor][NFC] Unify the way we delete dead functions	2020-01-03 10:43:40 -06:00
Johannes Doerfert	c90681b681	[Attributor][FIX] Don't crash on ptr2int/int2ptr instructions An integer isn't allowed in getAlignmentForValue so we need to stop at a ptr2int instruction during exploration.	2020-01-03 10:43:40 -06:00
Johannes Doerfert	412a0101a9	[Attributor][FIX] Do not derive nonnull and dereferenceable w/o access An inbounds GEP results in poison if the value is not "inbounds", not in UB. We accidentally derived nonnull and dereferenceable from these inbounds GEPs even in the absence of accesses that would make the poison to UB.	2020-01-03 10:43:40 -06:00
Johannes Doerfert	a4b3588ba2	[Attributor][FIX] Return CHANGED once a pessimistic fixpoint is reached.	2020-01-03 10:43:40 -06:00
Ankit	369a919514	Fix for a dangling point bug in DeadStoreElimination pass The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction. While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration. In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container. Reviewers: fhahn, bcahoon, efriedma Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65326	2020-01-03 14:28:44 +00:00
Sanjay Patel	1640582743	[InstCombine] replace undef elements in vector constant when doing icmp folds (PR44383) As shown in P44383: https://bugs.llvm.org/show_bug.cgi?id=44383 ...we can't safely propagate a vector constant through this icmp fold if that vector constant contains undefined elements. We know that each defined element of the constant is safe though, so find the first of those and replicate it into the formerly undef lanes. Differential Revision: https://reviews.llvm.org/D72101	2020-01-03 09:16:57 -05:00
Hideto Ueno	5fc02dc0a7	Revert "[Attributor] AAValueConstantRange: Value range analysis using constant range" This reverts commit `e996303431`.	2020-01-03 11:03:56 +09:00
Sanjay Patel	88fc5fdef6	[InstCombine] remove uses before deleting instructions (PR43723) This is a less ambitious alternative to previous attempts to fix this bug with: rG56b2aee1875a rGef02831f0a4e rG56b2aee1875a ...because those all failed bot testing with use-after-free or other problems. The original crashing/assert problem is still showing up on various fuzzers, so I've added a new minimal test based on another one of those failures. Instead of trying to manage and coordinate the logic in isAllocSiteRemovable() with the deletion loops, just loosen the existing code that handles casts and GEP by replacing with undef to allow other opcodes. That means that no instructions with uses should assert on deletion, and there are hopefully no non-obvious sanitizer bugs induced.	2020-01-02 09:47:36 -05:00
Brian Gesiak	9ce0ff2eef	[Coroutines] const-ify internal helpers (NFC) Several helpers internal to llvm/Transforms/Coroutines do not use 'const' for parameters that are not modified. Add const where possible.	2020-01-01 21:57:49 -05:00
Brian Gesiak	2fcf7691df	[Coroutines] Rename "legacy" passes (NFC) A series of patches beginning with https://reviews.llvm.org/D71898 propose to add an implementation of the coroutine passes to the new pass manager. As part of these changes, the coroutine passes that implement the legacy pass manager interface are renamed, to `<PassName>Legacy`. This mirrors similar changes that have been made to many other passes in LLVM as they've been transitioned to support both old and new pass managers. This commit splits out the renaming portion of that patch and commits it in advance as an NFC (no functional change intended) commit. It renames: * `CoroEarly` => `CoroEarlyLegacy` * `CoroSplit` => `CoroSplitLegacy` * `CoroElide` => `CoroElideLegacy` * `CoroCleanup` => `CoroCleanupLegacy`	2020-01-01 21:41:16 -05:00
Nikita Popov	8dd9a13619	[InstCombine] Preserve inbounds when merging with zero-index GEP (PR44423) This addresses https://bugs.llvm.org/show_bug.cgi?id=44423. If one of the GEPs is inbounds and the other is zero-index, we can also preserve inbounds. Differential Revision: https://reviews.llvm.org/D72060	2020-01-01 23:04:28 +01:00
Nikita Popov	6ba5f8c4ac	[InstCombine] Fix incorrect inbounds on GEP of GEP (PR44425) This fixes https://bugs.llvm.org/show_bug.cgi?id=44425. We need to drop inbounds if one of the GEPs is not inbounds. This was already done when creating a new GEP, but not when modifying in place. Differential Revision: https://reviews.llvm.org/D72059	2020-01-01 22:10:55 +01:00
Hideto Ueno	e996303431	[Attributor] AAValueConstantRange: Value range analysis using constant range This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point. One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument). The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty). Currently, AAValueConstantRange is created when AAValueSimplify cannot simplify the value. Supported - BinaryOperator(add, sub, ...) - CmpInst(icmp eq, ...) - !range metadata `AAValueConstantRange` is not intended to extend to polyhedral range value analysis. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71620	2020-01-01 15:35:56 +09:00
Craig Topper	374e0299cf	[X86][InstCombine] Add constant folding and simplification support for pdep and pext The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result. There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted. Fixes PR44389 Differential Revision: https://reviews.llvm.org/D71952	2019-12-31 15:06:47 -08:00
Sanjay Patel	a041c4ec6f	[InstCombine] fold zext of masked bit set/clear This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1	2019-12-31 12:35:10 -05:00
Nikita Popov	7adb5c2aca	Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms" This reverts commit `27a0795943`. Seems to break test-suite.	2019-12-31 17:42:57 +01:00
Nikita Popov	27a0795943	[InstCombine] Fix infinite loop due to bitcast <-> phi transforms Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164	2019-12-31 16:17:14 +01:00
Connor Abbott	fb114694e9	[InstCombine] Don't rewrite phi-of-bitcast when the phi has other users Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209	2019-12-31 12:15:02 +01:00
Ilya Biryukov	4f82af81a0	[Attributor] Suppress unused warnings when assertions are disabled. NFC	2019-12-31 10:21:52 +01:00
Johannes Doerfert	751336340d	[Attributor] Function signature rewrite infrastructure As part of the Attributor manifest we want to change the signature of functions. This patch introduces a fairly generic interface to do so. As a first, very simple, use case, we remove unused arguments. A second use case, pointer privatization, will be committed with this patch as well. A lot of the code and ideas are taken from argument promotion and we run all argument promotion tests through this framework as well. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68765	2019-12-31 02:31:33 -06:00
Johannes Doerfert	dada8132af	[Attributor] Propagate known align from arguments to call sites arguments Since the information is known we can simply use it at the call site. This is especially useful for callbacks but also helps regular calls. The test changes are mechanical.	2019-12-31 01:33:22 -06:00
Johannes Doerfert	b1b441d22d	[Attributor] Use abstract call sites to determine associated arguments This is the second step after D67871 to make use of abstract call sites. In this patch the argument we associate with a abstract call site argument can be the one in the callback callee instead of the one in the callback broker. Caveat: We cannot allow no-alias arguments for problematic callbacks: As described in [1], adding no-alias (or restrict) to arguments could break synchronization as the synchronization effect, e.g., a barrier, does not "alias" with the pointer anymore. This disables no-alias annotation for potentially problematic arguments until we implement the fix described in [1]. Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D68008 [1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel, International Workshop on OpenMP 2018, http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf	2019-12-31 01:33:22 -06:00
Johannes Doerfert	2888019871	[Attributor] Annotate the memory behavior of call site arguments Especially for callbacks, annotating the call site arguments is important. Doing so exposed a too strong dependence of AAMemoryBehavior on AANoCapture since we handle the case of potentially captured pointers explicitly. The changes to the tests are all mechanical.	2019-12-31 01:33:21 -06:00
Sanjay Patel	987eb8e26c	[InstCombine] propagate sign argument through nested copysigns This is another optimization suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153	2019-12-30 11:06:02 -05:00
Evgeniy Brevnov	948e745270	[LV][NFC] Keep dominator tree up to date during vectorization.	2019-12-30 18:38:41 +07:00
Evgeniy Brevnov	1b6286b945	[LV][NFC] Some refactoring and renaming to facilitate next change.	2019-12-30 18:38:41 +07:00
Hideto Ueno	34fe8d0451	[Attributor] Use `changeUseAfterManifest` in AAValueSimplify manifest Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71972	2019-12-30 17:08:48 +09:00
Hideto Ueno	ef4febd85b	[Attributor] AAUndefinedBehavior: Check for branches on undef value. A branch is considered UB if it depends on an undefined / uninitialized value. At this point this handles simple UB branches in the form: `br i1 undef, ...` We query `AAValueSimplify` to get a value for the branch condition, so the branch can be more complicated than just: `br i1 undef, ...`. Patch By: Stefanos Baziotis (@baziotis) Reviewers: jdoerfert, sstefan1, uenoku Reviewed By: uenoku Differential Revision: https://reviews.llvm.org/D71799	2019-12-29 17:43:00 +09:00
Gil Rapaport	d62bf16131	[LV] Use getMask() when printing recipe [NFCI] Use dedicated API for getting the mask instead of duplicating it. Differential Revision: https://reviews.llvm.org/D71964	2019-12-29 08:50:40 +02:00
Florian Hahn	dc2c9b0fcf	[Matrix] Propagate and use shape info for binary operators. This patch extends the current shape propagation and shape aware lowering to also support binary operators. Those operators are uniform with respect to their shape (shape of the input operands is the same as the shape of their result). Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70898	2019-12-27 15:50:47 +00:00
Fangrui Song	7a7334663c	Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821 Intrinsic has incorrect argument type! i32 (i32) @llvm.setjmp wipes tear	2019-12-27 00:00:14 -08:00
Hideto Ueno	cb5eb13eaf	[Attributor] Add helper to change an instruction to `unreachable` inst Summary: Calling `changeToUnreachable` in `manifest` from different places might cause really unpredictable problems. As other deleting functions are doing, we need to change these instructions after all `manifest`. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71910	2019-12-27 02:39:37 +09:00
Whitney Tsang	d1f41b2ca9	[NFC][LoopFusion] Fix printing of the guard branch. Reviewer: kbarton, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D71878	2019-12-26 02:45:29 +00:00
Hideto Ueno	1d5d074aef	[Attributor] Reach optimistic fixpoint in AAValueSimplify when the value is constant or undef Summary: As discussed in D71799, we have found that it is more useful to reach an optimistic fixpoint in AAValueSimpify when the value is constant or undef. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: baziotis, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71852	2019-12-25 14:18:34 +09:00
Johannes Doerfert	5732f56bbd	[Attributor] UB Attribute now handles all instructions that access memory through a pointer Summary: Follow-up on: https://reviews.llvm.org/D71435 We basically use `checkForAllInstructions` to loop through all the instructions in a function that access memory through a pointer: load, store, atomicrmw, atomiccmpxchg Note that we can now use the `getPointerOperand()` that gets us the pointer operand for an instruction that belongs to the aforementioned set. Question: This function returns `nullptr` if the instruction is `volatile`. Why? Guess: Because if it is volatile, we don't want to do any transformation to it. Another subtle point is that I had to add AtomicRMW, AtomicCmpXchg to `initializeInformationCache()`. Following `checkAllInstructions()` path, that seemed the most reasonable place to add it and correct the fact that these instructions were ignored (they were not in `OpcodeInstMap` etc.). Is that ok? Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert, sstefan1 Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71787	2019-12-24 19:25:08 -06:00
Johannes Doerfert	58f324a468	[Attributor] Function level undefined behavior attribute _Eventually_, this attribute will be assigned to a function if it contains undefined behavior. As a first small step, I tried to make it loop through the load instructions in a function (eventually, the plan is to check if a load instructions causes undefined behavior, because e.g. dereferences a null pointer - Also eventually, this won't happen in initialize() but in updateImpl()). Patch By: Stefanos Baziotis (@baziotis) Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71435	2019-12-24 19:23:08 -06:00
Florian Hahn	8d6f59b78a	[Matrix] Use fmuladd for matrix.multiply if allowed. If the matrix.multiply calls have the contract fast math flag, we can use fmuladd. This als adds a command line option to force fmuladd generation. We can retire this option once there is a clang-level option. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70951	2019-12-23 14:49:14 +01:00
Florian Hahn	109e4e3851	[Matrix] Add forward shape propagation and first shape aware lowerings. This patch adds infrastructure for forward shape propagation to LowerMatrixIntrinsics. It also updates the pass to make use of the shape information to break up larger vector operations and to eliminate unnecessary conversion operations between columnwise matrixes and flattened vectors: if shape information is available for an instruction, lower the operation to a set of instructions operating on columns. For example, a store of a matrix is broken down into separate stores for each column. For users that do not have shape information (e.g. because they do not yet support shape information aware lowering), we pack the result columns into a flat vector and update those users. It also adds shape aware lowering for the first non-intrinsic instruction: vector stores. Example: For %c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr We generate the code below without shape propagation. Note %9 which combines the columns of the transposed matrix into a flat vector. %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr With this patch, we propagate the 2x2 shape information from the transpose to the store and we generate the code below. Note that we store the columns directly and do not need an extra shuffle. %9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8 Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70897	2019-12-23 13:51:56 +01:00
Dinar Temirbulatov	a755ccefe6	[SLP] Replace NeedToGather variable with enum.	2019-12-23 08:21:53 +01:00
Mark de Wever	098d3347e7	[Transforms] Fixes -Wrange-loop-analysis warnings This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71810	2019-12-22 19:20:17 +01:00
Sanjay Patel	9cdcd81d3f	[InstCombine] enhance fold for copysign with known sign arg This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153	2019-12-22 10:07:01 -05:00
Sanjay Patel	79c7fa31f3	[InstCombine] check alloc size in bitcast of geps fold (PR44321) We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771	2019-12-21 10:31:21 -05:00
Sanjay Patel	19f9f374d9	[SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transforms As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706	2019-12-21 10:00:53 -05:00
Jakub Kuderski	c431c407eb	[InstCombine] Improve infinite loop detection Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673	2019-12-20 16:15:04 -05:00
Ayal Zaks	e498be5738	[LV] Strip wrap flags from vectorized reductions A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563	2019-12-20 14:48:53 +02:00
Vedant Kumar	caaacb8399	HotColdSplitting: Do not outline within noreturn functions A function marked `noreturn` may contain unreachable terminators: these should not be considered cold, as the function may be a trampoline. rdar://58068594	2019-12-19 14:06:24 -08:00
Bjorn Pettersson	89e3bb4502	[ConstantHoisting] Ignore unreachable bb:s when collecting candidates Summary: Ignore looking at blocks that are unreachable from entry when collecting candidates for hosting. Normally the consthoist pass is executed in the llc pipeline, just after unreachableblockelim. So it is abnormal to have code that is unreachable from the entry block. But when running the pass as part of opt, for example as part of fuzzy testing, we might trigger various kinds of asserts when collecting candidates if we include unreachable blocks in that analysis. It seems like a waste of time to hoist constants in unreachble blocks, so the solution is to simply ignore such blocks when collecting the hoisting candidates. The two added test cases used to end up in two different asserts, and the intention with the checks is just to verify that we no longer fail. Fixes: PR43903 Reviewers: spatel Reviewed By: spatel Subscribers: hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71678	2019-12-19 15:07:55 +01:00
David Green	a59cc5e128	[InstCombine] Canonicalize select immediates In certain situations after inlining and simplification we end up with code that is _almost_ a min/max pattern, but contains constants that have been demand-bit optimised to the wrong values, ending up with code like: %1 = icmp slt i32 %shr, -128 %2 = select i1 %1, i32 128, i32 %shr %.inv = icmp sgt i32 %shr, 127 %spec.select.i = select i1 %.inv, i32 127, i32 %2 %conv7 = trunc i32 %spec.select.i to i8 This should be turned into a min/max pattern, but the -128 in the first select was instead transformed into 128, as only the bottom byte was ever demanded. To fix this, I've put in further canonicalisation for the immediates of selects, preferring to use the same value as the icmp if available. Differential Revision: https://reviews.llvm.org/D71516	2019-12-19 12:36:46 +00:00
Piotr Sobczak	40b5a0f7c8	Revert "[InstCombine][AMDGPU] Trim more components of *buffer_load" Revert D70315, as it breaks gfx8 for some reason. This reverts commit `65f94b3380`.	2019-12-18 22:04:44 +01:00
Kit Barton	3db1cf7a1e	[LoopFusion] Use the LoopInfo::isRotatedForm method (NFC). Loop fusion previously had a method to check whether a loop was in rotated form. This method has been moved into the LoopInfo class. This patch removes the old isRotated method from loop fusion, in favour of the new one in LoopInfo.	2019-12-18 15:04:25 -05:00
Jakub Kuderski	3d29c41ad5	[InstCombine] Insert instructions before adding them to worklist Summary: This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions. It also adds a check in `Worklist::Add` that ensures that all added instructions have parents. Simple test case that illustrates the difference when run with `--debug-only=instcombine`: ``` define i32 @test35(i32 %a, i32 %b) { %1 = or i32 %a, 1135 %2 = or i32 %1, %b ret i32 %2 } ``` Before this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: <badref> = or i32 %2, 1135 ... ``` With this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: %3 = or i32 %2, 1135 ... ``` Reviewers: fhahn, davide, spatel, foad, grosser, nikic Reviewed By: nikic Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71093	2019-12-18 14:55:41 -05:00
Jakub Kuderski	406b6019cd	[InstCombine] Allow to limit the max number of iterations Summary: This patch teaches InstCombine to accept a new parameter: maximum number of iterations over functions. InstCombine tries to simplify instructions by iterating over the whole function until the function stops changing. As a consequence, the last iteration before reaching a fixpoint visits all instructions in the worklist and never performs any rewrites. Bounding the number of iterations can have 2 benefits: * In case the users of the pass can make a good guess about the number of required iterations, we can save the time normally spent on the last iteration that doesn't change anything. * When the wants to use InstCombine as a cleanup pass, it may be enough to run just a few iterations and stop even before reaching a fixpoint. This can be also useful for implementing a lightweight pass pipeline (think `-O1`). This patch does not change the behavior of opt or Clang -- limiting the number of iterations is entirely opt-in. Reviewers: fhahn, davide, spatel, foad, nlopes, grosser, lebedev.ri, nikic, xbolva00 Reviewed By: spatel Subscribers: craig.topper, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71145	2019-12-18 13:48:54 -05:00
stozer	89d19d60ad	Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISel This reverts commit `1f3dd83cc1`, reapplying commit `bb1b0bc4e5`. The original commit failed on some builds seemingly due to the use of a bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.	2019-12-18 16:26:42 +00:00
Whitney Tsang	9883d7edc6	[LoopUtils] Updated deleteDeadLoop() to handle loop nest. Reviewer: kariddi, sanjoy, reames, Meinersbur, bmahjour, etiotto, kbarton Reviewed By: Meinersbur Subscribers: mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D70939	2019-12-18 15:59:45 +00:00
stozer	1f3dd83cc1	Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel" Reverted due to build failure on windows bots. This reverts commit `bb1b0bc4e5`.	2019-12-18 11:46:10 +00:00
stozer	bb1b0bc4e5	[DebugInfo] Correctly handle salvaged casts and split fragments at ISel Previously, LLVM had no functional way of performing casts inside of a DIExpression(), which made salvaging cast instructions other than Noop casts impossible. This patch enables the salvaging of casts by using the DW_OP_LLVM_convert operator for SExt and Trunc instructions. There is another issue which is exposed by this fix, in which fragment DIExpressions (which are preserved more readily by this patch) for values that must be split across registers in ISel trigger an assertion, as the 'split' fragments extend beyond the bounds of the fragment DIExpression causing an error. This patch also fixes this issue by checking the fragment status of DIExpressions which are to be split, and dropping fragments that are invalid.	2019-12-18 11:09:18 +00:00
Anna Welker	7cd1cfdd6b	[NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter] Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610	2019-12-18 09:14:39 +00:00
Whitney Tsang	36bdc3dc35	[LoopFusion] Move instructions from FC0.Latch to FC1.Latch. Summary:This PR move instructions from FC0.Latch bottom up to the beginning of FC1.Latch as long as they are proven safe. To illustrate why this is beneficial, let's consider the following example: Before Fusion: header1: br header2 header2: br header2, latch1 latch1: br header1, preheader3 preheader3: br header3 header3: br header4 header4: br header4, latch3 latch3: br header3, exit3 After Fusion (before this PR): header1: br header2 header2: br header2, latch1 latch1: br header3 header3: br header4 header4: br header4, latch3 latch3: br header1, exit3 Note that preheader3 is removed during fusion before this PR. Notice that we cannot fuse loop2 with loop4 as there exists block latch1 in between. This PR move instructions from latch1 to beginning of latch3, and remove block latch1. LoopFusion is now able to fuse loop nest recursively. After Fusion (after this PR): header1: br header2 header2: br header3 header3: br header4 header4: br header2, latch3 latch3: br header1, exit3 Reviewer: kbarton, jdoerfert, Meinersbur, dmgreen, fhahn, hfinkel, bmahjour, etiotto Reviewed By: kbarton, Meinersbur Subscribers: hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D71165	2019-12-17 22:10:23 +00:00
Stefan Stipanovic	fff8ec9813	[Attributor] H2S fix. Summary: Fixing issues that were noticed in D71521 Reviewers: jdoerfert, lebedev.ri, uenoku Subscribers: Differential Revision: https://reviews.llvm.org/D71564	2019-12-17 20:41:09 +01:00
Piotr Sobczak	65f94b3380	[InstCombine][AMDGPU] Trim more components of buffer_load Summary: Add trimming of unused components of s_buffer_load. Extend trimming of buffer_load to also include unused components at the beginning of vectors and update offset. Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70315	2019-12-17 17:50:07 +01:00
Guillaume Chatelet	531c1161b9	Resubmit "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove" Summary: This is a resubmit of D71473. This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients. Functions will be deprecated one by one and as in tree code is cleaned up. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: aaron.ballman, courbet Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71547	2019-12-17 10:07:46 +01:00
Whitney Tsang	ec4749e3b8	Revert "[LoopUtils] Updated deleteDeadLoop() to handle loop nest." This reverts commit `cd09fee3d6`. This reverts commit `c066ff11d8`.	2019-12-17 03:51:41 +00:00
Johannes Doerfert	0bc3336ac1	[Attributor][NFC] Clang format the Attributor The Attributor is always kept formatted so diffs are cleaner. Sometime we get out of sync for various reasons so we need to format the file once in a while.	2019-12-16 21:03:18 -06:00
Whitney Tsang	c066ff11d8	[LoopUtils] Updated deleteDeadLoop() to handle loop nest. Reviewer: kariddi, sanjoy, reames, Meinersbur, bmahjour, etiotto, kbarton Reviewed By: Meinersbur Subscribers: mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D70939	2019-12-17 01:06:14 +00:00
Kit Barton	ff07fc66d9	[LoopFusion] Restrict loop fusion to rotated loops. Summary: This patch restricts loop fusion to only consider rotated loops as valid candidates. This simplifies the analysis and transformation and aligns with other loop optimizations. Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel Reviewed By: Meinersbur Subscribers: ormris, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71025	2019-12-16 15:17:29 -05:00
Craig Topper	02f644c59a	[InstCombine] Teach removeBitcastsFromLoadStoreOnMinMax not to change the size of a store. We can change the type as long as we don't change the size. Fixes PR44306 Differential Revision: https://reviews.llvm.org/D71532	2019-12-16 12:12:54 -08:00
Guillaume Chatelet	4658da10e4	Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove" This reverts commit `181ab91efc`.	2019-12-16 15:19:49 +01:00
Guillaume Chatelet	181ab91efc	[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove Summary: This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients. Functions will be deprecated one by one and as in tree code is cleaned up. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71473	2019-12-16 13:35:55 +01:00
Bjorn Pettersson	e5f07080b8	[BasicBlockUtils] Fix dbg.value elimination problem in MergeBlockIntoPredecessor Summary: In commit `d60f34c20a` (llvm-svn 317128, PR35113) MergeBlockIntoPredecessor was changed into discarding some dbg.value intrinsics referring to PHI values, post-splice due to loop rotation. That elimination of dbg.value intrinsics did not consider which dbg.value to keep depending on the context (e.g. if the variable is changing its value several times inside the basic block). In the past that hasn't been such a big problem since CodeGenPrepare::placeDbgValues has moved the dbg.value to be next to the PHI node anyway. But after commit `00e238896c` CodeGenPrepare isn't doing that any longer, so we need to be more careful when avoiding duplicate dbg.value intrinsics in MergeBlockIntoPredecessor. This patch replaces the code that tried to avoid duplicate dbg.values by using the RemoveRedundantDbgInstrs helper. Reviewers: aprantl, jmorse, vsk Reviewed By: aprantl, vsk Subscribers: jholewinski, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71480	2019-12-16 11:41:21 +01:00
Bjorn Pettersson	1c49553c19	[BasicBlockUtils] Add utility to remove redundant dbg.value instrs Summary: Add a RemoveRedundantDbgInstrs to BasicBlockUtils with the goal to remove redundant dbg intrinsics from a basic block. This can be useful after various transforms, as it might be simpler to do a filtering of dbg intrinsics after the transform than during the transform. One primary use case would be to replace a too aggressive removal done by MergeBlockIntoPredecessor, seen at loop rotate (not done in this patch). The elimination algorithm currently focuses on dbg.value intrinsics and is doing two iterations over the BB. First we iterate backward starting at the last instruction in the BB. Whenever a consecutive sequence of dbg.value instructions are found we keep the last dbg.value for each variable found (variable fragments are identified using the {DILocalVariable, FragmentInfo, inlinedAt} triple as given by the DebugVariable helper class). Next we iterate forward starting at the first instruction in the BB. Whenever we find a dbg.value describing a DebugVariable (identified by {DILocalVariable, inlinedAt}) we save the {DIValue, DIExpression} that describes that variables value. But if the variable already was mapped to the same {DIValue, DIExpression} pair we instead drop the second dbg.value. To ease the process of making lit tests for this utility a new pass is introduced called RedundantDbgInstElimination. It can be executed by opt using -redundant-dbg-inst-elim. Reviewers: aprantl, jmorse, vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71478	2019-12-16 11:41:21 +01:00
Johannes Doerfert	139c9ef45a	[Attributor] Annotate call sites of declarations with a callback Even if a declaration is called, if there is a callback we might need the information during CG-SCC traversal (D70767).	2019-12-13 23:51:59 -06:00
Johannes Doerfert	3d347e2835	[Attributor][NFC] Simplify debug printing for abstract attributes This also fixes a type in the debug printing of AANoAlias.	2019-12-13 23:51:59 -06:00
Johannes Doerfert	5d34602da4	[Attributor] Only replace instruction operands This was part of D70767. When we replace the value of (call/invoke) instructions we do not want to disturb the old call graph so we will only replace instruction uses until we get rid of the old PM. Accepted as part of D70767.	2019-12-13 22:16:38 -06:00
Francesco Petrogalli	19f73f0d1b	Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)." This reverts commit `0be81968a2`. The VFDatabase needs some rework to be able to handle vectorization and subsequent scalarization of intrinsics in out-of-tree versions of the compiler. For more details, see the discussion in https://reviews.llvm.org/D67572.	2019-12-13 19:42:04 +00:00
Fangrui Song	193da743db	[profile] Fix a crash when -fprofile-remapping-file= triggers an error Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D71485	2019-12-13 11:38:20 -08:00
Hiroshi Yamauchi	ed50e6060b	[PGO][PGSO] Enable size optimizations in code gen / target passes for cold code. Summary: Split off of D67120. Reviewers: davidxl Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71288	2019-12-13 11:01:19 -08:00
Nicola Zaghen	97572775d2	Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-13 14:30:21 +00:00
Evgenii Stepanov	dabd2622a8	hwasan: add tag_offset DWARF attribute to optimized debug info Summary: Support alloca-referencing dbg.value in hwasan instrumentation. Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in loclist format. Reviewers: pcc Subscribers: srhines, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70753	2019-12-12 16:18:54 -08:00
Johannes Doerfert	6abd01e462	[Attributor][FIX] Do treat byval arguments special When we reason about the pointer argument that is byval we actually reason about a local copy of the value passed at the call site. This was not the case before and we wrongly introduced attributes based on the surrounding function. AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and AANoCaptureCallSiteArgument are made aware of byval now. The code to skip "subsuming positions" for reasoning follows a common pattern and we should refactor it. A TODO was added. Discovered by @efriedma as part of D69748.	2019-12-12 16:04:21 -06:00
Florian Hahn	526244b187	[Matrix] Add first set of matrix intrinsics and initial lowering pass. This is the first patch adding an initial set of matrix intrinsics and a corresponding lowering pass. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html The first patch introduces four new intrinsics (transpose, multiply, columnwise load and store) and a LowerMatrixIntrinsics pass, that lowers those intrinsics to vector operations. Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix embedded in a <16 x float> vector) and the intrinsics take the dimension information as parameters. Those parameters need to be ConstantInt. For the memory layout, we initially assume column-major, but in the RFC we also described how to extend the intrinsics to support row-major as well. For the initial lowering, we split the input of the intrinsics into a set of column vectors, transform those column vectors and concatenate the result columns to a flat result vector. This allows us to lower the intrinsics without any shape propagation, as mentioned in the RFC. In follow-up patches, we plan to submit the following improvements: * Shape propagation to eliminate the embedding/splitting for each intrinsic. * Fused & tiled lowering of multiply and other operations. * Optimization remarks highlighting matrix expressions and costs. * Generate loops for operations on large matrixes. * More general block processing for operation on large vectors, exploiting shape information. We would like to add dedicated transpose, columnwise load and store intrinsics, even though they are not strictly necessary. For example, we could instead emit a large shufflevector instruction instead of the transpose. But we expect that to (1) become unwieldy for larger matrixes (even for 16x16 matrixes, the resulting shufflevector masks would be huge), (2) risk instcombine making small changes, causing us to fail to detect the transpose, preventing better lowerings For the load/store, we are additionally planning on exploiting the intrinsics for better alias analysis. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70456	2019-12-12 15:42:18 +00:00
Hideto Ueno	4ecf25545c	[Attributor][NFC] Fix comments and unnecessary comma	2019-12-12 13:42:40 +00:00
Hideto Ueno	827bade262	[Attributor] [NFC] Use `checkForAllUses` helpr in `AAHeapToStackImpl::updateImpl` Summary: Remove `Worklist` iteration and make use `checkForAllUses`. There is no test chage. Reviewers: sstefan1, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71352	2019-12-12 13:27:53 +00:00
Hideto Ueno	63599bd072	[Attributor][NFC] Refactoring `AANoFreeArgument::updateImpl` Summary: Refactoring `AANoFreeArgument::updateImpl`. There is no test change. Reviewers: sstefan1, jdoerfert Reviewed By: sstefan1 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71349	2019-12-12 13:27:53 +00:00
Nicola Zaghen	f798eb21ec	Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same." This reverts commit `5f6208778f`. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.	2019-12-12 10:29:54 +00:00
Nicola Zaghen	5f6208778f	[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!	2019-12-12 10:07:01 +00:00
Wenlei He	d275a06487	[AutoFDO] Statistic for context sensitive profile guided inlining Summary: AutoFDO compilation has two places that do inlining - the sample profile loader that does inlining with context sensitive profile, and the regular inliner as CGSCC pass. Ideally we want most inlining to come from sample profile loader as that is driven by context sensitive profile and also retains context sensitivity after inlining. However the reality is most of the inlining actually happens during regular inliner. To track the number of inline instances from sample profile loader and help move more inlining to sample profile loader, I'm adding statistics and optimization remarks for sample profile loader's inlining. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70584	2019-12-11 21:37:21 -08:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Reid Kleckner	85ba5f637a	Rename TTI::getIntImmCost for instructions and intrinsics Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381	2019-12-11 18:00:20 -08:00
Nikita Popov	8db5143b1a	[InstCombine] Optimize overflow check base on uadd.with.overflow result Fix for https://bugs.llvm.org/show_bug.cgi?id=40846. This adds a combine for cases where a (a + b) < a style overflow check is performed, but with a + b being the result of uadd.with.overflow, so the overflow result is also already available and we can just use it. Subsequently GVN/CSE will deduplicate the extracts. We can run into this situation if you have both a uadd.with.overflow and a manual add + overflow check in the same function (on the same operands), in which case GVN will rewrite the add to the with.overflow result and leave you with this pattern. The implementation is a bit ugly because I'm handling the various canonicalization edge cases. This does not yet handle the negated version of this pattern. Differential Revision: https://reviews.llvm.org/D58644	2019-12-11 20:52:04 +01:00
Nikita Popov	b361d3bbcd	[MergeFuncs] Remove incorrect attribute copying Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1. However, the attribute copying was done in the wrong place (in general call replacement, not thunk generation) and a proper fix was implemented in D12581. Previously this code was just unnecessary but harmless (because FunctionComparator ensured that the attributes of the two functions are exactly the same), but since byval was changed to accept a type this copying is actively wrong and may result in malformed IR. Differential Revision: https://reviews.llvm.org/D71173	2019-12-11 20:09:54 +01:00
Guillaume Chatelet	0a0d54b357	[Alignment][NFC] Introduce Align in IRBuilder Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71343	2019-12-11 14:41:23 +01:00
Guillaume Chatelet	3491109587	Rollback assumeAligned in MemorySanitizer Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing). Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71332	2019-12-11 14:25:21 +01:00
Guillaume Chatelet	8a7c52bc22	[Alignment][NFC] Introduce Align in SROA Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71277	2019-12-11 09:34:38 +01:00
Vlad Tsyrklevich	636c93ed11	Revert "Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty..." This reverts commit `f2ba93971c`, it was causing build timeouts on sanitizer-x86_64-linux-autoconf such as http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/44917	2019-12-10 16:03:17 -08:00
Francesco Petrogalli	0be81968a2	[VectorUtils] Introduce the Vector Function Database (VFDatabase). This patch introduced the VFDatabase, the framework proposed in http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [] In this patch the VFDatabase is used to bridge the TargetLibraryInfo (TLI) calls that were previously used to query for the availability of vector counterparts of scalar functions. The VFISAKind field `ISA` of VFShape have been moved into into VFInfo, under the assumption that different vector ISAs may provide the same vector signature. At the moment, the vectorizer accepts any of the available ISAs as long as the signature provided by the VFDatabase matches the one expected in the vectorization process. For example, when targeting AVX or AVX2, which both have 256-bit registers, the IR signature of the two vector functions associated to the two ISAs is the same. The `getVectorizedFunction` method at the moment returns the first available match. We will need to add more heuristics to the search system to decide which of the available version (TLI, AVX, AVX2, ...) the system should prefer, when multiple versions with the same VFShape are present. Some of the code in this patch is based on the work done by Sumedh Arani in https://reviews.llvm.org/D66025. [] Notice that in the proposal the VFDatabase was called SVFS. The name VFDatabase is more in line with LLVM recommendations for naming classes and variables. Differential Revision: https://reviews.llvm.org/D67572	2019-12-10 16:36:44 +00:00
Sanjay Patel	396d18aeb6	[InstCombine] replace shuffle's insertelement operand if inserted scalar is not demanded This pattern is noted as a regression from: D70246 ...where we removed an over-aggressive shuffle simplification. SimplifyDemandedVectorElts fails to catch this case when the insert has multiple uses, so I'm proposing to pattern match the minimal sequence directly. This fold does not conflict with any of our current shuffle undef/poison semantics. Differential Revision: https://reviews.llvm.org/D71220	2019-12-10 10:10:05 -05:00
Guillaume Chatelet	1b2842bf90	[Alignment][NFC] CreateMemSet use MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71213	2019-12-10 15:17:44 +01:00
stozer	f2ba93971c	Reapply: [DebugInfo] Recover debug intrinsics when killing duplicated/empty... basic blocks Originally applied in `72ce759928`. Fixed a build failure caused by incorrect use of cast instead of dyn_cast. This reverts commit `8b0780f795`.	2019-12-10 13:33:32 +00:00
Djordje Todorovic	9b9e995819	[DebugInfo][EarlyCSE] Use the salvageDebugInfoOrMarkUndef(); NFC Use the newest API. Differential Revision: https://reviews.llvm.org/D71061	2019-12-09 13:57:35 +01:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
Florian Hahn	c491949694	[LV] Pick correct BB as insert point when fixing PHI for FORs. Currently we fail to pick the right insertion point when PreviousLastPart of a first-order-recurrence is a PHI node not in the LoopVectorBody. This can happen when PreviousLastPart is produce in a predicated block. In that case, we should pick the insertion point in the BB the PHI is in. Fixes PR44020. Reviewers: hsaito, fhahn, Ayal, dorit Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D71071	2019-12-07 19:32:00 +00:00
Florian Hahn	c25de56905	[SimplifyCFG] Account for N being null. Fixes a crash, e.g. http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15119/	2019-12-07 17:23:42 +00:00
Rodrigo Caetano Rocha	d714aa0dfd	[SimplifyCFG] Handle AssumptionCache being null. AssumptionCache can be null in SimplifyCFGOptions. However, FoldCondBranchOnPHI() was not properly handling that when passing a null AssumptionCache to simplifyCFG. Patch by Rodrigo Caetano Rocha <rcor.cs@gmail.com> Reviewers: fhahn, lebedev.ri, spatel Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D69963	2019-12-07 16:54:49 +00:00
Florian Hahn	e60b36cf92	[VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC). The file is intended to gather various VPlan transformations, not only CFG related transforms. Actually, the only transformation there is not CFG related. Reviewers: Ayal, gilr, hsaito, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D70732	2019-12-07 08:56:35 +00:00
Teresa Johnson	c8e36862f5	[WPD] Remove unused parameter (NFC) Remove unused parameter.	2019-12-06 13:14:21 -08:00
Wenlei He	7b61ae68ec	[AutoFDO] Inline replay for cold/small callees from sample profile loader Summary: Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout. This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70750	2019-12-06 11:44:45 -08:00
Sanjay Patel	43e2a901e1	Revert "[InstCombine] reduce code duplication; NFC" This reverts commit `db57396584`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:24:14 -05:00
Sanjay Patel	b6d6f5470f	Revert "[InstCombine] improve readability; NFC" This reverts commit `7250ef3613`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:20:44 -05:00
Sanjay Patel	142a75a9b1	Revert "[InstCombine] reduce indentation; NFC" This reverts commit `8bf8ef7116`. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:19:02 -05:00
Sanjay Patel	8bf8ef7116	[InstCombine] reduce indentation; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	7250ef3613	[InstCombine] improve readability; NFC CreateIntCast returns the input if its type matches, so need to duplicate that check.	2019-12-06 13:26:45 -05:00
Sanjay Patel	db57396584	[InstCombine] reduce code duplication; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	6bb62a9d97	[InstCombine] improve readability; NFC	2019-12-06 13:26:44 -05:00
Gil Rapaport	39ccc099c9	[LV] Record GEP widening decisions in recipe (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit moves GEP operand queries controlling how GEPs are widened to a dedicated recipe and extracts GEP widening code to its own ILV method taking those recorded decisions as arguments. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential revision: https://reviews.llvm.org/D69067	2019-12-06 13:41:19 +02:00
Daniil Suchkov	c4d8c6319f	[LCSSA] Don't use VH callbacks to invalidate SCEV when creating LCSSA phis In general ValueHandleBase::ValueIsRAUWd shouldn't be called when not all uses of the value were actually replaced, though, currently formLCSSAForInstructions calls it when it inserts LCSSA-phis. Calls of ValueHandleBase::ValueIsRAUWd were added to LCSSA specifically to update/invalidate SCEV. In the best case these calls duplicate some of the work already done by SE->forgetValue, though in case when SCEV of the value is SCEVUnknown, SCEV replaces the underlying value of SCEVUnknown with the new value (i.e. acts like LCSSA-phi actually fully replaces the value it is created for), which leads to SCEV being corrupted because LCSSA-phi rarely dominates all uses of its inputs. Fixes bug https://bugs.llvm.org/show_bug.cgi?id=44058. Reviewers: fhahn, efriedma, reames, sanjoy.google Reviewed By: fhahn Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70593	2019-12-06 13:21:49 +07:00
Teresa Johnson	54a3c2a81e	[ThinLTO] Add option to disable readonly/writeonly attribute propagation Summary: Add an option to allow the attribute propagation on the index to be disabled, to allow a workaround for issues (such as that fixed by D70977). Also move the setting of the WithAttributePropagation flag on the index into propagateAttributes(), and remove some old stale code that predated this flag and cleared the maybe read/write only bits when we need to disable the propagation (previously only when importing disabled, now also when the new option disables it). Reviewers: evgeny777, steven_wu Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70984	2019-12-05 16:33:54 -08:00
Wenlei He	532196d811	[AutoFDO] Top-down Inlining for specialization with context-sensitive profile Summary: AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case: Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing. This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits, tejohnson Tags: #llvm Differential Revision: https://reviews.llvm.org/D70655	2019-12-05 16:07:01 -08:00
Wenlei He	e503fd85d3	[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function Summary: When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases: - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile. - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay. A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70653	2019-12-05 15:57:55 -08:00
Florian Hahn	19071173fc	Revert "[DSE] Fix for a dangling point bug in DeadStoreElimination." The commit causes a failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20911 This reverts commit `1847fd9d85`.	2019-12-05 19:29:21 +00:00
Evgenii Stepanov	6f89cbc429	LowerDbgDeclare: look through bitcasts. Summary: Emit a value debug intrinsic (with OP_deref) when an alloca address is passed to a function call after going through a bitcast. This generates an FP or SP-relative location for the local variable in the following case: int x; use((void *)&x; Reviewers: aprantl, vsk, pcc Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70752	2019-12-05 11:19:07 -08:00
Bob Haarman	055779a9ac	Revert "[InstCombine] keep assumption before sinking calls" Summary: This reverts commit `c3b06d0c39`. Reason for revert: Caused miscompiles when inserting assume for undef. Also adds a test to prevent similar breakage in future. Fixes PR44154. Reviewers: rnk, jdoerfert, efriedma, xbolva00 Reviewed By: rnk Subscribers: thakis, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70933	2019-12-05 10:39:34 -08:00
Roman Lebedev	796fa662f1	[InstCombine] Invert `add A, sext(B) --> sub A, zext(B)` canonicalization (to `sub A, zext B -> add A, sext B`) Summary: D68408 proposes to greatly improve our negation sinking abilities. But in current canonicalization, we produce `sub A, zext(B)`, which we will consider non-canonical and try to sink that negation, undoing the existing canonicalization. So unless we explicitly stop producing previous canonicalization, we will have two conflicting folds, and will end up endlessly looping. This inverts canonicalization, and adds back the obvious fold that we'd miss: * `sub [nsw] Op0, sext/zext (bool Y) -> add [nsw] Op0, zext/sext (bool Y)` https://rise4fun.com/Alive/xx4 * `sext(bool) + C -> bool ? C - 1 : C` https://rise4fun.com/Alive/fBl It is obvious that `@ossfuzz_9880()` / `@lshr_out_of_range()`/`@ashr_out_of_range()` (oss-fuzz 4871) are no longer folded as much, though those aren't really worrying. Reviewers: spatel, efriedma, t.p.northover, hfinkel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71064	2019-12-05 21:21:30 +03:00
Ankit	1847fd9d85	[DSE] Fix for a dangling point bug in DeadStoreElimination. The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction. While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration. In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container. Patch by Ankit <quic_aankit@quicinc.com> Reviewers: fhahn, bcahoon, efriedma Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D65326	2019-12-05 17:53:58 +00:00
Sanjay Patel	3c6b5d3674	[InstCombine] narrow select with FP casts Select doesn't change values, so truncate of extended operand cancels out.	2019-12-05 11:12:44 -05:00
Sanjay Patel	51e420c27e	[InstCombine] add FMF guard to builder in fptrunc transform; NFC This makes no difference currently because we don't apply FMF to FP casts, but that may change. This could also be a place to add a fold for select with fptrunc, so it will make that patch easier/smaller.	2019-12-05 10:55:07 -05:00
Roman Lebedev	09311459e3	[InstCombine] Extend `0 - (X sdiv C) -> (X sdiv -C)` fold to non-splat vectors Split off from https://reviews.llvm.org/D68408	2019-12-05 15:48:29 +03:00
Teresa Johnson	e420c0c78e	[ThinLTO] Fix importing of writeonly variables in distributed ThinLTO Summary: D69561/dde5893 enabled importing of readonly variables with references, however, it introduced a bug relating to importing/internalization of writeonly variables with references. A fix for this was added in D70006/7f92d66. But this didn't work in distributed ThinLTO mode. The reason is that the fix (importing the writeonly var with a zeroinitializer) was only applied when there were references on the writeonly var summary. In distributed ThinLTO mode, where we only have a small slice of the index, we will not have the references on the importing side if we are not importing those referenced values. Rather than changing this handshaking (which will require a lot of other changes, since that's how we know what to import in the distributed backend clang invocation), we can simply always give the writeonly variable a zero initializer. Reviewers: evgeny777, steven_wu Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70977	2019-12-04 14:59:27 -08:00
Tozer	8b0780f795	Revert "[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks" This reverts commit `72ce759928`. Reverted due to build failure.	2019-12-04 18:47:08 +00:00
Vedant Kumar	f208b70fbc	Revert "[Coverage] Revise format to reduce binary size" This reverts commit `e18531595b`. On Windows, there is an error: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data	2019-12-04 10:35:14 -08:00
Vedant Kumar	e18531595b	[Coverage] Revise format to reduce binary size Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2019-12-04 10:10:55 -08:00
Florian Hahn	e8a5c17211	[LoopInterchange] Improve inner exit loop safety checks. The PHI node checks for inner loop exits are too permissive currently. As indicated by an existing comment, we should only allow LCSSA PHI nodes that are part of reductions or are only used outside of the loop nest. We ensure this by checking the users of the LCSSA PHIs. Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer loop. It also moves the inner loop exit check before the outer loop exit check. Fixes PR43473. Reviewers: efriedma, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D68144	2019-12-04 17:46:01 +00:00
Francesco Petrogalli	a249551bb2	[llvm][Transform] Remove unused variable. [NFCI] The variable prevents compiling when using -Werror=unused-variable.	2019-12-04 17:40:30 +00:00
Hiroshi Yamauchi	62d429972e	[PGO][PGSO] Distinguish queries from unit tests and explicitly enable for the existing IR passes only. NFC. Summary: This is one more prep step necessary before the code gen pass instrumentation code could go in. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70988	2019-12-04 09:35:50 -08:00
stozer	72ce759928	[DebugInfo] Recover debug intrinsics when killing duplicated/empty basic blocks When basic blocks are killed, either due to being empty or to being an if.then or if.else block whose complement contains identical instructions, some of the debug intrinsics in that block are lost. This patch sinks those intrinsics into the single successor block, setting them Undef if necessary to prevent debug info from falling out-of-date. Differential Revision: https://reviews.llvm.org/D70318	2019-12-04 16:01:49 +00:00
Florian Hahn	4a9cde5a79	[SimpleLoopUnswitch] Invalidate the topmost loop with ExitBB as exiting. SCEV caches the exiting blocks when computing exit counts. In SimpleLoopUnswitch, we split the exit block of the loop to unswitch. Currently we only invalidate the loop containing that exit block, but if that block is the exiting block for a parent loop, we have stale cache entries. We have to invalidate the top-most loop that contains the exit block as exiting block. We might also be able to skip invalidating the loop containing the exit block, if the exit block is not an exiting block of that loop. There are also 2 more places in SimpleLoopUnswitch, that use a similar problematic approach to get the loop to invalidate. If the patch makes sense, I will also update those places to a similar approach (they deal with multiple exit blocks, so we cannot directly re-use getTopMostExitingLoop). Fixes PR43972. Reviewers: skatkov, reames, asbirlea, chandlerc Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D70786	2019-12-04 11:32:09 +00:00
Ehud Katz	2b6b8cb10c	[APFloat] Prevent construction of APFloat with Semantics and FP value Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)` may seem like they accept a FP (floating point) value, but the overload they reach is actually the `integerPart` one, not a `float` or `double` overload (which only exists when `fltSemantics` isn't passed). This may lead to possible loss of data, by the conversion from `float` or `double` to `integerPart`. To prevent future mistakes, a new constructor overload, which accepts any FP value and marked with `delete`, to prevent its usage. Fixes PR34095. Differential Revision: https://reviews.llvm.org/D70425	2019-12-04 12:02:04 +02:00
Craig Topper	5ebbabc1af	[InstCombine] Revert `aafde063aa` and `6749dc3446` related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.	2019-12-03 14:02:22 -08:00
Ayal Zaks	6ed9cef25f	[LV] Scalar with predication must not be uniform Fix PR40816: avoid considering scalar-with-predication instructions as also uniform-after-vectorization. Instructions identified as "scalar with predication" will be "vectorized" using a replicating region. If such instructions are also optimized as "uniform after vectorization", namely when only the first of VF lanes is used, such a replicating region becomes erroneous - only the first instance of the region can and should be formed. Fix such cases by not considering such instructions as "uniform after vectorization". Differential Revision: https://reviews.llvm.org/D70298	2019-12-03 19:50:24 +02:00
Anton Afanasyev	a315519c17	[SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates Summary: Make SLPVectorize to recognize homogeneous aggregates like `{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`, `[2 x {float, float}]` and so on. It's a follow-up of https://reviews.llvm.org/D70068. Merged `findBuildVector()` and `findBuildAggregate()` to one `findBuildAggregate()` function making it recursive to recognize multidimensional aggregates. Aggregates required to be homogeneous. Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70587	2019-12-03 19:29:27 +03:00
Florian Hahn	e9c68422de	[VPlan] Add dump function to VPlan class. This adds a dump() function to VPlan, which uses the existing operator<<. This method provides a convenient way to dump a VPlan while debugging, e.g. from lldb. Reviewers: hsaito, Ayal, gilr, rengolin Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D70920	2019-12-03 11:59:10 +00:00
Johannes Altmanninger	09667bc192	[asan] Remove debug locations from alloca prologue instrumentation Summary: This fixes https://llvm.org/PR26673 "Wrong debugging information with -fsanitize=address" where asan instrumentation causes the prologue end to be computed incorrectly: findPrologueEndLoc, looks for the first instruction with a debug location to determine the prologue end. Since the asan instrumentation instructions had debug locations, that prologue end was at some instruction, where the stack frame is still being set up. There seems to be no good reason for extra debug locations for the asan instrumentations that set up the frame; they don't have a natural source location. In the debugger they are simply located at the start of the function. For certain other instrumentations like -fsanitize-coverage=trace-pc-guard the same problem persists - that might be more work to fix, since it looks like they rely on locations of the tracee functions. This partly reverts `aaf4bb2394` "[asan] Set debug location in ASan function prologue" whose motivation was to give debug location info to the coverage callback. Its test only ensures that the call to @__sanitizer_cov_trace_pc_guard is given the correct source location; as the debug location is still set in ModuleSanitizerCoverage::InjectCoverageAtBlock, the test does not break. So -fsanitize-coverage is hopefully unaffected - I don't think it should rely on the debug locations of asan-generated allocas. Related revision: `3c6c14d14b` "ASAN: Provide reliable debug info for local variables at -O0." Below is how the X86 assembly version of the added test case changes. We get rid of some .loc lines and put prologue_end where the user code starts. ```diff --- 2.master.s 2019-12-02 12:32:38.982959053 +0100 +++ 2.patch.s 2019-12-02 12:32:41.106246674 +0100 @@ -45,8 +45,6 @@ .cfi_offset %rbx, -24 xorl %eax, %eax movl %eax, %ecx - .Ltmp2: - .loc 1 3 0 prologue_end # 2.c:3:0 cmpl $0, __asan_option_detect_stack_use_after_return movl %edi, 92(%rbx) # 4-byte Spill movq %rsi, 80(%rbx) # 8-byte Spill @@ -57,9 +55,7 @@ callq __asan_stack_malloc_0 movq %rax, 72(%rbx) # 8-byte Spill .LBB1_2: - .loc 1 0 0 is_stmt 0 # 2.c:0:0 movq 72(%rbx), %rax # 8-byte Reload - .loc 1 3 0 # 2.c:3:0 cmpq $0, %rax movq %rax, %rcx movq %rax, 64(%rbx) # 8-byte Spill @@ -72,9 +68,7 @@ movq %rax, %rsp movq %rax, 56(%rbx) # 8-byte Spill .LBB1_4: - .loc 1 0 0 # 2.c:0:0 movq 56(%rbx), %rax # 8-byte Reload - .loc 1 3 0 # 2.c:3:0 movq %rax, 120(%rbx) movq %rax, %rcx addq $32, %rcx @@ -99,7 +93,6 @@ movb %r8b, 31(%rbx) # 1-byte Spill je .LBB1_7 # %bb.5: - .loc 1 0 0 # 2.c:0:0 movq 40(%rbx), %rax # 8-byte Reload andq $7, %rax addq $3, %rax @@ -118,7 +111,8 @@ movl %ecx, (%rax) movq 80(%rbx), %rdx # 8-byte Reload movq %rdx, 128(%rbx) - .loc 1 4 3 is_stmt 1 # 2.c:4:3 +.Ltmp2: + .loc 1 4 3 prologue_end # 2.c:4:3 movq %rax, %rdi callq f movq 48(%rbx), %rax # 8-byte Reload ``` Reviewers: eugenis, aprantl Reviewed By: eugenis Subscribers: ormris, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70894	2019-12-03 11:24:17 +01:00
Bill Wendling	87f146767e	Place the "cold" code piece into the same section as the original function Summary: This cropped up in the Linux kernel where cold code was placed in an incompatible section. Reviewers: compnerd, vsk, tejohnson Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70925	2019-12-02 15:24:59 -08:00
Hiroshi Yamauchi	8cdfdfeee6	[PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize. Summary: In case of a need to distinguish different query sites for gradual commit or debugging of PGSO. NFC. Reviewers: davidxl Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70510	2019-12-02 13:54:13 -08:00
Florian Hahn	fe459ce65a	[VPlan] Move graph traits (NFC). By defining the graph traits right after the VPBlockBase definitions, we can make use of them earlier in the file. Reviewers: hsaito, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D70733	2019-12-02 18:23:11 +00:00
Sanjay Patel	af4e59949c	[InstCombine] fix undef propagation for vector urem transform (PR44186) As described here: https://bugs.llvm.org/show_bug.cgi?id=44186 The match() code safely allows undef values, but we can't safely propagate a vector constant that contains an undef to the new compare instruction.	2019-12-02 12:17:38 -05:00
Simon Tatham	01aefae4a1	[ARM,MVE] Add an InstCombine rule permitting VPNOT. Summary: If a user writing C code using the ACLE MVE intrinsics generates a predicate and then complements it, then the resulting IR will use the `pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit integer; complement that integer; and convert back. This will generate machine code that moves the predicate out of the `P0` register, complements it in an integer GPR, and moves it back in again. This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement of the original predicate vector, which we can already instruction- select as the VPNOT instruction which complements P0 in place. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70484	2019-12-02 16:20:30 +00:00
Roman Lebedev	0f22e783a0	[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100	2019-12-02 18:06:15 +03:00
Sanjay Patel	af0babc90a	[InstCombine] fold copysign with constant sign argument to (fneg+)fabs If the sign of the sign argument is known (this could be extended to use ValueTracking), then we can use fneg+fabs to clear/set the sign bit of the magnitude argument. http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic This transform is already done in DAGCombiner, but we can do it sooner in IR as suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 We have effectively no analysis for copysign in IR, so we are taking the unusual step of increasing the number of IR instructions for the negative constant case. Differential Revision: https://reviews.llvm.org/D70792	2019-12-02 09:23:12 -05:00
Bjorn Pettersson	a9d6b0e544	[InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast))) Summary: optimizeVectorResize is rewriting patterns like: %1 = bitcast vector %src to integer %2 = trunc/zext %1 %dst = bitcast %2 to vector Since bitcasting between integer an vector types gives different integer values depending on endianness, we need to take endianness into account. As it happens the old implementation only produced the correct result for little endian targets. Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178 Reviewers: spatel, lattner, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70844	2019-12-02 11:05:25 +01:00
David Green	59b56e5c57	[InstCombine] Expand usub_sat patterns to handle constants The constants come through as add %x, -C, not a sub as would be expected. They need some extra matchers to canonicalise them towards usub_sat. Differential Revision: https://reviews.llvm.org/D69514	2019-11-30 16:58:01 +00:00
David Green	3a1bef5616	[InstCombine] Adjust usub_sat fold one use checks This adjusts the one use checks in the the usub_sat fold code to not increase instruction count, but otherwise do the fold. Reviewed as a part of D69514.	2019-11-30 16:58:00 +00:00

... 8 9 10 11 12 ...

23852 Commits