llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	4ddb8c01b1	[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination [recommitting after the fix in r288307] This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288381	2016-12-01 17:34:50 +00:00
Adam Nemet	8b5fba8081	[GVN, OptDiag] Include the value that is forwarded in load elimination [recommitting after the fix in r288307] This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288380	2016-12-01 17:34:44 +00:00
Alexey Bataev	fc617690ab	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288377	2016-12-01 17:26:54 +00:00
Alexey Bataev	e59a8351d0	Revert "[SLP] Additional tests with the cost of vector operations." This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371	2016-12-01 16:45:04 +00:00
Adam Nemet	4d2a6e5998	[GVN] Basic optimization remark support [recommitting after the fix in r288307] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288370	2016-12-01 16:40:32 +00:00
Alexey Bataev	2ff768475d	[SLP] Additional tests with the cost of vector operations. llvm-svn: 288369	2016-12-01 16:11:48 +00:00
Adam Nemet	feafcd9688	[GVN] When merging blocks update LoopInfo if it's available If LoopInfo is available during GVN, BasicAA will use it. However MergeBlockIntoPredecessor does not update LI as it merges blocks. This didn't use to cause problems because LI was freed before GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is extended so LI needs to be kept up-to-date during GVN. Differential Revision: https://reviews.llvm.org/D27288 llvm-svn: 288307	2016-12-01 03:56:43 +00:00
Michael Kuperstein	b151a641aa	[LoopUnroll] Implement profile-based loop peeling This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274	2016-11-30 21:13:57 +00:00
Sanjay Patel	aa8b28e509	[InstCombine] allow more narrowing transforms for logic ops We had a limited version of this for scalar 'and'; this expands the transform to 'or' and 'xor' and allows vectors types too. llvm-svn: 288273	2016-11-30 20:48:54 +00:00
Sanjay Patel	3d78d3a2ae	[InstCombine] add tests to show potentially missed logic+trunc transforms; NFC llvm-svn: 288270	2016-11-30 20:20:49 +00:00
Sanjay Patel	c41d0c8a9b	[InstCombine] update test to use FileCheck and auto-generate checks; NFC llvm-svn: 288261	2016-11-30 18:49:56 +00:00
Sanjay Patel	2419d670c2	[InstCombine] auto-generate checks for select+bitwise logic tests; NFC llvm-svn: 288254	2016-11-30 17:07:21 +00:00
Adam Nemet	d4717bd8f3	Revert "[GVN] Basic optimization remark support" This reverts commit r288210. The failure on the stage2 LTO build is back. llvm-svn: 288226	2016-11-30 01:14:35 +00:00
Adam Nemet	d5747be721	[GVN] Basic optimization remark support [recommiting patches one-by-one to see which breaks the stage2 LTO bot] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288210	2016-11-29 22:37:01 +00:00
Justin Lebar	96e2915574	[StructurizeCFG] Fix infinite loop in rebuildSSA. Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based for loops", introduced a bug into rebuildSSA, wherein we were iterating over an instruction's use list while modifying it, without taking care to do this correctly. llvm-svn: 288200	2016-11-29 21:49:02 +00:00
Adam Nemet	c2ed4b35b4	Revert "[GVN] Basic optimization remark support" This reverts commit r288046. Trying to see if the revert fixes a compiler crash during a stage2 LTO build with a GVN backtrace. llvm-svn: 288179	2016-11-29 18:32:04 +00:00
Adam Nemet	91d4d93f94	Revert "[GVN, OptDiag] Include the value that is forwarded in load elimination" This reverts commit r288047. Trying to see if the revert fixes a compiler crash during a stage2 LTO build with a GVN backtrace. llvm-svn: 288178	2016-11-29 18:32:00 +00:00
Adam Nemet	a4d3d44ec2	Revert "[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination" This reverts commit r288090. Trying to see if the revert fixes a compiler crash during a stage2 LTO build with a GVN backtrace. llvm-svn: 288177	2016-11-29 18:31:53 +00:00
Artur Pilipenko	5365746723	[CVP] Remove use of removed flag (-cvp-dont-process-adds) from the test The flag was removed by 288154 llvm-svn: 288161	2016-11-29 16:43:30 +00:00
Alexey Bataev	e951e5eb7b	[SLP] Add a new test for tree vectorization starting from insertelement instruction. llvm-svn: 288148	2016-11-29 15:37:52 +00:00
Alexey Bataev	4fa063ebc9	[SLPVectorizer] Improved support of partial tree vectorization. Currently SLP vectorizer tries to vectorize a binary operation and dies immediately after unsuccessful the first unsuccessfull attempt. Patch tries to improve the situation, trying to vectorize all binary operations of all children nodes in the binop tree. Differential Revision: https://reviews.llvm.org/D25517 llvm-svn: 288115	2016-11-29 08:21:14 +00:00
Adam Nemet	b9e53c9056	[GVN, OptDiag] Print the interesting instructions involved in missed load-elimination This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288090	2016-11-29 00:09:22 +00:00
Eli Friedman	5096775393	[SROA] Drop lifetime.start/end intrinsics when they block promotion. Preserving lifetime markers isn't as important as allowing promotion, so just drop the lifetime markers if necessary. This also fixes an assertion failure where other parts of SROA assumed that lifetime markers never block promotion. Fixes https://llvm.org/bugs/show_bug.cgi?id=29139. Differential Revision: https://reviews.llvm.org/D24854 llvm-svn: 288074	2016-11-28 21:50:34 +00:00
Joerg Sonnenberger	caaa82d90d	Revert r287553: [CodeGenPrep] Skip merging empty case blocks It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052	2016-11-28 18:56:54 +00:00
Adam Nemet	a415a9bde6	[GVN, OptDiag] Include the value that is forwarded in load elimination This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288047	2016-11-28 17:45:34 +00:00
Adam Nemet	e5112b14b9	[GVN] Basic optimization remark support Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288046	2016-11-28 17:45:28 +00:00
James Molloy	6bed13c551	[InlineCost] Reduce inline thresholds to compensate for cost changes In r286814, the algorithm for calculating inline costs changed. This caused more inlining to take place which is especially apparent in optsize and minsize modes. As the cost calculation removed a skewed behaviour (we were inconsistent about the cost of calls) it isn't possible to update the thresholds to get exactly the same behaviour as before. However, this threshold change accounts for the very common case where an inline candidate has no calls within it. In this case, r286814 would inline around 5-6 more (IR) instructions. The changes to -Oz have been heavily benchmarked. The "obvious" value for the inline threshold at -Oz is zero, but due to inaccuracies in the inline heuristics this can actually cause code size increases due to not inlining key thunk functions (that then disappear). Experimentally, 5 was the sweet spot for code size over the test-suite. For -Os, this change removes the outlier results shown up by green dragon (http://104.154.54.203/db_default/v4/nts/13248). Fixes D26848. llvm-svn: 288024	2016-11-28 11:07:37 +00:00
Mohammad Shahid	2f5cb60b07	[SLP] Add new and update existing lit testfor providing more context to incoming patch for vectorization of jumbled load Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9 llvm-svn: 287992	2016-11-27 03:35:31 +00:00
Sanjay Patel	12a2af447b	[InstCombine] add test to show missing vector optimization; NFC llvm-svn: 287982	2016-11-26 16:13:23 +00:00
Sanjay Patel	8bd69b7ed9	[InstCombine] don't drop metadata in FoldOpIntoSelect() llvm-svn: 287980	2016-11-26 15:23:20 +00:00
Sanjay Patel	534e270ae5	[SimplifyCFG] auto-generate better checks; NFC llvm-svn: 287954	2016-11-25 21:12:39 +00:00
Sanjay Patel	d1a147f9f4	[SimplifyCFG] auto-generate better checks; NFC llvm-svn: 287953	2016-11-25 21:07:13 +00:00
Abhilash Bhandari	54e5a1a4da	[Loop Unswitch] Patch to selective unswitch only the reachable branch instructions. Summary: The iterative algorithm for Loop Unswitching may render some of the branches unreachable in the unswitched loops. Given the exponential nature of the algorithm, this is quite an overhead. This patch fixes this problem by selectively unswitching only those branches within a loop that are reachable from the loop header. Reviewers: Michael Zolothukin, Anna Thomas, Weiming Zhao. Subscribers: llvm-commits. Differential Revision: http://reviews.llvm.org/D26299 llvm-svn: 287925	2016-11-25 14:07:44 +00:00
Simon Pilgrim	841d7ca463	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882	2016-11-24 14:46:55 +00:00
Alexey Bataev	2eaacda53e	[SLP] Add more tests for SLP Vectorizer. llvm-svn: 287801	2016-11-23 20:10:32 +00:00
Alina Sbirlea	a3d2f703a5	[LoadStoreVectorizer] Enable vectorization of stores in the presence of an aliasing load Summary: The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain. In practice, the aliasing load can be treated as a memory barrier and all stores that precede it are a valid vectorizable prefix. Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch. Reviewers: jlebar, arsenm, tstellarAMD Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D27008 llvm-svn: 287781	2016-11-23 17:43:15 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Davide Italiano	f6fbe21bef	[SCCP] Add a test for switches on undef. Without this test, you can just remove the code fixing the switch to the first constant in ResolvedUndefs in and everything pass. This test, instead, fails with an assertion if the code is removed. Found while refactoring SCCP to integrate undef in the solver. llvm-svn: 287731	2016-11-23 01:42:39 +00:00
Dehao Chen	554f500ae2	Before sample pgo annotation, do not inline a function that has no debug info. (NFC) If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119. llvm-svn: 287710	2016-11-22 22:50:01 +00:00
Davide Italiano	e7ffae9dea	[SCCP] Remove code in visitBinaryOperator (and add tests). We visit and/or, we try to derive a lattice value for the instruction even if one of the operands is overdefined. If the non-overdefined value is still 'unknown' just return and wait for ResolvedUndefsIn to "plug in" the correct value. This simplifies the logic a bit. While I'm here add tests for missing cases. llvm-svn: 287709	2016-11-22 22:11:25 +00:00
Sanjay Patel	e359eaaf70	[InstCombine] change bitwise logic type to eliminate bitcasts In PR27925: https://llvm.org/bugs/show_bug.cgi?id=27925 ...we proposed adding this fold to eliminate a bitcast. In D20774, there was some concern about changing the type of a bitwise op as well as creating bitcasts that might not be free for a target. However, if we're strictly eliminating an instruction (by limiting this to one-use ops), then we should be able to do this in InstCombine. But we're cautiously restricting the transform for now to vector types to avoid possible backend problems. A transform to make sure the logic op is legal for the target should be added to reverse this transform and improve codegen. Differential Revision: https://reviews.llvm.org/D26641 llvm-svn: 287707	2016-11-22 22:05:48 +00:00
Vyacheslav Klochkov	9a630dfb57	Fixed the lost FastMathFlags in GVN(Global Value Numbering). Reviewer: Hal Finkel. Differential Revision: https://reviews.llvm.org/D26952 llvm-svn: 287700	2016-11-22 20:52:53 +00:00
Vyacheslav Klochkov	68a677ae5b	Fixed the lost FastMathFlags in Reassociate optimization. Reviewer: Hal Finkel. Differential Revision: https://reviews.llvm.org/D26957 llvm-svn: 287695	2016-11-22 20:23:04 +00:00
Justin Lebar	3e50a5be8f	[CodeGenPrepare] Don't sink non-cheap addrspacecasts. Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591	2016-11-21 22:49:15 +00:00
Eli Friedman	c0bba1a96d	[LoopReroll] Make root-finding more aggressive. Allow using an instruction other than a mul or phi as the base for root-finding. For example, the included testcase includes a loop which requires using a getelementptr as the base for root-finding. Differential Revision: https://reviews.llvm.org/D26529 llvm-svn: 287588	2016-11-21 22:35:34 +00:00
Sanjay Patel	3b0bafee63	[InstCombine] canonicalize min/max constant to select's false value This is a first step towards canonicalization and improved folding/codegen for integer min/max as discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html Here, we're just matching the simplest min/max patterns and adjusting the icmp predicate while swapping the select operands. I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll so it's easier to see how this might be extended (corresponds to the TODO comment in the code). That's also why I'm using matchSelectPattern() rather than a simpler check; once the backend is patched, we can just remove some of the restrictions to allow the obfuscated min/max patterns in the FIXME tests to be matched. Differential Revision: https://reviews.llvm.org/D26525 llvm-svn: 287585	2016-11-21 22:04:14 +00:00
Hubert Tong	1e5677649c	reassociate-deadinst.ll: avoid accidental match on path Pipe from stdin to avoid accidentally matching on the path. llvm-svn: 287583	2016-11-21 21:53:01 +00:00
Mandeep Singh Grang	17e3f9b79d	[MemorySSA] Fix unit tests broken by D26704 Summary: D26704 fixed the non-determinism in codegen by sorting basic blocks before iteration so as to have a defined iteration order. As a result we need to fix the names (numbers) of the temporaries in the following unit tests: test/Transforms/Util/MemorySSA/multi-edges.ll test/Transforms/Util/MemorySSA/multiple-backedges-hal.ll Reviewers: dberlin, david2050, mgrang Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26926 llvm-svn: 287575	2016-11-21 20:39:08 +00:00
Mandeep Singh Grang	73f0095d71	[MemorySSA] Fix for non-determinism in codegen This patch fixes the non-determinism caused due to iterating SmallPtrSet's which was uncovered due to the experimental "reverse iteration order " patch: https://reviews.llvm.org/D26718 The following unit tests failed because of the undefined order of iteration. LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll LLVM :: Transforms/Util/MemorySSA/many-doms.ll LLVM :: Transforms/Util/MemorySSA/phi-translation.ll Reviewers: dberlin, mgrang Subscribers: dberlin, llvm-commits, david2050 Differential Revision: https://reviews.llvm.org/D26704 llvm-svn: 287563	2016-11-21 19:33:02 +00:00
Jun Bum Lim	82f55c5446	[CodeGenPrep] Skip merging empty case blocks Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553	2016-11-21 16:47:28 +00:00
Yaxun Liu	02f75f31e0	Fix known zero bits for addrspacecast. Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1. This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand. Differential Revision: https://reviews.llvm.org/D26803 llvm-svn: 287545	2016-11-21 15:42:31 +00:00
Davide Italiano	2ae76dd239	[GlobalSplit] Port to the new pass manager. llvm-svn: 287511	2016-11-21 00:28:23 +00:00
Sanjay Patel	47e577eb92	[InstCombine] add tests to show likely unwanted select widening; NFC This is a prerequisite patch for D26556: https://reviews.llvm.org/D26556 ...because there was no direct coverage for these folds (which in some cases are adding instructions). llvm-svn: 287400	2016-11-18 23:22:00 +00:00
Michael Zolotukhin	5020c9971b	[LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks. This fixes PR30454. llvm-svn: 287379	2016-11-18 21:01:12 +00:00
Florian Hahn	77382be56b	[simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations. insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now propagates existing llvm.loop metadata to newly the added backedge. llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp now propagates existing llvm.loop metadata to the branch instructions in the predecessor blocks of the empty block that is removed. Differential Revision: https://reviews.llvm.org/D26495 llvm-svn: 287341	2016-11-18 13:12:07 +00:00
Craig Topper	1de753f7f5	[InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316	2016-11-18 06:04:33 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Dehao Chen	41d72a8632	Use profile info to adjust loop unroll threshold. Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186	2016-11-17 01:17:02 +00:00
Peter Collingbourne	f72a8d4e08	Introduce GlobalSplit pass. This pass splits globals into elements using inrange annotations on getelementptr indices. Differential Revision: https://reviews.llvm.org/D22295 llvm-svn: 287178	2016-11-16 23:40:26 +00:00
Craig Topper	6910fa0ef4	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083	2016-11-16 05:24:10 +00:00
Vyacheslav Klochkov	b3dc774a99	Fixed the lost FastMathFlags for CALL operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064	2016-11-16 00:55:50 +00:00
Justin Lebar	2860573529	[BypassSlowDivision] Handle division by constant numerators better. Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062	2016-11-16 00:44:47 +00:00
Wei Mi	37c4aaaf52	Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific. llvm-svn: 287014	2016-11-15 19:42:05 +00:00
Wei Mi	7ccf7651c0	[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999	2016-11-15 18:35:53 +00:00
Wei Mi	d2948cef70	[IndVars] Change the order to compute WidenAddRec in widenIVUse. When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec, if getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to call getExtendedOperandRecurrence again. But As we know it is possible that after SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction. So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which will increase the chance of successful widening. Differential Revision: https://reviews.llvm.org/D26059 llvm-svn: 286987	2016-11-15 17:34:52 +00:00
Sanjay Patel	bb238bb4e5	[InstCombine] add tests for bitcasted selects; NFC llvm-svn: 286978	2016-11-15 16:01:16 +00:00
Pablo Barrio	4f80c93a2e	Revert "[JumpThreading] Unfold selects that depend on the same condition" This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc. llvm-svn: 286976	2016-11-15 15:42:23 +00:00
Robert Lougher	b0905209dd	[LoopVectorizer] When estimating reg usage, unused insts may "end" another use The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969	2016-11-15 14:27:33 +00:00
Sanjay Patel	aaa06fa486	[InstCombine] add tests to show missing bitcast folds llvm-svn: 286900	2016-11-14 22:44:06 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
James Molloy	6df8f27c95	[InlineCost] Remove skew when calculating call costs When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814	2016-11-14 11:14:41 +00:00
Craig Topper	b4173a5a70	[InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics. llvm-svn: 286755	2016-11-13 07:26:19 +00:00
Craig Topper	8b831cbb2a	[InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739	2016-11-13 01:51:55 +00:00
Sanjay Patel	8e68394376	[InstCombine] update test to use FileCheck; NFC llvm-svn: 286668	2016-11-11 23:12:46 +00:00
Adam Nemet	9bfbf8bbdf	[LV] Stop saying "use -Rpass-analysis=loop-vectorize" This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662	2016-11-11 22:51:46 +00:00
Evgeniy Stepanov	1fe189d795	[cfi] Fix weak functions handling. When a function pointer is replaced with a jumptable pointer, special case is needed to preserve the semantics of extern_weak functions. Since a jumptable entry can not be extern_weak, we emulate that behaviour by replacing all references to F (the extern_weak function) with the following expression: F != nullptr ? JumpTablePtr : nullptr. Extra special care is needed for global initializers, since most (or probably all) backends can not lower an initializer that includes this kind of constant expression. Initializers like that are replaced with a global constructor (i.e. a runtime initializer). llvm-svn: 286636	2016-11-11 21:39:26 +00:00
Vyacheslav Klochkov	f1a12fe0f5	Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626	2016-11-11 19:55:29 +00:00
Sanjay Patel	da0149dd74	[InstCombine] add tests to show size-increasing select transforms llvm-svn: 286619	2016-11-11 19:37:54 +00:00
Evgeniy Stepanov	f48ffab554	[cfi] Implement cfi-icall using inline assembly. The current implementation is emitting a global constant that happens to evaluate to the same bytes + relocation as a jump instruction on X86. This does not work for PIE executables and shared libraries though, because we end up with a wrong relocation type. And it has no chance of working on ARM/AArch64 which use different relocation types for jump instructions (R_ARM_JUMP24) that is never generated for data. This change replaces the constant with module-level inline assembly followed by a hidden declaration of the jump table. Works fine for ARM/AArch64, but has some drawbacks. * Extra symbols are added to the static symbol table, which inflate the size of the unstripped binary a little. Stripped binaries are not affected. This happens because jump table declarations must be external (because their body is in the inline asm). * Original functions that were anonymous are now named <original name>.cfi, and it affects symbolization sometimes. This is necessary because the only user of these functions is the (inline asm) jump table, so they had to be added to @llvm.used, which does not allow unnamed functions. llvm-svn: 286611	2016-11-11 18:49:09 +00:00
Adam Nemet	7da20c39ee	[OptDiag] Remove non-printable chars from function name The r283656 did this in the remark arguments. We also need to do this in the main function attribute as that is written to YAML as well. llvm-svn: 286482	2016-11-10 17:47:03 +00:00
Sanjay Patel	40d33e7554	[InstCombine] auto-generate better checks; NFC Note that the existing metadata checking was re-added by hand because the script doesn't currently know how to generate checks for lines outside of functions. llvm-svn: 286460	2016-11-10 14:58:17 +00:00
Sanjay Patel	4e1b5a53c7	[InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923) Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423	2016-11-10 00:15:14 +00:00
Dehao Chen	06e079a530	Update vectorization debug info unittest. Summary: The change will test the change in r286159. The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location. Reviewers: probinson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26428 llvm-svn: 286403	2016-11-09 22:25:19 +00:00
Sanjay Patel	600631daf3	[InstCombine] regenerate checks; NFC llvm-svn: 286402	2016-11-09 22:21:58 +00:00
Sanjay Patel	16da6c466f	[InstCombine] regenerate checks; NFC llvm-svn: 286399	2016-11-09 21:41:34 +00:00
Alexandros Lamprineas	0ee3ec2fe4	[ARM] Loop Strength Reduction crashes when targeting ARM or Thumb. Scalar Evolution asserts when not all the operands of an Add Recurrence Expression are loop invariants. Loop Strength Reduction should only create affine Add Recurrences, so that both the start and the step of the expression are loop invariants. Differential Revision: https://reviews.llvm.org/D26185 llvm-svn: 286347	2016-11-09 08:53:07 +00:00
Dehao Chen	947dbe1254	Enable Loop Sink pass for functions that has profile. Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code. Reviewers: davidxl, hfinkel Subscribers: mehdi_amini, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26155 llvm-svn: 286325	2016-11-09 00:58:19 +00:00
Sanjay Patel	4e9d6cd354	[InstCombine] fix profitability equation for max-of-nots transform As the test change shows, we can increase the critical path by adding a 'not' instruction, so make sure that we're actually removing an instruction if we do this transform. This transform could also cause us to miss folds of min/max pairs. llvm-svn: 286315	2016-11-09 00:13:11 +00:00
Davide Italiano	1e77aaca8a	[LibcallsShrinkWrap] This pass doesn't preserve the CFG. For example, it invalidates the domtree, causing assertions in later passes which need dominator infos. Make it preserve GlobalsAA, as suggested by Eli. Differential Revision: https://reviews.llvm.org/D26381 llvm-svn: 286271	2016-11-08 19:18:20 +00:00
Sanjay Patel	8625c43662	[InstCombine] move min/max tests to min/max test file; NFC llvm-svn: 286256	2016-11-08 18:12:19 +00:00
Sanjay Patel	686cf49f7a	[InstCombine] update checks; NFC llvm-svn: 286255	2016-11-08 18:06:14 +00:00
Pablo Barrio	9f45254138	[JumpThreading] Unfold selects that depend on the same condition Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: rengolin, haicheng, sebpop Subscribers: jojo, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D26391 llvm-svn: 286236	2016-11-08 14:53:30 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Adam Nemet	b103fc52d3	[OptDiag, opt-viewer] Save callee's location and display as link With this we get a new field in the YAML record if the value being streamed out has a debug location. For examples, please see the changes to the tests. This is then used in opt-viewer to display a link for the callee function in the inlining remarks. Differential Revision: https://reviews.llvm.org/D26366 llvm-svn: 286169	2016-11-07 22:41:13 +00:00
Sanjoy Das	e06ef141fc	Avoid tail recursion elimination across calls with operand bundles Summary: In some specific scenarios with well understood operand bundle types (like `"deopt"`) it may be possible to go ahead and convert recursion to iteration, but TailRecursionElimination does not have that logic today so avoid doing the right thing for now. I need some input on whether `"funclet"` operand bundles should also block tail recursion elimination. If not, I'll allow TRE across calls with `"funclet"` operand bundles and add a test case. Reviewers: rnk, majnemer, nlewycky, ahatanak Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26270 llvm-svn: 286147	2016-11-07 21:01:49 +00:00
Benjamin Kramer	1697d39eef	[MemCpyOpt] Don't emit IR in an unspecified order Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126	2016-11-07 17:47:28 +00:00
Sanjay Patel	86408a8048	[InstCombine] allow splat vector folds in adjustMinMax() (retry r285732) This was reverted at r285866 because there was a crash handling a scalar select of vectors. I added a check for that pattern and a test case based on the example provided in the post-commit thread for r285732. llvm-svn: 286113	2016-11-07 15:52:45 +00:00
NAKAMURA Takumi	b4eef1fa4a	llvm/test/Transforms/DCE/calls-errno.ll: Suppress checking @pow(+0,-1). It depends on host's pow(3), and mingw's pow doesn't raise any errors, just returns +INF. llvm-svn: 286005	2016-11-04 18:50:45 +00:00
Greg Bedwell	5fc6f94591	Revert "[InstCombine] allow splat vector folds in adjustMinMax()" This reverts commit r285732. This change introduced a new assertion failure in the following testcase at -O2: typedef short __v8hi __attribute__((__vector_size__(16))); __v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) { __v8hi Result = V1; if (mask & 0x80) Result[0] = V2[0]; return Result; } llvm-svn: 285866	2016-11-02 23:17:05 +00:00
Eli Friedman	b6befc3bc4	DCE math library calls with a constant operand. On platforms which use -fmath-errno, math libcalls without any uses require some extra checks to figure out if they are actually dead. Fixes https://llvm.org/bugs/show_bug.cgi?id=30464 . Differential Revision: https://reviews.llvm.org/D25970 llvm-svn: 285857	2016-11-02 20:48:11 +00:00
Bjorn Pettersson	7424c8ccd1	[Reassociate] Skip analysis of dead code to avoid infinite loop. Summary: It was detected that the reassociate pass could enter an inifite loop when analysing dead code. Simply skipping to analyse basic blocks that are dead avoids such problems (and as a side effect we avoid spending time on optimising dead code). The solution is using the same Reverse Post Order ordering of the basic blocks when doing the optimisations, as when building the precalculated rank map. A nice side-effect of this solution is that we now know that we only try to do optimisations for blocks with ranked instructions. Fixes https://llvm.org/bugs/show_bug.cgi?id=30818 Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini Subscribers: dberlin Differential Revision: https://reviews.llvm.org/D26154 llvm-svn: 285793	2016-11-02 08:55:19 +00:00
Sanjay Patel	c3d89842ad	[InstCombine] allow splat vector folds in adjustMinMax() llvm-svn: 285732	2016-11-01 20:08:02 +00:00
Sanjay Patel	c0339c77ef	[InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons. This transforms %a = shl nuw %x, c1 %b = icmp {ugt\|ule} %a, c0 into %b = icmp {ugt\|ule} %x, (c0 >> c1) z3: (declare-const x (_ BitVec 64)) (declare-const c0 (_ BitVec 64)) (declare-const c1 (_ BitVec 64)) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvugt (bvshl x c1) c0) (bvugt x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvule (bvshl x c1) c0) (bvule x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) Patch by bryant! Differential Revision: https://reviews.llvm.org/D25913 llvm-svn: 285729	2016-11-01 19:19:29 +00:00
Sanjay Patel	700193ae05	[InstCombine] add vector tests for ext+adjust min/max llvm-svn: 285713	2016-11-01 17:34:29 +00:00
Sanjay Patel	586aeee341	[InstCombine] move/fix tests for adjusted min/max I think the former 'test50' had a typo making it functionally equivalent to the former 'test49'; changed the predicate to provide more coverage. llvm-svn: 285706	2016-11-01 16:39:30 +00:00
Sanjay Patel	04ef115ff7	[InstCombine] fix tests for adjusted min/max 1. Delete identical tests 2. Rename tests to reflect actual functionality 3. Add comments 4. Add unsigned variants 5. Add vector variants with FIXME comments 6. Rename test file llvm-svn: 285699	2016-11-01 15:48:30 +00:00
Simon Pilgrim	6dd8fab443	[InstCombine] Folding of shifts by the sum of positive values This patch introduces the combine: (C1 shift (A add C2)) -> ((C1 shift C2) shift A) iff A and C2 are both positive If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift. Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive). Differential Revision: https://reviews.llvm.org/D26000 llvm-svn: 285696	2016-11-01 15:40:30 +00:00
Sanjay Patel	69587324e8	[InstCombine] auto-generate better checks llvm-svn: 285693	2016-11-01 14:38:30 +00:00
Dorit Nuzman	bf2c15b5dc	Second attempt at r285517. llvm-svn: 285568	2016-10-31 13:17:31 +00:00
Dorit Nuzman	06903d16af	Revert r285517 due to build failures. llvm-svn: 285518	2016-10-30 14:34:57 +00:00
Dorit Nuzman	3c1c658f24	[LoopVectorize] Make interleaved-accesses analysis less conservative about possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 llvm-svn: 285517	2016-10-30 12:23:26 +00:00
Sanjay Patel	36eeb6d6f6	[ValueTracking] recognize more variants of smin/smax Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding smax pattern and commuted variants. The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR. Differential Revision: https://reviews.llvm.org/D26091 llvm-svn: 285499	2016-10-29 16:21:19 +00:00
Sanjay Patel	978f827d12	[InstCombine] re-use bitcasted compare operands in selects (PR28001) These mixed bitcast patterns show up with SSE/AVX intrinsics because we bitcast function parameters to <2 x i64>. The bitcasts obfuscate the expected min/max forms as shown in PR28001: https://llvm.org/bugs/show_bug.cgi?id=28001#c6 Differential Revision: https://reviews.llvm.org/D25943 llvm-svn: 285495	2016-10-29 15:22:04 +00:00
Justin Lebar	1535a5e9df	Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX. llvm-svn: 285464	2016-10-28 21:56:07 +00:00
Justin Lebar	0ede5fb1bb	Don't leave unused divs/rems sitting around in BypassSlowDivision. Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 llvm-svn: 285460	2016-10-28 21:43:54 +00:00
Justin Lebar	468bf73209	Don't claim the udiv created in BypassSlowDivision is exact. Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 llvm-svn: 285459	2016-10-28 21:43:51 +00:00
Matt Arsenault	ef00283425	SpeculativeExecution: Allow speculating more inst types Partial step towards removing the whitelist and only using TTI's cost. llvm-svn: 285438	2016-10-28 20:00:33 +00:00
Sanjay Patel	19ace1d548	[InstCombine] move/add tests for smin/smax folds llvm-svn: 285414	2016-10-28 16:54:03 +00:00
Matthew Simpson	9b6755362b	[LV] Correct misleading comments in test (NFC) llvm-svn: 285402	2016-10-28 14:27:45 +00:00
Davide Italiano	631cd27f29	[Reassociate] Removing instructions mutates the IR. Fixes PR 30784. Discussed with Justin, who pointed out that in the new PassManager infrastructure we can have more fine-grained control on which analyses we want to preserve, but this is the best we can do with the current infrastructure. llvm-svn: 285380	2016-10-28 02:47:09 +00:00
Davide Italiano	30665147f9	[ConstantFold] Get the correct vector type when folding a getelementptr. Differential Revision: https://reviews.llvm.org/D26014 llvm-svn: 285371	2016-10-28 00:53:16 +00:00
Davide Italiano	e4146714ca	Remove accidentally commited test. llvm-svn: 285366	2016-10-27 23:40:19 +00:00
Davide Italiano	e865a525df	[IR] Reintroduce getGEPReturnType(), it will be used in a later patch. llvm-svn: 285365	2016-10-27 23:38:51 +00:00
Sanjay Patel	c0de9c9e40	[InstCombine] fix foldSPFofSPF() to handle vector splats llvm-svn: 285345	2016-10-27 21:19:40 +00:00
Sanjay Patel	923f74b27c	[InstCombine] add vector tests for foldSPFofSPF to show missing folds llvm-svn: 285340	2016-10-27 20:51:03 +00:00
Sanjay Patel	cc8e0af3e5	[InstCombine] auto-generate checks for min/max tests llvm-svn: 285336	2016-10-27 19:54:15 +00:00
Sanjay Patel	611f9f92fc	[InstCombine] handle simple vector integer constants in IsFreeToInvert llvm-svn: 285318	2016-10-27 17:30:50 +00:00
Dehao Chen	b94c09baa0	Add Loop Sink pass to reverse the LICM based of basic block frequency. Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency. Reviewers: hfinkel, davidxl, chandlerc Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22778 llvm-svn: 285308	2016-10-27 16:30:08 +00:00
Sanjay Patel	e372aecb8a	[ValueTracking] fix matchSelectPattern to allow vector splat folds of min/max/abs/nabs llvm-svn: 285303	2016-10-27 15:26:10 +00:00
Sanjay Patel	d5b8d64d4b	[InstCombine] add tests for missing folds of vector abs/nabs/min/max llvm-svn: 285299	2016-10-27 15:02:45 +00:00
Sanjay Patel	f21dd2648f	[InstCombine] auto-generate better checks; NFC llvm-svn: 285293	2016-10-27 13:55:37 +00:00
Alexey Bataev	46c0278e7d	[SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer. After successfull horizontal reduction vectorization attempt for PHI node vectorizer tries to update root binary op by combining vectorized tree and the ReductionPHI node. But during vectorization this ReductionPHI can be vectorized itself and replaced by the `undef` value, while the instruction itself is marked for deletion. This 'marked for deletion' PHI node then can be used in new binary operation, causing "Use still stuck around after Def is destroyed" crash upon PHI node deletion. Also the test is fixed to make it perform actual testing. Differential Revision: https://reviews.llvm.org/D25671 llvm-svn: 285286	2016-10-27 12:02:28 +00:00
Sanjoy Das	01969218a4	Simplify `x >=u x >> y` and `x >=u x udiv y` Summary: Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`. This is a folloup of rL258422 and https://github.com/rust-lang/rust/pull/30917 where llvm failed to optimize away the bounds checking in a binary search. Patch by Arthur Silva! Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25941 llvm-svn: 285228	2016-10-26 19:18:43 +00:00
Dehao Chen	e713000eb6	Introduce updateDiscriminator interface to DILocation to make it cleaner assigning discriminators. Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later. Reviewers: dnovillo, dblaikie, aprantl, echristo Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25959 llvm-svn: 285207	2016-10-26 15:48:45 +00:00
Sanjay Patel	0bacecfb32	[InstCombine] consolidate zext tests and auto-generate checks; NFC llvm-svn: 285195	2016-10-26 14:08:49 +00:00
Sanjay Patel	0b756ff1d1	[InstCombine] auto-generate better checks; NFC llvm-svn: 285194	2016-10-26 13:58:22 +00:00
Rong Xu	33308f92eb	[PGO] Fix select instruction annotation Summary: Select instruction annotation in IR PGO uses the edge count to infer the branch count. It's currently placed in setInstrumentedCounts() where no all the BB counts have been computed. This leads to wrong branch weights. Move the annotation after all BB counts are populated. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25961 llvm-svn: 285128	2016-10-25 21:47:24 +00:00
Guozhi Wei	ae541f6a71	[InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996 The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996. The problem is with following code xB = load (type B); xA = load (type A); +yA = (A)xB; B -> A +zAn = PHI[yA, xA]; PHI +zBn = (B)zAn; // A -> B store zAn; store zBn; optimizeBitCastFromPhi generates +zBn = (B)zAn; // A -> B and expects it will be combined with the following store instruction to another store zAn Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely. optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions. Differential Revision: https://reviews.llvm.org/D23896 llvm-svn: 285116	2016-10-25 20:43:42 +00:00
Sanjay Patel	f3dda13bd2	[InstCombine] Ensure that truncated int types are legal. Fixes the FIXMEs in D25952 and rL285075. Patch by bryant! Differential Revision: https://reviews.llvm.org/D25955 llvm-svn: 285108	2016-10-25 20:11:47 +00:00
Matthew Simpson	c62266d680	[LV] Sink scalar operands of predicated instructions When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions. Differential Revision: https://reviews.llvm.org/D25632 llvm-svn: 285097	2016-10-25 18:59:45 +00:00
Sanjay Patel	8f9235dd7f	[InstCombine] add tests for missing icmp + shl nuw fold Patch by bryant! Differential Revision: https://reviews.llvm.org/D25952 llvm-svn: 285095	2016-10-25 18:47:56 +00:00
Michael Ilseman	e542804343	Add -strip-nonlinetable-debuginfo capability This adds a new function to DebugInfo.cpp that takes an llvm::Module as input and removes all debug info metadata that is not directly needed for line tables, thus effectively stripping all type and variable information from the module. The primary motivation for this feature was the bitcode work flow (cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html for more background). This is not wired up yet, but will be in subsequent patches. For testing, the new functionality is exposed to opt with a -strip-nonlinetable-debuginfo option. The secondary use-case (and one that works right now!) is as a reduction pass in bugpoint. I added two new bugpoint options (-disable-strip-debuginfo and -disable-strip-debug-types) to control the new features. By default it will first attempt to remove all debug information, then only the type info, and then proceed to hack at any remaining MDNodes. Thanks to Adrian Prantl for stewarding this patch! llvm-svn: 285094	2016-10-25 18:44:13 +00:00
Geoff Berry	91e9a5cc23	[EarlyCSE] Make MemorySSA memory dependency check more aggressive. Now that MemorySSA keeps track of whether MemoryUses are optimized, use getClobberingMemoryAccess() to check MemoryUse memory dependencies since it should no longer be so expensive. This is a follow-up change to https://reviews.llvm.org/D25881 llvm-svn: 285080	2016-10-25 16:18:47 +00:00
Sanjay Patel	d59f7f9047	[InstCombine] add test and code comment to show potentially misguided icmp trunc transform llvm-svn: 285075	2016-10-25 15:16:39 +00:00
Sanjay Patel	62fbfe4e21	[InstCombine] fix checks for previous commit (r285069) Accidentally put in the hoped-for checks ahead of the transform! llvm-svn: 285070	2016-10-25 13:30:19 +00:00
Sanjay Patel	97beffe037	[InstCombine] add tests for bitcast interference with min/max (PR28001) llvm-svn: 285069	2016-10-25 13:27:56 +00:00
Sanjay Patel	02be063351	[InstCombine] auto-generate checks llvm-svn: 285046	2016-10-25 00:44:02 +00:00
Sanjay Patel	5cff621dce	[InstCombine] auto-generate checks llvm-svn: 285045	2016-10-25 00:41:00 +00:00
Sanjay Patel	60f80d7a8b	[InstCombine] regenerate some checks llvm-svn: 285036	2016-10-24 22:50:26 +00:00
Mandeep Singh Grang	da99e33ae3	[llvm] Remove redundant --check-prefix=CHECK from tests Reviewers: MatzeB, mcrosier, rengolin Differential Revision: https://reviews.llvm.org/D25894 llvm-svn: 285003	2016-10-24 18:57:55 +00:00
Adrian Prantl	28d2d281e7	add-discriminators: Fix handling of lexical scopes. This fixes a bug in the handling of lexical scopes, when more than one scope is defined on the same line or functions are inlined into call sites that are on the same line as the function definition. This situation can easily happen in macro expansions. The problem is solved by introducing a SmallDenseMap<DIScope , DILexicalBlockFile , 1> that keeps track of all the different lexical scopes that share a line/file location. Fixes PR30681. llvm-svn: 284998	2016-10-24 18:23:51 +00:00
Geoff Berry	6815468768	[EarlyCSE] Optimize MemoryPhis and reduce memory clobber queries w/ MemorySSA Summary: When using MemorySSA, re-optimize MemoryPhis when removing a store since this may create MemoryPhis with all identical arguments. Also, when using MemorySSA to check if two MemoryUses are reading from the same version of the heap, use the defining access instead of calling getClobberingAccess, since the latter can currently result in many more AA calls. Once the MemorySSA use optimization tracking changes are done, we can remove this limitation, which should result in more loads being CSE'd. Reviewers: dberlin Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D25881 llvm-svn: 284984	2016-10-24 15:54:00 +00:00
Nico Weber	b38d341106	Revert 284971. It seems to break selfhost on some bots, see e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/21 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/20 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/22 llvm-svn: 284979	2016-10-24 14:52:04 +00:00
Pablo Barrio	f9e0d0b7d0	[JumpThreading] Unfold selects that depend on the same condition Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: reames, bkramer, mcrosier, gberry, haicheng, jmolloy, sebpop Subscribers: jojo, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D25477 llvm-svn: 284971	2016-10-24 13:04:45 +00:00
Anna Thomas	0860259434	[StripGCRelocates] New pass to remove gc.relocates added by RS4GC Summary: Utility pass to remove gc.relocates created by rewrite statepoints for GC. With respect to safepoint verification, the IR generated would be incorrect, and cannot run as such. This would be a single transformation on the final optimized IR. The benefit of the pass is for easy analysis when the IRs are 'polluted' by too many gc.relocates. Added tests. test run: All RS4GC tests with -verify option. Local downstream tests on large IR files. This also works when the pointer being gc.relocated is another gc.relocate. Reviewers: sanjoy, reames Subscribers: beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25096 llvm-svn: 284855	2016-10-21 18:43:16 +00:00
Artur Pilipenko	47dc098c06	[LVI] Fix a bug with a guard being the very first instruction in a BB not taken into account While looking for guards use reverse iterator and scan up to rend() not to begin() llvm-svn: 284827	2016-10-21 15:02:21 +00:00
John Brawn	84b21835f1	[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818	2016-10-21 11:08:48 +00:00
Davide Italiano	d15477b09d	Revert "[GVN/PRE] Hoist global values outside of loops." There's no agreement about this patch. I personally find the PRE machinery of the current GVN hard enough to reason about that I'm not sure I'll try to land this again, instead of working on the rewrite). llvm-svn: 284796	2016-10-21 01:37:02 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Sanjay Patel	efd8885772	[InstSimplify] fold negation of sign-bit 0 - X --> X, if X is 0 or the minimum signed value 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW I noticed this pattern might be created in the backend after the change from D25485, so we'll want to add a similar fold for the DAG. The use of computeKnownBits in InstSimplify may be something to investigate if the compile time of InstSimplify is noticeable. We could replace computeKnownBits with specific pattern matchers or limit the recursion. Differential Revision: https://reviews.llvm.org/D25785 llvm-svn: 284649	2016-10-19 21:23:45 +00:00
Artur Pilipenko	5c6ef75485	[IndVarSimplify] Teach calculatePostIncRange to take guards into account Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25739 llvm-svn: 284632	2016-10-19 19:43:54 +00:00
Matthew Simpson	41fa838f07	[LV] Avoid emitting trivially dead instructions Some instructions from the original loop, when vectorized, can become trivially dead. This happens because of the way we structure the new loop. For example, we create new induction variables and induction variable "steps" in the new loop. Thus, when we go to vectorize the original induction variable update, it may no longer be needed due to the instructions we've already created. This patch prevents us from creating these redundant instructions. This reduces code size before simplification and allows greater flexibility in code generation since we have fewer unnecessary instruction uses. Differential Revision: https://reviews.llvm.org/D25631 llvm-svn: 284631	2016-10-19 19:22:02 +00:00
Artur Pilipenko	f2d5dc5dc6	[IndVarSimplify] Use control-dependent range information to prove non-negativity This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example. for.body: %i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ] %within_limits = icmp ult i32 %i, 64 br i1 %within_limits, label %continue, label %for.end continue: %i.i64 = zext i32 %i to i64 %arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64 %val = load i32, i32* %arrayidx, align 4 br label %for.inc for.inc: %i.inc = add nsw nuw i32 %i, 1 %cmp = icmp slt i32 %i.inc, %limit br i1 %cmp, label %for.body, label %for.end There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25738 llvm-svn: 284629	2016-10-19 18:59:03 +00:00
Sanjay Patel	cf26c27478	[InstSimplify] move one and add more tests for potential negation folds llvm-svn: 284627	2016-10-19 18:42:12 +00:00
Dehao Chen	4b41571d24	Update the section.ll to fix non-x86 failure. llvm-svn: 284566	2016-10-19 03:53:41 +00:00
Dehao Chen	95fc43143d	Revert r284545 again as the regression in ppc still exists. There is bug in MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563	2016-10-19 01:18:25 +00:00
Rong Xu	1c0e9b97d2	Conditionally eliminate library calls where the result value is not used Summary: This pass shrink-wraps a condition to some library calls where the call result is not used. For example: sqrt(val); is transformed to if (val < 0) sqrt(val); Even if the result of library call is not being used, the compiler cannot safely delete the call because the function can set errno on error conditions. Note in many functions, the error condition solely depends on the incoming parameter. In this optimization, we can generate the condition can lead to the errno to shrink-wrap the call. Since the chances of hitting the error condition is low, the runtime call is effectively eliminated. These partially dead calls are usually results of C++ abstraction penalty exposed by inlining. This optimization hits 108 times in 19 C/C++ programs in SPEC2006. Reviewers: hfinkel, mehdi_amini, davidxl Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz Differential Revision: https://reviews.llvm.org/D24414 llvm-svn: 284542	2016-10-18 21:36:27 +00:00
Dehao Chen	83033e0b9a	Add target for test to fix regression introduced by r284533. llvm-svn: 284538	2016-10-18 21:13:31 +00:00
Dehao Chen	302b69c940	Use profile info to set function section prefix to group hot/cold functions. Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533	2016-10-18 20:42:47 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Davide Italiano	84bd58e915	[opt] Strip coverage if debug info is not present. If -coverage is passed, but -g is not, clang populates the PassManager pipeline with StripSymbols(debugOnly = true). The stripSymbol pass therefore scans the list of named metadata, drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD) around. The verifier runs, and finds out that there's a CU not listed in !llvm.dbg.cu (as it was previously dropped) -> crash. When we strip debug info, so, check if there's coverage data, and strip it as well, in order to avoid pending metadata left around. Differential Revision: https://reviews.llvm.org/D25689 llvm-svn: 284418	2016-10-17 20:05:35 +00:00
Dehao Chen	018a3afa99	Ignore debug info when making optimization decisions in SimplifyCFG. Summary: Debug info should not affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info. Reviewers: davidxl, mzolotukhin, jmolloy Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D25286 llvm-svn: 284415	2016-10-17 19:28:44 +00:00
Oliver Stannard	fe4432b105	[SimplifyCFG] Don't lower complex ConstantExprs to lookup tables Not all ConstantExprs can be represented by a global variable, for example most pointer arithmetic other than addition of a constant, so we can't convert these values from switch statements to lookup tables. Differential Revision: https://reviews.llvm.org/D25550 llvm-svn: 284379	2016-10-17 12:00:24 +00:00
Davide Italiano	590ad7037e	[GVN/PRE] Hoist global values outside of loops. In theory this could be generalized to move anything where we prove the operands are available, but that would require rewriting PRE. As NewGVN will hopefully come soon, and we're trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably not worth spending too much time on it. Fix provided by Daniel Berlin! llvm-svn: 284311	2016-10-15 21:35:23 +00:00
David L Kreitzer	01a057a0c4	Add a pass to optimize patterns of vectorized interleaved memory accesses for X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260	2016-10-14 18:20:41 +00:00
Tom Stellard	64a9d0876c	AMDGPU/SI: Don't allow unaligned scratch access Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257	2016-10-14 18:10:39 +00:00
David L Kreitzer	d5c6755d83	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254	2016-10-14 17:56:00 +00:00
Sanjay Patel	6d6eca5cdc	[InstCombine] use m_APInt to allow sub with constant folds for splat vectors llvm-svn: 284247	2016-10-14 16:31:54 +00:00
Sanjay Patel	ecd0da2619	[InstCombine] add tests for missing vector folds llvm-svn: 284245	2016-10-14 15:55:34 +00:00
Sanjay Patel	a3bc38b36d	[InstCombine] auto-generate checks llvm-svn: 284244	2016-10-14 15:41:25 +00:00
Sanjay Patel	0b611dcabf	[InstCombine] remove redundant test This test was apparently checking for 2 independent folds, but we have plenty of tests for those individual folds already. We are lacking vector tests, however, because we don't have the shift folds for vectors. llvm-svn: 284243	2016-10-14 15:36:28 +00:00
Sanjay Patel	ad0757febb	[InstCombine] update test to use FileCheck and auto-generate checks llvm-svn: 284242	2016-10-14 15:30:31 +00:00
Sanjay Patel	c6c5965a42	[InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y) Prefer add/zext because they are better supported in terms of value-tracking. Note that the backend should be prepared for this IR canonicalization (including vector types) after: https://reviews.llvm.org/rL284015 Differential Revision: https://reviews.llvm.org/D25135 llvm-svn: 284241	2016-10-14 15:24:31 +00:00
David L Kreitzer	d9ca3589de	[safestack] Move X86-targeted tests into the X86 subdirectory. Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D25340 llvm-svn: 284139	2016-10-13 17:51:59 +00:00
Matthew Simpson	1d4b163fc0	[LV] Account for predicated stores in instruction costs This patch ensures that we scale the estimated cost of predicated stores by block probability. This is a follow-on patch for r284123. llvm-svn: 284126	2016-10-13 14:54:31 +00:00
Matthew Simpson	6cdb5a6f96	[LV] Avoid rounding errors for predicated instruction costs This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch since we currently don't scale the cost of predicated stores by block probability. Differential Revision: https://reviews.llvm.org/D25333 llvm-svn: 284123	2016-10-13 14:19:48 +00:00
Sebastian Pop	5ba9f24ed7	commit back "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This is with an extra change to avoid calling MemoryLocation::get() on a call instruction. Differential Revision: https://reviews.llvm.org/D25542 llvm-svn: 284098	2016-10-13 01:39:10 +00:00
Reid Kleckner	8958f6a529	Revert "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This CL didn't actually address the test case in PR30499, and clang still crashes. Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC" Reverts r283965 and r283967. llvm-svn: 284093	2016-10-13 00:18:26 +00:00
Haicheng Wu	1ef17e90b2	Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053	2016-10-12 21:29:38 +00:00
Haicheng Wu	45e4ef737d	Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" This reverts commit r284044. llvm-svn: 284051	2016-10-12 21:02:22 +00:00
Krzysztof Parzyszek	3cb5ffeb35	Fix testcases failing after r284036 The codegen has changed slightly between my tests and the commit. llvm-svn: 284049	2016-10-12 20:39:33 +00:00
Haicheng Wu	6cac34fd41	[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044	2016-10-12 20:24:32 +00:00
Krzysztof Parzyszek	8271be9a1d	Do not remove implicit defs in BranchFolder Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036	2016-10-12 19:50:57 +00:00
Sanjoy Das	bc357e8fa3	[SimplifyCFG] Don't create PHI nodes for constant bundle operands Summary: Constant bundle operands may need to retain their constant-ness for correctness. I'll admit that this is slightly odd, but it looks like SimplifyCFG already does this for things like @llvm.frameaddress and @llvm.stackmap, so I suppose adding one more case is not a big deal. It is possible to add a mechanism to denote bundle operands that need to remain constants, but that's probably too complicated for the time being. Reviewers: jmolloy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D25502 llvm-svn: 284028	2016-10-12 18:15:33 +00:00
Artur Pilipenko	c6eb6bd9cb	[ValueTracking] An improvement to IR ValueTracking on Non-negative Integers Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default. Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18777 llvm-svn: 284022	2016-10-12 16:18:43 +00:00
Chad Rosier	c215c3fd14	[CVP] Convert an AShr to a LShr if 1st operand is known to be nonnegative. An arithmetic shift can be safely changed to a logical shift if the first operand is known positive. This allows ComputeKnownBits (and similar analysis) to determine the sign bit of the shifted value in some cases. In turn, this allows InstCombine to canonicalize a signed comparison (a > 0) into an equality check (a != 0). PR30577 Differential Revision: https://reviews.llvm.org/D25119 llvm-svn: 284013	2016-10-12 13:41:38 +00:00
Simon Pilgrim	fd0d7b21e0	[InstCombine] Fix constexpr issue in select combining As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead. Differential Revision: https://reviews.llvm.org/D25466 llvm-svn: 284000	2016-10-12 10:20:15 +00:00
Sebastian Pop	ab12fb62ee	GVN-hoist: fix store past load dependence analysis (PR30216, PR30499) This is a refreshed version of a patch that was reverted: it fixes the problems reported in both PR30216 and PR30499, and contains all the test-cases from both bugs. To hoist stores past loads, we used to search for potential conflicting loads on the hoisting path by following a MemorySSA def-def link from the store to be hoisted to the previous defining memory access, and from there we followed the def-use chains to all the uses that occur on the hoisting path. The problem is that the def-def link may point to a store that does not alias with the store to be hoisted, and so the loads that are walked may not alias with the store to be hoisted, and even as in the testcase of PR30216, the loads that may alias with the store to be hoisted are not visited. The current patch visits all loads on the path from the store to be hoisted to the hoisting position and uses the alias analysis to ask whether the store may alias the load. I was not able to use the MemorySSA functionality to ask for whether load and store are clobbered: I'm not sure which function to call, so I used a call to AA->isNoAlias(). Store past store is still working as before using a MemorySSA query: I added an extra test to pr30216.ll to make sure store past store does not regress. Tested on x86_64-linux with check and a test-suite run. Differential Revision: https://reviews.llvm.org/D25476 llvm-svn: 283965	2016-10-12 02:23:39 +00:00
David Majnemer	80dca0c78f	[InstCombine] Transform !range metadata to !nonnull when combining loads When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair. This fixes PR30597. Patch by Ariel Ben-Yehuda! Differential Revision: https://reviews.llvm.org/D25215 llvm-svn: 283836	2016-10-11 01:00:45 +00:00
Adrian Prantl	3bfe1093df	Teach llvm::StripDebugInfo() about global variable !dbg attachments. This is a regression introduced by the global variable ownership reversal performed in r281284. rdar://problem/28448075 llvm-svn: 283784	2016-10-10 17:53:33 +00:00
Simon Pilgrim	cfef627b1f	[SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests llvm-svn: 283756	2016-10-10 14:28:06 +00:00
Simon Pilgrim	2c0733c678	[SLPVectorizer][X86] Add avx512 sitofp/uitofp tests llvm-svn: 283751	2016-10-10 14:14:31 +00:00
Simon Pilgrim	6cadb5610e	[SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests Fixed copy+paste vector alignment to correct for per-element scalar loads Increased to 512-bit data sizes in preparation of avx512 tests llvm-svn: 283748	2016-10-10 14:10:41 +00:00
Adam Nemet	ee5cf031ce	[OptRemarks] Remove non-printable chars from function name Value names may be prefixed with a binary '1' to indicate that the backend should not modify the symbols due to any platform naming convention. This should not show up in the YAML opt record file because it breaks the YAML parser. llvm-svn: 283656	2016-10-08 04:47:20 +00:00
Gor Nishanov	1b6aec8e25	[coroutines] Store an address of destroy OR cleanup part in the coroutine frame. Summary: If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup. Before this change, CoroSplit synthesized these stores after coro.begin: ``` store void (%f.Frame) @f.resume, void (%f.Frame)* %resume.addr store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr ``` In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops. Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow: ``` store void (%f.Frame) @f.resume, void (%f.Frame)* %resume.addr %0 = select i1 %need.alloc, void (%f.Frame) @f.destroy, void (%f.Frame) @f.cleanup store void (%f.Frame) %0, void (%f.Frame)* %destroy.addr ``` Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25377 llvm-svn: 283625	2016-10-08 00:22:50 +00:00
Davide Italiano	f6988d2980	[InstCombine] Don't unpack arrays that are too large (part 2). This is similar to r283599, but for store instructions. Thanks to David for pointing out! llvm-svn: 283612	2016-10-07 21:53:09 +00:00
Davide Italiano	da11412243	[InstCombine] Don't unpack arrays that are too large Differential Revision: https://reviews.llvm.org/D25376 llvm-svn: 283599	2016-10-07 20:57:42 +00:00
Anna Thomas	e76d77ace5	[RS4GC] Strengthen coverage: add more tests Summary: Add tests for cases where we have zero coverage in RS4GC. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25341 llvm-svn: 283591	2016-10-07 20:34:00 +00:00
Sanjay Patel	4326c4ac8f	[InstCombine] fold select X, (ext X), C If we're going to canonicalize IR towards select of constants, try harder to create those. Also, don't lose the metadata. This is actually 4 related transforms in one patch: // select X, (sext X), C --> select X, -1, C // select X, (zext X), C --> select X, 1, C // select X, C, (sext X) --> select X, C, 0 // select X, C, (zext X) --> select X, C, 0 Differential Revision: https://reviews.llvm.org/D25126 llvm-svn: 283575	2016-10-07 17:53:07 +00:00
Matthew Simpson	a371c14ffe	[LV] Don't mark multi-use branch conditions uniform Previously, we marked the branch conditions of latch blocks uniform after vectorization if they were instructions contained in the loop. However, if a condition instruction has users other than the branch, it may not remain uniform. This patch ensures the conditions we mark uniform are only used by the branch. This should fix PR30627. Reference: https://llvm.org/bugs/show_bug.cgi?id=30627 llvm-svn: 283563	2016-10-07 15:20:13 +00:00
Tom Stellard	17eb3413cd	[ValueTracking] Fix crash in GetPointerBaseWithConstantOffset() Summary: While walking defs of pointer operands we were assuming that the pointer size would remain constant. This is not true, because addresspacecast instructions may cast the pointer to an address space with a different pointer width. This partial reverts r282612, which was a more conservative solution to this problem. Reviewers: reames, sanjoy, apilipenko Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24772 llvm-svn: 283557	2016-10-07 14:23:29 +00:00
Alexey Bataev	6ad5da7c81	[SLPVectorizer] Fix for PR25748: reduction vectorization after loop unrolling. The next code is not vectorized by the SLPVectorizer: ``` int test(unsigned int *p) { int sum = 0; for (int i = 0; i < 8; i++) sum += p[i]; return sum; } ``` During optimization this loop is fully unrolled and SLPVectorizer is unable to vectorize it. Patch tries to fix this problem. Differential Revision: https://reviews.llvm.org/D24796 llvm-svn: 283535	2016-10-07 09:39:22 +00:00
Oliver Stannard	4df1cc0b00	[ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI With the ROPI and RWPI relocation models we can't always have pointers to global data or functions in constant data, so don't try to convert switches into lookup tables if any value in the lookup table would require a relocation. We can still safely emit lookup tables of other values, such as simple constants. Differential Revision: https://reviews.llvm.org/D24462 llvm-svn: 283530	2016-10-07 08:48:24 +00:00
David Majnemer	8c03c1bade	[SimplifyCFG] Correctly test for unconditional branches in GetCaseResults GetCaseResults assumed that a terminator with one successor was an unconditional branch. This is not necessarily the case, it could be a cleanupret. Strengthen the check by querying whether or not the terminator is exceptional. llvm-svn: 283517	2016-10-07 01:38:35 +00:00
Michael Kuperstein	5185b7dde3	[LV] Remove triples from target-independent vectorizer tests. NFC. Vectorizer tests in the target-independent directory should not have a target triple. If a test really needs to query a specific backend, it belongs in the right target subdirectory (which "REQUIRES" the right backend). Otherwise, it should not specify a triple. llvm-svn: 283512	2016-10-06 23:57:25 +00:00
Rong Xu	0e79f7d11d	[PGO] Create weak alias for the renamed Comdat function Add a weak alias to the renamed Comdat function in IR level instrumentation, using it's original name. This ensures the same behavior w/ and w/o IR instrumentation, even for non standard conforming code. Differential Revision: http://reviews.llvm.org/D25339 llvm-svn: 283490	2016-10-06 20:38:13 +00:00
Michael Ilseman	6d6b4d87a3	Revert "Add -strip-nonlinetable-debuginfo capability" This reverts commit r283473. Reverted until review is completed. llvm-svn: 283478	2016-10-06 18:30:26 +00:00
Michael Ilseman	d0a4db7632	Add -strip-nonlinetable-debuginfo capability This adds a new function to DebugInfo.cpp that takes an llvm::Module as input and removes all debug info metadata that is not directly needed for line tables, thus effectively stripping all type and variable information from the module. The primary motivation for this feature was the bitcode work flow (cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html for more background). This is not wired up yet, but will be in subsequent patches. For testing, the new functionality is exposed to opt with a -strip-nonlinetable-debuginfo option. The secondary use-case (and one that works right now!) is as a reduction pass in bugpoint. I added two new bugpoint options (-disable-strip-debuginfo and -disable-strip-debug-types) to control the new features. By default it will first attempt to remove all debug information, then only the type info, and then proceed to hack at any remaining MDNodes. llvm-svn: 283473	2016-10-06 17:58:38 +00:00
Hal Finkel	bdd6735a9e	Don't filter diagnostics written as YAML to the output file The purpose of the YAML diagnostic output file is to collect information on optimizations performed, or not performed, for later processing by tools that help users (and compiler developers) understand how code was optimized. As such, the diagnostics that appear in the file should not be coupled to what a user might want to see summarized for them as the compiler runs, and in fact, because the user likely does not know what optimization diagnostics their tools might want to use, the user cannot provide a useful filter regardless. As such, we shouldn't filter the diagnostics going to the output file. Differential Revision: https://reviews.llvm.org/D25224 llvm-svn: 283236	2016-10-04 18:13:45 +00:00
Adam Nemet	0428e93217	Serialize remark argument as a mapping to get proper quotation for the value. llvm-svn: 283231	2016-10-04 17:05:04 +00:00
Alexey Bataev	7e217c2402	[SLPVectorizer] Add a test with non-vectorizable IR. llvm-svn: 283225	2016-10-04 15:07:23 +00:00
Anna Thomas	479cbb9405	[RS4GC] Handle ShuffleVector instruction in findBasePointer Summary: This patch modifies the findBasePointer to handle the shufflevector instruction. Tests run: RS4GC tests, local downstream tests. Reviewers: reames, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25197 llvm-svn: 283219	2016-10-04 13:48:37 +00:00
Sanjoy Das	0359a193a7	[PruneEH] Be correct in the face IPO This fixes one spot I had missed in r265762. Credit goes to Philip Reames for spotting this one! llvm-svn: 283137	2016-10-03 19:35:30 +00:00
Hans Wennborg	b4d2678c6f	Jump threading: avoid trying to split edge into landingpad block (PR27840) Splitting the edge is nontrivial because of the landing pad, and we would currently assert trying to do it. Differential Revision: https://reviews.llvm.org/D24680 llvm-svn: 283129	2016-10-03 18:18:04 +00:00
Sanjay Patel	d27a21874b	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119	2016-10-03 16:38:27 +00:00
Simon Pilgrim	04e249b128	[SLPVectorizer][X86] Added fptosi/fptoui tests llvm-svn: 283048	2016-10-01 19:35:59 +00:00
Simon Pilgrim	567c4fbdae	[SLPVectorizer][X86] Added fcopysign tests llvm-svn: 283046	2016-10-01 17:00:26 +00:00
Simon Pilgrim	cceeb2a4fa	[SLPVectorizer][X86] Added fabs tests llvm-svn: 283045	2016-10-01 16:54:01 +00:00
Sanjay Patel	f7b851fe84	[InstCombine] allow non-splat folds of select cond (ext X), C llvm-svn: 282906	2016-09-30 19:49:22 +00:00
Dehao Chen	a067b0926f	Revert test change in r282894 as it's broken in some platforms. llvm-svn: 282903	2016-09-30 19:25:23 +00:00
Gor Nishanov	a263a60ad5	[Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend Summary: In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame. ``` %save = call token @llvm.coro.save(i8* null) %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0 %suspend = call i8 @llvm.coro.suspend(token %save, i1 false) switch i8 %suspend, label %exit [ i8 0, label %await.ready i8 1, label %exit ] await.ready: %val = load i32, i32* %Result.i19 ``` Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24418 llvm-svn: 282902	2016-09-30 19:24:19 +00:00
Sanjay Patel	75b2518762	[InstCombine] add tests for non-splat select(ext) llvm-svn: 282901	2016-09-30 19:15:41 +00:00
Gor Nishanov	c16219486a	[Coroutines] Part15b: Fix dbg information handling in coro-split. Summary: Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /ModuleLevelChanges=/true, Returns); would duplicate all of the debug information including the DICompileUnit. We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24417 llvm-svn: 282899	2016-09-30 19:05:06 +00:00
Gor Nishanov	768de2c604	[Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup. Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24412 llvm-svn: 282897	2016-09-30 18:41:35 +00:00
Sanjay Patel	b43712a513	clean up tests and auto-generate checks llvm-svn: 282896	2016-09-30 18:37:34 +00:00
Dehao Chen	977853b7c5	Update loop unroller cost model to make sure debug info does not affect optimization decisions. Summary: Debug info should not affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info. Reviewers: davidxl, mzolotukhin Subscribers: haicheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D25098 llvm-svn: 282894	2016-09-30 18:30:04 +00:00
Sanjay Patel	9685b3eb57	[InstCombine] add tests for select X, (ext X), C llvm-svn: 282891	2016-09-30 18:10:14 +00:00
Matthew Simpson	7808833e28	[LV] Build all scalar steps for non-uniform induction variables When building the steps for scalar induction variables, we previously attempted to determine if all the scalar users of the induction variable were uniform. If they were, we would only emit the step corresponding to vector lane zero. This optimization was too aggressive. We generally don't know the entire set of induction variable users that will be scalar. We have isScalarAfterVectorization, but this is only a conservative estimate of the instructions that will be scalarized. Thus, an induction variable may have scalar users that aren't already known to be scalar. To avoid emitting unused steps, we can only check that the induction variable is uniform. This should fix PR30542. Reference: https://llvm.org/bugs/show_bug.cgi?id=30542 llvm-svn: 282863	2016-09-30 15:13:52 +00:00
Piotr Padlewski	d28694739c	[thinlto] Don't decay threshold for hot callsites Summary: We don't want to decay hot callsites to import chains of hot callsites. The same mechanism is used in LIPO. Reviewers: tejohnson, eraman, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24976 llvm-svn: 282833	2016-09-30 03:01:17 +00:00
Piotr Padlewski	ba72b95f7b	[thinlto] Add cold-callsite import heuristic Summary: Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006 Reviewers: tejohnson, mehdi_amini, eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24940 llvm-svn: 282733	2016-09-29 17:32:07 +00:00
Evgeny Stupachenko	dc8a254663	Wisely choose sext or zext when widening IV. Summary: The patch fixes regression caused by two earlier patches D18777 and D18867. Reviewers: reames, sanjoy Differential Revision: http://reviews.llvm.org/D24280 From: Li Huang llvm-svn: 282650	2016-09-28 23:39:39 +00:00
Sanjay Patel	3e9e5ccf7c	[InstCombine] update to use FileCheck Also, remove unnecessary function attributes, parameters, and comments. It looks like at least some of these tests are not minimal though... llvm-svn: 282620	2016-09-28 19:10:16 +00:00
Artur Pilipenko	b6ce6e5dac	Don't look through addrspacecast in GetPointerBaseWithConstantOffset Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value. This is similar to D13008. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D24729 llvm-svn: 282612	2016-09-28 17:57:16 +00:00
Sanjay Patel	220a8730fb	[InstSimplify] allow or-of-icmps folds with vector splat constants llvm-svn: 282592	2016-09-28 14:27:21 +00:00
Sanjay Patel	a8f9e57c74	[InstSimplify] add vector splat tests for or-of-icmps llvm-svn: 282591	2016-09-28 14:17:35 +00:00
Sanjay Patel	1b312ad42d	[InstSimplify] allow and-of-icmps folds with vector splat constants llvm-svn: 282590	2016-09-28 13:53:13 +00:00
Adam Nemet	c507ac96f5	[Inliner] Port all opt remarks to new streaming API llvm-svn: 282559	2016-09-27 23:47:03 +00:00
Adam Nemet	0427909434	Pass -S to opt in this test to avoid printing binary on mismatch The purpose of the test is to verify diagnostics. llvm-svn: 282558	2016-09-27 23:46:59 +00:00
Adam Nemet	1142147e41	[Inliner] Fold the analysis remark into the missed remark There is really no reason for these to be separate. The vectorizer started this pretty bad tradition that the text of the missed remarks is pretty meaningless, i.e. vectorization failed. There, you have to query analysis to get the full picture. I think we should just explain the reason for missing the optimization in the missed remark when possible. Analysis remarks should provide information that the pass gathers regardless whether the optimization is passing or not. llvm-svn: 282542	2016-09-27 21:58:17 +00:00
Michael Zolotukhin	1a554be3b6	[LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form. llvm-svn: 282541	2016-09-27 21:03:45 +00:00
Adam Nemet	a62b7e1a28	Output optimization remarks in YAML (Re-committed after moving the template specialization under the yaml namespace. GCC was complaining about this.) This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282539	2016-09-27 20:55:07 +00:00
Adam Nemet	cc2a3fa8e8	Revert "Output optimization remarks in YAML" This reverts commit r282499. The GCC bots are failing llvm-svn: 282503	2016-09-27 16:39:24 +00:00
Adam Nemet	92e928c10a	Output optimization remarks in YAML This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282499	2016-09-27 16:15:16 +00:00
Ivan Krasin	4ff4f21e15	Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation Summary: We don't currently need this facility for CFI. Disabling individual hot methods proved to be a better strategy in Chrome. Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne. Reviewers: pcc Subscribers: kcc Differential Revision: https://reviews.llvm.org/D24948 llvm-svn: 282461	2016-09-27 00:29:53 +00:00
Piotr Padlewski	d9830eb79f	[thinlto] Basic thinlto fdo heuristic Summary: This patch improves thinlto importer by importing 3x larger functions that are called from hot block. I compared performance with the trunk on spec, and there were about 2% on povray and 3.33% on milc. These results seems to be consistant and match the results Teresa got with her simple heuristic. Some benchmarks got slower but I think they are just noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with more iterations to confirm. Geomean of all benchmarks including the noisy ones were about +0.02%. I see much better improvement on google branch with Easwaran patch for pgo callsite inlining (the inliner actually inline those big functions) Over all I see +0.5% improvement, and I get +8.65% on povray. So I guess we will see much bigger change when Easwaran patch will land (it depends on new pass manager), but it is still worth putting this to trunk before it. Implementation details changes: - Removed CallsiteCount. - ProfileCount got replaced by Hotness - hot-import-multiplier is set to 3.0 for now, didn't have time to tune it up, but I see that we get most of the interesting functions with 3, so there is no much performance difference with higher, and binary size doesn't grow as much as with 10.0. Reviewers: eraman, mehdi_amini, tejohnson Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24638 llvm-svn: 282437	2016-09-26 20:37:32 +00:00
Daniel Berlin	1e98c04226	Remove pruning of phi nodes in MemorySSA - it makes updating harder Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24923 llvm-svn: 282419	2016-09-26 17:22:54 +00:00
Matthew Simpson	b764aba2ab	[LV] Scalarize instructions marked scalar after vectorization This patch ensures that we actually scalarize instructions marked scalar after vectorization. Previously, such instructions may have been vectorized instead. Differential Revision: https://reviews.llvm.org/D23889 llvm-svn: 282418	2016-09-26 17:08:37 +00:00
Gor Nishanov	bc0ebb383c	[Coroutines] Part14: Handle coroutines with no suspend points. Summary: If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function. Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24408 llvm-svn: 282414	2016-09-26 15:49:28 +00:00
Alexey Bataev	793c946ecb	[InstCombine] Fixed bug introduced in r282237 The index of the new insertelement instruction was evaluated in the wrong way, it was considered as the index of the inserted value instead of index of the position, where the value should be inserted. llvm-svn: 282401	2016-09-26 13:18:59 +00:00
Andrea Di Biagio	a82d52d11d	[InstCombine] Teach the udiv folding logic how to handle constant expressions. This patch fixes PR30366. Function foldUDivShl() worked under the assumption that one of the values in input to the function was always an instance of llvm::Instruction. However, function visitUDivOperand() (the only user of foldUDivShl) was clearly violating that precondition; internally, visitUDivOperand() uses pattern matches to check the operands of a udiv. Pattern matchers for binary operators know how to handle both Instruction and ConstantExpr values. This patch fixes the problem in foldUDivShl(). Now we use pattern matchers instead of explicit casts to Instruction. The reduced test case from PR30366 has been added to test file InstCombine/udiv-simplify.ll. Differential Revision: https://reviews.llvm.org/D24565 llvm-svn: 282398	2016-09-26 12:07:23 +00:00
Sanjay Patel	04949faf69	[TLI] isdigit / isascii / toascii param type should match return type (PR30484) We crash in LibCallSimplifier if we don't check the validity of the function signature properly. llvm-svn: 282278	2016-09-23 18:44:09 +00:00
Alexey Bataev	fee9078dcd	[InstCombine] Fix for PR29124: reduce insertelements to shufflevector If inserting more than one constant into a vector: define <4 x float> @foo(<4 x float> %x) { %ins1 = insertelement <4 x float> %x, float 1.0, i32 1 %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2 ret <4 x float> %ins2 } InstCombine could reduce that to a shufflevector: define <4 x float> @goo(<4 x float> %x) { %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3> ret <4 x float> %shuf } Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e. shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> -> insertelement <4 x float> %v, float 1.0, 1 Differential Revision: https://reviews.llvm.org/D24182 llvm-svn: 282237	2016-09-23 09:14:08 +00:00
Sanjay Patel	30ef70b090	[InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672) We already have the udiv variant of this transform, so I think this is ok for InstCombine too even though there is an increase in IR instructions. As the tests and TODO comments show, the transform can lead to follow-on combines. This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672 Differential Revision: https://reviews.llvm.org/D24527 llvm-svn: 282209	2016-09-22 22:36:26 +00:00
Hans Wennborg	c7957ef86c	Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)" and also the dependent r282175 "GVN-hoist: do not dereference null pointers" It's causing compiler crashes building Harfbuzz (PR30499). llvm-svn: 282199	2016-09-22 21:20:53 +00:00
Sebastian Pop	8e6e3318c2	GVN-hoist: fix store past load dependence analysis (PR30216) To hoist stores past loads, we used to search for potential conflicting loads on the hoisting path by following a MemorySSA def-def link from the store to be hoisted to the previous defining memory access, and from there we followed the def-use chains to all the uses that occur on the hoisting path. The problem is that the def-def link may point to a store that does not alias with the store to be hoisted, and so the loads that are walked may not alias with the store to be hoisted, and even as in the testcase of PR30216, the loads that may alias with the store to be hoisted are not visited. The current patch visits all loads on the path from the store to be hoisted to the hoisting position and uses the alias analysis to ask whether the store may alias the load. I was not able to use the MemorySSA functionality to ask for whether load and store are clobbered: I'm not sure which function to call, so I used a call to AA->isNoAlias(). Store past store is still working as before using a MemorySSA query: I added an extra test to pr30216.ll to make sure store past store does not regress. Differential Revision: https://reviews.llvm.org/D24517 llvm-svn: 282168	2016-09-22 15:33:51 +00:00
Sebastian Pop	f6bfc1ad8b	GVN-hoist: move hoist testcase to GVNHoist dir llvm-svn: 282161	2016-09-22 14:45:46 +00:00
Sebastian Pop	440f15b7fc	GVN-hoist: only hoist relevant scalar instructions Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction and would try to value number it. The patch filters out all such kind of irrelevant instructions. A bit frustrating is that there is no easy way to discard all those very infrequent instructions, a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that checking for those very infrequent other instructions would cost us more in compilation time than just letting those instructions getting numbered, so I'm still thinking that a simpler check: if (isa<TerminatorInst>(I)) return false; is better than listing all the other less frequent instructions. Differential Revision: https://reviews.llvm.org/D23929 llvm-svn: 282160	2016-09-22 14:45:40 +00:00
Anna Thomas	82c3717f54	[RS4GC] Remat in presence of phi and use live value Summary: Reviewers: Subscribers: llvm-svn: 282150	2016-09-22 13:13:06 +00:00
Dorit Nuzman	d1247a684e	Fix revision 281960 llvm-svn: 282139	2016-09-22 07:56:23 +00:00
Chad Rosier	00eb8db3a1	[LoopInterchange] Track all dependencies, not just anti dependencies. Currently, we give up on loop interchange if we encounter a flow dependency anywhere in the loop list. Worse yet, we don't even track output dependencies. This patch updates the dependency matrix computation to track flow and output dependencies in the same way we track anti dependencies. This improves an internal workload by 2.2x. Note the loop interchange pass is off by default and it can be enabled with '-mllvm -enable-loopinterchange' Differential Revision: https://reviews.llvm.org/D24564 llvm-svn: 282101	2016-09-21 19:16:47 +00:00
Matthew Simpson	15869f86d8	[LV] Don't emit unused scalars for uniform instructions If we identify an instruction as uniform after vectorization, we know that we should only use the value corresponding to the first vector lane of each unroll iteration. However, when scalarizing such instructions, we still produce values for the other vector lanes. This patch prevents us from generating the unused scalars. Differential Revision: https://reviews.llvm.org/D24275 llvm-svn: 282087	2016-09-21 16:50:24 +00:00
Dehao Chen	160fbc3f95	Change the basic block weight calculation algorithm to use max instead of voting. Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight. Reviewers: dnovillo Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24788 llvm-svn: 282084	2016-09-21 16:26:51 +00:00
Hans Wennborg	1049085c78	Revert r281895 "Add @llvm.dbg.value entries for the phi node created by -mem2reg" (And follow-up r281964.) It caused PR30468. llvm-svn: 282077	2016-09-21 15:55:53 +00:00
Arnold Schwaighofer	f62ba1031f	DeadArgElim: Don't mark swifterror arguments as unused Replacing swifterror arguments with undef creates invalid IR. rdar://28300490 llvm-svn: 282075	2016-09-21 15:29:08 +00:00
Adam Nemet	e3cef93727	[LV] When reporting about a specific instruction without debug location use loop's This can occur for example if some optimization drops the debug location. llvm-svn: 282048	2016-09-21 03:14:20 +00:00
Michael Kuperstein	79dcc274b4	[InferAttributes] Don't access parameters that don't exist. Check for the correct number of parameters before querying their type. This fixes PR30455. llvm-svn: 282038	2016-09-20 23:10:31 +00:00
Dorit Nuzman	02efef0525	Reverting revision 281960 due to test failures. llvm-svn: 281961	2016-09-20 08:27:48 +00:00
Dorit Nuzman	d3686e5269	[SROA] Preserve llvm.mem.parallel_loop_access metadata. SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it transforms loads/stores. This patch fixes a couple occurences of this issue. (Partially addresses PR28981). Differential Revision: https://reviews.llvm.org/D23549 llvm-svn: 281960	2016-09-20 07:50:49 +00:00
Dehao Chen	20866ed57e	Handle early inline for hot callsites that reside in the same basic block. Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content. Reviewers: dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24734 llvm-svn: 281927	2016-09-19 18:38:14 +00:00
Dehao Chen	82667d04c5	Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability. Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available. Reviewers: dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24732 llvm-svn: 281912	2016-09-19 16:33:41 +00:00
Dehao Chen	38e3731c47	Use call target count to derive the call instruction weight Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24410 llvm-svn: 281911	2016-09-19 16:06:37 +00:00
Keith Walker	c941252374	Add @llvm.dbg.value entries for the phi node created by -mem2reg When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare entries are converted to @llvm.dbg.value entries at the place where the store instructions existed. However no entry is created to describe the resulting value of the phi node. The effect of this is especially noticeable in for loops which have a constant for the intial value; the loop control variable's location would be described as the intial constant value in the loop body once the -mem2reg optimization phase was run. This change adds the creation of the @llvm.dbg.value entries to describe variables whose location is the result of a phi node created in -mem2reg. Also when the phi node is finally lowered to a machine instruction it is important that the lowered "load" instruction is placed before the associated DEBUG_VALUE entry describing the value loaded. Differential Revision: https://reviews.llvm.org/D23715 llvm-svn: 281895	2016-09-19 09:49:30 +00:00
James Molloy	0efb96a8ee	[SimplifyCFG] Update (AND) IR flags when CSE'ing instructions We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to. Fixes PR30373. llvm-svn: 281889	2016-09-19 08:23:08 +00:00
Dehao Chen	41cde0b986	Handle Invoke during sample profiler annotation: make it inlinable. Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24409 llvm-svn: 281870	2016-09-18 23:11:37 +00:00
Simon Pilgrim	34032cba14	Rename tests llvm-svn: 281863	2016-09-18 20:25:41 +00:00
Xinliang David Li	4ca1733a06	[Profile] Implement select instruction instrumentation in IR PGO Differential Revision: http://reviews.llvm.org/D23727 llvm-svn: 281858	2016-09-18 18:34:07 +00:00
Elena Demikhovsky	5f8cc0c346	[Loop Vectorizer] Consecutive memory access - fixed and simplified Amended consecutive memory access detection in Loop Vectorizer. Load/Store were not handled properly without preceding GEP instruction. Differential Revision: https://reviews.llvm.org/D20789 llvm-svn: 281853	2016-09-18 13:56:08 +00:00
Teresa Johnson	fbb431b292	[ThinLTO] Ensure anonymous globals renamed even at -O0 Summary: This fixes an issue when files are compiled with -flto=thin at default -O0. We need to rename anonymous globals before attempting to write the module summary because all values need names for the summary. This was happening at -O1 and above, but not before the early exit when constructing the pipeline for -O0. Also add an internal -prepare-for-thinlto option to enable this to be tested via opt. Fixes PR30419. Reviewers: mehdi_amini Subscribers: probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24701 llvm-svn: 281840	2016-09-17 20:40:16 +00:00
Sanjay Patel	f26710d97d	[InstCombine] canonicalize vector select with constant vector condition to shuffle As discussed on llvm-dev ( http://lists.llvm.org/pipermail/llvm-dev/2016-August/104210.html ): turn a vector select with constant condition operand into a shuffle as a canonicalization step. Shuffles may be easier to reason about in conjunction with other shuffles and insert/extract. Possible known (minor?) regressions from this change are filed as: https://llvm.org/bugs/show_bug.cgi?id=28530 https://llvm.org/bugs/show_bug.cgi?id=28531 https://llvm.org/bugs/show_bug.cgi?id=30371 If something terrible happens to perf after this commit, feel free to revert until a backend fix is in place. Differential Revision: https://reviews.llvm.org/D24279 llvm-svn: 281787	2016-09-16 22:16:18 +00:00
Evgeniy Stepanov	aa84f050fc	[safestack] Fix assertion failure in stack coloring. This is a fix for PR30318. Clang may generate IR where an alloca is already live when entering a BB with lifetime.start. In this case, conservatively extend the alloca lifetime all the way back to the block entry. llvm-svn: 281784	2016-09-16 22:04:10 +00:00
Sanjay Patel	c96f6db246	[InstCombine] allow vector types for constant folding / computeKnownBits (PR24942) computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine. I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements. This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942: https://llvm.org/bugs/show_bug.cgi?id=24942 Differential Revision: https://reviews.llvm.org/D24677 llvm-svn: 281777	2016-09-16 21:20:36 +00:00
Sanjay Patel	0417c7c3ef	auto-generate checks llvm-svn: 281755	2016-09-16 17:48:16 +00:00
Mehdi Amini	55b06538b5	Fix test after renaming -name-anon-functions pass to -name-anon-globals llvm-svn: 281752	2016-09-16 17:18:16 +00:00
Mehdi Amini	2cac787919	Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well A follow-up patch will rename this pass and the source file accordingly, but I figured the non-NFC change will be easier to spot in isolation. Differential Revision: https://reviews.llvm.org/D24641 llvm-svn: 281744	2016-09-16 16:56:25 +00:00
David Majnemer	08566532ae	Add a test for r280191 llvm-svn: 281694	2016-09-16 02:43:36 +00:00
Sanjay Patel	af91d1f81e	[InstCombine] allow icmp (shr/shl) folds for vectors These 2 helper functions were already using APInt internally, so just change the API and caller to allow folds for splats. The scalar regression tests look quite thorough, so I just added a couple of tests to prove that vectors are handled too. These folds should be grouped with the other cmp+shift folds though. That can be an NFC follow-up. llvm-svn: 281663	2016-09-15 21:35:30 +00:00
Sanjay Patel	d590db6eac	regenerate checks llvm-svn: 281655	2016-09-15 20:39:01 +00:00
Mehdi Amini	d880309835	[GlobalOpt] Dead Eliminate declarations GlobalOpt is already dead-code-eliminating global definitions. With this change it also takes care of declarations. Hopefully this should make it now a strict superset of GlobalDCE. This is important for LTO/ThinLTO as we don't want the linker to see "undefined reference" when it processes the input files: it could prevent proper internalization (or even load an extra file from a static archive, changing the behavior of the program!). llvm-svn: 281653	2016-09-15 20:26:27 +00:00
David Majnemer	8b16da8744	[InstCombine] Do not RAUW a constant GEP canRewriteGEPAsOffset expects to process instructions, not constants. This fixes PR30342. llvm-svn: 281650	2016-09-15 20:10:09 +00:00
Sanjay Patel	886a542e23	[InstCombine] allow icmp (sub nsw) folds for vectors Also, clean up the code and comments for the existing folds in foldICmpSubConstant(). llvm-svn: 281631	2016-09-15 18:05:17 +00:00
Sanjay Patel	514068397e	[InstCombine] add vector tests for icmp (sub nsw) llvm-svn: 281630	2016-09-15 17:54:47 +00:00
Sanjay Patel	40c53ea933	[InstCombine] allow (icmp sgt smin(PosA, B), 0) fold for vectors llvm-svn: 281624	2016-09-15 16:23:20 +00:00
Sanjay Patel	78a7c7d26a	[InstCombine] add vector tests for icmp sgt smin llvm-svn: 281623	2016-09-15 16:13:41 +00:00
Sanjay Patel	e957f1fe2c	[InstCombine] auto-generate checks llvm-svn: 281621	2016-09-15 15:48:53 +00:00
Sanjay Patel	7577a3d799	[InstCombine] use m_APInt to allow icmp folds using known bits for splat constant vectors llvm-svn: 281613	2016-09-15 14:15:47 +00:00
NAKAMURA Takumi	6a5bac48cf	llvm/test/Transforms/CorrelatedValuePropagation/alloca.ll REQUIRES +Asserts. llvm-svn: 281598	2016-09-15 09:45:31 +00:00
Wei Mi	f160e345be	Add some shortcuts in LazyValueInfo to reduce compile time of Correlated Value Propagation. The patch is to partially fix PR10584. Correlated Value Propagation queries LVI to check non-null for pointer params of each callsite. If we know the def of param is an alloca instruction, we know it is non-null and can return early from LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is constant. If the def of the pointer is an alloca instruction, we know it is not a constant pointer. These shortcuts can reduce the cost of CVP significantly. Differential Revision: https://reviews.llvm.org/D18066 llvm-svn: 281586	2016-09-15 06:28:34 +00:00
Sanjay Patel	9f036b5a97	[InstCombine] add vector tests for foldICmpUsingKnownBits() llvm-svn: 281559	2016-09-14 23:15:11 +00:00
Matthew Simpson	b25e87fca5	[LV] Process pointer IVs with PHINodes in collectLoopUniforms This patch moves the processing of pointer induction variables in collectLoopUniforms from the consecutive pointer phase of the analysis to the phi node phase. Previously, if a pointer induction variable was used by both a scalarized non-memory instruction as well as a vectorized memory instruction, we would incorrectly identify the pointer as uniform. Pointer induction variables should be treated the same as other phi nodes. That is, they are uniform if all users of the induction variable and induction variable update are uniform. Differential Revision: https://reviews.llvm.org/D24511 llvm-svn: 281485	2016-09-14 14:47:40 +00:00
Andrea Di Biagio	e8e1af3649	[InstCombine] Merged two test files and regenerated checks using update_test_checks.py. NFC. llvm-svn: 281478	2016-09-14 14:18:21 +00:00
Akira Hatanaka	dea090e6b2	[ObjCARC] Traverse chain downwards to replace uses of argument passed to ObjC library call with call return. ARC contraction tries to replace uses of an argument passed to an objective-c library call with the call return value. For example, in the following IR, it replaces uses of argument %9 and uses of the values discovered traversing the chain upwards (%7 and %8) with the call return %10, if they are dominated by the call to @objc_autoreleaseReturnValue. This transformation enables code-gen to tail-call the call to @objc_autoreleaseReturnValue, which is necessary to enable auto release return value optimization. %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = bitcast %0* %8 to i8* %10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9) ret %0* %8 Since r276727, llvm started removing redundant bitcasts and as a result started feeding the following IR to ARC contraction: %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7) ret %0* %8 ARC contraction no longer does the optimization described above since it only traverses the chain upwards and fails to recognize that the function return can be replaced by the call return. This commit changes ARC contraction to traverse the chain downwards too and replace uses of bitcasts with the call return. rdar://problem/28011339 Differential Revision: https://reviews.llvm.org/D24523 llvm-svn: 281419	2016-09-13 23:43:11 +00:00
Sanjay Patel	ab40f9d0ef	add tests for PR28672 I'm not sure if we actually want to transform all of these in InstCombine yet, so I'm not labeling these with FIXME. llvm-svn: 281386	2016-09-13 20:36:13 +00:00
Matt Arsenault	e2e6cfee61	Reapply "InstCombine: Reduce trunc (shl x, K) width." This reapplies r272987 with a fix for infinitely looping when the truncated value is another shift of a constant. llvm-svn: 281379	2016-09-13 19:43:57 +00:00
Andrea Di Biagio	7277afeec1	[ConstantFold] Improve the bitcast folding logic for constant vectors. The constant folder didn't know how to always fold bitcasts of constant integer vectors. In particular, it was unable to handle the case where a constant vector had some undef elements, and the resulting (i.e. bitcasted) vector type had more elements than the original vector type. Example: %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32> On a little endian target, %cast could have been folded to: <4 x i32><i32 undef, i32 undef, i32 2, i32 0> This patch improves the folding logic by teaching how to correctly propagate undef elements in the folded vector. Differential Revision: https://reviews.llvm.org/D24301 llvm-svn: 281343	2016-09-13 14:50:47 +00:00
Andrea Di Biagio	3647a96a44	[InstSimplify] Add tests to show missed bitcast folding opportunities. InstSimplify doesn't always know how to fold a bitcast of a constant vector. In particular, the logic in InstSimplify doesn't know how to handle the case where the constant vector in input contains some undef elements, and the number of elements is smaller than the number of elements of the bitcast vector type. llvm-svn: 281332	2016-09-13 13:17:42 +00:00
Sam Parker	64781ed4bb	Remove InstCombine test file My previous commit should of removed a test file but I missed it. llvm-svn: 281326	2016-09-13 12:33:06 +00:00
Sam Parker	214f7bf5cc	Enable simplify libcalls for ARM PCS Teach SimplifyLibcalls that in can treat functions annotated with apcs, aapcs or aapcs_vfp like normal C functions if they only take and return integer or pointer values, and the target is not iOS. Differential Revision: https://reviews.llvm.org/D24453 llvm-svn: 281322	2016-09-13 12:10:14 +00:00
Peter Collingbourne	d4135bbc30	DebugInfo: New metadata representation for global variables. This patch reverses the edge from DIGlobalVariable to GlobalVariable. This will allow us to more easily preserve debug info metadata when manipulating global variables. Fixes PR30362. A program for upgrading test cases is attached to that bug. Differential Revision: http://reviews.llvm.org/D20147 llvm-svn: 281284	2016-09-13 01:12:59 +00:00
Sanjay Patel	ff00fae8e6	add more tests for PR30273 llvm-svn: 281270	2016-09-12 22:28:29 +00:00
Sanjay Patel	dea26950a0	[InstCombine] add test for PR30327 llvm-svn: 281248	2016-09-12 19:50:08 +00:00
Sanjay Patel	2eea9a1d58	[InstCombine] regenerate checks llvm-svn: 281247	2016-09-12 19:29:26 +00:00
Sanjay Patel	f5887f1fbd	[InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors isSignBitCheck could be changed to take a pointer param to avoid the 'UnusedBit' ugliness. llvm-svn: 281231	2016-09-12 16:25:41 +00:00
David Majnemer	c83044d9bb	[FunctionAttrs] Don't try to infer returned if it is already on an argument Trying to infer the 'returned' attribute if an argument is already 'returned' can lead to verification failure: inference might determine that a different argument is passed through which would result in two different arguments marked as 'returned'. This fixes PR30350. llvm-svn: 281221	2016-09-12 16:04:59 +00:00
Sanjay Patel	db400baa80	[InstCombine] add tests to show missing vector folds llvm-svn: 281219	2016-09-12 15:51:42 +00:00
Sanjay Patel	f9ca770225	[InstCombine] regenerate checks llvm-svn: 281186	2016-09-12 00:12:56 +00:00
Sanjay Patel	a2aabfcc17	[InstCombine] regenerate checks llvm-svn: 281185	2016-09-12 00:08:33 +00:00
James Molloy	104370ab37	[SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd This should actually fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas. The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan). llvm-svn: 281162	2016-09-11 09:00:03 +00:00
James Molloy	18d96e8fa5	[SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example. So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction. Should fix PR30244. llvm-svn: 281160	2016-09-11 08:07:30 +00:00
Justin Lebar	11a3204355	Add handling of !invariant.load to PropagateMetadata. Summary: This will let e.g. the load/store vectorizer propagate this metadata appropriately. Reviewers: arsenm Subscribers: tra, jholewinski, hfinkel, mzolotukhin Differential Revision: https://reviews.llvm.org/D23479 llvm-svn: 281153	2016-09-11 01:39:08 +00:00
Arnold Schwaighofer	5d335559b9	InstCombine: Don't combine loads/stores from swifterror to a new type This generates invalid IR: the only users of swifterror can be call arguments, loads, and stores. rdar://28242257 llvm-svn: 281144	2016-09-10 18:14:57 +00:00
Arnold Schwaighofer	c9277f40fd	Inliner: Don't mark swifterror allocas with lifetime markers This would create a bitcast use which fails the verifier: swifterror values may only be used by loads, stores, and as function arguments. rdar://28233244 llvm-svn: 281114	2016-09-09 22:40:27 +00:00
Matt Arsenault	950a82047b	LSV: Fix incorrectly increasing alignment If the unaligned access has a dynamic offset, it may be odd which would make the adjusted alignment incorrect to use. llvm-svn: 281110	2016-09-09 22:20:14 +00:00
Sanjay Patel	58109abe91	[InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors llvm-svn: 281107	2016-09-09 21:59:37 +00:00
Dehao Chen	22ce5eb051	Do not widen load for different variable in GVN. Summary: Widening load in GVN is too early because it will block other optimizations like PRE, LICM. https://llvm.org/bugs/show_bug.cgi?id=29110 The SPECCPU2006 benchmark impact of this patch: Reference: o2_nopatch (1): o2_patched Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 25.2 -0.08% spec/2006/fp/C++/447.dealII 45.92 +1.05% spec/2006/fp/C++/450.soplex 41.7 -0.26% spec/2006/fp/C++/453.povray 35.65 +1.68% spec/2006/fp/C/433.milc 23.79 +0.42% spec/2006/fp/C/470.lbm 41.88 -1.12% spec/2006/fp/C/482.sphinx3 47.94 +1.67% spec/2006/int/C++/471.omnetpp 22.46 -0.36% spec/2006/int/C++/473.astar 21.19 +0.24% spec/2006/int/C++/483.xalancbmk 36.09 -0.11% spec/2006/int/C/400.perlbench 33.28 +1.35% spec/2006/int/C/401.bzip2 22.76 -0.04% spec/2006/int/C/403.gcc 32.36 +0.12% spec/2006/int/C/429.mcf 41.04 -0.41% spec/2006/int/C/445.gobmk 26.94 +0.04% spec/2006/int/C/456.hmmer 24.5 -0.20% spec/2006/int/C/458.sjeng 28 -0.46% spec/2006/int/C/462.libquantum 55.25 +0.27% spec/2006/int/C/464.h264ref 45.87 +0.72% geometric mean +0.23% For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400. Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames Subscribers: gberry, junbuml Differential Revision: https://reviews.llvm.org/D24096 llvm-svn: 281074	2016-09-09 18:42:35 +00:00
Sanjay Patel	6da0fb8c74	[InstCombine] add tests to show pattern matching failures due to commutation I was looking to fix a bug in getComplexity(), and these cases showed up as obvious failures. I'm not sure how to find these in general though. llvm-svn: 281055	2016-09-09 16:35:20 +00:00
Gor Nishanov	faf36c2e0b	[Coroutines] Part13: Handle single edge PHINodes across suspends Summary: If one of the uses of the value is a single edge PHINode, handle it. Original: %val = something <suspend> %p = PHINode [%val] After Spill + Part13: %val = something %slot = gep val.spill.slot store %val, %slot <suspend> %p = load %slot Plus tiny fixes/changes: * use correct index for coro.free in CoroCleanup * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24242 llvm-svn: 281020	2016-09-09 05:39:00 +00:00
Dehao Chen	87823f8e4d	Remove debug info when hoisting instruction from then/else branch. Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch. Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24164 llvm-svn: 280995	2016-09-08 21:53:33 +00:00
Sanjay Patel	a4c6223319	[InstCombine] regenerate checks llvm-svn: 280993	2016-09-08 21:40:21 +00:00
Matthew Simpson	bfe5e1817b	[LV] Ensure proper handling of multi-use case when collecting uniforms The test case included in r280979 wasn't checking what it was supposed to be checking for the predicated store case. Fixing the test revealed that the multi-use case (when a pointer is used by both vectorized and scalarized memory accesses) wasn't being handled properly. We can't skip over non-consecutive-like pointers since they may have looked consecutive-like with a different memory access. llvm-svn: 280992	2016-09-08 21:38:26 +00:00
Sanjay Patel	ed9fda01a3	[InstCombine] regenerate checks llvm-svn: 280991	2016-09-08 21:32:21 +00:00
Matthew Simpson	408a3abcfe	[LV] Don't mark pointers used by scalarized memory accesses uniform Previously, all consecutive pointers were marked uniform after vectorization. However, if a consecutive pointer is used by a memory access that is eventually scalarized, the pointer won't remain uniform after all. An example is predicated stores. Even though a predicated store may be consecutive, it will still be scalarized, making it's pointer operand non-uniform. This patch updates the logic in collectLoopUniforms to consider the cases where a memory access may be scalarized. If a memory access may be scalarized, its pointer operand is not marked uniform. The determination of whether a given memory instruction will be scalarized or not has been moved into a common function that is used by the vectorizer, cost model, and legality analysis. Differential Revision: https://reviews.llvm.org/D24271 llvm-svn: 280979	2016-09-08 19:11:07 +00:00
Dehao Chen	ebb715b119	Add unittest for r280760 llvm-svn: 280963	2016-09-08 16:53:40 +00:00
Simon Pilgrim	073472a2bf	[InstCombine][X86] Regenerate masked memory op combine tests llvm-svn: 280960	2016-09-08 16:32:37 +00:00
Simon Pilgrim	cd7b2830b9	[InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests llvm-svn: 280959	2016-09-08 16:30:46 +00:00
Simon Pilgrim	3b1ecbe66c	[InstCombine][X86] Regenerate insertps combine tests llvm-svn: 280957	2016-09-08 16:15:21 +00:00
Michael Zolotukhin	e72997a524	Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905	2016-09-08 03:51:30 +00:00
Michael Zolotukhin	5e0a20697e	[LoopUnroll] Properly update loop-info when cloning prologues and epilogues. Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901	2016-09-08 01:52:26 +00:00
Sanjay Patel	9b40f98357	[InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors llvm-svn: 280873	2016-09-07 22:33:03 +00:00
Hal Finkel	ac5803ba91	[SimplifyCFG] Don't try to create metadata-valued PHIs We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866	2016-09-07 21:38:22 +00:00
Sanjay Patel	def931e76a	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861	2016-09-07 20:50:44 +00:00
Justin Lebar	3a5f40c191	[LSV] Use the original loads' names for the extractelement instructions. Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818	2016-09-07 15:49:48 +00:00
Andrea Di Biagio	bdd576dbb0	Regenerate vector bitcast folding tests using update_test_checks.py. Two tests have been merged together, regenerated and then moved to a more appropriate directory. No functional change. llvm-svn: 280814	2016-09-07 14:50:07 +00:00
Andrea Di Biagio	f3fd316223	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807	2016-09-07 12:47:53 +00:00
Andrea Di Biagio	8df5b9cf48	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804	2016-09-07 12:03:03 +00:00
James Molloy	ec905a62ae	[SimplifyCFG] Update workaround for PR30188 to also include loads I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792	2016-09-07 08:40:20 +00:00
James Molloy	bf1837d9c9	[SimplifyCFG] Check PHI uses more accurately PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790	2016-09-07 08:15:54 +00:00
Adam Nemet	c520822dbf	[JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / \| eq_1 \| (8) \ \| check_2 (16) (8) / \| eq_2 \| (8) \ \| check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * : check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| \ eq_2 \| (0) \ \ \| ` --- check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with ): check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| /-- eq_2~ \| (0) (back edge) \| check_3 (0) (0) / \| (loop exit) \| (0*) \| (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713	2016-09-06 16:08:33 +00:00
Sanjay Patel	e341c919b0	fix FileCheck variables for test added with r280677 The script (utils/update_test_checks.py) seems to have problems with variable names that start with the same string. llvm-svn: 280679	2016-09-05 23:49:32 +00:00
Gor Nishanov	ccabaca273	[Coroutines] Part12: Handle alloca address-taken Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678	2016-09-05 23:45:45 +00:00
Sanjay Patel	eea2ef7862	[InstCombine] don't assert that division-by-constant has been folded (PR30281) This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677	2016-09-05 23:38:22 +00:00
Sanjay Patel	46f9df5b71	[InstCombine] revert r280637 because it causes test failures on an ARM bot http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676	2016-09-05 22:36:32 +00:00
Oliver Stannard	ef38d53a7e	[SimplifyCFG] Add test for sinking inline asm in if/else This test code previously caused a failure in the module verifier, because SimplifyCFG created this invalid instruction, which tries to take the address of inline asm: %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r" This has been fixed recently, presumably by James Molloy's patches that re-wrote and changed parts of SimplifyCFG, so this patch just adds a regression test for it. Differential Revision: https://reviews.llvm.org/D24231 llvm-svn: 280660	2016-09-05 13:49:26 +00:00
Gor Nishanov	0e18f75a92	[Coroutines] Part11: Add final suspend handling. Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646	2016-09-05 04:44:30 +00:00
Sanjay Patel	c641e9d6ff	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637	2016-09-04 20:58:27 +00:00
Dorit Nuzman	abd15f69b2	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617	2016-09-04 07:49:39 +00:00
Joseph Tremoulet	e92e0a9042	Fix inliner funclet unwind memoization Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610	2016-09-04 01:23:20 +00:00
Matt Arsenault	46a0382ab2	AMDGPU: Do basic folding of class intrinsic This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586	2016-09-03 07:06:58 +00:00
Wei Mi	c37307a5f4	Fix buildbot error. Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86. llvm-svn: 280568	2016-09-03 01:43:28 +00:00
Xinliang David Li	40afd5c9e4	[Profile] handle select instruction in 'expect' lowering Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547	2016-09-02 22:03:40 +00:00
Sanjay Patel	70277411d3	[InstCombine] auto-generate assertions for tighter checking llvm-svn: 280531	2016-09-02 19:38:37 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Sanjay Patel	521f19f249	[InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504	2016-09-02 17:05:43 +00:00
Matthew Simpson	b65c230eab	[LV] Ensure reverse interleaved group GEPs remain uniform For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497	2016-09-02 16:19:22 +00:00
Andrea Di Biagio	805815f407	[instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN. This patch fixes a crash caused by an incorrect folding of an ordered comparison between a packed floating point vector and a splat vector of NaN. An ordered comparison between a vector and a constant vector of NaN, should always be folded into a constant vector where each element is i1 false. Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar 'false'. Later on, this would cause an assertion failure, since the value type of the folded value doesn't match the expected value type of the uses of the original instruction: "Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!". This patch fixes the issue and adds a test case to the already existing test InstSimplify/floating-point-compares.ll. Differential Revision: https://reviews.llvm.org/D24143 llvm-svn: 280488	2016-09-02 14:47:43 +00:00
Alexey Bataev	ed27d69037	[InstCombine] Add test for insertelementinsts with constants. Added a tests that shows that several insertelementinsts with constant indexes/data are not folded into a single shuffleinst. llvm-svn: 280474	2016-09-02 09:00:53 +00:00
James Molloy	f3cf2a494b	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470	2016-09-02 07:29:00 +00:00
NAKAMURA Takumi	9aec5c3797	llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes. llvm-svn: 280451	2016-09-02 01:33:00 +00:00
Sanjay Patel	199a8e6a92	[InstCombine] add tests to show potential shuffle+insert folds llvm-svn: 280403	2016-09-01 19:14:19 +00:00
Sanjay Patel	dd861964d1	[InstCombine] remove fold of an icmp pattern that should never happen While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger this code path. The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic() or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast(). Differential Revision: https://reviews.llvm.org/D24031 llvm-svn: 280370	2016-09-01 14:20:43 +00:00
James Molloy	88cad7e5cf	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280364	2016-09-01 12:58:13 +00:00
James Molloy	eec6df3193	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280351	2016-09-01 10:44:35 +00:00
Hal Finkel	40d7f5c277	Add a counter-function insertion pass As discussed in https://reviews.llvm.org/D22666, our current mechanism to support -pg profiling, where we insert calls to mcount(), or some similar function, is fundamentally broken. We insert these calls in the frontend, which means they get duplicated when inlining, and so the accumulated execution counts for the inlined-into functions are wrong. Because we don't want the presence of these functions to affect optimizaton, they should be inserted in the backend. Here's a pass which would do just that. The knowledge of the name of the counting function lives in the frontend, so we're passing it here as a function attribute. Clang will be updated to use this mechanism. Differential Revision: https://reviews.llvm.org/D22825 llvm-svn: 280347	2016-09-01 09:42:39 +00:00
Nick Lewycky	97e49ac59e	Add -fprofile-dir= to clang. -fprofile-dir=path allows the user to specify where .gcda files should be emitted when the program is run. In particular, this is the first flag that causes the .gcno and .o files to have different paths, LLVM is extended to support this. -fprofile-dir= does not change the file name in the .gcno (and thus where lcov looks for the source) but it does change the name in the .gcda (and thus where the runtime library writes the .gcda file). It's different from a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX. To implement this we split -coverage-file into -coverage-data-file and -coverage-notes-file to specify the two different names. The !llvm.gcov metadata node grows from a 2-element form {string coverage-file, node dbg.cu} to 3-elements, {string coverage-notes-file, string coverage-data-file, node dbg.cu}. In the 3-element form, the file name is already "mangled" with .gcno/.gcda suffixes, while the 2-element form left that to the middle end pass. llvm-svn: 280306	2016-08-31 23:04:32 +00:00
Sanjay Patel	0d70831d73	[InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 ) allows us to remove the ConstantInt check; no other changes needed. llvm-svn: 280300	2016-08-31 22:18:43 +00:00
Sanjay Patel	541aef4661	[InstCombine] allow icmp (div X, Y), C folds for splat constant vectors Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO. llvm-svn: 280299	2016-08-31 21:57:21 +00:00
Geoff Berry	8d84605f25	[EarlyCSE] Optionally use MemorySSA. NFC. Summary: Use MemorySSA, if requested, to do less conservative memory dependency checking. This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC. Reviewers: dberlin, sanjoy, reames, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19821 llvm-svn: 280279	2016-08-31 19:24:10 +00:00
Geoff Berry	64f5ed172a	[EarlyCSE] Allow forwarding a non-invariant load into an invariant load. Reviewers: sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23935 llvm-svn: 280265	2016-08-31 17:45:31 +00:00
Philip Reames	2b1084ac93	[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we only fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any known miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250	2016-08-31 15:12:17 +00:00
James Molloy	cacfc16109	Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd" This reverts commit r280216 - it caused buildbot failures. llvm-svn: 280234	2016-08-31 13:16:52 +00:00
James Molloy	76c9d423a7	Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches" This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280233	2016-08-31 13:16:45 +00:00
James Molloy	06a45483a1	Revert "[SimplifyCFG] Add a workaround to fix PR30188" This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280232	2016-08-31 13:16:36 +00:00
James Molloy	8a66a39cbf	Revert "[SimplifyCFG] Fix bootstrap failure after r280220" This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence. llvm-svn: 280231	2016-08-31 13:16:30 +00:00
James Molloy	b7efa6c227	[SimplifyCFG] Fix bootstrap failure after r280220 We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash. llvm-svn: 280228	2016-08-31 12:33:48 +00:00
James Molloy	171fdac7ce	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280219	2016-08-31 10:46:45 +00:00
James Molloy	c53b40b509	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280217	2016-08-31 10:46:33 +00:00
James Molloy	55bd04cd20	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280216	2016-08-31 10:46:23 +00:00
James Molloy	923e98c232	[SimplifyCFG] Tail-merge calls with sideeffects This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour at least vaguely consistent with the previous version and keep it as close to NFC as I could. There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup along the way to ensure we don't create indirect calls. Should fix PR28964. llvm-svn: 280215	2016-08-31 10:46:16 +00:00
Gor Nishanov	50d7fb974f	[Coroutines] Part 10: Add coroutine promise support. Summary: 1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain a coroutine promise pointer from a coroutine frame and vice versa. 2) CoroFrame now interprets Promise argument of llvm.coro.begin to place CoroutinPromise alloca at a deterministic offset from the coroutine frame. Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll). Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23993 llvm-svn: 280184	2016-08-31 00:35:41 +00:00
Alina Sbirlea	3f8f7840bf	[LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148. Summary: LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize. A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head. e.g.: i1: store a[0] i2: store a[1] i3: store a[1] Leads to: H: i1 T: i2 i3 Instead of: H: i1 i1 T: i2 i3 So the positions for instructions that follow i3 will have different indexes in H/T. This patch resolves PR29148. This issue also surfaced the fact that if the chain is too long, and TLI returns a "not-fast" answer, the whole chain will be abandoned for vectorization, even though a smaller one would be beneficial. Added a testcase and FIXME for this. Reviewers: tstellarAMD, arsenm, jlebar Subscribers: mzolotukhin, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24057 llvm-svn: 280179	2016-08-30 23:53:59 +00:00

... 6 7 8 9 10 ...

8205 Commits