llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	77c3c5f9b8	[InstCombine] Add m_BitReverse pattern match helper. NFCI. llvm-svn: 306860	2017-06-30 18:58:29 +00:00
Anna Thomas	e5e5e59d8b	[RuntimeUnrolling] Add logic for loops with multiple exit blocks Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846	2017-06-30 17:57:07 +00:00
Nikolai Bozhenov	bde9b14c6f	Revert of r306525: "Canonicalize clamp of float types to minmax" llvm-svn: 306815	2017-06-30 10:39:09 +00:00
Ayal Zaks	8d26f0a602	[LV] Optimize for size when vectorizing loops with tiny trip count It may be detrimental to vectorize loops with very small trip count, as various costs of the vectorized loop body as well as enclosing overheads including runtime tests and scalar iterations may outweigh the gains of vectorizing. The current cost model measures the cost of the vectorized loop body only, expecting it will amortize other costs, and loops with known or expected very small trip counts are not vectorized at all. This patch allows loops with very small trip counts to be vectorized, but under OptForSize constraints, which ensure the cost of the loop body is dominant, having no runtime guards nor scalar iterations. Patch inspired by D32451. Differential Revision: https://reviews.llvm.org/D34373 llvm-svn: 306803	2017-06-30 08:02:35 +00:00
Craig Topper	880bf82685	[InstCombine] In foldXorToXor, move the commutable matcher from the LHS match to the RHS match. No meaningful change intended. There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique. This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice. llvm-svn: 306800	2017-06-30 07:37:41 +00:00
Chandler Carruth	3545a9e1f9	Remove the BBVectorize pass. It served us well, helped kick-start much of the vectorization efforts in LLVM, etc. Its time has come and past. Back in 2014: http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html Time to actually let go and move forward. =] I've updated the release notes both about the removal and the deprecation of the corresponding C API. llvm-svn: 306797	2017-06-30 07:09:08 +00:00
Daniel Jasper	3b704ceba1	Revert "r306541 - Add zero-length check to memcpy/memset load store loop expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. llvm-svn: 306793	2017-06-30 06:37:33 +00:00
Daniel Jasper	5ce1ce742e	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792	2017-06-30 06:32:21 +00:00
Max Kazantsev	8d0322e612	[SCEV] Use depth limit instead of local cache for SExt and ZExt In rL300494 there was an attempt to deal with excessive compile time on invocations of getSign/ZeroExtExpr using local caching. This approach only helps if we request the same SCEV multiple times throughout recursion. But in the bug PR33431 we see a case where we request different values all the time, so caching does not help and the size of the cache grows enormously. In this patch we remove the local cache for this methods and add the recursion depth limit instead, as we do for arithmetics. This gives us a guarantee that the invocation sequence is limited and reasonably short. Differential Revision: https://reviews.llvm.org/D34273 llvm-svn: 306785	2017-06-30 05:04:09 +00:00
Eric Christopher	a95aac3751	Reduce indenting and clean up comparisons around sign bit. llvm-svn: 306781	2017-06-30 01:57:48 +00:00
Eric Christopher	710c1c8faa	Reduce the complexity of the signbit/branch test functions. llvm-svn: 306779	2017-06-30 01:35:31 +00:00
Dehao Chen	2f31d0d86e	Hook the sample PGO machinery in the new PM Summary: This patch hooks up SampleProfileLoaderPass with the new PM. Reviewers: chandlerc, davidxl, davide, tejohnson Reviewed By: chandlerc, tejohnson Subscribers: tejohnson, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D34720 llvm-svn: 306763	2017-06-29 23:33:05 +00:00
Dinar Temirbulatov	f05c73c132	[SLPVectorizer] Moving Entry->NeedToGather check out of inner loop, since it is invariant there. NFCI. llvm-svn: 306749	2017-06-29 21:56:33 +00:00
Sam Clegg	3d65030c45	Remove `inline` keyword from inline `classof` methods The style guide states that the explicit `inline` should not be used with inline methods. classof is very common inline method with a fair amount on inconsistency: $ git grep classof ./include \| grep inline \| wc -l 230 $ git grep classof ./include \| grep -v inline \| wc -l 257 I chose to target this method rather the larger change since this method is easily cargo-culted (I did it at least once). I considered doing the larger change and removing all occurrences but that would be a much larger change. Differential Revision: https://reviews.llvm.org/D33906 llvm-svn: 306731	2017-06-29 19:35:17 +00:00
Xin Tong	02008c30b5	Remove useless header. NFC llvm-svn: 306712	2017-06-29 17:48:12 +00:00
Leo Li	20fbad9307	[ConstantHoisting] Avoid hoisting constants in GEPs that index into a struct type. Summary: Indices for GEPs that index into a struct type should always be constants. This added more checks in `collectConstantCandidates:` which make sure constants for GEP pointer type are not hoisted. This fixed Bug https://bugs.llvm.org/show_bug.cgi?id=33538 Reviewers: ributzka, rnk Reviewed By: ributzka Subscribers: efriedma, llvm-commits, srhines, javed.absar, pirama Differential Revision: https://reviews.llvm.org/D34576 llvm-svn: 306704	2017-06-29 17:03:34 +00:00
Daniel Berlin	b7df17ec59	PredicateInfo: Use OrderedInstructions instead of our homemade version. llvm-svn: 306703	2017-06-29 17:01:14 +00:00
Daniel Berlin	b779db7ebc	NewGVN: Remove useless test in addPhiOfOps. llvm-svn: 306702	2017-06-29 17:01:10 +00:00
Daniel Berlin	7c757aee38	Remove unneeded else from OrderedInstructions::dominates. llvm-svn: 306701	2017-06-29 17:01:03 +00:00
Dinar Temirbulatov	7b96266a16	[SLPVectorizer] Introducing getTreeEntry() helper function [NFC] Differential Revision: https://reviews.llvm.org/D34756 llvm-svn: 306655	2017-06-29 08:46:18 +00:00
Craig Topper	798a19ab8e	[InstCombine] In visitXor, use m_Not on the instruction itself instead of looking for all ones in Op1. This is consistent with 3 other not checks before this one. NFCI llvm-svn: 306617	2017-06-29 00:07:08 +00:00
Keno Fischer	a236dae5d1	[InstCombine] Retain TBAA when narrowing memory accesses Summary: As discussed on the mailing list it is legal to propagate TBAA to loads/stores from/to smaller regions of a larger load tagged with TBAA. Do so for (load->extractvalue)=>(gep->load) and similar foldings. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D31954 llvm-svn: 306615	2017-06-28 23:36:40 +00:00
Ayal Zaks	d9bc43ef2a	[LV] Fix PR33613 - retain order of insertelement per part r306381 caused PR33613, by reversing the order in which insertelements were generated per unroll part. This patch fixes PR33613 by retraining this order, placing each set of insertelements per part immediately after the last scalar being packed for this part. Includes a test case derived from PR33613. Reference: https://bugs.llvm.org/show_bug.cgi?id=33613 Differential Revision: https://reviews.llvm.org/D34760 llvm-svn: 306575	2017-06-28 17:59:33 +00:00
Geoff Berry	b0573547f6	[LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCount Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564	2017-06-28 17:01:15 +00:00
Sanjay Patel	4e96f19052	[InstCombine] use local variable to reduce code; NFCI llvm-svn: 306560	2017-06-28 16:39:06 +00:00
Geoff Berry	66d9bdbca8	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Teresa Johnson	538b8d25f0	Add zero-length check to memcpy/memset load store loop expansion Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 llvm-svn: 306541	2017-06-28 13:07:37 +00:00
Nikolai Bozhenov	b01e6b5a52	[InstCombine] Canonicalize clamp of float types to minmax in fast mode. Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 306525	2017-06-28 09:26:20 +00:00
Max Kazantsev	6c466a376e	[IRCE][NFC] Better get SCEV for 1 in calculateSubRanges A slightly more efficient way to get constant, we avoid resolving in getSCEV and excessive invocations, and we don't create a ConstantInt if 'true' branch is taken. Differential Revision: https://reviews.llvm.org/D34672 llvm-svn: 306503	2017-06-28 04:57:45 +00:00
Kyle Butt	f73c8a06a9	Inlining: Don't re-map simplified cloned instructions. When simplifying an instruction that has been re-mapped, it should never simplify to an instruction in the original function. In the edge case where we are inlining a function into itself, the existing code led to incorrect behavior. Replace the incorrect code with an assert verifying that we never expect simplification to produce an instruction in the old function, unless the functions are the same. Differential Revision: https://reviews.llvm.org/D33850 llvm-svn: 306495	2017-06-28 01:41:25 +00:00
Peter Collingbourne	92648c25a4	Bitcode: Write the irsymtab to disk. Differential Revision: https://reviews.llvm.org/D33973 llvm-svn: 306487	2017-06-27 23:50:11 +00:00
Geoff Berry	2573a19fe6	[EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of EarlyCSE. llvm-svn: 306477	2017-06-27 22:25:02 +00:00
Dehao Chen	920d022519	re-commit r306336: Enable vectorizer-maximize-bandwidth by default. Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473	2017-06-27 22:05:58 +00:00
Craig Topper	5fe0197622	[InstCombine] Propagate nsw flag when turning mul by pow2 into shift when the constant is a vector splat or the scalar bit width is larger than 64-bits The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less. This patch changes it to use m_APInt to remove both these issues Differential Revision: https://reviews.llvm.org/D34699 llvm-svn: 306457	2017-06-27 19:57:53 +00:00
Serge Guelton	7bc405aa4c	[CodeExtractor] Prevent extraction of block involving blockaddress BlockAddress are only valid within their function context, which does not interact well with CodeExtractor. Detect this case and prevent it. Differential Revision: https://reviews.llvm.org/D33839 llvm-svn: 306448	2017-06-27 18:57:53 +00:00
Yaxun Liu	7c44f340de	[SROA] Fix APInt size when alloca address space is not 0 SROA assumes alloca address space is 0, which causes assertion. This patch fixes that. Differential Revision: https://reviews.llvm.org/D34104 llvm-svn: 306440	2017-06-27 18:26:06 +00:00
Sanjay Patel	7227276d41	[InstCombine] canonicalize icmp predicate feeding select This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform. We have this transform for icmp+br, so unless there's some reason that icmp+select should be treated differently, we should do the same thing here. The benefit comes from increasing the chances of creating identical instructions. This is shown in the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE can simplify the identical cmps, and then InstCombine can fold the selects together. The possible regression for the tests in select.ll raises questions about poison/undef: http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html ...but that transform is just as likely to be triggered by this canonicalization as it is to be missed, so we're just pointing out a commutation deficiency in the pattern matching: https://reviews.llvm.org/rL228409 Differential Revision: https://reviews.llvm.org/D34242 llvm-svn: 306435	2017-06-27 17:53:22 +00:00
Dehao Chen	66131665c4	Enable ICP for AutoFDO. Summary: AutoFDO should have ICP enabled. Reviewers: davidxl Reviewed By: davidxl Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D34662 llvm-svn: 306429	2017-06-27 17:23:33 +00:00
Anna Thomas	dc935a6eb6	[LoopUnrollRuntime] Use SCEV exit count for calculating trip count. NFCI Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block (which is proven to be the only exiting block in the loop to be unrolled). llvm-svn: 306410	2017-06-27 14:14:35 +00:00
Ayal Zaks	fc1e210d44	Recommitting 306331. Undoing revert 306338 after fixed bug: add metadata to the load instead of the reverse shuffle added to it, retaining the original ValueMap implementation. llvm-svn: 306381	2017-06-27 08:41:19 +00:00
Chandler Carruth	3f81d8024c	[SROA] Fix PR32902 by more carefully propagating !nonnull metadata. This is based heavily on the work done ni D34285. I mostly wanted to do test cleanup for the author to save them some time, but I had a really hard time understanding why it was so hard to write better test cases for these issues. The problem is that because SROA does a second rewrite of the loads and because we don't propagate !nonnull for non-pointer loads, we first introduced invalid !nonnull metadata and then stripped it back off just in time to avoid most ways of this PR manifesting. Moving to the more careful utility only fixes this by changing the predicate to look at the new load's type rather than the target type. However, that does fix the bug, and the utility is much nicer including adding range metadata to model the nonnull property after a conversion to an integer. However, we have bigger problems because we don't actually propagate range metadata, and the utility to do this extracted from instcombine isn't really in good shape to do this currently. It only handles the case of copying range metadata from an integer load to a pointer load. It doesn't even handle the trivial cases of propagating from one integer load to another when they are the same width! This utility will need to be beefed up prior to using in this location to get the metadata to fully survive. And even then, we need to go and teach things to turn the range metadata into an assume the way we do with nonnull so that when we promote an integer we don't lose the information. All of this will require a new test case that looks kind-of like `preserve-nonnull.ll` does here but focuses on range metadata. It will also likely require more testing because it needs to correctly handle changes to the integer width, especially as SROA actively tries to change the integer width! Last but not least, I'm a little worried about hooking the range metadata up here because the instcombine logic for converting from a range metadata to a nonnull metadata node seems broken in the face of non-zero address spaces where null is not mapped to the integer `0`. So that probably needs to get fixed with test cases both in SROA and in instcombine to cover it. But this does extract the core PR fix from D34285 of preventing the !nonnull metadata from being propagated in a broken state just long enough to feed into promotion and crash value tracking. On D34285 there is some discussion of zero-extend handling because it isn't necessary. First, the new load size covers all of the non-undef (ie, possibly initialized) bits. This may even extend past the original alloca if loading those bits could produce valid data. The only way its valid for us to zero-extend an integer load in SROA is if the original code had a zero extend or those bits were undef. And we get to assume things like undef never satifies nonnull, so non undef bits can participate here. No need to special case the zero-extend handling, it just falls out correctly. The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to save a few rounds of trivial edits fixing style issues and test case formulation. Differental Revision: D34285 llvm-svn: 306379	2017-06-27 08:32:03 +00:00
Mikael Holmen	37b5120a9a	[Reassociate] Make sure EraseInst sets MadeChange Summary: EraseInst didn't report that it made IR changes through MadeChange. It is essential that changes to the IR are reported correctly, since for example ReassociatePass::run() will indicate that all analyses are preserved otherwise. And the CGPassManager determines if the CallGraph is up-to-date based on status from InstructionCombiningPass::runOnFunction(). Reviewers: craig.topper, rnk, davide Reviewed By: rnk, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34616 llvm-svn: 306368	2017-06-27 05:32:13 +00:00
Dehao Chen	8b7effb344	revert r306336 for breaking ppc test. llvm-svn: 306344	2017-06-26 23:05:35 +00:00
Ayal Zaks	3923c0c46b	reverting 306331. Causes TBAA metadata to be generates on reverse shuffles, investigating. llvm-svn: 306338	2017-06-26 22:26:54 +00:00
Dehao Chen	79655792cc	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336	2017-06-26 21:41:09 +00:00
Ayal Zaks	e7e15d186b	[LV] Changing the interface of ValueMap, NFC. Instead of providing access to the internal MapStorage holding all Values associated with a given Key, used for setting or resetting them all together, ValueMap keeps its MapStorage internal; its new interface allows getting, setting or resetting a single Value, per part or per part-and-lane. Follows the discussion in https://reviews.llvm.org/D32871. Differential Revision: https://reviews.llvm.org/D34473 llvm-svn: 306331	2017-06-26 21:03:51 +00:00
Wei Mi	71f06420e4	[GVN] Recommit the patch "Add phi-translate support in scalarpre". The recommit fixes three bugs: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. The third one is an out-of-bound array access in a flipped if guard. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. llvm-svn: 306313	2017-06-26 18:16:10 +00:00
Chandler Carruth	2abb65ae11	[InstCombine] Factor the logic for propagating !nonnull and !range metadata out of InstCombine and into helpers. NFC, this just exposes the logic used by InstCombine when propagating metadata from one load instruction to another. The plan is to use this in SROA to address PR32902. If anyone has better ideas about how to factor this or name variables, I'm all ears, but this seemed like a pretty good start and lets us make progress on the PR. This is based on a patch by Ariel Ben-Yehuda (D34285). llvm-svn: 306267	2017-06-26 03:31:31 +00:00
Chandler Carruth	4a000883c7	[LoopSimplify] Re-instate r306081 with a bug fix w.r.t. indirectbr. This was reverted in r306252, but I already had the bug fixed and was just trying to form a test case. The original commit factored the logic for forming dedicated exits inside of LoopSimplify into a helper that could be used elsewhere and with an approach that required fewer intermediate data structures. See that commit for full details including the change to the statistic, etc. The code looked fine to me and my reviewers, but in fact didn't handle indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty. If you have code that looks just right, you can end up leaking these predecessors into a subsequent rewrite, and crash deep down when trying to update PHI nodes for predecessors that don't exist. I've added an assert that makes the bug much more obvious, and then changed the code to reliably clear the vector so we don't get this bug again in some other form as the code changes. I've also added a test case that does manage to catch this while also giving some nice positive coverage in the face of indirectbr. The real code that found this came out of what I think is CPython's interpreter loop, but any code with really "creative" interpreter loops mixing indirectbr and other exit paths could manage to tickle the bug. I was hard to reduce the original test case because in addition to having a particular pattern of IR, the whole thing depends on the order of the predecessors which is in turn depends on use list order. The test case added here was designed so that in multiple different predecessor orderings it should always end up going down the same path and tripping the same bug. I hope. At least, it tripped it for me without manipulating the use list order which is better than anything bugpoint could do... llvm-svn: 306257	2017-06-25 22:45:31 +00:00
Anna Thomas	e7cb633d29	[LoopDeletion] NFC: Move phi node value setting into prepass Recommit NFC patch (rL306157) where I missed incrementing the basic block iterator, which caused loop deletion tests to hang due to infinite loop. Had reverted it in rL306162. rL306157 commit message: Currently, the implementation of delete dead loops has a special case when the loop being deleted is never executed. This special case (updating of exit block's incoming values for phis) can be run as a prepass for non-executable loops before performing the actual deletion. llvm-svn: 306254	2017-06-25 21:13:58 +00:00
Daniel Jasper	4c6cd4ccb7	Revert "[LoopSimplify] Factor the logic to form dedicated exits into a utility." This leads to a segfault. Chandler already has a test case and should be able to recommit with a fix soon. llvm-svn: 306252	2017-06-25 17:58:25 +00:00
Sanjay Patel	2f3ead7adc	[InstCombine] add (sext i1 X), 1 --> zext (not X) http://rise4fun.com/Alive/i8Q A narrow bitwise logic op is obviously better than math for value tracking, and zext is better than sext. Typically, the 'not' will be folded into an icmp predicate. The IR difference would even survive through codegen for x86, so we would see worse code: https://godbolt.org/g/C14HMF one_or_zero(int, int): # @one_or_zero(int, int) xorl %eax, %eax cmpl %esi, %edi setle %al retq one_or_zero_alt(int, int): # @one_or_zero_alt(int, int) xorl %ecx, %ecx cmpl %esi, %edi setg %cl movl $1, %eax subl %ecx, %eax retq llvm-svn: 306243	2017-06-25 14:15:28 +00:00
Xinliang David Li	b67530e9b9	[PGO] Implementate profile counter regiser promotion Differential Revision: http://reviews.llvm.org/D34085 llvm-svn: 306231	2017-06-25 00:26:43 +00:00
Hiroshi Inoue	b300824ee7	fix trivial typos in comment, NFC dereferencable -> dereferenceable llvm-svn: 306210	2017-06-24 15:43:33 +00:00
Craig Topper	7b66ffe875	[ValueTracking][InstCombine] Use m_Shr instead m_CombineOr(m_LShr, m_AShr). NFC llvm-svn: 306205	2017-06-24 06:24:04 +00:00
Craig Topper	72ee6945af	[Analysis][Transforms] Use commutable matchers instead of m_CombineOr in a few places. NFC llvm-svn: 306204	2017-06-24 06:24:01 +00:00
Vitaly Buka	df19ad456e	[InstCombine] Don't replace allocas with smaller globals Summary: InstCombine replaces large allocas with small globals consts causing buffer overflows on valid code, see PR33372. This fix permits this optimization only if the global is dereference for alloca size. Fixes PR33372 Reviewers: eugenis, majnemer, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34311 llvm-svn: 306194	2017-06-24 01:35:19 +00:00
Anna Thomas	77a2e6b198	Revert "[LoopDeletion] NFC: Move phi node value setting into prepass" This reverts commit r306157. It caused some timeouts in clang tests. Perhaps unreachable loops have far too many phi nodes. Reverting and investigating. llvm-svn: 306162	2017-06-23 21:30:48 +00:00
Anna Thomas	a43b387f27	[LoopDeletion] NFC: Move phi node value setting into prepass Currently, the implementation of delete dead loops has a special case when the loop being deleted is never executed. This special case (updating of exit block's incoming values for phis) can be run as a prepass for non-executable loops before performing the actual deletion. llvm-svn: 306157	2017-06-23 20:38:50 +00:00
Craig Topper	68ed55e06a	[CorrelatedValuePropagation] Fix typo in comment sense->since. NFC llvm-svn: 306152	2017-06-23 20:28:40 +00:00
Craig Topper	29cdfe2cd9	[CorrelatedValuePropagation] Remove comment about iterating switch cases in reverse order. This is no longer being done after r298791. NFC llvm-svn: 306151	2017-06-23 20:28:35 +00:00
Anna Thomas	91eed9ac1a	[RuntimeLoopUnrolling] Rename exit block and move assert earlier. NFC The single exit block allowed in runtime unrolling is guaranteed to be the Latch's successor, so rename it as LatchExitBlock. llvm-svn: 306105	2017-06-23 14:28:01 +00:00
Anna Thomas	d67165c93c	[InstCombine] Recognize and simplify three way comparison idioms Summary: Many languages have a three way comparison idiom where comparing two values produces not a boolean, but a tri-state value. Typical values (e.g. as used in the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1 for greater than. We actually do a great job already of converting three way comparisons into binary comparisons when the result produced has one a single use. Unfortunately, such values can have more than one use, and in that case, our existing optimizations break down. The patch adds a peephole which converts a three-way compare + test idiom into a binary comparison on the original inputs. It focused on replacing the test on the result of the three way compare and does nothing about removing the three way compare itself. That's left to other optimizations (which do actually kick in commonly.) We currently recognize one idiom on signed integer compare. In the future, we plan to recognize and simplify other comparison idioms on other signed/unsigned datatypes such as floats, vectors etc. This is a resurrection of Philip Reames' original patch: https://reviews.llvm.org/D19452 Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev Reviewed by: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34278 llvm-svn: 306100	2017-06-23 13:41:45 +00:00
Craig Topper	2c20c42cb6	[JumpThreading] Teach jump threading how to analyze (and (cmp A, C1), (cmp A, C2)) after InstCombine has turned it into (cmp (add A, C3), C4) Currently JumpThreading can use LazyValueInfo to analyze an 'and' or 'or' of compare if the compare is fed by a livein of a basic block. This can be used to to prove the condition can't be met for some predecessor and the jump from that predecessor can be moved to the false path of the condition. But if the compare is something that InstCombine turns into an add and a single compare, it can't be analyzed because the livein is now an input to the add and not the compare. This patch adds a new method to LVI to get a ConstantRange on an edge. Then we teach jump threading to detect the add livein feeding a compare and to get the ConstantRange and propagate it. Differential Revision: https://reviews.llvm.org/D33262 llvm-svn: 306085	2017-06-23 05:41:35 +00:00
Craig Topper	7927996140	[JumpThreading] Use some temporary variables to reduce the number of times we call the same methods. NFC A future patch will add even more uses of these variables. llvm-svn: 306084	2017-06-23 05:41:32 +00:00
Chandler Carruth	4ab0f4910a	[LoopSimplify] Factor the logic to form dedicated exits into a utility. I want to use the same logic as LoopSimplify to form dedicated exits in another pass (SimpleLoopUnswitch) so I wanted to factor it out here. I also noticed that there is a pretty significantly more efficient way to implement this than the way the code in LoopSimplify worked. We don't need to actually retain the set of unique exit blocks, we can just rewrite them as we find them and use only a set to deduplicate. This did require changing one part of LoopSimplify to not re-use the unique set of exits, but it only used it to check that there was a single unique exit. That part of the code is about to walk the exiting blocks anyways, so it seemed better to rewrite it to use those exiting blocks to compute this property on-demand. I also had to ditch a statistic, but it doesn't seem terribly valuable. Differential Revision: https://reviews.llvm.org/D34049 llvm-svn: 306081	2017-06-23 04:03:04 +00:00
Eric Christopher	5a7c2f1700	Remove the LoadCombine pass. It was never enabled and is unsupported. Based on discussions with the author on mailing lists. llvm-svn: 306067	2017-06-22 22:58:12 +00:00
Anna Thomas	72c90c87f8	[LoopDeletion] Update exits correctly when multiple duplicate edges from an exiting block Summary: Currently, we incorrectly update exit blocks of loops when there are multiple edges from a single exiting block to the exit block. This can happen when we have switches as the terminator of the exiting blocks. The fix here is to correctly update the phi nodes in the exit block, and remove all incoming values except for one which is from the preheader. Note: Currently, this error can manifest only while deleting non-executed loops. However, it is possible to trigger this error in invariant loops, once we enhance the logic around the exit conditions for the loop check. Reviewers: chandlerc, dberlin, sanjoy, efriedma Reviewed by: efriedma Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D34516 llvm-svn: 306048	2017-06-22 20:20:56 +00:00
Craig Topper	dffbbcb3fd	[InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2)) Summary: InstCombine likes to turn (icmp eq (and X, C1), 0) into (icmp slt (trunc (X)), 0) sometimes. This breaks foldSelectICmpAndOr's ability to recognize (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y). This patch tries to recover this. I had to flip around some of the early out checks so that I could create a new And instruction during the compare processing without it possibly never getting used. Reviewers: spatel, majnemer, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34184 llvm-svn: 306029	2017-06-22 16:23:30 +00:00
Craig Topper	0de5e6a729	[InstCombine] Add one use checks to or/and->xnor folding If the components of the and/or had multiple uses, this transform created an additional instruction. This patch makes sure we remove one of the components. Differential Revision: https://reviews.llvm.org/D34498 llvm-svn: 306027	2017-06-22 16:12:02 +00:00
Sanjay Patel	d1e811979c	[InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138) There are 2 parts to this patch made simultaneously to avoid a regression. We're reversing the canonicalization that moves bitwise vector ops before bitcasts. We're moving bitwise vector ops after bitcasts instead. That's the 1st and 3rd hunks of the patch. The motivation is that there's only one fold that currently depends on the existing canonicalization (see next), but there are many folds that would automatically benefit from the new canonicalization. PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these patterns in IR. There's an or(and,andn) pattern that requires an adjustment in order to continue matching to 'select' because the bitcast changes position. This match is unfortunately complicated because it requires 4 logic ops with optional bitcast and sext ops. Test diffs: 1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference - bitcast comes before logic. 2. There are also tests with no diffs in bitcast.ll that verify that we're still doing folds that were enabled by the previous canonicalization. 3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns to look through bitcasts. 4. logical-select.ll contains several tests for the or(and,andn) --> select fold to verify that we are still handling those cases. The lone diff shows the movement of the bitcast from the new canonicalization rule. Differential Revision: https://reviews.llvm.org/D33517 llvm-svn: 306011	2017-06-22 15:46:54 +00:00
Sanjay Patel	e800df8eac	[InstCombine] add peekThroughBitcast() helper; NFC This is an NFC portion of D33517. We have similar helpers in the backend. llvm-svn: 306008	2017-06-22 15:28:01 +00:00
Diana Picus	b512e91515	Revert "Enable vectorizer-maximize-bandwidth by default." This reverts commit r305960 because it broke self-hosting on AArch64. llvm-svn: 305990	2017-06-22 10:00:28 +00:00
Sam Clegg	705f798bff	Mark dump() methods as const. NFC Add const qualifier to any dump() method where adding one was trivial. Differential Revision: https://reviews.llvm.org/D34481 llvm-svn: 305963	2017-06-21 22:19:17 +00:00
Dehao Chen	014db29b89	Enable vectorizer-maximize-bandwidth by default. Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 305960	2017-06-21 22:01:32 +00:00
Craig Topper	34caf5396f	[Reassociate] Use early returns in a couple places to reduce indentation and improve readability. NFC llvm-svn: 305946	2017-06-21 19:39:35 +00:00
Craig Topper	99a2e89920	[Reassociate] Const correct a helper function. NFC llvm-svn: 305945	2017-06-21 19:39:33 +00:00
Craig Topper	a074c101e5	[InstCombine] Cleanup using commutable matchers. Make a couple helper methods standalone static functions. Put 'if' around variable declaration instead of after. NFC llvm-svn: 305941	2017-06-21 18:57:00 +00:00
Dehao Chen	50f2aa19e8	Do not inline recursive direct calls in sample loader pass. Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls. Reviewers: iteratee, davidxl Reviewed By: iteratee Subscribers: sanjoy, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D34456 llvm-svn: 305934	2017-06-21 17:57:43 +00:00
Craig Topper	5b173f2bb3	[InstCombine] Add range metadata to cttz/ctlz/ctpop intrinsic calls based on known bits Summary: I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know. This patch adds range metadata to these intrinsics based on the known bits. Currently the patch punts if we already have range metadata present. Reviewers: spatel, RKSimon, davide, majnemer Reviewed By: RKSimon Subscribers: sanjoy, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D32582 llvm-svn: 305927	2017-06-21 16:32:35 +00:00
Craig Topper	ae86cc725d	[InstCombine] Don't let folding (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) create more instructions than it removes Summary: Previously this folding had no checks to see if it was going to result in less instructions. This was pointed out during the review of D34184 This patch adds code to count how many instructions its going to create vs how many its going to remove so we can make a proper decision. Reviewers: spatel, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34437 llvm-svn: 305926	2017-06-21 16:07:13 +00:00
Craig Topper	cbac691c4b	[Reassociate] Support xor reassociating for splat vectors Summary: This patch adds support for xors of splat vectors. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34354 llvm-svn: 305925	2017-06-21 16:07:09 +00:00
Davide Italiano	0ec715be1f	[NewGVN] Fix a bug that made the store verifier less effective. We weren't actually checking for duplicated stores, as the condition was always actually false. This was found by Coverity, and I have no clue how to trigger this in real-world code (although I tried for a bit). llvm-svn: 305867	2017-06-20 22:57:40 +00:00
Sanjay Patel	4ccbd58d70	[InstCombine] fix code/test comments for r305792; NFC These diffs were in the last version of the patch in D33342, but I accidentally committed the previous rev. llvm-svn: 305793	2017-06-20 12:45:46 +00:00
Sanjay Patel	adca825dc1	[InstCombine] try to canonicalize xor-of-icmps to and-of-icmps We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine, but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds, we can use the truth table definition of xor: X ^ Y --> (X \| Y) & !(X & Y) ...to see if we can convert the xor to and/or and then use the existing folds. http://rise4fun.com/Alive/J9v Differential Revision: https://reviews.llvm.org/D33342 llvm-svn: 305792	2017-06-20 12:40:55 +00:00
Vedant Kumar	b5794ca90c	[ProfileData] PR33517: Check for failure of symtab creation With PR33517, it became apparent that symbol table creation can fail when presented with malformed inputs. This patch makes that sort of error detectable, so llvm-cov etc. can fail more gracefully. Specifically, we now check that function names within the symbol table aren't empty. Testing: check-{llvm,clang,profile}, some unit test updates. llvm-svn: 305765	2017-06-20 01:38:56 +00:00
Ana Pazos	f731bde064	[PATCH] [PGO] Fixed cast operation in emIntrinsicVisitor::instrumentOneMemIntrinsic. Reviewers: xur, efriedma, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34293 llvm-svn: 305737	2017-06-19 20:04:33 +00:00
Taewook Oh	9083547ae3	Improve profile-guided heuristics to use estimated trip count. Summary: Existing heuristic uses the ratio between the function entry frequency and the loop invocation frequency to find cold loops. However, even if the loop executes frequently, if it has a small trip count per each invocation, vectorization is not beneficial. On the other hand, even if the loop invocation frequency is much smaller than the function invocation frequency, if the trip count is high it is still beneficial to vectorize the loop. This patch uses estimated trip count computed from the profile metadata as a primary metric to determine coldness of the loop. If the estimated trip count cannot be computed, it falls back to the original heuristics. Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32451 llvm-svn: 305729	2017-06-19 18:48:58 +00:00
Bjorn Pettersson	475fcd9cd8	[InstCombine] Make sure AddReachableCodeToWorklist sets MadeIRChange Summary: Some optimizations in AddReachableCodeToWorklist did not update the MadeIRChange state. This could happen both when removing trivially dead instructions (DCE) and at constant folds. It is essential that changes to the IR is reported correctly, since for example InstCombinePass::run() will indicate that all analyses are preserved otherwise. And the CGPassManager determines if the CallGraph is up-to-date based on status from InstructionCombiningPass::runOnFunction(). The new test case early_dce_clobbers_callgraph.ll is a reproducer for some asserts that started to trigger after changes in the inliner in r305245. With this patch the test case passes again. Reviewers: sanjoy, craig.topper, dblaikie Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34346 llvm-svn: 305725	2017-06-19 18:00:27 +00:00
Hans Wennborg	ca69fc1cb7	Revert r304824 "Fix PR23384 (part 3 of 3)" This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720	2017-06-19 17:57:15 +00:00
Davide Italiano	daa9c0e403	[NewGVN] Simplify findConditionEquivalence(). NFCI. llvm-svn: 305707	2017-06-19 16:46:15 +00:00
Dinar Temirbulatov	e2c6991c07	Remove brackets, NFC. llvm-svn: 305706	2017-06-19 16:44:07 +00:00
Craig Topper	a7529b68cc	[InstCombine] Cleanup some duplicated one use checks Summary: These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called. For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count. For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything. Reviewers: spatel, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34308 llvm-svn: 305705	2017-06-19 16:23:49 +00:00
Craig Topper	ef85498e05	[Reassociate] Support some reassociation of vector xors Summary: Currently we don't try to do anything with vector xors. This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch. Reviewers: mcrosier, davide, resistor Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34338 llvm-svn: 305704	2017-06-19 16:23:46 +00:00
Craig Topper	4350734d36	[Reassociate] Make one of the helper methods static because it doesn't use any class variables. NFC llvm-svn: 305703	2017-06-19 16:23:43 +00:00
Anna Thomas	7949f4529a	[JumpThreading][LVI] Invalidate LVI information after blocks are merged Summary: After a single predecessor is merged into a basic block, we need to invalidate the LVI information for the new merged block, when LVI is not provably true for all of instructions in the new block. The test cases added show the correct LVI information using the LVI printer pass. Reviewers: reames, dberlin, davide, sanjoy Reviewed by: dberlin, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34108 llvm-svn: 305699	2017-06-19 15:23:33 +00:00
Xin Tong	b412831d11	[TRE] Improve code motion in TRE, use AA to tell whether a load can be moved before a call that writes to memory. Summary: use AA to tell whether a load can be moved before a call that writes to memory. Reviewers: dberlin, davide, sanjoy, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D34115 llvm-svn: 305698	2017-06-19 15:21:18 +00:00
Daniel Berlin	36b08b2088	NewGVN: Fix PR 33461, caused by slightly overzealous verification. llvm-svn: 305657	2017-06-19 00:24:00 +00:00
Craig Topper	d96177cf72	[Reassociate] Use APInt::isNullValue() instead of comparing with 0. NFC This should compile to slightly better code. llvm-svn: 305651	2017-06-18 18:15:38 +00:00
Xin Tong	9d2a5b1cf7	Add argmononly attribute to strlen and wcslen, i.e. they only read memory (string) passed to them. Summary: This allows strlen to be moved out of the loop in case its argument is not modified in the loop in LICM. Reviewers: hfinkel, davide, sanjoy, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34323 llvm-svn: 305641	2017-06-18 03:10:26 +00:00
Sanjoy Das	b70ddd8901	[SROA] Add support for non-integral pointers Summary: C.f. http://llvm.org/docs/LangRef.html#non-integral-pointer-type Reviewers: chandlerc, loladiro Reviewed By: loladiro Subscribers: reames, loladiro, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D32203 llvm-svn: 305639	2017-06-17 20:28:13 +00:00
Xin Tong	025780ba6e	[TRE] Add assertion for folding trivial return block llvm-svn: 305637	2017-06-17 16:55:12 +00:00
Xin Tong	d5b4d0b53a	[TRE] Update comments. NFC llvm-svn: 305636	2017-06-17 16:18:36 +00:00
Wei Mi	c7ba876323	Revert rL305578. There is still some buildbot failure to be fixed. llvm-svn: 305603	2017-06-16 23:14:35 +00:00
Anna Thomas	6bc14c65ad	[InstCombine] Set correct insertion point for selects generated while folding phis Summary: When we fold vector constants that are operands of phi's that feed into select, we need to set the correct insertion point for the new selects that get generated. The correct insertion point is the incoming block for the phi. Such cases can occur with patch r298845, which fixed folding of vector constants, but the new selects could be inserted incorrectly (as the added test case shows). Reviewers: majnemer, spatel, sanjoy Reviewed by: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34162 llvm-svn: 305591	2017-06-16 21:08:37 +00:00
Davide Italiano	ec5b0257bf	[SCCP] Simplify the code a bit. NFCI. llvm-svn: 305583	2017-06-16 20:50:31 +00:00
Davide Italiano	0b1190aa8d	[SCCP] Clarify a comment about unhandled instructions. llvm-svn: 305579	2017-06-16 20:27:17 +00:00
Wei Mi	a2493b6ad9	[GVN] Recommit the patch "Add phi-translate support in scalarpre". The recommit fixes two bugs: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 305578	2017-06-16 20:21:01 +00:00
Davide Italiano	d95d871af0	[SCCP] Remove redundant instruction visitors. Whenever we don't know what to do with an instruction, we send it to overdefined anyway. llvm-svn: 305575	2017-06-16 19:43:57 +00:00
Xinliang David Li	c3f8e83253	Fix function name /NFC llvm-svn: 305564	2017-06-16 16:54:13 +00:00
Daniel Neilson	3faabbbe85	[Atomics] Rename and change prototype for atomic memcpy intrinsic Summary: Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic. Reviewers: reames, sanjoy, efriedma Reviewed By: reames Subscribers: mzolotukhin, anna, llvm-commits, skatkov Differential Revision: https://reviews.llvm.org/D33240 llvm-svn: 305558	2017-06-16 14:43:59 +00:00
Craig Topper	da6ea0d3e8	[InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 \| K2)) == (K1 \| K2) if K1 and K2 are a 1-bit mask Summary: This is the demorganed version of the case we already handle for the OR of iszero. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34244 llvm-svn: 305548	2017-06-16 05:10:37 +00:00
Craig Topper	4d76da4fc9	[CorrelatedValuePropagation] Remove superfluous semicolon. NFC llvm-svn: 305538	2017-06-16 01:53:20 +00:00
Evgeniy Stepanov	4d4ee93d25	[cfi] CFI-ICall for ThinLTO. Implement ControlFlowIntegrity for indirect function calls in ThinLTO. Design follows the RFC in llvm-dev, see https://groups.google.com/d/msg/llvm-dev/MgUlaphu4Qc/kywu0AqjAQAJ llvm-svn: 305533	2017-06-16 00:18:29 +00:00
Xinliang David Li	eea0ade2eb	[PartialInlining] Code Refactoring This is a NFC code refactoring and interface cleanup. This paves the way to enable outlining-only mode for the partial inliner. llvm-svn: 305530	2017-06-15 23:56:59 +00:00
Craig Topper	2ba991ff2c	[InstCombine] Add two FIXMEs for bad single use checks. NFC llvm-svn: 305510	2017-06-15 21:38:48 +00:00
Teresa Johnson	152277952e	Split PGO memory intrinsic optimization into its own source file Summary: Split the PGOMemOPSizeOpt pass out from IndirectCallPromotion.cpp into its own file. Reviewers: davidxl Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D34248 llvm-svn: 305501	2017-06-15 20:23:57 +00:00
Craig Topper	f2d3e6d3d5	[InstCombine] Make the context instruction parameter of foldOrOfICmps a reference to discourage passing nullptr and to remove the '&' from all of the call sites. NFC llvm-svn: 305493	2017-06-15 19:09:51 +00:00
Craig Topper	6eec9e21a5	[InstCombine] Handle (iszero(A & K1) \| iszero(A & K2)) -> (A & (K1 \| K2)) != (K1 \| K2) when the one of the Ands is commuted relative to the other Currently we expect A to be on the same side in both Ands but nothing guarantees that. While there also switch to using matchers for some of the code. Differential Revision: https://reviews.llvm.org/D34230 llvm-svn: 305487	2017-06-15 17:55:20 +00:00
Max Kazantsev	dc80366d52	[ScalarEvolution] Apply Depth limit to getMulExpr This is a fix for PR33292 that shows a case of extremely long compilation of a single .c file with clang, with most time spent within SCEV. We have a mechanism of limiting recursion depth for getAddExpr to avoid long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr and back that do not propagate the info about depth. As result of this, a chain getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr can be extremely long, with every segment of getAddExpr's being up to max depth long. This leads either to long compilation or crash by stack overflow. We face this situation while analyzing big SCEVs in the test of PR33292. This patch applies the same limit on max expression depth for getAddExpr and getMulExpr. Differential Revision: https://reviews.llvm.org/D33984 llvm-svn: 305463	2017-06-15 11:48:21 +00:00
George Karpenkov	406c113103	Fixing section name for Darwin platforms for sanitizer coverage On Darwin, section names have a 16char length limit. llvm-svn: 305429	2017-06-14 23:40:25 +00:00
Daniel Berlin	6d2db9edb2	PredicateInfo: Don't insert conditional info when a conditional branch jumps to the same target regardless of condition llvm-svn: 305416	2017-06-14 21:19:52 +00:00
Daniel Berlin	51e878e01d	NewGVN: This is wrong by inspection, it will not cause an issue currently due to other limitations, i believe. This also means i can't make a test for it. llvm-svn: 305415	2017-06-14 21:19:28 +00:00
Davide Italiano	0dc4778067	[EarlyCSE] Make PhiToCheck in removeMSSA() a set. This way we end up not looking at PHI args already removed. MemSSA now goes through the updater so we can prune it to avoid having redundant MemoryPHI arguments, but that doesn't quite work for the general case. Discussed with Daniel Berlin, fixes PR33406. llvm-svn: 305409	2017-06-14 19:29:53 +00:00
Frederich Munch	dceb612eeb	Hide dbgs() stream for when built with -fmodules. Summary: Make DebugCounter::print and dump methods to be const correct. Reviewers: aprantl Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34214 llvm-svn: 305408	2017-06-14 19:16:22 +00:00
Vedant Kumar	9c056c9e1b	[InstrProf] Don't take the address of alwaysinline available_externally functions Doing so breaks compilation of the following C program (under -fprofile-instr-generate): __attribute__((always_inline)) inline int foo() { return 0; } int main() { return foo(); } At link time, we fail because taking the address of an available_externally function creates an undefined external reference, which the TU cannot provide. Emitting the function definition into the object file at all appears to be a violation of the langref: "Globals with 'available_externally' linkage are never emitted into the object file corresponding to the LLVM module." Differential Revision: https://reviews.llvm.org/D34134 llvm-svn: 305327	2017-06-13 22:12:35 +00:00
Teresa Johnson	8015f88525	[PGO] Update VP metadata after memory intrinsic optimization Summary: Leave an updated VP metadata on the fallback memcpy intrinsic after specialization. This can be used for later possible expansion based on the average of the remaining values. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34164 llvm-svn: 305321	2017-06-13 20:44:08 +00:00
Frederich Munch	6391c7e2a1	Revert r305313 & r305303, self-hosting build-bot isn’t liking it. llvm-svn: 305318	2017-06-13 19:05:24 +00:00
Frederich Munch	4c73b40dca	Force RegisterStandardPasses to construct std::function in the IPO library. Summary: Fixes an issue using RegisterStandardPasses from a statically linked object before PassManagerBuilder::addGlobalExtension is called from a dynamic library. Reviewers: efriedma, theraven Reviewed By: efriedma Subscribers: mehdi_amini, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D33515 llvm-svn: 305303	2017-06-13 16:48:41 +00:00
David Blaikie	6d0f39476a	Inliner: Avoid calling shouldInline until it's absolutely necessary This restores the order of evaluation (& conditionalized evaluation) of isTriviallyDeadInstruction, InlineHistoryIncludes, and shouldInline (with the addition of a shouldInline call after isTriviallyDeadInstruction) from before r305245. llvm-svn: 305267	2017-06-13 02:24:09 +00:00
George Burgess IV	f613749382	Fix signed/unsigned comparison warning; NFC llvm-svn: 305262	2017-06-13 01:28:49 +00:00
David Blaikie	ae8c4af4ac	Inliner: Don't remove calls to readnone+nounwind (but not always_inline) functions in the AlwaysInliner llvm-svn: 305245	2017-06-12 23:01:17 +00:00
Anna Thomas	4b027e8f89	[RS4GC] Drop invalid metadata after pointers are relocated Summary: After RS4GC, we should drop metadata that is no longer valid. These metadata is used by optimizations scheduled after RS4GC, and can cause a miscompile. One such metadata is invariant.load which is used by LICM sinking transform. After rewriting statepoints, the address of a load maybe relocated. With invariant.load metadata on a load instruction, LICM sinking assumes the loaded value (from a dererenceable address) to be invariant, and rematerializes the load operand and the load at the exit block. This transforms the IR to have an unrelocated use of the address after a statepoint, which is incorrect. Other metadata we conservatively remove are related to dereferenceability and noalias metadata. This patch drops such metadata on store and load instructions after rewriting statepoints. Reviewers: reames, sanjoy, apilipenko Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33756 llvm-svn: 305234	2017-06-12 21:26:53 +00:00
Sanjay Patel	2e33bbaff0	[InstCombine] lshr (sext iM X to iN), N-M --> zext (ashr X, min(N-M, M-1)) to iN This is a follow-up to https://reviews.llvm.org/D33879 / https://reviews.llvm.org/rL304939 , and was discussed in https://reviews.llvm.org/D33338. We prefer this form because a narrower shift may be cheaper, and we can more easily fold a zext than a sext. http://rise4fun.com/Alive/slVe Name: shz %s = sext i8 %x to i12 %r = lshr i12 %s, 4 => %a = ashr i8 %x, 4 %r = zext i8 %a to i12 llvm-svn: 305190	2017-06-12 14:23:43 +00:00
Xinliang David Li	7ed6cd32ea	[PartialInlining] Support shrinkwrap life_range markers Differential Revision: http://reviews.llvm.org/D33847 llvm-svn: 305170	2017-06-11 20:46:05 +00:00
Geoff Berry	3cca1da20c	[EarlyCSE] Add option to use MemorySSA for function simplification run of EarlyCSE (off by default). Summary: Use MemorySSA for memory dependency checking in the EarlyCSE pass at the start of the function simplification portion of the pipeline. We rely on the fact that GVNHoist runs just after this pass of EarlyCSE to amortize the MemorySSA construction cost since GVNHoist uses MemorySSA and EarlyCSE preserves it. This is turned off by default. A follow-up change will turn it on to allow for easier reversion in case it breaks something. llvm-svn: 305146	2017-06-10 15:20:03 +00:00
Andrew Kaylor	647025f9e1	[InstSimplify] Don't constant fold or DCE calls that are marked nobuiltin Differential Revision: https://reviews.llvm.org/D33737 llvm-svn: 305132	2017-06-09 23:18:11 +00:00
Yaxun Liu	6455b0dbf3	[SROA] Fix APInt size when load/store have different address space Currently there is a bug in SROA::presplitLoadsAndStores which causes assertion in GEPOperator::accumulateConstantOffset. Basically it does not consider the situation that the pointer operand of load or store may be in a non-zero address space and its size may be different from the size of a pointer in address space 0. This patch fixes assertion when compiling Blender Cycles kernels for amdgpu backend. Diffferential Revision: https://reviews.llvm.org/D33298 llvm-svn: 305107	2017-06-09 20:46:29 +00:00
Keno Fischer	5329174cb1	[Sink] Fix predicate in legality check Summary: isSafeToSpeculativelyExecute is the wrong predicate to use here. All that checks for is whether it is safe to hoist a value due to unaligned/un-dereferencable accesses. However, not only are we doing sinking rather than hoisting, our concern is that the location we're loading from may have been modified. Instead forbid sinking any load across a critical edge. Reviewers: majnemer Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D33179 llvm-svn: 305102	2017-06-09 19:31:10 +00:00
Sanjay Patel	70db424601	[SimplifyLibCalls] fix formatting; NFC llvm-svn: 305081	2017-06-09 14:22:03 +00:00
Serguei Katkov	38414b57f9	[IndVars] Add an option to be able to disable LFTR This change adds an option disable-lftr to be able to disable Linear Function Test Replace optimization. By default option is off so current behavior is not changed. Reviewers: reames, sanjoy, wmi, andreadb, apilipenko Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33979 llvm-svn: 305055	2017-06-09 06:11:59 +00:00
George Burgess IV	a20352e13e	[LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops. If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations. If we do, we get incorrect optimizations in cases like: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; } which gets optimized to: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] \| 128; } Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`. InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up. llvm-svn: 305053	2017-06-09 03:56:15 +00:00
David Blaikie	cb9327b02d	Inliner: Don't touch indirect calls Other comments/implications are that this isn't intended behavior (nor perserved/reimplemented in the new inliner) & complicates fixing the 'inlining' of trivially dead calls without consulting the cost function first. llvm-svn: 305052	2017-06-09 03:29:20 +00:00
Craig Topper	a420562257	[InstCombine] Pass a proper context instruction to all of the calls into InstSimplify Summary: This matches the behavior we already had for compares and makes us consistent everywhere. Reviewers: dberlin, hfinkel, spatel Reviewed By: dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33604 llvm-svn: 305049	2017-06-09 03:21:29 +00:00
Evgeniy Stepanov	d02dbf6b1c	[CFI] Remove LinkerSubsectionsViaSymbols. Since D17854 LinkerSubsectionsViaSymbols is unnecessary. It is interfering with ThinLTO implementation of CFI-ICall, where the aliases used on the !LinkerSubsectionsViaSymbols branch are needed to export jump tables to ThinLTO backends. This is the second attempt to land this change after fixing PR33316. llvm-svn: 305031	2017-06-08 23:38:22 +00:00
Craig Topper	2aa4d39f5e	[ExtractGV] Fix the doxygen comment on the constructor and the class to refer to global values instead of functions. While there fix an 80 column violation. NFC llvm-svn: 305030	2017-06-08 23:38:19 +00:00
Peter Collingbourne	e357fbd243	Write summaries for merged modules when splitting modules for ThinLTO. This is to prepare to allow for dead stripping of globals in the merged modules. Differential Revision: https://reviews.llvm.org/D33921 llvm-svn: 305027	2017-06-08 23:01:49 +00:00
Kostya Serebryany	2c2fb8896b	[sanitizer-coverage] one more flavor of coverage: -fsanitize-coverage=inline-8bit-counters. Experimental so far, not documenting yet. Reapplying revisions 304630, 304631, 304632, 304673, see PR33308 llvm-svn: 305026	2017-06-08 22:58:19 +00:00
Dehao Chen	e2a428bad7	Do not early-inline recursive calls in sample profile loader. Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it. Reviewers: davidxl, dnovillo, iteratee Reviewed By: iteratee Subscribers: iteratee, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D34017 llvm-svn: 305009	2017-06-08 20:11:57 +00:00
Galina Kistanova	e128958552	Changed a comparison operator for std::stable_sort to implement strict weak ordering. This is a temporarily fix which needs additional work, as it triggers a test3 failure. test3 is commented out till then. llvm-svn: 304993	2017-06-08 17:27:40 +00:00

1 2 3 4 5 ...

18348 Commits