llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	8174281b93	Revert r333226 "[ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive." This breaks some libFuzzer tests. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer/builds/15589/steps/check-fuzzer/logs/stdio Reverting to investigate llvm-svn: 333253	2018-05-25 04:01:56 +00:00
Vedant Kumar	4872535eb9	[Debugify] Set a DI version module flag for llc compatibility Setting the "Debug Info Version" module flag makes it possible to pipe synthetic debug info into llc, which is useful for testing backends. llvm-svn: 333237	2018-05-24 23:00:23 +00:00
Vedant Kumar	b70e35686b	[Debugify] Avoid printing unnecessary square braces, NFC llvm-svn: 333236	2018-05-24 23:00:22 +00:00
Craig Topper	49f23fe349	[ValueTracking] Teach computeKnownBits that the result of an absolute value pattern that uses nsw flag is always positive. If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive. This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases. Differential Revision: https://reviews.llvm.org/D47041 llvm-svn: 333226	2018-05-24 21:22:51 +00:00
Warren Ristow	d3efa9429f	[InstCombine] Enable more reassociations using FMF 'reassoc' + 'nsz' Reassociation of math ops in some contexts (especially vector contexts) has generally only been happening when the 'fast' FMF was set. This enables reassoication when only the finer grained controls 'reassoc' and 'nsz' are set. Differential Revision: https://reviews.llvm.org/D47335 llvm-svn: 333221	2018-05-24 20:16:43 +00:00
Jun Bum Lim	dfbe6fa832	[LICM] Preserve DT and LoopInfo specifically Summary: In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically, instead of all analyses that depend on the CFG (setPreservesCFG()). This change should fix PR37323. Reviewers: uabelho, davide, dberlin, Ka-Ka Reviewed By: dberlin Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D46775 llvm-svn: 333198	2018-05-24 15:58:34 +00:00
Chad Rosier	274d72faad	[InstCombine] Combine XOR and AES instructions on ARM/ARM64. The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in the instruction. Therefore, if the AES key is zero and the AES data was previously XORed, it can be combined into a single instruction. Differential Revision: https://reviews.llvm.org/D47239 Patch by Michael Brase! llvm-svn: 333193	2018-05-24 15:26:42 +00:00
Karl-Johan Karlsson	478232d52f	[NaryReassociate] Detect deleted instr with WeakVH Summary: If NaryReassociate succeed it will, when replacing the old instruction with the new instruction, also recursively delete trivially dead instructions from the old instruction. However, if the input to the NaryReassociate pass contain dead code it is not save to recursively delete trivially deadinstructions as it might lead to deleting the newly created instruction. This patch will fix the problem by using WeakVH to detect this rare case, when the newly created instruction is dead, and it will then restart the basic block iteration from the beginning. This fixes pr37539 Reviewers: tra, meheff, grosser, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47139 llvm-svn: 333155	2018-05-24 06:09:02 +00:00
Changpeng Fang	5f9154618e	StructurizeCFG: Adjust the loop depth for a subregion to order the nodes correctly Summary: StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order. The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops. To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited before moving on to outer loop. However, we found a problem for a SubRegion which is a loop itself: --> BB1 --> BB2 --> BB3 --> In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead BB2 to be placed in the wrong order. In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth to guard the sorting. Reviewers: arsenm, jlebar Differential Revision: https://reviews.llvm.org/D46912 llvm-svn: 333111	2018-05-23 18:34:48 +00:00
Roman Lebedev	6b6c553bb8	[InstCombine] Fold unfolded masked merge pattern with variable mask! Summary: Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]]. Now that the backend is all done, we can finally fold it! The canonical unfolded masked merge pattern is ```(x & m) \| (y & ~m)``` There is a second, equivalent variant: ```(x \| ~m) & (y \| m)``` Only one of them (the or-of-and's i think) is canonical. And if the mask is not a constant, we should fold it to: ```((x ^ y) & M) ^ y``` https://rise4fun.com/Alive/ndQw Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: nicholas, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D46814 llvm-svn: 333106	2018-05-23 17:47:52 +00:00
Craig Topper	3b768e8602	[InstCombine] Negate ABS/NABS patterns by swapping the select operands to remove the negation Differential Revision: https://reviews.llvm.org/D47236 llvm-svn: 333101	2018-05-23 17:29:03 +00:00
Max Kazantsev	d99f3bacb4	[LoopUnswitch] Fix SCEV invalidation in unswitching Loop unswitching makes substantial changes to a loop that can also affect cached SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the innermost loop in case of full unswitching and does not invalidate anything at all in case of trivial unswitching. As result, we may end up with incorrect data in cache. Differential Revision: https://reviews.llvm.org/D46045 Reviewed By: mzolotukhin llvm-svn: 333072	2018-05-23 10:09:53 +00:00
Piotr Padlewski	d6f7346a4b	Fix aliasing of launder.invariant.group Summary: Patch for capture tracking broke bootstrap of clang with -fstict-vtable-pointers which resulted in debbugging nightmare. It was fixed https://reviews.llvm.org/D46900 but as it turned out, there were other parts like inliner (computing of noalias metadata) that I found after bootstraping with enabled assertions. Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47088 llvm-svn: 333070	2018-05-23 09:16:44 +00:00
David Bolvansky	cd3eb99016	[InstCombine] [NFC] Added more tests for unlocked IO transformation Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47243 llvm-svn: 333057	2018-05-23 03:01:45 +00:00
Sanjay Patel	4b96935bd7	[InstCombine] use nsw negation for abs libcalls Also, produce the canonical IR abs (s<0) to be more efficient. This is the libcall equivalent of the clang builtin change from: rL333038 Pasting from that commit message: The stdlib functions are defined in section 7.20.6.1 of the C standard with: "If the result cannot be represented, the behavior is undefined." That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would be UB/poison. llvm-svn: 333042	2018-05-22 23:29:40 +00:00
Sanjay Patel	3ef8f858da	[InstCombine] move misplaced test file and regenerate checks; NFC llvm-svn: 333039	2018-05-22 23:15:56 +00:00
David Bolvansky	88e262bcdd	Delete empty test file Differential Revision: https://reviews.llvm.org/D47230 llvm-svn: 333031	2018-05-22 21:47:08 +00:00
David Bolvansky	1f343fa0e0	[InstCombine] Remove calloc transformations Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework. Reviewers: efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47218 llvm-svn: 333022	2018-05-22 20:27:36 +00:00
Sanjay Patel	9781679f0f	[InstCombine] move/add tests for sub with bool op; NFC llvm-svn: 333012	2018-05-22 18:50:06 +00:00
Florian Hahn	a6e63f176c	[NewGVN] Fix handling of assumes This patch fixes two bugs: * test1: Previously assume(a >= 5) concluded that a == 5. That's only valid for assume(a == 5)... * test2: If operands were swapped, additional users were added to the wrong cmp operand. This resulted in an "unsettled iteration" assertion failure. Patch by Nikita Popov Differential Revision: https://reviews.llvm.org/D46974 llvm-svn: 333007	2018-05-22 17:38:22 +00:00
Sanjay Patel	dd5fb8f03f	[InstCombine] fix broken test Looks like the last line got chopped off from rL332990. llvm-svn: 332992	2018-05-22 16:14:16 +00:00
David Bolvansky	41f4b64ee1	[InstCombine] Calloc-ed strings optimizations Summary: Example cases: strlen(calloc(...)) -> 0 Reviewers: efriedma, bkramer Reviewed By: bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47059 llvm-svn: 332990	2018-05-22 15:41:23 +00:00
Karl-Johan Karlsson	11d68a619e	[LowerSwitch] Fixed faulty PHI node update Summary: When lowerswitch merge several cases into a new default block it's not updating the PHI nodes accordingly. The code that update the PHI nodes for the default edge only update the first entry and do not remove the remaining ones, to make sure the number of entries match the number of predecessors. This is easily fixed by replacing the code that update the PHI node with the already existing utility function for updating PHI nodes. Reviewers: hans, reames, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47055 llvm-svn: 332960	2018-05-22 08:46:48 +00:00
Bjorn Pettersson	fecef6be9e	[LoopVersioning] Don't modify the list that we iterate over in addPHINodes Summary: In LoopVersioning::addPHINodes we need to iterate over all users for a value "Inst", and if the user is outside of the VersionedLoop we should replace the use of "Inst" by using the value "PN" instead. Replacing the use of "Inst" for a user of "Inst" also means that Inst->users() is modified. So it is not safe to do the replace while iterating over Inst->users() as we used to do. This patch splits the task into two steps. First we iterate over Inst->users() to find all users that should be updated. Those users are saved into a local data structure on the stack. And then, in the second step, we do the actual updates. This time iterating over the local data structure. Reviewers: mzolotukhin, anemet Reviewed By: mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47134 llvm-svn: 332958	2018-05-22 08:33:02 +00:00
Stanislav Mekhanoshin	0e132dca53	[AMDGPU] Optimze old value of v_mov_b32_dpp We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf. This is alternative implementation working with the intrinsic in InstCombine. Original review for past-ISel optimization: D46570. Differential Revision: https://reviews.llvm.org/D46596 llvm-svn: 332956	2018-05-22 08:04:33 +00:00
Matt Arsenault	1349a04ef5	AMDGPU: Make v2i16/v2f16 legal on VI This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953	2018-05-22 06:32:10 +00:00
Sanjay Patel	ec50effbd6	[InstCombine] regenerate checks; NFC llvm-svn: 332894	2018-05-21 21:09:14 +00:00
Sanjay Patel	94b1f846b2	[InstCombine] add tests for cast-of-select; NFC In all cases, we're pulling the cast above the select. That's not a good canonicalization if we're creating a select that then mismatches the operand size of its condition. llvm-svn: 332883	2018-05-21 20:23:58 +00:00
Craig Topper	f14e62c9a5	[EarlyCSE] Improve EarlyCSE of some absolute value cases. Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous. Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same. Differential Revision: https://reviews.llvm.org/D47037 llvm-svn: 332865	2018-05-21 18:42:42 +00:00
Diego Caballero	168d04d544	[VPlan] Reland r332654 and silence unused func warning r332654 was reverted due to an unused function warning in release build. This commit includes the same code with the warning silenced. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332860	2018-05-21 18:14:23 +00:00
Alexey Bataev	7c9ad0db3d	[InstCombine] Fix PR37526: MinMax patterns produce an infinite loop. Summary: This patch fixes PR37526 by simplifying the newly generated LoadInst instructions. If the pointer address is a bitcast from the pointer to the NewType, we can just remove this extra bitcast instead of creating the new one. This fixes the PR37526 + may speed up the whole compilation process. Reviewers: spatel, RKSimon, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47144 llvm-svn: 332855	2018-05-21 17:46:34 +00:00
Nico Weber	e4a12cfa2f	revert r332610, it breaks cfi, see D46326 llvm-svn: 332838	2018-05-21 11:44:39 +00:00
David Green	8ceab61c75	[CVP] Require DomTree for new Pass Manager We were previously using a DT in CVP through SimplifyQuery, but not requiring it in the new pass manager. Hence it would crash if DT was not already available. This now gets DT directly and plumbs it through to where it is used (instead of using it through SQ). llvm-svn: 332836	2018-05-21 11:06:28 +00:00
Craig Topper	e4c045b7df	[X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead. Someday maybe we'll use selects for all intrinsics. llvm-svn: 332824	2018-05-20 23:34:04 +00:00
Sanjay Patel	a003c728a5	[InstCombine] choose 1 form of abs and nabs as canonical We already do this for min/max (see the blob above the diff), so we should do the same for abs/nabs. A sign-bit check (<s 0) is used as a predicate for other IR transforms and it's likely the best for codegen. This might solve the motivating cases for D47037 and D47041, but I think those patches still make sense. We can't guarantee this canonicalization if the icmp has more than one use. Differential Revision: https://reviews.llvm.org/D47076 llvm-svn: 332819	2018-05-20 14:23:23 +00:00
Max Kazantsev	c0b268f90c	[IRCE] Fix miscompile with range checks against negative values In the patch rL329547, we have lifted the over-restrictive limitation on collected range checks, allowing to work with range checks with the end of their range not being provably non-negative. However it appeared that the non-negativity of this value was assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative. The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK) and the second time with `X = IRC.getEnd()`, where we may now see the problem if the end is actually a negative value. In this case, we may sometimes miscompile. This patch is the conservative fix of the miscompile problem. Rather than rejecting non-provably non-negative `getEnd()` values, we will check it for non-negativity in runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X` is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original `[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative. So we in fact prohibit execution of the main loop if at least one of range checks was made against a negative value (and we figured it out in runtime). It is still better than what we have before (non-negativity had to be proved in compile time) and prevents us from miscompile, however it is sometiles too restrictive for unsigned range checks against a negative value (which in fact can be eliminated). Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly, this limitation can be lifted, too. Differential Revision: https://reviews.llvm.org/D46860 Reviewed By: samparker llvm-svn: 332809	2018-05-19 13:06:37 +00:00
Benjamin Kramer	a76b64ff80	[MergeICmps] Don't crash when memcmp is not available Fixes clang crashing with -fno-builtin, PR37527. llvm-svn: 332808	2018-05-19 12:51:59 +00:00
Yaxun Liu	ea988f1fd9	Fix evaluator for non-zero alloca addr space The evaluator goes through BB and creates global vars as temporary values to evaluate results of LLVM instructions. It creates undef for alloca, however it assumes alloca in addr space 0. If the next instruction is addrspace cast to 0, then we get an invalid cast instruction. This patch let the temp global var have an address space matching alloca addr space, so that the valuation can be done. Differential Revision: https://reviews.llvm.org/D47081 llvm-svn: 332794	2018-05-19 02:58:16 +00:00
Piotr Padlewski	ce358262eb	Dissallow non-empty metadata for invariant.group Summary: This feature is not needed, but it might be usefull in the future to use metadata to mark what which function should support it (and strip it when not). Reviewers: rsmith, sanjoy, amharc, kuhar Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45419 llvm-svn: 332787	2018-05-18 23:53:46 +00:00
Piotr Padlewski	a26a08cb52	Constant fold launder of null and undef Summary: This might be useful because clang will add some barriers for pointer comparisons. Reviewers: majnemer, dberlin, hfinkel, nlewycky, davide, rsmith, amharc, kuhar Subscribers: davide, amharc, llvm-commits Differential Revision: https://reviews.llvm.org/D32423 llvm-svn: 332786	2018-05-18 23:52:57 +00:00
Amara Emerson	08099c7edd	Delete a test that was missed in the revert r332747. r332747 originally reverted r332654 which added this test. llvm-svn: 332755	2018-05-18 19:21:40 +00:00
Sanjay Patel	56e09c6928	[InstCombine] add tests for lack of abs/nabs canonicalization; NFC llvm-svn: 332726	2018-05-18 15:26:38 +00:00
Sanjay Patel	fa3e4601c6	[InstCombine] regenerate checks; NFC There were a combination of auto-generated styles in use here because the scripts have evolved. llvm-svn: 332725	2018-05-18 15:22:19 +00:00
Alexander Ivchenko	5c54742da4	[X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part) This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 llvm-svn: 332705	2018-05-18 11:58:25 +00:00
David Stenberg	0af67e5b65	[SimplifyCFG] Fix a debug invariant bug in FoldBranchToCommonDest() Summary: Fix a case where FoldBranchToCommonDest() would bail out from doing CSE when encountering a debug intrinsic. Handle that by skipping past the debug intrinsics. Also, as a minor refactoring, rename checkCSEInPredecessor() to tryCSEWithPredecessor() to make it a bit more clear that the function may remove instructions. Reviewers: fhahn, craig.topper, dblaikie, xbolva00 Reviewed By: fhahn, xbolva00 Subscribers: vsk, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D46635 llvm-svn: 332698	2018-05-18 08:52:15 +00:00
Serguei Katkov	5095883fe9	[LICM] Extend the MustExecute scope CanProveNotTakenFirstIteration utility does not handle the case when condition of the branch is a constant. Add its handling. Reviewers: reames, anna, mkazantsev Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46996 llvm-svn: 332695	2018-05-18 04:56:28 +00:00
Diego Caballero	f58ad3129c	[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops. Patch #3 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path using VPInstructions. It includes: - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now). - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG. - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan. Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332654	2018-05-17 19:24:47 +00:00
Anastasis Grammenos	d6c6678766	[Debugify] Print the output to stderr Currently debugify prints it's output to stdout, with this patch all the output generated goes to stderr. This change lets us use debugify without taking away the ability to pipe the output to other llvm tools. llvm-svn: 332642	2018-05-17 18:19:58 +00:00
Craig Topper	bd332588bd	[InstCombine] Propagate the nsw/nuw flags from the add in the 'shifty' abs pattern to the sub in the select version. According to alive this is valid. I'm hoping to use this to make an assumption that the sign bit is zero after this sequence. The only way it wouldn't be is if the input was INT__MIN, but by preserving the flags we can make doing this to INT_MIN UB. The nuw flags is weird because it creates such a contradiction that the original number would have to be positive meaning we could remove the select entirely, but we don't get that far. Differential Revision: https://reviews.llvm.org/D46988 llvm-svn: 332623	2018-05-17 16:29:52 +00:00
Dmitry Mikulin	3c6b4e35bd	In thin and full LTO + CFI, direct function calls may go through jump table entries to reach the target. Since these calls don't require type checks, we can short-circuit them to their real targets. Differential Revision: https://reviews.llvm.org/D46326 llvm-svn: 332610	2018-05-17 14:29:07 +00:00
Mikael Holmen	2ca16899ec	Require DominatorTree when requiring/preserving LoopInfo in the old pass manager Summary: Require DominatorTree when requiring/preserving LoopInfo in the old pass manager BreakCriticalEdges tries to keep LoopInfo and DominatorTree updated if they exist. However, since commit r321653 and r321805, to update LoopInfo we must have a DominatorTree, or we will hit an assert. To fix this we now make a couple of passes that only required/preserved LoopInfo also require DominatorTree. This solves PR37334. Reviewers: eli.friedman, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D46829 llvm-svn: 332583	2018-05-17 09:05:40 +00:00
Martin Storsjo	c10788728b	[Analysis] Only use _unlocked stdio functions on linux The existing comment said that the functions were available only on GNU/Linux (and on certain Android versions), but only checked T.isGNUEnvironment() which also is true on MinGW (for arch-windows-gnu triplets), which doesn't have such functions. Existing checks in the initialize function in TargetLibraryInfo.cpp also use only T.isOSLinux() to check for glibc features. This fixes use of stdio on MinGW. Differential Revision: https://reviews.llvm.org/D47002 llvm-svn: 332581	2018-05-17 08:16:08 +00:00
Bjorn Pettersson	81a76a388a	[SROA] Handle PHI with multiple duplicate predecessors Summary: The verifier accepts PHI nodes with multiple entries for the same basic block, as long as the value is the same. As seen in PR37203, SROA did not handle such PHI nodes properly when speculating loads over the PHI, since it inserted multiple loads in the predecessor block and changed the PHI into having multiple entries for the same basic block, but with different values. This patch teaches SROA to reuse the same speculated load for each PHI duplicate entry in such situations. Resolves: https://bugs.llvm.org/show_bug.cgi?id=37203 Reviewers: uabelho, chandlerc, hfinkel, bkramer, efriedma Reviewed By: efriedma Subscribers: dberlin, efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D46426 llvm-svn: 332577	2018-05-17 07:21:41 +00:00
Hiroshi Inoue	f5c0e6c285	[SROA] pr37267: fix assertion failure in integer widening The current integer widening does not support rewriting partial split slices in rewriteIntegerStore (and rewriteIntegerLoad). This patch adds explicit checks for this case in isIntegerWideningViableForSlice. Before r322533, splitting is allowed only for the whole-alloca slice and hence the above case is implicitly rejected by another check `if (DL.getTypeStoreSize(ValueTy) > Size)` because whole-alloca slice is larger than the partition. Differential Revision: https://reviews.llvm.org/D46750 llvm-svn: 332575	2018-05-17 06:32:17 +00:00
Stanislav Mekhanoshin	595fdcf43b	[AMDGPU] Move lsr test. NFC. llvm-svn: 332562	2018-05-17 01:30:51 +00:00
Benjamin Kramer	8ac15bf4dc	[InstCombine] Fix the signature of fgets_unlocked. It returns a pointer, not an int. This miscompiles all code that uses the return value of fgets. llvm-svn: 332531	2018-05-16 21:45:39 +00:00
Sanjay Patel	2eb3512090	[InstCombine] allow more binop (shuffle X), C transforms The canonicalization was restricted to shuffle masks with a 1-to-1 mapping to the constant vector, but that disqualifies the common splat pattern. This is part of solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 332479	2018-05-16 15:15:22 +00:00
David Bolvansky	ca22d427b9	[SimplifyLibcalls] Replace locked IO with unlocked IO Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed, Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja Reviewed By: rja Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D45736 llvm-svn: 332452	2018-05-16 11:39:52 +00:00
Shoaib Meenai	074728a2a9	[ObjCARC] Prevent code motion into a catchswitch A catchswitch must be the only non-phi instruction in its basic block; attempting to move a retain or release into a catchswitch basic block will result in invalid IR. Explicitly mark a CFG hazard in this case to prevent the code motion. Differential Revision: https://reviews.llvm.org/D46482 llvm-svn: 332430	2018-05-16 04:52:18 +00:00
Evgeny Stupachenko	bff9302c3d	Fix LSR compile time hang. Summary: Limit number of reassociations in GenerateReassociationsImpl. Reviewers: qcolombet, mkazantsev Differential Revision: https://reviews.llvm.org/D46039 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 332426	2018-05-16 02:48:50 +00:00
Anastasis Grammenos	66f13e0ba2	[Debugify] Fix test failing after r332416 I missed a test that needed an update. Failing bot: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/30071 llvm-svn: 332418	2018-05-16 00:11:52 +00:00
Sanjay Patel	919882638e	[InstCombine] fix binop (shuffle X), C --> shuffle (binop X, C') to check uses llvm-svn: 332407	2018-05-15 22:00:37 +00:00
Marek Olsak	3c5fd145c5	StructurizeCFG: fix inverting conditions Author: Samuel Pitoiset Without this patch, it appears to me that we are selecting the wrong operand when inverting conditions. In the attached test, it will select %tmp3 instead of %tmp4. To fix it, just use 'A' as everywhere. This fixes a regression introduced by "[PatternMatch] define m_Not using m_Xor and cst_pred_ty" https://reviews.llvm.org/D46351 llvm-svn: 332403	2018-05-15 21:41:55 +00:00
Sanjay Patel	cc0fbdb6e4	[InstCombine] add more tests for binop-shuffle; NFC The splat pattern is part of PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 llvm-svn: 332393	2018-05-15 20:34:09 +00:00
Sanjay Patel	3c35290c58	[InstCombine] fix binop-of-shuffles to check uses llvm-svn: 332375	2018-05-15 17:14:23 +00:00
Sanjay Patel	c5b4401cf7	[InstCombine] add multi-use shuffle tests and regenerate checks; NFC llvm-svn: 332373	2018-05-15 16:47:47 +00:00
whitequark	8f0ab258bd	[MergeFunctions] Fix merging of small weak functions When two interposable functions are merged, we cannot replace uses and have to emit calls to a common internal function. However, writeThunk() will not actually emit a thunk if the function is too small. This leaves us in a broken state where mergeTwoFunctions already rewired the functions, but writeThunk doesn't do anything. This patch changes the implementation so that: * writeThunk() does just that. * The direct replacement of calls is moved into mergeTwoFunctions() into the non-interposable case only. * isThunkProfitable() is extracted and will be called for the non-iterposable case always, and in the interposable case only if uses are still left after replacement. This issue has been introduced in https://reviews.llvm.org/D34806, where the code for checking thunk profitability has been moved. Differential Revision: https://reviews.llvm.org/D46804 Reviewed By: whitequark llvm-svn: 332342	2018-05-15 11:31:07 +00:00
Vedant Kumar	595ba1d548	[Debugify] Add -debugify-each for testing each pass in a pipeline This adds a -debugify-each mode to opt which, when enabled, wraps each {Module,Function}Pass in a pipeline with logic to add, check, and strip synthetic debug info for testing purposes. This mode can be used to test complex pipelines for debug info bugs, or to collect statistics about the number of debug values & locations lost throughout various stages of a pipeline. Patch by Son Tuan Vu! Differential Revision: https://reviews.llvm.org/D46525 llvm-svn: 332312	2018-05-15 00:29:27 +00:00
Keno Fischer	de577af8c0	[InstCombine] fix crash due to ignored addrspacecast Summary: Part of the InstCombine code for simplifying GEPs looks through addrspacecasts. However, this was done by updating a variable also used by the next transformation, for marking GEPs as inbounds. This led to replacing a GEP with a similar instruction in a different addrspace, which caused an assertion failure in RAUW. This caused julia issue https://github.com/JuliaLang/julia/issues/27055 Patch by Jeff Bezanson <jeff@juliacomputing.com> Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D46722 llvm-svn: 332302	2018-05-14 22:05:01 +00:00
Sanjay Patel	bf55e6dee1	[AggressiveInstCombine] avoid crashing on unsimplified code (PR37446) This bug: https://bugs.llvm.org/show_bug.cgi?id=37446 ...raises another question: why do we run aggressive-instcombine before regular instcombine? llvm-svn: 332243	2018-05-14 13:43:32 +00:00
Craig Topper	911025b1cd	[X86] Extend instcombine folds for pclmuldq intrinsics to the 256 and 512 bit version. llvm-svn: 332202	2018-05-13 21:56:32 +00:00
Craig Topper	f170b85d40	[X86] Add missing test for the InstCombines of pclmulqdq. Apparently this test was lost when r293151 was committed. It was present in the review, but not the commit. llvm-svn: 332199	2018-05-13 18:26:06 +00:00
Sergey Dmitriev	69c9cd277d	[CodeExtractor] Allow extracting blocks with exception handling This is a CodeExtractor improvement which adds support for extracting blocks which have exception handling constructs if that is legal to do. CodeExtractor performs validation checks to ensure that extraction is legal when it finds invoke instructions or EH pads (landingpad, catchswitch, or cleanuppad) in blocks to be extracted. I have also added an option to allow extraction of blocks with alloca instructions, but no validation is done for allocas. CodeExtractor caller has to validate it himself before allowing alloca instructions to be extracted. By default allocas are still not allowed in extraction blocks. Differential Revision: https://reviews.llvm.org/D45904 llvm-svn: 332151	2018-05-11 22:49:49 +00:00
Craig Topper	a17d627abb	[X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer used by clang. llvm-svn: 332146	2018-05-11 21:59:34 +00:00
Artem Belevich	c2cd5d5ce0	[Split GEP] handle trunc() in separate-const-offset-from-gep pass. Let separate-const-offset-from-gep pass handle trunc() when it calculates constant offset relative to base. The pass itself may insert trunc() instructions when it canonicalises array indices to pointer-size integers and needs to handle trunc() in order to evaluate the offset. Differential Revision: https://reviews.llvm.org/D46732 llvm-svn: 332142	2018-05-11 21:13:19 +00:00
Daniel Neilson	f6651d4d94	[InstCombine] Handle atomic memset in the same way as regular memset Summary: This change adds handling of the atomic memset intrinsic to the code path that simplifies the regular memset. In practice this means that we will now also expand a small constant-length atomic memset into a single unordered atomic store. Reviewers: apilipenko, skatkov, mkazantsev, anna, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D46660 llvm-svn: 332132	2018-05-11 20:04:50 +00:00
David Bolvansky	cd93c4ef1a	[InstCombine] snprintf optimizations Reviewers: spatel, efriedma, majnemer, rja, bkramer Reviewed By: rja, bkramer Subscribers: mstorsjo, rja, llvm-commits Differential Revision: https://reviews.llvm.org/D46285 llvm-svn: 332110	2018-05-11 17:50:49 +00:00
Martin Storsjo	0d7c37756b	[Analysis] Validate the return type of s(n)printf like libcalls If the sprintf function is static (as on mingw-w64, where many stdio functions are static inline wrappers), earlier optimization passes could optimize out the return value altogether, and make it void, which could break optimizations of this libcall that touch the return value. This fixes the issue discussed in PR37408 for the sprintf function. Differential Revision: https://reviews.llvm.org/D46752 llvm-svn: 332106	2018-05-11 16:53:56 +00:00
Davide Italiano	6e1f7bf316	[Reassociate] Prevent infinite loops when processing PHIs. Phi nodes can reside in live blocks but one of their incoming arguments can come from a dead block. Dead blocks and reassociate don't play nice together. In fact, reassociate performs an RPO as a first step to avoid processing dead blocks. The reason why Reassociate might not fixpoint when examining dead blocks is that the following: %xor0 = xor i16 %xor1, undef %xor1 = xor i16 %xor0, undef is perfectly valid LLVM IR (if it appears in a dead block), so the worklist algorithm keeps pushing the two instructions for reexamination. Note that this is not Reassociate fault, at least not entirely. It's llvm that has a weird definition of dominance. Fixes PR37390. llvm-svn: 332100	2018-05-11 15:45:36 +00:00
Daniel Neilson	8f30ec65b0	[InstCombine] Unify handling of atomic memtransfer with non-atomic memtransfer Summary: This change reworks the handling of atomic memcpy within the instcombine pass. Previously, a constant length atomic memcpy would be lowered into loads & stores as long as no more than 16 load/store pairs are created. This is quite different from the lowering done for a non-atomic memcpy; which only ever lowers into a single load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are expanded to load/stores in later passes, such as SelectionDAG lowering. In this change the behaviour for atomic memcpy is unified with non-atomic memcpy; atomic memcpy is now treated in the same was as non-atomic memcpy has always been. We leave it to later passes to lower longer-length atomic memcpy calls. Due to the structure of the pass's handling of memtransfer intrinsics, this change also gives us handling of atomic memmove that we did not previously have. Reviewers: apilipenko, skatkov, mkazantsev, anna, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D46658 llvm-svn: 332093	2018-05-11 14:30:02 +00:00
Brian Gesiak	c651113439	[Coroutines] PR34897: Fix incorrect elisions Summary: https://bugs.llvm.org/show_bug.cgi?id=34897 demonstrates an incorrect coroutine frame allocation elision in the coro-elide pass. The elision is performed on the basis that the SSA variables from all llvm.coro.begin are directly referenced in subsequent llvm.coro.destroy instructions. However, this ignores the fact that the function may exit through paths that do not run these destroy instructions. In the sample program from PR34897, for example, the llvm.coro.destroy instruction is only executed in exception handling code. When the coroutine function exits normally, llvm.coro.destroy is not called. Eliding the allocation in this case causes a subsequent reference to the coroutine handle from outside of the function to access freed memory. To fix the issue, when finding an llvm.coro.destroy for each llvm.coro.begin, only consider llvm.coro.destroy that are executed along non-exceptional paths. Test Plan: 1. Download the sample program from https://bugs.llvm.org/show_bug.cgi?id=34897, compile it with `clang++ -fcoroutines-ts -stdlib=libc++ -std=c++1z -O2`, and run it. It should print `"run1\ncheck1\nrun2\ncheck2"` and then exit successfully. 2. Compile https://godbolt.org/g/mCKfnr and confirm it is still optimized to a single instruction, 'return 1190'. 3. `check-llvm` Reviewers: rsmith, GorNishanov, eric_niebler Reviewed By: GorNishanov Subscribers: andrewrk, lewissbaker, EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D43242 llvm-svn: 332077	2018-05-11 03:12:28 +00:00
Craig Topper	4b026e5ebd	[InstCombine] Add tests for cases where we don't recognize type promoted rotate idioms. These rotates take the form (x << (n & mask)) \| (x >> (-n & mask)) where mask is bitwidth - 1. If x has been promoted to a wider type than its original bit width due to type promotion we fail to narrower it and therefore don't recognize it as a rotate. llvm-svn: 332068	2018-05-11 00:46:09 +00:00
Wei Mi	0c2f6be662	[SampleFDO] Don't treat warm callsite with inline instance in the profile as cold We found current sampleFDO had a performance issue when triaging a regression. For a callsite with inline instance in the profile, even if hot callsite inliner cannot inline it, it may still execute enough times and should not be treated as cold in regular inliner later. However, currently if such callsite is not inlined by hot callsite inliner, and the BB where the callsite locates doesn't get samples from other instructions inside of it, the callsite will have no profile metadata annotated. In regular inliner cost analysis, if the callsite has no profile annotated and its caller has profile information, it will be treated as cold. The fix changes the isCallsiteHot check and chooses to compare CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo. Differential Revision: https://reviews.llvm.org/D45377 llvm-svn: 332058	2018-05-10 23:02:27 +00:00
Martin Storsjo	86e6742c17	Revert "[InstCombine] snprintf optimizations" This reverts commit SVN r331889, which could trigger failed assertions for cases where the snprintf function is declared with a vaguely differing signature (e.g. being defined as static inline), see PR37408. llvm-svn: 332043	2018-05-10 21:23:36 +00:00
Sanjay Patel	c7bb14301a	[InstCombine] add folds for minnum(-a, -b) --> -maxnum(a, b) This is similar to what we do for integer min/max with 'not' ops (rL321882). This should fix: https://bugs.llvm.org/show_bug.cgi?id=37404 https://bugs.llvm.org/show_bug.cgi?id=37405 llvm-svn: 332031	2018-05-10 20:03:13 +00:00
Sanjay Patel	b5322e385e	[InstCombine] add minnum/maxnum tests (PR37404, PR37405); NFC llvm-svn: 332025	2018-05-10 19:21:08 +00:00
Haicheng Wu	0aae2bc260	[CGP] Split large data structres to sink more GEPs Accessing the members of a large data structures needs a lot of GEPs which usually have large offsets due to the size of the underlying data structure. If the offsets are too large to fit into the r+i addressing mode, these GEPs cannot be sunk to their users' blocks and many extra registers are needed then to carry the values of these GEPs. This patch tries to split a large data struct starting from %base like the following. Before: BB0: %base = BB1: %gep0 = gep %base, off0 %gep1 = gep %base, off1 %gep2 = gep %base, off2 BB2: %load1 = load %gep0 %load2 = load %gep1 %load3 = load %gep2 After: BB0: %base = %new_base = gep %base, off0 BB1: %new_gep0 = %new_base %new_gep1 = gep %new_base, off1 - off0 %new_gep2 = gep %new_base, off2 - off0 BB2: %load1 = load i32, i32* %new_gep0 %load2 = load i32, i32* %new_gep1 %load3 = load i32, i32* %new_gep2 In the above example, the struct is split into two parts. The first part still starts from %base and the second part starts from %new_base. After the splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to BB2 and folded into their users. The algorithm to split data structure is simple and very similar to the work of merging SExts. First, it collects GEPs that have large offsets when iterating the blocks. Second, it splits the underlying data structures and updates the collected GEPs to use smaller offsets. Differential Revision: https://reviews.llvm.org/D42759 llvm-svn: 332015	2018-05-10 18:27:36 +00:00
Sanjay Patel	dc4bb3fd36	[InstCombine] regenerate full checks; NFC llvm-svn: 331998	2018-05-10 17:05:38 +00:00
Daniel Neilson	71fa1b904a	[DSE] Teach the pass about partial overwrite of atomic memory intrinsics Summary: This change teaches DSE that the atomic memory intrinsics can be overwriten partially in the same way as the non-atomic forms. Specifically, that the atomic memcpy & memset can be shortened at the end and that the atomic memset can be shortened at the beginning, if they partially overwritten by later stores. Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith, spatel, filcab, sanjoy Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45584 llvm-svn: 331991	2018-05-10 15:12:49 +00:00
whitequark	68403564df	[PR37339] Fix assertion in FunctionComparator::cmpInlineAsm Fixes bug https://bugs.llvm.org/show_bug.cgi?id=37339. InlineAsm is only uniqued if the FunctionTypes are exactly the same, while cmpTypes() for example considers all pointer types in the default address space to be the same. For this reason the end of cmpInlineAsm() can be reached. This patch replaces the unreachable assertion with a check that the function types are not identical. Differential Revision: https://reviews.llvm.org/D46495 Reviewers: jfb llvm-svn: 331990	2018-05-10 15:05:47 +00:00
Sanjay Patel	6f2cf73b37	[PhaseOrdering] remove stale comments; NFC Forgot to update this with rL331937. llvm-svn: 331939	2018-05-09 23:10:46 +00:00
Sanjay Patel	ac3951a735	[AggressiveInstCombine] convert a chain of 'and-shift' bits into masked compare This is a follow-up to D45986. As suggested there, we should match the "all-bits-set" pattern in addition to "any-bits-set". This was a little more complicated than I thought it would be initially because the "and 1" instruction can be anywhere in the chain. Hopefully, the code comments make that logic understandable, but if you see a way to simplify or improve that, it's most appreciated. This transforms patterns that emerge from bitfield tests as seen in PR37098: https://bugs.llvm.org/show_bug.cgi?id=37098 I think it would also help reduce the large test from: D46336 D46595 but we need something to reassociate that case to the forms we're expecting here first. Differential Revision: https://reviews.llvm.org/D46649 llvm-svn: 331937	2018-05-09 23:08:15 +00:00
Philip Reames	79e917d117	[InstCombine] Widen guards with conditions between The previous handling for guard widening in InstCombine was extremely restrictive. In particular, it didn't handle the common case where we had two guards separated by a single icmp. Handle this by scanning through a small fixed window of instructions to find the next guard if needed. Differential Revision: https://reviews.llvm.org/D46203 llvm-svn: 331935	2018-05-09 22:56:32 +00:00
Benjamin Kramer	0d2fc1a501	[InstCombine] Teach SimplifyDemandedBits that udiv doesn't demand low dividend bits that are zero in the divisor This is safe as long as the udiv is not exact. The pattern is not common in C++ code, but comes up all the time in code generated by XLA's GPU backend. Differential Revision: https://reviews.llvm.org/D46647 llvm-svn: 331933	2018-05-09 22:27:34 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
David Bolvansky	9b5e6e8288	[InstCombine] snprintf optimizations Reviewers: spatel, efriedma, majnemer, rja, bkramer Reviewed By: rja, bkramer Subscribers: rja, llvm-commits Differential Revision: https://reviews.llvm.org/D46285 llvm-svn: 331889	2018-05-09 16:09:31 +00:00
Karl-Johan Karlsson	b27dd33c58	[LV] Add lit testcase for bitcast problem. NFC llvm-svn: 331878	2018-05-09 13:34:57 +00:00
Benjamin Kramer	ccb0fbe9a0	Revert "[InstCombine] snprintf optimizations" This reverts commit r331849. It miscompiles snprintf(buf, sizeof(buf), "%s", "any constant string); into memcpy(buf, "%s", sizeof("any constant string")); llvm-svn: 331866	2018-05-09 11:38:57 +00:00
Bjorn Pettersson	9f953cdd7c	[MergedLoadStoreMotion] Fix a debug invariant bug in mergeStores Summary: MergedLoadStoreMotion::mergeStores is using some heuristics to limit the amount of stores that it tries to sink (see MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The heuristic involves counting the number of instructions in one of the basic blocks that is part of the transformation. We now ignore dbg intrinsics when counting instruction for the MagicCompileTimeControl heuristic. This to make sure that the amount of stores that are sunk doesn't depend on the amount of debug information (if -g is used or not). Reviewers: Gerolf, davide, majnemer Reviewed By: davide Subscribers: dberlin, bjope, aprantl, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D46600 llvm-svn: 331852	2018-05-09 06:52:12 +00:00
David Bolvansky	44a37f04b2	[InstCombine] snprintf optimizations Reviewers: spatel, efriedma, majnemer, rja Reviewed By: rja Subscribers: rja, llvm-commits Differential Revision: https://reviews.llvm.org/D46285 llvm-svn: 331849	2018-05-09 06:34:20 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Heejin Ahn	bf7716952a	Support a funclet operand bundle in LowerInvoke Summary: The current LowerInvoke pass cannot handle invoke instructions with a funclet bundle operand. The order of operands for an invoke instruction is {call arguments, callee, funclet operand (if any), normal dest, unwind dest}. The current code assumes there is no funclet operand and incorrectly includes a funclet operand into call arguments. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46242 llvm-svn: 331832	2018-05-09 00:53:50 +00:00
Davide Italiano	48283ba3a1	[SimplifyCFG] Fix a crash when folding PHIs. We enter MergeBlockIntoPredecessor with a block looking like this: for.inc.us-lcssa: ; preds = %cond.end %k.1.lcssa.ph = phi i32 [ %conv15, %cond.end ] %t.3.lcssa.ph = phi i32 [ %k.1.lcssa.ph, %cond.end ] br label %for.inc, !dbg !66 [note the first arg of the PHI being a PHI]. FoldSingleEntryPHINodes gets rid of both PHIs (calling, eraseFromParent). But right before we call the function, we push into IncomingValues the only argument of the PHIs, and shortly after we try to iterate over something which has been invalidated before :( The fix its not trying to remove PHIs which have an incoming value coming from the same BB we're looking at. Fixes PR37300 and rdar://problem/39910460 Differential Revision: https://reviews.llvm.org/D46568 llvm-svn: 331824	2018-05-08 23:28:15 +00:00
Hideki Saito	d722d61402	[LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block Summary: Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body. This resulted in an IL verification error ---- value used before defined. Reviewers: rengolin, fhahn, hfinkel Reviewed By: rengolin, fhahn Subscribers: dcaballe, Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D46302 llvm-svn: 331799	2018-05-08 18:57:34 +00:00
Guozhi Wei	1aea95a9ea	[CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions. Differential Revision: https://reviews.llvm.org/D45537 llvm-svn: 331783	2018-05-08 17:58:32 +00:00
Bjorn Pettersson	51cebc98f3	[LCSSA] Do not remove used PHI nodes in formLCSSAForInstructions Summary: In formLCSSAForInstructions we speculatively add new PHI nodes, that sometimes ends up without having any uses. It has been discovered that sometimes an added PHI node can appear as being unused in one iteration of the Worklist, although it can end up being used by a PHI node added in a later iteration. We now check, a second time, that the PHI node still is unused before we remove it. This avoids an assert about "Trying to remove a phi with uses." for the added test case. Reviewers: davide, mzolotukhin, mattd, dberlin Reviewed By: mzolotukhin, dberlin Subscribers: dberlin, mzolotukhin, davide, bjope, uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D46422 llvm-svn: 331741	2018-05-08 06:59:47 +00:00
Teresa Johnson	59da890c96	[NewPM] Emit inliner NoDefinition missed optimization remark Summary: Makes this consistent with the old PM. Reviewers: eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D46526 llvm-svn: 331709	2018-05-08 01:45:46 +00:00
Dmitry Mikulin	738bac77c1	Remove explicit setting of the CFI jumptable section name, it does not appear to be needed: jump table sections are created with .cfi.jumptable suffix. With this change each jump table is placed in a separate section, which allows the linker to re-order them. Differential Revision: https://reviews.llvm.org/D46537 llvm-svn: 331680	2018-05-07 21:30:15 +00:00
Roman Lebedev	ffec58bce6	[InstCombine][NFC] Add tests for one more masked merge pattern. This pattern came up in D46494. I'm pretty sure we want to canonicalize it from (x \| ~m) & (y & m) to (x & m) \| (y & ~m) https://rise4fun.com/Alive/TEM llvm-svn: 331625	2018-05-07 09:42:45 +00:00
Piotr Padlewski	e9832dfdf3	[CaptureTracking] Handle capturing of launder.invariant.group Summary: launder.invariant.group has the same rules of capturing as bitcast, gep, etc - the original value is not captured if the returned pointer is not captured. With this patch, we mark 40% more functions as noalias when compiling with -fstrict-vtable-pointers; 1078 vs 1778 (39.37%) Reviewers: sanjoy, davide, nlewycky, majnemer, mehdi_amini Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D32673 llvm-svn: 331587	2018-05-05 10:23:27 +00:00
Philip Reames	5b39acd111	[LICM] Compute a must execute property for the prefix of the header as we go Computing this property within the existing walk ensures that the cost is linear with the size of the block. If we did this from within isGuaranteedToExecute, it would be quadratic without some very fancy caching. This allows us to reliably catch a hoistable instruction within a header which may throw at some point after our hoistable instruction. It doesn't do anything for non-header cases, but given how common single block loops are, this seems very worthwhile. llvm-svn: 331557	2018-05-04 21:35:00 +00:00
Shoaib Meenai	57fadab1cb	[ObjCARC] Account for catchswitch in bitcast insertion A catchswitch is both a pad and a terminator, meaning it must be the only non-phi instruction in its basic block. When we're inserting a bitcast in the incoming basic block for a phi, if that incoming block is a catchswitch, we should go up the dominator tree to find a valid insertion point rather than attempting to insert before the catchswitch (which would result in invalid IR). Differential Revision: https://reviews.llvm.org/D46412 llvm-svn: 331548	2018-05-04 19:03:11 +00:00
Craig Topper	a3f39ee33d	[LoopIdiomRecognize] Add a test case to show incorrect transformation of an infinite loop with side effets into a countable loop using ctlz. We currently recognize this idiom where x is signed and thus the shift in an ashr. int cnt = 0; while (x) { x >>= 1; // arithmetic shift right ++cnt; } and turn it into (bitwidth - ctlz(x)). And if there is anything else in the loop we will create a new loop that runs that many times. If x is initially negative, the shift result will never be 0 and thus the loop is infinite. If you put something with side effects in the loop, that side effect will now only happen bitwidth times instead of an infinite number of times. So this transform is only safe for logical shift right (which we don't currently recognize) or if we can prove that x cannot be negative before the loop. llvm-svn: 331493	2018-05-03 23:50:29 +00:00
Sanjay Patel	e7b6654711	[InstCombine] refine select-of-constants to bitwise ops Add logic for the special case when a cmp+select can clearly be reduced to just a bitwise logic instruction, and remove an over-reaching chunk of general purpose bit magic. The primary goal is to remove cases where we are not improving the IR instruction count when doing these select transforms, and in all cases here that is true. In the motivating 3-way compare tests, there are further improvements because we can combine/propagate select values (not sure if that belongs in instcombine, but it's there for now). DAGCombiner has folds to turn some of these selects into bit magic, so there should be no difference in the end result in those cases. Not all constant combinations are handled there yet, however, so it is possible that some targets will see more cmov/csel codegen with this change in IR canonicalization. Ideally, we'll go further to not turn selects into multiple logic/math ops in instcombine, and we'll canonicalize to selects. But we should make sure that this step does not result in regressions first (and if it does, we should fix those in the backend). The general direction for this change was discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-September/105373.html http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html Alive proofs for the new bit magic: https://rise4fun.com/Alive/XG7 Differential Revision: https://reviews.llvm.org/D46086 llvm-svn: 331486	2018-05-03 21:58:44 +00:00
Piotr Padlewski	c77ab8ef2f	perform DSE through launder.invariant.group Summary: Alias Analysis knows that llvm.launder.invariant.group returns pointer that mustalias argument, but this information wasn't used, therefor we didn't DSE through launder.invariant.group Reviewers: chandlerc, dberlin, bogner, hfinkel, efriedma Reviewed By: dberlin Subscribers: amharc, llvm-commits, nlewycky, rsmith Differential Revision: https://reviews.llvm.org/D31581 llvm-svn: 331449	2018-05-03 11:03:53 +00:00
Piotr Padlewski	5dde809404	Rename invariant.group.barrier to launder.invariant.group Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448	2018-05-03 11:03:01 +00:00
Craig Topper	856fd68690	[LoopIdiomRecognize] When looking for 'x & (x -1)' for popcnt, make sure the left hand side of the 'and' matches the left hand side of the 'subtract' llvm-svn: 331437	2018-05-03 05:48:49 +00:00
Craig Topper	a0cba89f86	[LoopIdiomRecognize] Add a test case showing that we transform to ctpop without fully checking the 'x & (x-1)' part. The code fails to check that the same value is used twice. We only make sure the left hand side of the and is part of the loop recurrence. The 'x' in the subtract can be any value. llvm-svn: 331436	2018-05-03 05:48:48 +00:00
Max Kazantsev	58fce7e54b	Re-enable "[SCEV] Make computeExitLimit more simple and more powerful" This patch was temporarily reverted because it has exposed bug 37229 on PowerPC platform. The bug is unrelated to the patch and was just a general bug in the optimization done for PowerPC platform only. The bug was fixed by the patch rL331410. This patch returns the disabled commit since the bug was fixed. llvm-svn: 331427	2018-05-03 02:37:55 +00:00
Chandler Carruth	71c3a3fac5	[GCOV] Emit the writeout function as nested loops of global data. Summary: Prior to this change, LLVM would in some cases emit massive writeout functions with many 10s of 1000s of function calls in straight-line code. This is a very wasteful way to represent what are fundamentally loops and creates a number of scalability issues. Among other things, register allocating these calls is extremely expensive. While D46127 makes this less severe, we'll still run into scaling issues with this eventually. If not in the compile time, just from the code size. Now the pass builds up global data structures modeling the inputs to these functions, and simply loops over the data structures calling the relevant functions with those values. This ensures that the code size is a fixed and only data size grows with larger amounts of coverage data. A trivial change to IRBuilder is included to make it easier to build the constants that make up the global data. Reviewers: wmi, echristo Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D46357 llvm-svn: 331407	2018-05-02 22:24:39 +00:00
Daniel Sanders	8d0d1aa229	[reassociate] Fix excessive revisits when processing long chains of reassociatable instructions. Summary: Some of our internal testing detected a major compile time regression which I've tracked down to: r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst". It appears that processing long chains of reassociatable instructions causes non-linear (potentially exponential) growth in the number of times an instruction is revisited. For example, the included test revisits instructions 220 times in a 20-instruction test. It appears that r278938 reversed the order instructions were visited and that this is preventing scheduled revisits from being cancelled as a result of visiting the instructions naturally during normal processing. However, simply reversing the order also harmed the generated code. Upon closer inspection, it was discovered that revisits occurred in the opposite order to the first pass (Thanks to escha for spotting that). This patch makes the revisit order consistent with the first pass which allows more revisits to be cancelled. This does appear to have a small impact on the generated code in few cases but it significantly reduces compile-time. After this patch, our internal test that was most affected by the regression dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46% of the runtime it had before (99.54% improvement). Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only: \| metric \| geomean before patch \| geomean after patch \| delta \| \| ----- \| ----- \| ----- \| ----- \| \| compile time \| 0.1956 \| 0.1261 \| -35.54% \| \| execution time \| 0.3240 \| 0.3237 \| - \| \| code size \| 7365.4459 \| 7365.6079 \| - \| The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though: \| Performance Regressions - compile_time \| Δ \| Previous \| Current \| \| MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk \| 9.82% \| 2.0473 \| 2.2483 \| Reviewers: javed.absar, dberlin Reviewed By: dberlin Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45734 llvm-svn: 331381	2018-05-02 17:59:16 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Sanjay Patel	d2025a2e31	[AggressiveInstCombine] convert a chain of 'or-shift' bits into masked compare and (or (lshr X, C), ...), 1 --> (X & C') != 0 I initially thought about implementing the minimal pattern in instcombine as mentioned here: https://bugs.llvm.org/show_bug.cgi?id=37098#c6 ...but we need to do better to catch the more general sequence from the motivating test (more than 2 bits in the compare). And a test-suite run with statistics showed that this pattern only happened 2 times currently. It would potentially happen more often if reassociation worked better (D45842), but it's probably still not too frequent? This is small enough that I didn't see a need to create a whole new class/file within AggressiveInstCombine. There are likely other relatively small matchers like what was discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome). We could potentially also consolidate matchers for ctpop, bswap, etc under here. Differential Revision: https://reviews.llvm.org/D45986 llvm-svn: 331311	2018-05-01 21:02:09 +00:00
Sanjay Patel	f70671582d	[AggressiveInstCombine] add more bitfield test patterns; NFC Add another baseline for D45986 and a pattern that won't be matched with that patch. llvm-svn: 331309	2018-05-01 20:55:03 +00:00
Sanjay Patel	2b36e95d45	[PhaseOrdering] add tests for bittest patterns from bitfields; NFC As mentioned in D45986, there's a potential ordering dependency between instcombine and aggressive-instcombine for detecting these, so I'm adding a few tests to confirm that the expected folds occur using -O3 (because aggressive-instcombine only runs at -O3 currently). llvm-svn: 331308	2018-05-01 20:53:44 +00:00
Daniel Neilson	1091ca4640	[LV] Move test/Transforms/LoopVectorize/pr23997.ll Summary: This fixes a build break with r331269. test/Transforms/LoopVectorize/pr23997.ll should be in: test/Transforms/LoopVectorize/X86/pr23997.ll llvm-svn: 331281	2018-05-01 16:40:45 +00:00
Matthew Simpson	661e6a02bd	[SLP] Add additional test for transposable binary operations with reuse llvm-svn: 331274	2018-05-01 15:59:26 +00:00
Daniel Neilson	9e4bbe801a	[LV] Preserve inbounds on created GEPs Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269	2018-05-01 15:35:08 +00:00
Wei Mi	eec5ba9fae	Fix the issue that ComputeValueKnownInPredecessors only handles the case when phi is on lhs of a comparison op. For the following testcase, L1: %t0 = add i32 %m, 7 %t3 = icmp eq i32* %t2, null br i1 %t3, label %L3, label %L2 L2: %t4 = load i32, i32* %t2, align 4 br label %L3 L3: %t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ] %t6 = icmp eq i32 %t0, %t5 br i1 %t6, label %L4, label %L5 We know if we go through the path L1 --> L3, %t6 should always be true. However currently, if the rhs of the eq comparison is phi, JumpThreading fails to evaluate %t6 to true. And we know that Instcombine cannot guarantee always canonicalizing phi to the left hand side of the comparison operation according to the operand priority comparison mechanism in instcombine. The patch handles the case when rhs of the comparison op is a phi. Differential Revision: https://reviews.llvm.org/D46275 llvm-svn: 331266	2018-05-01 14:47:24 +00:00
Omer Paparo Bivas	54596e1c1a	[InstCombine] new testcases for OverflowingBinaryOperators and PossiblyExactOperators transformations; NFC instcombine should transform the relevant cases if the OverflowingBinaryOperator/PossiblyExactOperator can be proven to be safe. Change-Id: I7aec62a31a894e465e00eb06aed80c3ea0c9dd45 llvm-svn: 331265	2018-05-01 14:27:10 +00:00
Omer Paparo Bivas	82ef8e19ef	[InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case Differential Revision: https://reviews.llvm.org/D45731 Change-Id: I85d4226504e954933c41598327c91b2d08192a9d llvm-svn: 331257	2018-05-01 12:25:46 +00:00
Sanjay Patel	1934cb1c49	[InstCombine] fix test to restore intent This test had values that differed in only in capitalization, and that causes problems for the auto-generating check line script. So I changed that in rL331226, but I accidentally forgot to change a subsequent use of a param. llvm-svn: 331228	2018-04-30 21:28:18 +00:00
Sanjay Patel	dab8515370	[InstCombine] add tests, update checks; NFC llvm-svn: 331226	2018-04-30 21:03:36 +00:00
Roman Lebedev	aa4faec114	[InstCombine] Unfold masked merge with constant mask Summary: As discussed in D45733, we want to do this in InstCombine. https://rise4fun.com/Alive/LGk Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: chandlerc, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D45867 llvm-svn: 331205	2018-04-30 17:59:33 +00:00
Roman Lebedev	b7ae4ef82b	[InstCombine][NFC] Add tests for unfolding masked merge with constant mask Summary: As discussed in D45733, we want to do this in InstCombine. Differential Revision: https://reviews.llvm.org/D45866 llvm-svn: 331204	2018-04-30 17:59:26 +00:00
Davide Italiano	bd3bf1660b	[SLPVectorizer] Debug info shouldn't impact spill cost computation. <rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199	2018-04-30 16:57:33 +00:00
Roman Lebedev	136867931a	[InstCombine] Canonicalize variable mask in masked merge Summary: Masked merge has a pattern of: `((x ^ y) & M) ^ y`. But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`, We should canonicalize the pattern to non-inverted mask. https://rise4fun.com/Alive/Yol Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45664 llvm-svn: 331112	2018-04-28 15:45:07 +00:00
Roman Lebedev	6b1e66b188	[InstCombine][NFC] Add tests for variable mask canonicalization in masked merge Summary: Masked merge has a pattern of: `((x ^ y) & M) ^ y`. But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`, We should canonicalize the pattern to non-inverted mask. Differential Revision: https://reviews.llvm.org/D45663 llvm-svn: 331111	2018-04-28 15:45:00 +00:00
Max Kazantsev	303572f3df	[NFC] Add some tests that demonstrate unrecognized three-way comparison patterns llvm-svn: 331100	2018-04-28 04:38:21 +00:00
Philip Reames	502d4481d4	[LoopGuardWidening] Make PostDomTree optional The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom. The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..) llvm-svn: 331094	2018-04-27 23:15:56 +00:00
Adrian Prantl	210a29de7b	Fix a bug in GlobalOpt's handling of DIExpressions. This patch adds support for fragment expressions TryToShrinkGlobalToBoolean() which were previously just dropped. Thanks to Reid Kleckner for providing me a reproducer! llvm-svn: 331086	2018-04-27 21:41:36 +00:00
Roman Lebedev	6959b8e76f	[PatternMatch] Stabilize the matching order of commutative matchers Summary: Currently, we 1. match `LHS` matcher to the `first` operand of binary operator, 2. and then match `RHS` matcher to the `second` operand of binary operator. If that does not match, we swap the `LHS` and `RHS` matchers: 1. match `RHS` matcher to the `first` operand of binary operator, 2. and then match `LHS` matcher to the `second` operand of binary operator. This works ok. But it complicates writing of commutative matchers, where one would like to match (`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side. This is additionally complicated by the fact that `m_Specific()` stores the `Value `, not `Value `, so it won't work at all out of the box. The last problem is trivially solved by adding a new `m_c_Specific()` that stores the `Value `, not `Value `. I'm choosing to add a new matcher, not change the existing one because i guess all the current users are ok with existing behavior, and this additional pointer indirection may have performance drawbacks. Also, i'm storing pointer, not reference, because for some mysterious-to-me reason it did not work with the reference. The first one appears trivial, too. Currently, we 1. match `LHS` matcher to the `first` operand of binary operator, 2. and then match `RHS` matcher to the `second` operand of binary operator. If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ operands: 1. match ~~`RHS`~~ `LHS` matcher to the ~~`first`~~ `second` operand of binary operator, 2. and then match ~~`LHS`~~ `RHS` matcher to the ~~`second`~ `first` operand of binary operator. Surprisingly, `$ ninja check-llvm` still passes with this. But i expect the bots will disagree.. The motivational unittest is included. I'd like to use this in D45664. Reviewers: spatel, craig.topper, arsenm, RKSimon Reviewed By: craig.topper Subscribers: xbolva00, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D45828 llvm-svn: 331085	2018-04-27 21:23:20 +00:00
Sanjay Patel	2677038cc0	[Reassociate] add a test with debug info; NFC As suggested in D45842 (although still not sure if we're going to advance that), we must invalidate references to instructions that have been recycled (operands were changed, so result is different). llvm-svn: 331083	2018-04-27 21:14:15 +00:00
Philip Reames	e4ec473b3f	[MustExecute/LICM] Special case first instruction in throwing header We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header. why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice. In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.) Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution. llvm-svn: 331079	2018-04-27 20:44:01 +00:00
Philip Reames	9258e9d190	[LoopGuardWidening] Split out a loop pass version of GuardWidening The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk. This patch only includes a legacy pass. A follow up will add a new style pass as well. llvm-svn: 331060	2018-04-27 17:29:10 +00:00
Florian Hahn	f3fea0f11f	[LoopInterchange] Allow some loops with PHI nodes in the exit block. We currently support LCSSA PHI nodes in the outer loop exit, if their incoming values do not come from the outer loop latch or if the outer loop latch has a single predecessor. In that case, the outer loop latch will be executed only if the inner loop gets executed. If we have multiple predecessors for the outer loop latch, it may be executed even if the inner loop does not get executed. This is a first step to support the case described in https://bugs.llvm.org/show_bug.cgi?id=30472 Reviewers: efriedma, karthikthecool, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43237 llvm-svn: 331037	2018-04-27 13:52:51 +00:00
Benjamin Kramer	733c7fc55d	[NVPTX] Turn on Loop/SLP vectorization Since PTX has grown a <2 x half> datatype vectorization has become more important. The late LoadStoreVectorizer intentionally only does loads and stores, but now arithmetic has to be vectorized for optimal throughput too. This is still very limited, SLP vectorization happily creates <2 x half> if it's a legal type but there's still a lot of register moving happening to get that fed into a vectorized store. Overall it's a small performance win by reducing the amount of arithmetic instructions. I haven't really checked what the loop vectorizer does to PTX code, the cost model there might need some more tweaks. I didn't see it causing harm though. Differential Revision: https://reviews.llvm.org/D46130 llvm-svn: 331035	2018-04-27 13:36:05 +00:00
Matt Morehouse	1ae1febfde	Revert "[SimplifyLibcalls] Replace locked IO with unlocked IO" This reverts r331002 due to sanitizer bot breakage. llvm-svn: 331011	2018-04-27 01:48:09 +00:00
Eli Friedman	e06539456c	[LowerTypeTests] Mark .cfi.jumptable nounwind. It doesn't unwind, and the wrong marking leads to the creation of an .eh_frame section when it isn't necessary. Differential Revision: https://reviews.llvm.org/D46082 llvm-svn: 331008	2018-04-27 00:32:24 +00:00
David Bolvansky	2c9cc9c731	[SimplifyLibcalls] Replace locked IO with unlocked IO Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed, Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D45736 llvm-svn: 331002	2018-04-26 22:31:43 +00:00
Roman Lebedev	33095e3610	[InstCombine][NFC] Regenerate checks in or-xor.ll llvm-svn: 330996	2018-04-26 21:41:56 +00:00
Roman Lebedev	cabaeac29c	[InstCombine][NFC] Regenerate checks in and-or-not.ll llvm-svn: 330994	2018-04-26 21:13:09 +00:00
Sanjoy Das	6f1937b10f	[InstCombine] Simplify Add with remainder expressions as operands. Summary: Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1). This is a common pattern seen in code generated by the XLA GPU backend. Add test cases for this new optimization. Patch by Bixia Zheng! Reviewers: sanjoy Reviewed By: sanjoy Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar Differential Revision: https://reviews.llvm.org/D45976 llvm-svn: 330992	2018-04-26 20:52:28 +00:00
Sanjoy Das	0e643db48f	Add test cases to prepare for the optimization that simplifies Add with remainder expressions as operands. Summary: Add test cases to prepare for the new optimization that Simplifies integer add expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1). Patch by Bixia Zheng! Reviewers: sanjoy Reviewed By: sanjoy Subscribers: jlebar, llvm-commits Differential Revision: https://reviews.llvm.org/D46017 llvm-svn: 330991	2018-04-26 20:52:27 +00:00
Roman Lebedev	7cc56f1599	[InstCombine][NFC] add2.ll: add a few commutative checks. Fixes some missing test coverage in InstCombineAddSub.cpp, visitAdd() llvm-svn: 330986	2018-04-26 20:07:17 +00:00
Roman Lebedev	1efe879641	[InstCombine][NFC] Autogenerate checks in add2.ll llvm-svn: 330985	2018-04-26 20:07:12 +00:00
Roman Lebedev	3d7b22621c	[NFC][InstCombine] rem.ll: add a few commutative tests. This closes a gap in missing test coverage in isKnownToBeAPowerOfTwo() from ValueTracking.cpp llvm-svn: 330975	2018-04-26 18:44:37 +00:00
Roman Lebedev	e117e1a440	[NFC][InstCombine] Regenerate rem.ll test llvm-svn: 330974	2018-04-26 18:44:32 +00:00
Matthew Simpson	cfdec0ff70	[SLP] Add tests for transposable binary operations These test cases are vectorizable, but we are currently unable to vectorize them effectively. llvm-svn: 330945	2018-04-26 14:50:04 +00:00
Alex Bradbury	130b8b3f2b	[RISCV] Implement isTruncateFree Adapted from ARM's implementation introduced in r313533 and r314280. llvm-svn: 330940	2018-04-26 13:37:00 +00:00
Florian Hahn	fd2bc11248	[LoopInterchange] Ignore debug intrinsics during legality checks. Reviewers: aprantl, mcrosier, karthikthecool Reviewed By: aprantl Subscribers: mattd, vsk, #debug-info, llvm-commits Differential Revision: https://reviews.llvm.org/D45379 llvm-svn: 330931	2018-04-26 10:26:17 +00:00
Max Kazantsev	2c287ec9c5	Revert "[SCEV] Make computeExitLimit more simple and more powerful" This reverts commit 023c8be90980e0180766196cba86f81608b35d38. This patch triggers miscompile of zlib on PowerPC platform. Most likely it is caused by some pre-backend PPC-specific pass, but we don't clearly know the reason yet. So we temporally revert this patch with intention to return it once the problem is resolved. See bug 37229 for details. llvm-svn: 330893	2018-04-26 02:07:40 +00:00
David Bolvansky	cb8ca5f37c	[SimplifyLibcalls] Atoi, strtol replacements Reviewers: spatel, lebedev.ri, xbolva00, efriedma Reviewed By: xbolva00, efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D45418 llvm-svn: 330860	2018-04-25 18:58:53 +00:00
Taewook Oh	923c216da5	[ICP] Do not attempt type matching for variable length arguments. Summary: When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument. It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch. Reviewers: mssimpso, davidxl, davide Reviewed By: mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46026 llvm-svn: 330844	2018-04-25 17:19:21 +00:00
Sanjay Patel	0387ceb67a	[InstCombine] add tests for select to logic folds; NFC As discussed in D45862, we want these folds sometimes because they're good improvements. But as we can see here, the current logic doesn't check uses and doesn't produce optimal code in all cases. llvm-svn: 330837	2018-04-25 15:59:23 +00:00
Florian Hahn	1da30c659d	[LoopInterchange] Use getExitBlock()/getExitingBlock instead of manual impl. This also means we have to check if the latch is the exiting block now, as `transform` expects the latches to be the exiting blocks too. https://bugs.llvm.org/show_bug.cgi?id=36586 Reviewers: efriedma, davide, karthikthecool Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D45279 llvm-svn: 330806	2018-04-25 09:35:54 +00:00
Bjorn Pettersson	bec2a7c4eb	[DebugInfo] Invalidate debug info in ReassociatePass::RewriteExprTree Summary: When Reassociate is rewriting an expression tree it may reuse old binary expression nodes, for new expressions. Whenever an expression node is reused, but with a non-trivial change in the result, we need to invalidate any debug info that is associated with the node. If for example rewriting x = mul a, b y = mul c, x into x = mul c, b y = mul a, x we still get the same result for 'y', but 'x' is a new expression. All debug info referring to 'x' must be invalidated (marked as optimized out) since we no longer calculate the expected value. As a side-effect this patch avoid (at least some) problems where reassociate could end up creating IR with debug-use before def. Earlier the dbg.value nodes where left untouched in the IR, while the reused binary nodes where sinked to just before the root node of the rewritten expression tree. See PR27273 for more info about such problems. Reviewers: dblaikie, aprantl, dexonsmith Reviewed By: aprantl Subscribers: JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D45975 llvm-svn: 330804	2018-04-25 09:23:56 +00:00
Sanjay Patel	54795bb16b	[InstCombine] move tests for select with bit-test of condition; NFC These are all but 1 of the select-of-constant tests that appear to be transformed within foldSelectICmpAnd() and the block above it predicated by decomposeBitTestICmp(). As discussed in D45862 (and can be seen in several tests here), we probably want to stop doing those transforms because they can increase the instruction count without benefitting other passes or codegen. The 1 test not included here is a urem test where the bit hackery allows us to remove a urem. To preserve killing that urem, we should do some stronger known-bits analysis or pattern matching of 'urem x, (select-of-pow2-constants)'. llvm-svn: 330768	2018-04-24 21:06:06 +00:00
Florian Hahn	97ae30b8d6	[LoopInterchange] Add REQUIRES: asserts to test. llvm-svn: 330748	2018-04-24 18:10:52 +00:00
Diego Caballero	60f2776b2f	[LV][VPlan] Detect outer loops for explicit vectorization. Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739	2018-04-24 17:04:17 +00:00
Florian Hahn	ceee788947	[LoopInterchange] Make isProfitableForVectorization slightly more conservative. After D43236, we started interchanging loops with empty dependence matrices. In isProfitableForVectorization, we try to determine if interchanging makes the loop dependences more friendly to the vectorizer. If there are no dependences, we should not interchange, based on that heuristic. Reviewers: efriedma, mcrosier, karthikthecool, blitz.opensource Reviewed By: mcrosier Differential Revision: https://reviews.llvm.org/D45208 llvm-svn: 330738	2018-04-24 16:55:32 +00:00
Sanjay Patel	510af48e5d	[InstCombine] regenerate checks; NFC The first step in fixing problems raised in D45862 is to make the problems visible. Now we can more easily see/update cases where selects have been turned into multiple instructions with no apparent improvement in analysis or benefits for other passes (vectorization). llvm-svn: 330731	2018-04-24 16:08:03 +00:00
Sanjay Patel	f03ec65517	[InstCombine] regenerate checks; NFC The current version of the script uses regex for params. This could mask a bug (param values got wrongly swapped), but it seems unlikely in practice, so let's just update the whole file to reduce diffs when there is a meaningful change here. llvm-svn: 330729	2018-04-24 15:42:30 +00:00
Benjamin Kramer	f85f5da3b2	[LoadStoreVectorize] Ignore interleaved invariant loads. The memory location an invariant load is using can never be clobbered by any store, so it's safe to move the load ahead of the store. Differential Revision: https://reviews.llvm.org/D46011 llvm-svn: 330725	2018-04-24 15:28:47 +00:00
Chandler Carruth	43acdb35bc	[PM/LoopUnswitch] Fix a bug in the loop block set formation of the new loop unswitch. This code incorrectly added the header to the loop block set early. As a consequence we would incorrectly conclude that a nested loop body had already been visited when the header of the outer loop was the preheader of the nested loop. In retrospect, adding the header eagerly doesn't really make sense. It seems nicer to let the cycle be formed naturally. This will catch crazy bugs in the CFG reconstruction where we can't correctly form the cycle earlier rather than later, and makes the rest of the logic just fall out. I've also added various asserts that make these issues much easier to debug. llvm-svn: 330707	2018-04-24 10:33:08 +00:00
Max Kazantsev	7790d6cbff	[NFC] Use FileCheck in test llvm-svn: 330684	2018-04-24 04:42:37 +00:00
Chandler Carruth	0ace148ca6	[PM/LoopUnswitch] Remove another over-aggressive assert. This code path can very clearly be called in a context where we have baselined all the cloned blocks to a particular loop and are trying to handle nested subloops. There is no harm in this, so just relax the assert. I've added a test case that will make sure we actually exercise this code path. llvm-svn: 330680	2018-04-24 03:27:00 +00:00
George Burgess IV	8e807bf3fa	Reland r301880(!): "[InstSimplify] Handle selects of GEPs with 0 offset" I was reminded today that this patch got reverted in r301885. I can no longer reproduce the failure that caused the revert locally (...almost one year later), and the patch applied pretty cleanly, so I guess we'll see if the bots still get angry about it. The original breakage was InstSimplify complaining (in "assertion failed" form) about getting passed some crazy IR when running `ninja check-sanitizer`. I'm unable to find traces of what, exactly, said crazy IR was. I suppose we'll find out pretty soon if that's still the case. :) Original commit: Author: gbiv Date: Mon May 1 18:12:08 2017 New Revision: 301880 URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev Log: [InstSimplify] Handle selects of GEPs with 0 offset In particular (since it wouldn't fit nicely in the summary): (select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V) Differential Revision: https://reviews.llvm.org/D31435 llvm-svn: 330667	2018-04-24 00:25:01 +00:00
Sanjay Patel	fa8f5ad9f3	[AggressiveInstCombine] add tests for PR37098; NFC I'm not sure if this is where we should try to fold these patterns inspired by: https://bugs.llvm.org/show_bug.cgi?id=37098 ...if this isn't the right place, we can move the tests. llvm-svn: 330642	2018-04-23 20:20:32 +00:00
Xin Tong	8edff27923	[CallSiteSplit] Make sure we remove nonnull if the parameter turns out to be a constant. Summary: We do not need nonull attribute if we know an argument is going to be constant. Reviewers: junbuml, davide, fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45608 llvm-svn: 330641	2018-04-23 20:09:08 +00:00
Bjorn Pettersson	8e484dc531	[MemCpyOpt] Skip optimizing basic blocks not reachable from entry Summary: Skip basic blocks not reachable from the entry node in MemCpyOptPass::iterateOnFunction. Code that is unreachable may have properties that do not exist for reachable code (an instruction in a basic block can for example be dominated by a later instruction in the same basic block, for example if there is a single block loop). MemCpyOptPass::processStore is only safe to use for reachable basic blocks, since it may iterate past the basic block beginning when used for unreachable blocks. By simply skipping to optimize unreachable basic blocks we can avoid asserts such as "Assertion `!NodePtr->isKnownSentinel()' failed." in MemCpyOptPass::processStore. The problem was detected by fuzz tests. Reviewers: eli.friedman, dneilson, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D45889 llvm-svn: 330635	2018-04-23 19:55:04 +00:00
Daniel Neilson	cc45e923c5	[DSE] Teach the pass that atomic memory intrinsics are stores. Summary: This change teaches DSE that the atomic memory intrinsics are stores that can be eliminated, and can allow other stores to be eliminated. This change specifically does not teach DSE that these intrinsics can be partially eliminated (i.e. length reduced, and dest/src changed); that will be handled in another change. Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith Reviewed By: efriedma Subscribers: dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D45535 llvm-svn: 330629	2018-04-23 19:06:49 +00:00
Max Kazantsev	91f481665e	[LoopRotate] Fix incorrect SCEV invalidation in loop rotation LoopRotate only invalidates innermost loops while the changes that it makes may also affert any of this parents. With patch rL329047, SCEV becomes much smarter about calculation of exit counts for outer loops, so we cannot assume that they are not affected. Differential Revision: https://reviews.llvm.org/D45945 llvm-svn: 330582	2018-04-23 12:33:31 +00:00
Max Kazantsev	b1137c42fa	[LoopSimplify] Fix incorrect SCEV invalidation In the function `simplifyOneLoop` we optimistically assume that changes in the inner loop only affect this very loop and have no impact on its parents. In fact, after rL329047 has been merged, we can now calculate exit counts for outer loops which may depend on inner loops. Thus, we need to invalidate all parents when we do something to a loop. There is an evidence of incorrect behavior of `simplifyOneLoop`: when we insert `SE->verify()` check in the end of this funciton, it fails on a bunch of existing test, in particular: LLVM :: Transforms/LoopUnroll/peel-loop-not-forced.ll LLVM :: Transforms/LoopUnroll/peel-loop-pgo.ll LLVM :: Transforms/LoopUnroll/peel-loop.ll LLVM :: Transforms/LoopUnroll/peel-loop2.ll Note that previously we have fixed issues of this variety, see rL328483. This patch makes this function invalidate the outermost loop properly. Differential Revision: https://reviews.llvm.org/D45937 Reviewed By: chandlerc llvm-svn: 330576	2018-04-23 10:32:37 +00:00
Chandler Carruth	bf7190a154	[PM/LoopUnswitch] Remove a buggy assert in the new loop unswitch. The condition this was asserting doesn't actually hold. I've added comments to explain why, removed the assert, and added a fun test case reduced out of 403.gcc. llvm-svn: 330564	2018-04-23 06:58:36 +00:00
Sanjay Patel	30be665e82	[PatternMatch] allow undef elements when matching a vector zero This is the last step in getting constant pattern matchers to allow undef elements in constant vectors. I'm adding a dedicated m_ZeroInt() function and building m_Zero() from that. In most cases, calling code can be updated to use m_ZeroInt() directly when there's no need to match pointers, but I'm leaving that efficiency optimization as a follow-up step because it's not always clear when that's ok. There are just enough icmp folds in InstSimplify that can be used for integer or pointer types, that we probably still want a generic m_Zero() for those cases. Otherwise, we could eliminate it (and possibly add a m_NullPtr() as an alias for isa<ConstantPointerNull>()). We're conservatively returning a full zero vector (zeroinitializer) in InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify), but I'm not sure if that's actually necessary in all cases. We may be able to propagate an undef lane instead. One test where this happens is marked with 'TODO'. llvm-svn: 330550	2018-04-22 17:07:44 +00:00
Sanjay Patel	c1265ab99e	[InstCombine] add vector test with undef elts; NFC llvm-svn: 330547	2018-04-22 15:59:14 +00:00
Sanjay Patel	e187cd3273	[InstSimplify, InstCombine] add vector tests with undef elts; NFC llvm-svn: 330543	2018-04-22 14:19:37 +00:00
Sanjay Patel	5f845732ed	[InstSimplify] move tests for shifts; NFC llvm-svn: 330516	2018-04-21 16:58:00 +00:00
Sanjay Patel	d0b27a1156	[InstSimplify] move/add/regenerate checks for tests; NFC llvm-svn: 330515	2018-04-21 16:23:47 +00:00
Shoaib Meenai	d64b83266b	[ObjCARC] Account for funclet token in storeStrong transform When creating a call to storeStrong in ObjCARCContract, ensure the call gets the correct funclet token, otherwise WinEHPrepare will turn the call (and all subsequent instructions) into unreachable. We already have logic to do this for the ARC autorelease elision marker; factor that out into a common function that's used for both. These are the only two places in this transform that create call instructions. Differential Revision: https://reviews.llvm.org/D45857 llvm-svn: 330487	2018-04-20 22:11:03 +00:00
Sean Fertile	18f17333dd	[PartialInlining] Fix Crash from holding a reference to a destructed ORE. The callback used to create an ORE for the legacy PI pass caches the allocated object in a unique_ptr in the runOnModule function, and returns a reference to that object. Under certian circumstances we can end up holding onto that reference after the OREs destruction. Rather then allowing the new and legacy passes to create ORE object in diffrent ways, create the ORE at the point of use. Differential Revision: https://reviews.llvm.org/D43219 llvm-svn: 330473	2018-04-20 19:56:26 +00:00
Florian Hahn	773872fd67	[NewGVN] Split OpPHI detection and creation. It also adds a check making sure PHIs for operands are all in the same block. Patch by Daniel Berlin <dberlin@dberlin.org> Reviewers: dberlin, davide Differential Revision: https://reviews.llvm.org/D43865 llvm-svn: 330444	2018-04-20 16:37:13 +00:00
Michael Zolotukhin	f79d15e432	Fix typo in a test. llvm-svn: 330434	2018-04-20 13:51:36 +00:00
Michael Zolotukhin	a2c9af0209	Revert "Revert r330403 and r330413." Reapply the patches with a fix. Thanks Ilya and Hans for the reproducer! This reverts commit r330416. The issue was that removing predecessors invalidated uses that we stored for rewrite. The fix is to finish manipulating with CFG before we select uses for rewrite. llvm-svn: 330431	2018-04-20 13:34:32 +00:00
Ilya Biryukov	afe822bd6d	Revert r330403 and r330413. Revert r330413: "[SSAUpdaterBulk] Use SmallVector instead of DenseMap for storing rewrites." Revert r330403 "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time." r330403 commit seems to crash clang during our integrate while doing PGO build with the following stacktrace: #2 llvm::SSAUpdaterBulk::RewriteAllUses(llvm::DominatorTree, llvm::SmallVectorImpl<llvm::PHINode>) #3 llvm::JumpThreadingPass::ThreadEdge(llvm::BasicBlock, llvm::SmallVectorImpl<llvm::BasicBlock> const&, llvm::BasicBlock) #4 llvm::JumpThreadingPass::ProcessThreadableEdges(llvm::Value, llvm::BasicBlock, llvm::jumpthreading::ConstantPreference, llvm::Instruction) #5 llvm::JumpThreadingPass::ProcessBlock(llvm::BasicBlock) The crash happens while compiling 'lib/Analysis/CallGraph.cpp'. r3340413 is reverted due to conflicting changes. llvm-svn: 330416	2018-04-20 10:52:54 +00:00
Roman Lebedev	f6934d725b	[NFC][InstCombine] Regenerate two tests that are affected by folding masked merge llvm-svn: 330415	2018-04-20 10:49:19 +00:00
Michael Zolotukhin	79e4f7fadb	Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time. Hopefully, changing set to vector removes nondeterminism detected by some bots, or the new assert will catch something. This reverts commit r330180. llvm-svn: 330403	2018-04-20 08:01:08 +00:00
Vlad Tsyrklevich	5d15230c37	Fix build failures for r330387 on buildbots that don't build the X86 target llvm-svn: 330388	2018-04-20 02:26:12 +00:00
Vlad Tsyrklevich	230b256783	LowerTypeTests: Propagate symver directives Summary: This change fixes https://crbug.com/834474, a build failure caused by LowerTypeTests not preserving .symver symbol versioning directives for exported functions. Emit symver information to ThinLTO summary data and then propagate symver directives for exported functions to the merged module. Emitting symver information to the summaries increases the size of intermediate build artifacts for a Chromium build by less than 0.2%. Reviewers: pcc Reviewed By: pcc Subscribers: tejohnson, mehdi_amini, eraman, llvm-commits, eugenis, kcc Differential Revision: https://reviews.llvm.org/D45798 llvm-svn: 330387	2018-04-20 01:36:48 +00:00
Sanjay Patel	ad8976db16	[Reassociate] add baseline tests for binop swapping; NFC Similar to rL330086, I don't know if we want to do these transforms here, but we might as well have the tests here either way to show that this pass is missing potential functionality (intentionally or not). llvm-svn: 330368	2018-04-19 21:56:17 +00:00
Chandler Carruth	32e62f9c5b	[PM/LoopUnswitch] Detect irreducible control flow within loops and skip unswitching non-trivial edges. Summary: This fixes the bug pointed out in review with non-trivial unswitching. This also provides a basis that should make it pretty easy to finish fleshing out a routine to scan an entire function body for irreducible control flow, but this patch remains minimal for disabling loop unswitch. Reviewers: sanjoy, fedor.sergeev Subscribers: mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45754 llvm-svn: 330357	2018-04-19 18:44:25 +00:00
Florian Hahn	b789165e6b	[NewGVN] Add ops as dependency if we cannot find a leader for ValueOp. If those operands change, we might find a leader for ValueOp, which could enable new phi-of-op creation. This fixes a case where we missed creating a phi-of-ops node. With D43865 and this patch, bootstrapping clang/llvm works with -enable-newgvn, whereas without it, the "value changed after iteration" assertion is triggered. Reviewers: dberlin, davide Reviewed By: dberlin Differential Revision: https://reviews.llvm.org/D42180 llvm-svn: 330334	2018-04-19 15:05:47 +00:00
Roman Lebedev	d536de1e7b	[NFC][InstCombine] A few more tests for masked merge add/xor -> or with constant mask llvm-svn: 330325	2018-04-19 13:02:17 +00:00
Florian Hahn	9a175bc1bc	Remove file accidentally added in r330320. llvm-svn: 330321	2018-04-19 12:09:05 +00:00
Florian Hahn	2342533e1a	[IR/BasicBlockTest] Fix asan failure introduced in rL330316. The argument has to be deleted after the module containing the function gets deleted. llvm-svn: 330320	2018-04-19 12:06:26 +00:00
Sanjay Patel	b2ab3f28d5	[SimplifyLibcalls] Realloc(null, N) -> Malloc(N) Patch by Dávid Bolvanský! Differential Revision: https://reviews.llvm.org/D45413 llvm-svn: 330259	2018-04-18 14:21:31 +00:00
Sam Parker	3c19051bf0	[IRCE] Only check for NSW on equality predicates After investigation discussed in D45439, it would seem that the nsw flag restriction is unnecessary in most cases. So the IsInductionVar lambda has been removed, the functionality extracted, and now only require nsw when using eq/ne predicates. Differential Revision: https://reviews.llvm.org/D45617 llvm-svn: 330256	2018-04-18 13:50:28 +00:00
Florian Hahn	ac27758895	[LoopUnroll] Only peel if a predicate becomes known in the loop body. If a predicate does not become known after peeling, peeling is unlikely to be beneficial. Reviewers: mcrosier, efriedma, mkazantsev, junbuml Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D44983 llvm-svn: 330250	2018-04-18 12:29:24 +00:00
Bjorn Pettersson	bc4f19b6bd	[DebugInfo] Sink related dbg users when sinking in InstCombine Summary: When sinking an instruction in InstCombine we now also sink the DbgInfoIntrinsics that are using the sunken value. Example) When sinking the load in this input bb.X: %0 = load i64, i64* %start, align 4, !dbg !31 tail call void @llvm.dbg.value(metadata i64 %0, ...) br i1 %cond, label %for.end, label %for.body.lr.ph for.body.lr.ph: br label %for.body we now also move the dbg.value, like this bb.X: br i1 %cond, label %for.end, label %for.body.lr.ph for.body.lr.ph: %0 = load i64, i64* %start, align 4, !dbg !31 tail call void @llvm.dbg.value(metadata i64 %0, ...) br label %for.body In the past we haven't moved the dbg.value so we got bb.X: tail call void @llvm.dbg.value(metadata i64 %0, ...) br i1 %cond, label %for.end, label %for.body.lr.ph for.body.lr.ph: %0 = load i64, i64* %start, align 4, !dbg !31 br label %for.body So in the past we got a debug-use before the def of %0. And that dbg.value was also on the path jumping to %for.end, for which %0 never was defined. CodeGenPrepare normally comes to rescue later (when not moving the dbg.value), since it moves dbg.value instrinsics quite brutally, without really analysing if it is correct to move the intrinsic (see PR31878). So at the moment this patch isn't expected to have much impact, besides that it is moving the dbg.value already in opt, making the IR look more sane directly. This can be seen as a preparation to (hopefully) make it possible to turn off CodeGenPrepare::placeDbgValues later as a solution to PR31878. I also adjusted test/DebugInfo/X86/sdagsplit-1.ll to make the IR in the test case up-to-date with this behavior in InstCombine. Reviewers: rnk, vsk, aprantl Reviewed By: vsk, aprantl Subscribers: mattd, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D45425 llvm-svn: 330243	2018-04-18 08:08:04 +00:00
Sanjay Patel	aea15131db	[InstCombine] peek through bitcasted vector/array pointer GEP operand The bitcast may be interfering with other combines or vectorization as shown in PR16739: https://bugs.llvm.org/show_bug.cgi?id=16739 Most pointer-related optimizations are probably able to look through this bitcast, but removing the bitcast shrinks the IR, so it's at least a size savings. Differential Revision: https://reviews.llvm.org/D44833 llvm-svn: 330237	2018-04-18 00:36:40 +00:00
Vedant Kumar	b0585893cc	[Mem2Reg] Create merged debug locations for inserted phis Track the debug locations of the incoming values to newly-created phis, and apply merged debug locations to the phis. A merged location will be on line 0, but will have the correct scope set. This improves crash reporting when an inlined instruction with a merged location triggers a machine exception. A debugger will be able to narrow down the crash to the correct inlined scope, instead of simply pointing to the outer scope of the caller. Taken together with a change allows generating merged line-0 locations for instructions which aren't calls, this results in a 0.5% increase in the uncompressed size of the .debug_line section of a stage2+Release build of clang (-O3 -g). rdar://33858697 Differential Revision: https://reviews.llvm.org/D45397 llvm-svn: 330227	2018-04-17 22:03:08 +00:00
Michael Zolotukhin	21458fdc55	Revert "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again." This reverts r330175. There are still stage3/stage4 miscompares. llvm-svn: 330180	2018-04-17 07:31:27 +00:00
Michael Zolotukhin	3f5fd1b129	Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." again. One more, hopefully the last, bug is fixed: when forming UsesToRewrite we should ignore phi operands coming from edges that we want to delete. This reverts r329910. llvm-svn: 330175	2018-04-17 04:45:22 +00:00
Craig Topper	60c7e0d587	[X86] Remove unnecessary -mattr to enable avx512bw when the -mcpu already enabled it. NFC This makes the test similar to the arith-sub.ll and arith-mul.ll tests. llvm-svn: 330144	2018-04-16 18:14:19 +00:00
Haicheng Wu	f7466f3164	[SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s\|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s\|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143	2018-04-16 18:09:49 +00:00
Sanjay Patel	1170daa277	[InstCombine] simplify fneg+fadd folds; NFC Two cleanups: 1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file. 2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now. llvm-svn: 330126	2018-04-16 14:13:57 +00:00
Roman Lebedev	f84bfb2147	[InstCombine] Simplify 'xor' to 'or' if no common bits are set. Summary: In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]], let's first handle the simple straight-forward things. Let's start with the `and` -> `or` simplification. The one obvious thing missing here: the constant mask is not handled. I have an idea how to handle it, but it will require some thinking, and is not strictly required here, so i've left that for later. https://rise4fun.com/Alive/Pkmg Reviewers: spatel, craig.topper, eli.friedman, jingyue Reviewed By: spatel Subscribers: llvm-commits Was reviewed as part of https://reviews.llvm.org/D45631 llvm-svn: 330103	2018-04-15 18:59:44 +00:00
Roman Lebedev	620b3da38f	[InstCombine] Simplify 'add' to 'or' if no common bits are set. Summary: In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]], let's first handle the simple straight-forward things. Let's start with the `and` -> `or` simplification. The one obvious thing missing here: the constant mask is not handled. I have an idea how to handle it, but it will require some thinking, and is not strictly required here, so i've left that for later. https://rise4fun.com/Alive/Pkmg Reviewers: spatel, craig.topper, eli.friedman, jingyue Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45631 llvm-svn: 330101	2018-04-15 18:59:33 +00:00
Warren Ristow	8b2f27ce3a	[InstCombine] Enable Add/Sub simplifications with only 'reassoc' FMF These simplifications were previously enabled only with isFast(), but that is more restrictive than required. Since r317488, FMF has 'reassoc' to control these cases at a finer level. llvm-svn: 330089	2018-04-14 19:18:28 +00:00
Sanjay Patel	713f09d014	[InstCombine] add shift+logic tests (PR37098); NFC It debateable whether instcombine should be in the business of reassociation, but it is currently. These tests and PR37098 demonstrate a missing ability to do a simple reassociation that allows eliminating shifts. If we decide that functionality belongs somewhere else, then we should still have some tests to show that we've intentionally limited instcombine to not include this ability. llvm-svn: 330086	2018-04-14 13:39:02 +00:00
Roman Lebedev	8db3e115e7	[InstCombine][NFC] masked-merge: add 'and' tests, too. (and plain 'or', for completeness sake.) After submitting D45631, i have realized that it will already affect 'and' pattern, and it was obvious that there were no good test patterns to show that. Since the masked-merge.ll is getting kinda big, unify naming schemes a bit, and split into 'xor'/'and'/'or' testfiles, with the only difference being the last operation. llvm-svn: 330072	2018-04-13 21:57:01 +00:00
Krzysztof Parzyszek	dfed941eec	[LV] Introduce TTI::getMinimumVF The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062	2018-04-13 20:16:32 +00:00
Roman Lebedev	fe6a0b9a65	[InstCombine][NFC] masked-merge: commutativity tests: ensure the ordering. This was intended since initially, but i did not really think about it, and did not know how to force that. Now that the xor->or fold is working (patch upcoming), this came up to improve the test coverage. A followup for rL330003, rL330007 https://bugs.llvm.org/show_bug.cgi?id=6773 llvm-svn: 330039	2018-04-13 17:15:55 +00:00
Roman Lebedev	4899a9cc89	[InstCombine][NFC] Regenerate logical-select.ll test llvm-svn: 330017	2018-04-13 14:07:29 +00:00
Roman Lebedev	53e423ed1e	[InstCombine][NFC] Add last few tests with constant mask for masked merge folding. A followup for rL330003 https://bugs.llvm.org/show_bug.cgi?id=6773 llvm-svn: 330007	2018-04-13 12:00:00 +00:00
Roman Lebedev	038d996c80	[InstCombine][NFC] Add tests for masked merge folding. https://bugs.llvm.org/show_bug.cgi?id=6773 As discussed there, some backends may want to undo this fold (x86+bmi for scalars, x86+sse for vectors, ...) https://bugs.llvm.org/show_bug.cgi?id=37104 https://rise4fun.com/Alive/JXt llvm-svn: 330003	2018-04-13 10:56:35 +00:00
Roman Lebedev	c00659328a	[InstCombine]: foldSelectICmpAndAnd(): and is commutative Summary: The fold added in D45108 did not account for the fact that the and instruction is commutative, and if the mask is a variable, the mask variable and the fold variable may be swapped. I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] This extends/generalizes that fold, so it is handled too. Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45539 llvm-svn: 330001	2018-04-13 09:57:57 +00:00
Craig Topper	254ed028a4	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990	2018-04-13 06:07:18 +00:00
Vedant Kumar	65b0d4df20	[DebugInfo] Create merged locations for instructions other than calls This lifts a restriction on DILocation::getMergedLocation(), allowing it to create merged locations for instructions other than calls. Instruction::applyMergedLocation() now defaults to creating merged locations for all instructions. The default behavior of getMergedLocation() is unchanged: callers which invoke it directly are unaffected. This change will enable a follow-up Mem2Reg fix which improves crash reporting. Differential Revision: https://reviews.llvm.org/D45396 llvm-svn: 329955	2018-04-12 20:58:24 +00:00
Sam Parker	9737535943	[IRCE] isKnownNonNegative helper function Created a helper function to query for non negative SCEVs. Uses the SGE predicate to catch constants that could be interpreted as negative. Differential Revision: https://reviews.llvm.org/D45481 llvm-svn: 329907	2018-04-12 12:49:40 +00:00
Roman Lebedev	53271ba1d2	[InstCombine][NFC]: Add tests: foldSelectICmpAndAnd(): and is commutative Summary: The fold added in D45108 did not account for the fact that the and instruction is commutative, and if the mask is a variable, the mask variable and the fold variable may be swapped. I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 \| PR6773 ]] Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45538 llvm-svn: 329901	2018-04-12 12:04:57 +00:00
Hiroshi Inoue	bcadfee2ad	[NFC] fix trivial typos in documents and comments "is is" -> "is", "if if" -> "if", "or or" -> "or" llvm-svn: 329878	2018-04-12 05:53:20 +00:00
George Burgess IV	48ee59b6f0	[DeadArgElim] Remove allocsize attributes on callsites We're already removing allocsize attributes from Functions that we remove args from, since removing arguments from a function may make the allocsize attribute incorrect. It appears we forgot to also remove them from callsites. Without this, I get verifier errors on `@Test2`. It probably wouldn't be too hard to make DAE properly update allocsize attributes instead of dropping them, but I can't think of a scenario where that'd be useful in practice. llvm-svn: 329868	2018-04-12 02:06:01 +00:00
Daniel Neilson	381cdf3e07	[DSE] Add tests for atomic memory intrinsics (NFC) Summary: These tests show that DSE currently does nothing with the atomic memory intrinsics. Future work will teach DSE how to simplify these. llvm-svn: 329845	2018-04-11 19:46:02 +00:00
Daniel Neilson	9cfa786faa	[DSE] Regenerate tests with update_test_checks.py (NFC) Summary: In preparation for a future commit, this regenerates the test checks for test/Transforms/DeadStoreElimination/OverwriteStoreBegin.ll test/Transforms/DeadStoreElimination/OverwriteStoreEnd.ll llvm-svn: 329839	2018-04-11 18:43:10 +00:00
Daniel Neilson	7e2e5c3c58	[DSE] Regenerate tests with update_test_checks.py (NFC) Summary: In preparation for a future commit, this regenerates the test checks for test/Transforms/DeadStoreElimination/simple.ll test/Transforms/DeadStoreElimination/memintrinsics.ll llvm-svn: 329824	2018-04-11 16:50:04 +00:00
Sanjay Patel	ff98682c9c	[InstCombine] limit X - (cast(-Y) --> X + cast(Y) with hasOneUse() llvm-svn: 329821	2018-04-11 15:57:18 +00:00
Haicheng Wu	5ba379557d	[SLP] update a test case. NFC. llvm-svn: 329818	2018-04-11 15:09:49 +00:00
Artur Gainullin	d928201ac5	Eliminate a bitwise 'not' op of 'not' min/max by inverting the min/max. Bitwise 'not' of the min/max could be eliminated in the pattern: %notx = xor i32 %x, -1 %cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y %smax = select i1 %cmp1, i32 %notx, i32 %y %res = xor i32 %smax, -1 https://rise4fun.com/Alive/lCN Reviewers: spatel Reviewed by: spatel Subscribers: a.elovikov, llvm-commits Differential Revision: https://reviews.llvm.org/D45317 llvm-svn: 329791	2018-04-11 10:29:37 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Sanjay Patel	3b6d46761f	[CVP] simplify phi with constant incoming values that match common variable edge values This is based on an example that was recently posted on llvm-dev: void propagate_null(void b, int* g) { if (!b) { return 0; } (*g)++; return b; } https://godbolt.org/g/xYk3qG The original code or constant propagation in other passes has obscured the fact that the phi can be removed completely. Differential Revision: https://reviews.llvm.org/D45448 llvm-svn: 329755	2018-04-10 20:42:39 +00:00
Michael Zolotukhin	aa7868594e	[SSAUpdaterBulk] Handle CFG with unreachable from entry blocks. llvm-svn: 329660	2018-04-10 02:16:29 +00:00
Zhaoshi Zheng	43af17be41	[MemorySSAUpdater] Mark Phi users of a node being moved as non-optimize Fix PR36484, as suggested: <quote> during moves, mark the direct users of the erased things that were phis as "not to be optimized" <quote> llvm-svn: 329621	2018-04-09 20:55:37 +00:00
Alexey Bataev	2f67dbb73e	[SLP] Additional tests for reorder reuse vectorization, NFC. llvm-svn: 329603	2018-04-09 19:02:34 +00:00
Xin Tong	0efadbbcde	[MergeICmp] Split blocks that do other work. Summary: We do not try to move the instructions and split the block till we know the blocks can be split, i.e. BCE-cmp-insts can be separated from non-BCE-cmp-insts. Reviewers: davide, courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44443 llvm-svn: 329564	2018-04-09 13:14:06 +00:00
Max Kazantsev	8624a4786a	[IRCE] Relax restriction on collected range checks In IRCE, we have a very old legacy check that works when we collect comparisons that we treat as range checks. It ensures that the value against which the indvar is compared is loop invariant and is also positive. This latter condition remained there since the times when IRCE was only able to handle signed latch comparison. As the optimization evolved, it now learned how to intersect signed or unsigned ranges, and this logic has no reliance on the fact that the right border of each range should be positive. The old implementation of this non-negativity check was also naive enough and just looked into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this check did not allow to deal with the most simple case that looks like follows: int size; // not known non-negative int length; //known non-negative; i = 0; if (size != 0) { do { range_check(i < size); range_check(i < length); ++i; } while (i < size) } In this case, even if from some dominating conditions IRCE could parse loop structure, it could only remove the range check against `length` and simply ignored the check against `size`. In this patch we remove this obsolete check. It will allow IRCE to pick comparison against `size` as a potential range check and then let Range Intersection logic decide whether it is OK to eliminate it or not. Differential Revision: https://reviews.llvm.org/D45362 Reviewed By: samparker llvm-svn: 329547	2018-04-09 06:01:22 +00:00
Piotr Padlewski	ac5abbfdd9	NFC: Update NewGVN invariant.group test llvm-svn: 329533	2018-04-08 16:04:09 +00:00
Sanjay Patel	de9f7458a4	[InstCombine] add/move tests for fsub folds; NFC There are a pair of folds that try to merge fneg into fsub with an intervening cast, but as shown in the FIXME tests, they can create extra instructions. llvm-svn: 329501	2018-04-07 14:07:58 +00:00
Roman Lebedev	41922f1a6d	[InstCombine] Get rid of select of bittest (PR36950 / PR17564) Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 \| PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 \| PR17564 ]], D45065, D45107 https://godbolt.org/g/iAYRup Alive proof: https://rise4fun.com/Alive/uiH Testing: `ninja check-llvm` Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45108 llvm-svn: 329492	2018-04-07 10:37:24 +00:00
Vitaly Buka	66f53d71f7	Runtime flag to control branch funnel threshold Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45193 llvm-svn: 329459	2018-04-06 21:32:36 +00:00
Sanjay Patel	a9ca709011	[InstCombine] limit nsz: -(X - Y) --> Y - X to hasOneUse() As noted in the post-commit discussion for r329350, we shouldn't generally assume that fsub is the same cost as fneg. llvm-svn: 329429	2018-04-06 17:24:08 +00:00
Sanjay Patel	a6823f0e67	[InstCombine] add test for fsub+fneg with extra use; NFC llvm-svn: 329418	2018-04-06 16:30:52 +00:00
Sanjay Patel	bafdf97632	[InstCombine] add potential calloc tests and regenerate checks; NFC D45344 is proposing to remove the use restriction that made the calloc transform safe, but it doesn't currently address the problematic example given inD16337. Add a test to make sure that doesn't break. llvm-svn: 329412	2018-04-06 16:06:08 +00:00
Mircea Trofin	aa3fea6cb0	[GlobalOpt] Fix support for casts in ctors. Summary: Fixing an issue where initializations of globals where constructors use casts were silently translated to 0-initialization. Reviewers: davidxl, evgeny777 Reviewed By: evgeny777 Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45198 llvm-svn: 329409	2018-04-06 15:54:47 +00:00
Chad Rosier	45735b8e40	[LoopUnroll] Make LoopPeeling respect the AllowPeeling preference. The SimpleLoopUnrollPass isn't suppose to perform loop peeling. Differential Revision: https://reviews.llvm.org/D45334 llvm-svn: 329395	2018-04-06 13:57:21 +00:00
Hans Wennborg	b230c763a4	EntryExitInstrumenter: Handle musttail calls Inserting instrumentation between a musttail call and ret instruction would create invalid IR. Instead, treat musttail calls as function exits. llvm-svn: 329385	2018-04-06 10:14:09 +00:00
Sanjay Patel	04683de82f	[InstCombine] FP: Z - (X - Y) --> Z + (Y - X) This restores what was lost with rL73243 but without re-introducing the bug that was present in the old code. Note that we already have these transforms if the ops are marked 'fast' (and I assume that's happening somewhere in the code added with rL170471), but we clearly don't need all of 'fast' for these transforms. llvm-svn: 329362	2018-04-05 23:21:15 +00:00
Sanjay Patel	715ba65317	[InstCombine] add FP tests for Z - (X - Y); NFC A fold for this pattern was removed at rL73243 to fix PR4374: https://bugs.llvm.org/show_bug.cgi?id=4374 ...and apparently there were no tests that went with that fold. llvm-svn: 329360	2018-04-05 22:56:54 +00:00
Sanjay Patel	03e2526728	[InstCombine] nsz: -(X - Y) --> Y - X This restores part of the fold that was removed with rL73243 (PR4374). llvm-svn: 329350	2018-04-05 21:37:17 +00:00
Roman Lebedev	daa8da1ff4	[InstCombine][NFC] Regenerate select-of-bittest.ll with instnamer pass As requested by spatel in https://reviews.llvm.org/D45329 llvm-svn: 329349	2018-04-05 21:34:59 +00:00
Roman Lebedev	be9a226e21	[InstCombine] [NFC] Add more tests for getting rid of select of bittest (D45108, PR36950 / PR17564) Summary: More tests for D45108: * One use tests * allow shift to be a variable, too Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45329 llvm-svn: 329348	2018-04-05 21:34:53 +00:00
Daniel Neilson	367c2aea4e	[InstCombine] Properly change GEP type when reassociating loop invariant GEP chains Summary: This is a fix to PR37005. Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug whereby it will convert: %src = getelementptr inbounds i8, i8* %base, <2 x i64> %val %res = getelementptr inbounds i8, <2 x i8> %src, i64 %val2 into: %src = getelementptr inbounds i8, i8 %base, i64 %val2 %res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2 is loop invariant. This fix recreates new GEP instructions if the index operand swap would result in the type of %src changing from vector to scalar, or vice versa. Reviewers: sebpop, spatel Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45287 llvm-svn: 329331	2018-04-05 18:51:45 +00:00
Sanjay Patel	37248d35c3	[InstCombine] add test for fneg+fsub with nsz; NFC There used to be a fold that would handle this case more generally, but it was removed at rL73243 to fix PR4374: https://bugs.llvm.org/show_bug.cgi?id=4374 llvm-svn: 329322	2018-04-05 17:40:51 +00:00
Sanjay Patel	deaf4f354e	[InstCombine] use pattern matchers for fsub --> fadd folds This allows folding for vectors with undef elements. llvm-svn: 329316	2018-04-05 17:06:45 +00:00
Sanjay Patel	7becb3ae4b	[InstCombine] add tests for fsub --> fadd; NFC llvm-svn: 329313	2018-04-05 16:51:09 +00:00
Sanjay Patel	2204520e49	[PatternMatch] define m_FNeg using m_FSub Using cstfp_pred_ty in the definition allows us to match vectors with undef elements. This replicates the change for m_Not from D44076 / rL326823 and continues towards making all pattern matchers allow undef elements in vectors. llvm-svn: 329303	2018-04-05 15:36:55 +00:00
Sanjay Patel	2eaa2a43f8	[InstCombine] add vector and vector undef tests for FP folds; NFC llvm-svn: 329294	2018-04-05 15:07:35 +00:00
Florian Hahn	8eda4b0dc0	[LoopInterchange] Require asserts for test using -stats (NFC) This fixes a buildbot failure. llvm-svn: 329279	2018-04-05 13:07:39 +00:00
Florian Hahn	6e0043365b	[LoopInterchange] Add stats counter for number of interchanged loops. Reviewers: samparker, karthikthecool, blitz.opensource Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D45209 llvm-svn: 329269	2018-04-05 10:39:23 +00:00
Florian Hahn	831a757728	[LoopInterchange] Preserve LoopInfo after interchanging. LoopInterchange relies on LoopInfo being up-to-date, so we should preserve it after interchanging. This patch updates restructureLoops to move the BBs of the interchanged loops to the right place. Reviewers: davide, efriedma, karthikthecool, mcrosier Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D45278 llvm-svn: 329264	2018-04-05 09:48:45 +00:00
Taewook Oh	e0db533feb	[CallSiteSplitting] Do not perform callsite splitting inside landing pad Summary: If the callsite is inside landing pad, do not perform callsite splitting. Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure. Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad. Attached test case will crash with assertion failure without the fix. Reviewers: fhahn, junbuml, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45130 llvm-svn: 329250	2018-04-05 04:16:23 +00:00
Vitaly Buka	4296ea72ff	Don't inline @llvm.icall.branch.funnel Summary: @llvm.icall.branch.funnel is musttail with variable number of arguments. After inlining current backend can't separate call targets from call arguments. Reviewers: pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45116 llvm-svn: 329235	2018-04-04 21:46:27 +00:00
Eric Fiselier	96bbec79b4	[Analysis] Support aligned new/delete functions. Summary: Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well. This allows the compiler to perform certain optimizations including eliding new/delete calls. Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer Reviewed By: bkramer Subscribers: ckennelly, llvm-commits Differential Revision: https://reviews.llvm.org/D44769 llvm-svn: 329218	2018-04-04 19:01:51 +00:00
Eric Fiselier	e03d45fa8e	Revert "[Analysis] Support aligned new/delete functions." This reverts commit bee3bbd9bdd3ab3364b8fb0cdb6326bc1ae740e0. llvm-svn: 329217	2018-04-04 18:23:00 +00:00
Eric Fiselier	0d5f3b0281	[Analysis] Support aligned new/delete functions. Summary: Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well. This allows the compiler to perform certain optimizations including eliding new/delete calls. Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer Reviewed By: bkramer Subscribers: ckennelly, llvm-commits Differential Revision: https://reviews.llvm.org/D44769 llvm-svn: 329215	2018-04-04 18:12:01 +00:00
Roman Lebedev	c0c9ba7ee0	[InstCombine] [NFC] Add tests for getting rid of select of bittest (PR36950 / PR17564) Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 \| PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 \| PR17564 ]], D45065, D45108 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45107 llvm-svn: 329198	2018-04-04 14:10:13 +00:00
Simon Pilgrim	f1e668830f	[SLPVectorizer][X86] Regenerate some tests. NFCI llvm-svn: 329196	2018-04-04 13:53:51 +00:00
Nicolai Haehnle	eb7311ffb1	StructurizeCFG: Test for branch divergence correctly Fixes cases like the new test @nonuniform. In that test, %cc itself is a uniform value; however, when reading it after the end of the loop in basic block %if, its value is effectively non-uniform, so the branch is non-uniform. This problem was encountered in https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change in itself is not sufficient to fix that bug, as there is another issue in the AMDGPU backend. As discovered after committing an earlier version of this change, this exposes a subtle interaction between this pass and DivergenceAnalysis: since we remove and re-create branch instructions, we can no longer rely on DivergenceAnalysis for branches in subregions that were already processed by the pass. Explicitly remove branch instructions from DivergenceAnalysis to avoid dangling pointers as a matter of defensive programming, and change how we detect non-uniform subregions. Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4 Differential Revision: https://reviews.llvm.org/D43743 llvm-svn: 329165	2018-04-04 10:58:15 +00:00
Max Kazantsev	613af1f7ca	[SCEV] Prove implications for SCEVUnknown Phis This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis. If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values `L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient to prove the predicate for values that come from the same predecessor block. The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0` given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot. Differential Revision: https://reviews.llvm.org/D44001 Reviewed By: anna llvm-svn: 329150	2018-04-04 05:46:47 +00:00
Craig Topper	7d3aba6687	[SimplifyCFG] Teach merge conditional stores to handle cases where the PostBB has more than 2 predecessors by inserting a new block for the store. Summary: Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors. This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store. Reviewers: efriedma, jmolloy, ABataev Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39760 llvm-svn: 329142	2018-04-04 03:47:17 +00:00
Sanjay Patel	81b3b10a95	[InstCombine] allow more fmul folds with 'reassoc' The tests marked with 'FIXME' require loosening the check in SimplifyAssociativeOrCommutative() to optimize completely; that's still checking isFast() in Instruction::isAssociative(). llvm-svn: 329121	2018-04-03 22:19:19 +00:00
Gor Nishanov	d4712715dd	[coroutines] Respect alloca alignment requirements when building coroutine frame Summary: If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment. For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned. ``` %PackedStruct = type <{ i64 }> ... %data = alloca %PackedStruct, align 8 ``` If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame: ``` %f.Frame = type { ..., i16, [6 x i8], %PackedStruct } ``` Reviewers: rnk, lewissbaker, modocache Reviewed By: modocache Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D45221 llvm-svn: 329112	2018-04-03 20:54:20 +00:00
Florian Hahn	9467ccf447	[LoopInterchange] Add remark for calls preventing interchanging. It also updates test/Transforms/LoopInterchange/call-instructions.ll to use accesses where we can prove dependence after D35430. Reviewers: sebpop, karthikthecool, blitz.opensource Reviewed By: sebpop Differential Revision: https://reviews.llvm.org/D45206 llvm-svn: 329111	2018-04-03 20:54:04 +00:00
Daniel Neilson	901acfab0c	[InstCombine] Fold compare of int constant against a splatted vector of ints Summary: Folding patterns like: %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer %cast = bitcast <4 x i8> %vec to i32 %cond = icmp eq i32 %cast, 0 into: %ext = extractelement <4 x i8> %insvec, i32 0 %cond = icmp eq i32 %ext, 0 Combined with existing rules, this allows us to fold patterns like: %insvec = insertelement <4 x i8> undef, i8 %val, i32 0 %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer %cast = bitcast <4 x i8> %vec to i32 %cond = icmp eq i32 %cast, 0 into: %cond = icmp eq i8 %val, 0 When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant. Reviewers: spatel, anna, mkazantsev Reviewed By: spatel Subscribers: efriedma, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D44997 llvm-svn: 329087	2018-04-03 17:26:20 +00:00
Alexey Bataev	428e9d9d87	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085	2018-04-03 17:14:47 +00:00
Florian Hahn	b79217077d	[LoopInterchange] Update tests so DA can handle access after D35430. I have taken the opportunity to simplify some tests slightly and move parts around. It also brings back a few IR checks for interchangable loops. Reviewers: karthikthecool, sebpop, grosser Reviewed By: sebpop Differential Revision: https://reviews.llvm.org/D45207 llvm-svn: 329081	2018-04-03 16:37:58 +00:00
Alexey Bataev	976aff148a	[SLP] Added tests for checks of reordering of the repeated instructions, NFC. llvm-svn: 329080	2018-04-03 16:31:26 +00:00
Benjamin Kramer	2fc3b18922	Revert "[SLP] Fix PR36481: vectorize reassociated instructions." This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071	2018-04-03 14:40:33 +00:00
Ikhlas Ajbar	b7322e8ac7	peel loops with runtime small trip counts For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042	2018-04-03 03:39:43 +00:00
Haicheng Wu	7f0daaeb86	[SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035	2018-04-03 00:05:10 +00:00
Brian Gesiak	64521bed0d	[Coroutines] Avoid assert splitting hidden coros Summary: When attempting to split a coroutine with 'hidden' visibility (for example, a C++ coroutine that is inlined when compiled with the option '-fvisibility-inlines-hidden'), LLVM would hit an assertion in include/llvm/IR/GlobalValue.h:240: "local linkage requires default visibility". The issue is that the visibility is copied from the source of the function split in the `CloneFunctionInto` function, but the linkage is not. To fix, create the new function first with external linkage, then copy the linkage from the original function after `CloneFunctionInto` is called. Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the explicit call to `setDSOLocal` can be removed in CoroSplit.cpp. Test Plan: check-llvm Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk Reviewed By: rnk Subscribers: llvm-commits, eric_niebler Differential Revision: https://reviews.llvm.org/D44185 llvm-svn: 329033	2018-04-02 23:39:40 +00:00
Reid Kleckner	298ffc609b	[InstCombine] Don't strip function type casts from musttail calls Summary: The cast simplifications that instcombine does here do not make any attempt to obey the verifier rules for musttail calls. Therefore we have to disable them. Reviewers: efriedma, majnemer, pcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45186 llvm-svn: 329027	2018-04-02 22:49:44 +00:00
Reid Kleckner	a9e9918ee4	Treat inlining a notail call as a regular, non-tail call Otherwise, we end up inlining a musttail call into a non-tail position, which breaks verifier invariants. Fixes PR31014 llvm-svn: 329015	2018-04-02 21:23:16 +00:00
Sanjay Patel	cbb0450540	[InstCombine] add folds for icmp + sub (PR36969) (A - B) >u A --> A <u B C <u (C - D) --> C <u D https://rise4fun.com/Alive/e7j Name: ugt %sub = sub i8 %x, %y %cmp = icmp ugt i8 %sub, %x => %cmp = icmp ult i8 %x, %y Name: ult %sub = sub i8 %x, %y %cmp = icmp ult i8 %x, %sub => %cmp = icmp ult i8 %x, %y This should fix: https://bugs.llvm.org/show_bug.cgi?id=36969 llvm-svn: 329011	2018-04-02 20:37:40 +00:00
Sanjay Patel	be0442eeaa	[InstCombine] add tests for icmp (sub x, y), x (PR36969); NFC llvm-svn: 329010	2018-04-02 20:23:54 +00:00
Rong Xu	5a8d4c3357	[DeadArgumentElim] Clone function level metadatas Some Function level metadatas, such as function entry count, are not cloned in DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim after internalization. This patch clones the metadatas in the original function to the new function. Differential Revision: https://reviews.llvm.org/D44127 llvm-svn: 328991	2018-04-02 17:27:38 +00:00
Gor Nishanov	b0316d96ae	[coroutines] Add support for llvm.coro.noop intrinsics Summary: A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing. To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed. Reviewers: EricWF, modocache, rnk, lewissbaker Reviewed By: modocache Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45114 llvm-svn: 328986	2018-04-02 16:55:12 +00:00
Alexey Bataev	3decaf4275	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980	2018-04-02 14:51:37 +00:00
Teresa Johnson	974706ebf7	[ThinLTO] Add an import cutoff for debugging/triaging Summary: Adds -import-cutoff=N which will stop importing during the thin link after N imports. Default is -1 (no limit). Reviewers: wmi Subscribers: inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D45127 llvm-svn: 328934	2018-04-01 15:54:40 +00:00
David Green	f80ebc8d21	[LoopRotate] Rotate loops with loop exiting latches If a loop has a loop exiting latch, it can be profitable to rotate the loop if it leads to the simplification of a phi node. Perform rotation in these cases even if loop rotate itself didnt simplify the loop to get there. Differential Revision: https://reviews.llvm.org/D44199 llvm-svn: 328933	2018-04-01 12:48:24 +00:00
Teresa Johnson	db83aceb06	[ThinLTO] Add an option to force summary call edges cold for debugging Summary: Useful to selectively disable importing into specific modules for debugging/triaging/workarounds. Reviewers: eraman Subscribers: inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D45062 llvm-svn: 328909	2018-03-31 00:18:08 +00:00
Krzysztof Parzyszek	fce30c2ba3	Revert "peel loops with runtime small trip counts" This reverts commit r328854, it breaks some Hexagon tests. llvm-svn: 328875	2018-03-30 16:55:44 +00:00
Ikhlas Ajbar	a7d614c3e5	[Hexagon] add missing lit config file llvm-svn: 328855	2018-03-30 03:32:24 +00:00
Ikhlas Ajbar	66c8ba5a50	peel loops with runtime small trip counts For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 328854	2018-03-30 03:05:34 +00:00
Dinar Temirbulatov	c326c1c582	[SLPVectorizer] Add tests related to PR30787, NFCI. llvm-svn: 328813	2018-03-29 18:57:03 +00:00
Haicheng Wu	c7cc87922e	[JumpThreading] Don't select an edge that we know we can't thread In r312664 (D36404), JumpThreading stopped threading edges into loop headers. Unfortunately, I observed a significant performance regression as a result of this change. Upon further investigation, the problematic pattern looked something like this (after many high level optimizations): while (true) { bool cond = ...; if (!cond) { <body> } if (cond) break; } Now, naturally we want jump threading to essentially eliminate the second if check and hook up the edges appropriately. However, the above mentioned change, prevented it from doing this because it would have to thread an edge into the loop header. Upon further investigation, what is happening is that since both branches are threadable, JumpThreading picks one of them at arbitrarily. In my case, because of the way that the IR ended up, it tended to pick the one to the loop header, bailing out immediately after. However, if it had picked the one to the exit block, everything would have worked out fine (because the only remaining branch would then be folded, not thraded which is acceptable). Thus, to fix this problem, we can simply eliminate loop headers from consideration as possible threading targets earlier, to make sure that if there are multiple eligible branches, we can still thread one of the ones that don't target a loop header. Patch by Keno Fischer! Differential Revision: https://reviews.llvm.org/D42260 llvm-svn: 328798	2018-03-29 16:01:26 +00:00
Rong Xu	662f38b16f	[PGO] Fix branch probability remarks assert Fixed counter/weight overflow that leads to an assertion. Also fixed the help string for pgo-emit-branch-prob option. Differential Revision: https://reviews.llvm.org/D44809 llvm-svn: 328653	2018-03-27 18:55:56 +00:00
Sam Parker	90b7f4f72c	[IRCE] Enable decreasing loops of non-const bound As a follow-up to r328480, this updates the logic for the decreasing safety checks in a similar manner: - CanBeMax is replaced by CannotBeMaxInLoop which queries isLoopEntryGuardedByCond on the maximum value. - SumCanReachMin is replaced by isSafeDecreasingBound which includes some logic from parseLoopStructure and, again, has been updated to use isLoopEntryGuardedByCond on the given bounds. Differential Revision: https://reviews.llvm.org/D44776 llvm-svn: 328613	2018-03-27 08:24:53 +00:00
Max Kazantsev	7094c8deb2	[SCEV] Make exact taken count calculation more optimistic Currently, `getExact` fails if it sees two exit counts in different blocks. There is no solid reason to do so, given that we only calculate exact non-taken count for exiting blocks that dominate latch. Using this fact, we can simply take min out of all exits of all blocks to get the exact taken count. This patch makes the calculation more optimistic with enforcing our assumption with asserts. It allows us to calculate exact backedge taken count in trivial loops like for (int i = 0; i < 100; i++) { if (i > 50) break; . . . } Differential Revision: https://reviews.llvm.org/D44676 Reviewed By: fhahn llvm-svn: 328611	2018-03-27 07:30:38 +00:00
Max Kazantsev	a63d333881	[SCEV] Add one more case in computeConstantDifference This patch teaches `computeConstantDifference` handle calculation of constant difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`. Differential Revision: https://reviews.llvm.org/D43759 Reviewed By: anna llvm-svn: 328609	2018-03-27 04:54:00 +00:00
Eli Friedman	88e2bac94d	[MemorySSA] Fix exponential compile-time updating MemorySSA. MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for each block, it computes the previous definition for each predecessor, then takes those definitions and combines them. But currently it doesn't remember results which it already computed; this means it can visit the same block multiple times, which adds up to exponential time overall. To fix this, this patch adds a cache. If we computed the result for a block already, we don't need to visit it again because we'll come up with the same result. Well, unless we RAUW a MemoryPHI; in that case, the TrackingVH will be updated automatically. This matches the original source paper for this algorithm. The testcase isn't really a test for the bug, but it adds coverage for the case where tryRemoveTrivialPhi erases an existing PHI node. (It's hard to write a good regression test for a performance issue.) Differential Revision: https://reviews.llvm.org/D44715 llvm-svn: 328577	2018-03-26 19:52:54 +00:00
Haicheng Wu	b45f921678	[SLP] Add more checks to a test case. NFC. llvm-svn: 328572	2018-03-26 18:59:28 +00:00
Haicheng Wu	0ec1dbe417	[SLP] Add a test case. NFC. llvm-svn: 328546	2018-03-26 16:47:37 +00:00
Sebastian Pop	d870aea03e	[InstCombine] reassociate loop invariant GEP chains to enable LICM This change brings performance of zlib up by 10%. The example below is from a hot loop in longest_match() from zlib. do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1 In this example %idx.ext1 is a loop invariant. It will be moved above the use of loop induction variable %idx.ext such that it can be hoisted out of the loop by LICM. The operands that have dependences carried by the loop will be sinked down in the GEP chain. This patch will produce the following output: do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1 %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext llvm-svn: 328539	2018-03-26 16:19:31 +00:00
Sanjay Patel	4fd4fd610c	[InstCombine] distribute fmul over fadd/fsub This replaces a large chunk of code that was looking for compound patterns that include these sub-patterns. Existing tests ensure that all of the previous examples are still folded as expected. We still need to loosen the FMF check. llvm-svn: 328502	2018-03-26 15:03:57 +00:00
Sanjay Patel	2455fef497	[InstCombine] check uses before creating instructions for fmul distribution As the tests show, we could create extra instructions without any obvious benefit. llvm-svn: 328498	2018-03-26 14:25:43 +00:00
Max Kazantsev	a55749312b	[LoopUnroll] Fix dangling pointers in SCEV Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on fact that exit count of outer loops cannot rely on exiting blocks of inner loops, which is true in current implementation of backedge taken count calculation but is wrong in general. As result, when we only forget the loop that we have just unrolled, we may still have cached data for its outer loops (in particular, exit counts) which keeps references on blocks of inner loop that could have been changed or even deleted. The attached test demonstrates a situaton when after unrolling of innermost loop the outermost loop contains a dangling pointer on non-existant block. The problem shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV smarter about exit count calculation. I am not sure if the bug exists without this patch, it appears that now it is accidentally correct just because in practice exact backedge taken count for outer loops with complex control flow inside is never calculated. But when SCEV learns to do so, this problem shows up. This patch replaces existing logic of SCEV loop invalidation with a correct one, which happens to be invalidation of outermost loop (which also leads to invalidation of all loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers on removed blocks, or just outdated information that has changed after unrolling. Differential Revision: https://reviews.llvm.org/D44818 Reviewed By: samparker llvm-svn: 328483	2018-03-26 11:31:46 +00:00
Benjamin Kramer	8840f644b4	[DeadArgElim] Strip allocsize attributes when deleting an argument. Since allocsize refers to the argument number it gets invalidated when an argument is removed and the numbers shift. llvm-svn: 328481	2018-03-26 09:44:24 +00:00
Sam Parker	53a423a417	[IRCE] Enable increasing loops of variable bounds CanBeMin is currently used which will report true for any unknown values, but often a check is performed outside the loop which covers this situation: for (int i = 0; i < N; ++i) ... if (N > 0) for (int i = 0; i < N; ++i) ... So I've add 'LoopGuardedAgainstMin' which reports whether N is greater than the minimum value which then allows loop with a variable loop count to be optimised. I've also moved the increasing bound checking into its own function and replaced SumCanReachMax is another isLoopEntryGuardedByCond function. llvm-svn: 328480	2018-03-26 09:29:42 +00:00
Sanjay Patel	93e64dd9a1	[PatternMatch] allow undef elements when matching vector FP +0.0 This continues the FP constant pattern matching improvements from: https://reviews.llvm.org/rL327627 https://reviews.llvm.org/rL327339 https://reviews.llvm.org/rL327307 Several integer constant matchers also have this ability. I'm separating matching of integer/pointer null from FP positive zero and renaming/commenting to make the functionality clearer. llvm-svn: 328461	2018-03-25 21:16:33 +00:00
Sanjay Patel	c84b48ec29	[InstSimplify, InstCombine] add/update tests with FP +0.0 vector with undef; NFC llvm-svn: 328455	2018-03-25 17:48:20 +00:00
Sanjay Patel	e94a85533f	[InstCombine] adjust test comments; NFC llvm-svn: 328450	2018-03-25 14:24:32 +00:00
Sanjay Patel	60d0e119c7	[InstCombine] consolidate casted icmp vector tests We have thorough coverage of predicates and scalar types, so we just need a sampling of vector tests to show that things are working or not with vectors types. llvm-svn: 328449	2018-03-25 14:19:25 +00:00
Sanjay Patel	841aac04d4	[InstCombine] peek through more icmp of FP cast + bitcast This is an extension of rL328426 as noted in D44367. llvm-svn: 328448	2018-03-25 14:01:42 +00:00
Sanjay Patel	745a9c62c2	[InstCombine] peek through FP casts for sign-bit compares (PR36682) This pattern came up in PR36682: https://bugs.llvm.org/show_bug.cgi?id=36682 https://godbolt.org/g/LhuD9A Equality checks are planned as a follow-up enhancement. Differential Revision: https://reviews.llvm.org/D44367 llvm-svn: 328426	2018-03-24 15:45:02 +00:00
Sanjay Patel	6a176d527c	[InstCombine] add multi-use/vector tests for intrinsic shrinking; NFC llvm-svn: 328422	2018-03-24 14:45:41 +00:00
Fedor Sergeev	6660fd0f95	[PM][FunctionAttrs] add NoUnwind attribute inference to PostOrderFunctionAttrs pass Summary: This was motivated by absence of PrunEH functionality in new PM. It was decided that a proper way to do PruneEH is to add NoUnwind inference into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top. This change generalizes attribute handling implemented for (a removal of) Convergent attribute, by introducing a generic builder-like class AttributeInferer It registers all the attribute inference requests, storing per-attribute predicates into a vector, and then goes through an SCC Node, scanning all the instructions for not breaking attribute assumptions. The main idea is that as soon all the instructions from all the functions of SCC Node conform to attribute assumptions then we are free to infer the attribute as set for all the functions of SCC Node. It handles two distinct cases of attributes: - those that might break due to derefinement of the function code for these attributes we are allowed to apply inference only if all the functions are "exact definitions". Example - NoUnwind. - those that do not care about derefinement for these attributes we are allowed to apply inference as soon as we see any function definition. Example - removal of Convergent attribute. Also in this commit: * Converted all the FunctionAttrs tests to use FileCheck and added new-PM invocations to them * FunctionAttrs/convergent.ll test demonstrates a difference in behavior between new and old PM implementations. Marked with FIXME. * PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg combo as intended * some of "other" tests were updated since function-attrs now infers 'nounwind' even for old PM pipeline * -disable-nounwind-inference hidden option added as a possible workaround for a supposedly rare case when nounwind being inferred by default presents a problem Reviewers: chandlerc, jlebar Reviewed By: jlebar Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D44415 llvm-svn: 328377	2018-03-23 21:46:16 +00:00
Sanjay Patel	9b09fe9f66	[InstCombine] increase test coverage for intrinsic shrinking; NFC There were no tests with vector types before this. llvm-svn: 328371	2018-03-23 21:13:53 +00:00
Sanjay Patel	cd1f3e7a78	[InstCombine] auto-generate checks; NFC llvm-svn: 328329	2018-03-23 15:39:03 +00:00
Sanjay Patel	3547dcb3ae	[InstSimplify] regenerate checks, move tests; NFC llvm-svn: 328327	2018-03-23 15:31:31 +00:00
Sanjay Patel	d189b596ae	[InstCombine] regenerate test checks; NFC llvm-svn: 328325	2018-03-23 15:19:35 +00:00
Matthew Simpson	6c289a1c74	[SLP] Stop counting cost of gather sequences with multiple uses When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316	2018-03-23 14:18:27 +00:00
Florian Hahn	f73c3ece7f	Revert r328307: [IPSCCP] Use constant range information for comparisons of parameters. Reverted for now, due to it causing verifier failures. llvm-svn: 328312	2018-03-23 12:49:39 +00:00
Florian Hahn	b1feec087e	[IPSCCP] Use constant range information for comparisons of parameters. For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 328307	2018-03-23 11:56:00 +00:00
Florian Hahn	52436a587e	[LoopUnroll] Simplify induction variables after peeling too. Loop peeling also has an impact on the induction variables, so we should benefit from induction variable simplification after peeling too. Reviewers: sanjoy, bogner, mzolotukhin, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43878 llvm-svn: 328301	2018-03-23 10:38:12 +00:00
Evgeny Stupachenko	579507a53a	Revert r325687 (workaround for PR36032). Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 328257	2018-03-22 22:04:39 +00:00
Matt Morehouse	236cdaf84c	[SimplifyCFG] Create attribute for fuzzing-specific optimizations. Summary: When building with libFuzzer, converting control flow to selects or obscuring the original operands of CMPs reduces the effectiveness of libFuzzer's heuristics. This patch provides an attribute to disable or modify certain optimizations for optimal fuzzing signal. Provides a less aggressive alternative to https://reviews.llvm.org/D44057. Reviewers: vitalybuka, davide, arsenm, hfinkel Reviewed By: vitalybuka Subscribers: junbuml, mehdi_amini, wdng, javed.absar, hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44232 llvm-svn: 328214	2018-03-22 17:07:51 +00:00
Anna Thomas	9b1176b0ef	[LoopPredication] Add profitability check based on BPI Summary: LoopPredication is not profitable when the loop is known to always exit through some block other than the latch block. A coarse grained latch check can cause loop predication to predicate the loop, and unconditionally deoptimize. However, without predicating the loop, the guard may never fail within the loop during the dynamic execution because the non-latch loop termination condition exits the loop before the latch condition causes the loop to exit. We teach LP about this using BranchProfileInfo pass. Reviewers: apilipenko, skatkov, mkazantsev, reames Reviewed by: skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44667 llvm-svn: 328210	2018-03-22 16:03:59 +00:00
Sanjay Patel	94c91b78e7	[InstCombine] add folds for xor-of-icmp signbit tests (PR36682) This is a retry of r328119 which was reverted at r328145 because it could crash by trying to combine icmps with different operand types. This version has a check for that and additional tests. Original commit message: This is part of solving: https://bugs.llvm.org/show_bug.cgi?id=36682 There's also a leftover improvement from the long-ago-closed: https://bugs.llvm.org/show_bug.cgi?id=5438 https://rise4fun.com/Alive/dC1 llvm-svn: 328197	2018-03-22 14:08:16 +00:00
Sanjay Patel	1d0832d9a3	[InstCombine] move/add tests for fmul distribution; NFC There are at least 3 problems: 1. We're distributing across large patterns, but fail to do that for the minimal patterns. 2. We're not checking uses, so we may create more instructions than we eliminate. 3. We should be able to do these transforms with less than full 'fast' fmuls. llvm-svn: 328152	2018-03-21 21:28:19 +00:00
Reid Kleckner	762331be07	Revert r328119 "[InstCombine] add folds for xor-of-icmp signbit tests (PR36682)" This asserts when compiling safe_numerics_unittest.cpp in Chromium with MSan. llvm-svn: 328145	2018-03-21 20:35:36 +00:00
Sanjay Patel	e235942a1e	[InstSimplify] fp_binop X, NaN --> NaN We propagate the existing NaN value when possible. Differential Revision: https://reviews.llvm.org/D44521 llvm-svn: 328140	2018-03-21 19:31:53 +00:00
Matthew Simpson	b17fff79f0	[SLP] Add test case for a gather sequence with multiple uses llvm-svn: 328133	2018-03-21 19:13:14 +00:00
Sanjay Patel	778032f39d	[InstCombine] add folds for xor-of-icmp signbit tests (PR36682) This is part of solving: https://bugs.llvm.org/show_bug.cgi?id=36682 There's also a leftover improvement from the long-ago-closed: https://bugs.llvm.org/show_bug.cgi?id=5438 https://rise4fun.com/Alive/dC1 llvm-svn: 328119	2018-03-21 17:17:13 +00:00
Sanjay Patel	3da85ae5a5	[InstCombine] move/add tests for xor-of-icmps (PR36682); NFC llvm-svn: 328109	2018-03-21 15:54:48 +00:00
Daniel Neilson	6f1eb58e92	[MemCpyOpt] Update to new API for memory intrinsic alignment Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the MemCpyOpt pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. We also add a few tests to fill gaps in the testing of this pass. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 328097	2018-03-21 14:14:55 +00:00
Justin Lebar	038cbc5c13	Re-re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions. Summary: If the operands of a udiv/urem can be proved to fit within a smaller power-of-two-sized type, reduce the width of the udiv/urem. Backed out for causing performance regressions. Re-landing because we've determined that these regressions were noise. Original Differential Revision: https://reviews.llvm.org/D44102 llvm-svn: 328096	2018-03-21 14:08:21 +00:00
Shoaib Meenai	3f689c8632	[ObjCARC] Add funclet token to ARC marker The inline assembly generated for the ARC autorelease elision marker must have a funclet token if it's emitted inside a funclet, otherwise the inline assembly (and all subsequent code in the funclet) will be marked unreachable by WinEHPrepare. Note that this only applies for the non-O0 case, since at O0, clang emits the autorelease elision marker itself rather than deferring to the backend. The fix for clang is handled in a separate change. Differential Revision: https://reviews.llvm.org/D44641 llvm-svn: 328042	2018-03-20 20:45:41 +00:00
Xin Tong	bdbd97ed9a	[MergeICmp] Fix a bug in entry block shuffled to middle of the chain Summary: Fix a bug in entry block shuffled to middle of the chain. Reviewers: davide, courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44642 llvm-svn: 327971	2018-03-20 11:57:54 +00:00
Andrei Elovikov	8b8253fdc7	[LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960	2018-03-20 09:04:39 +00:00
Sanjay Patel	0ce3086777	[InstCombine] canonicalize fcmp+select to fabs This is complicated by -0.0 and nan. This is based on the DAG patterns as shown in D44091. I'm hoping that we can just remove those DAG folds and always rely on IR canonicalization to handle the matching to fabs. We would still need to delete the broken code from DAGCombiner to fix PR36600: https://bugs.llvm.org/show_bug.cgi?id=36600 Differential Revision: https://reviews.llvm.org/D44550 llvm-svn: 327858	2018-03-19 15:14:30 +00:00
Serguei Katkov	529f42331e	[SCEV] Re-land: Fix isKnownPredicate This is re-land of https://reviews.llvm.org/rL327362 with a fix and regression test. The crash was due to it is possible that for found MDL loop, LHS or RHS may contain an invariant unknown SCEV which does not dominate the MDL. Please see regression test for an example. Reviewers: sanjoy, mkazantsev, reames Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44553 llvm-svn: 327822	2018-03-19 06:35:30 +00:00
Anastasis Grammenos	3a589103a4	[LICM] Salvage DI from dying Instructions LICM deletes trivially dead instructions which it won't attempt to sink. Attempt to salvage debug values which reference these instructions. llvm-svn: 327800	2018-03-18 15:59:19 +00:00
Roman Lebedev	e6da3063a5	[InstCombine] peek through unsigned FP casts for zero-equality compares (PR36682) Summary: This pattern came up in PR36682 / D44390 https://bugs.llvm.org/show_bug.cgi?id=36682 https://reviews.llvm.org/D44390 https://godbolt.org/g/oKvT5H See also D44416 Reviewers: spatel, majnemer, efriedma, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D44424 llvm-svn: 327799	2018-03-18 15:53:02 +00:00
Sanjay Patel	63b1028953	[InstCombine] add nnan requirement for sqrt(x) * sqrt(y) -> sqrt(x*y) This is similar to D43765. llvm-svn: 327797	2018-03-18 14:32:54 +00:00
Sanjay Patel	95ec4a4dfe	[InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X As shown in the code comment, we don't need all of 'fast', but we do need reassoc + nsz + nnan. Differential Revision: https://reviews.llvm.org/D43765 llvm-svn: 327796	2018-03-18 14:12:25 +00:00
Sanjay Patel	5a5c33d8b5	[InstSimplify] add NaN constant diversity; NFC llvm-svn: 327743	2018-03-16 20:55:55 +00:00
Philip Reames	8a106272e8	[LICM/mustexec] Extend first iteration must execute logic to fcmps This builds on the work from https://reviews.llvm.org/D44287. It turned out supporting fcmp was much easier than I realized, so let's do that now. As an aside, our -O3 handling of a floating point IVs leaves a lot to be desired. We do convert the float IV to an integer IV, but do so late enough that many other optimizations are missed (e.g. we don't vectorize). Differential Revision: https://reviews.llvm.org/D44542 llvm-svn: 327722	2018-03-16 16:33:49 +00:00
Sanjay Patel	2b94927f0d	[InstCombine] add nnan requirement to potential fabs folds tests; NFC As noted in D44550, we can't guarantee preserving the sign-bit of NaN if we convert these to fabs(). llvm-svn: 327718	2018-03-16 15:27:39 +00:00
Brian M. Rzycki	f65ddc5fa2	[JumpThreading] Track unreachable BBs to avoid processing JumpThreading iterates over F until the IR quiesces. Transforming unreachable BBs increases compile time and it is also possible to never stabilize causing JumpThreading to hang. An older attempt at fixing this problem was D3991 where removeUnreachableBlocks(F) was called before JumpThreading began. This has a few drawbacks: * expensive - the routine attempts to fix up the IR to identify additional BBs that can be removed along with unreachable BBs. * aggressive - does not identify and preserve the shape of the IR. At a minimum it does not preserve loop hierarchies. * invasive - altering reachable blocks it may disrupt IR shapes that could have otherwise been JumpThreaded. This patch avoids removeUnreachableBlocks(F) and instead tracks unreachable BBs in a SmallPtrSet using DominatorTree to validate the initial state of all BBs. We then rely on subsequent passes to identify and remove these unreachable blocks from F. Reviewers: dberlin, sebpop, kuhar, dinesh.d Reviewed by: sebpop, kuhar Subscribers: hiraditya, uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D44177 llvm-svn: 327713	2018-03-16 15:13:47 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Sanjay Patel	728269c10d	[InstCombine] add more tests for fcmp+select -> fabs; NFC This should correspond to the patterns in D44091 and might make handling these in the DAG unnecessary. llvm-svn: 327689	2018-03-16 01:06:33 +00:00
Sanjay Patel	2d568ec0e4	[InstCombine] add tests for fcmp+select -> fabs; NFC llvm-svn: 327680	2018-03-15 22:48:23 +00:00
Florian Hahn	fc97b6173f	[LoopUnroll] Peel off iterations if it makes conditions true/false. If the loop body contains conditions of the form IndVar < #constant, we can remove the checks by peeling off #constant iterations. This improves codegen for PR34364. Reviewers: mkuper, mkazantsev, efriedma Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D43876 llvm-svn: 327671	2018-03-15 21:34:43 +00:00
Philip Reames	a21d5f1e18	[LICM] Ignore exits provably not taken on first iteration when computing must execute It is common to have conditional exits within a loop which are known not to be taken on some iterations, but not necessarily all. This patches extends our reasoning around guaranteed to execute (used when establishing whether it's safe to dereference a location from the preheader) to handle the case where an exit is known not to be taken on the first iteration and the instruction of interest is known to be taken on the first iteration. This case comes up in two major ways: * If we have a range check which we've been unable to eliminate, we frequently know that it doesn't fail on the first iteration. * Pass ordering. We may have a check which will be eliminated through some sequence of other passes, but depending on the exact pass sequence we might never actually do so or we might miss other optimizations from passes run before the check is finally eliminated. The initial version (here) is implemented via InstSimplify. At the moment, it catches a few cases, but misses a lot too. I added test cases for missing cases in InstSimplify which I'll follow up on separately. Longer term, we should probably wire SCEV through to here to get much smarter loop aware simplification of the first iteration predicate. Differential Revision: https://reviews.llvm.org/D44287 llvm-svn: 327664	2018-03-15 21:04:28 +00:00
Philip Reames	422024a1b7	[EarlyCSE] Don't hide earler invariant.scopes If we've already established an invariant scope with an earlier generation, we don't want to hide it in the scoped hash table with one with a later generation. I noticed this when working on the invariant-load handling, but it also applies to the invariant.start case as well. Without this change, my previous patch for invariant-load regresses some cases, so I'm pushing this without waiting for review. This is why you don't make last minute tweaks to patches to catch "obvious cases" after it's already been reviewed. Bad Philip! llvm-svn: 327655	2018-03-15 18:12:27 +00:00
Philip Reames	ca587fe0b4	[EarlyCSE] Reuse invariant scopes for invariant load This is a follow up to https://reviews.llvm.org/D43716 which rewrites the invariant load handling using the new infrastructure. It's slightly more powerful, but only in somewhat minor ways for the moment. It's not clear that DSE of stores to invariant locations is actually interesting since why would your IR have such a construct to start with? Note: The submitted version is slightly different than the reviewed one. I realized the scope could start for an invariant load which was proven redundant and removed. Added a test case to illustrate that as well. Differential Revision: https://reviews.llvm.org/D44497 llvm-svn: 327646	2018-03-15 17:29:32 +00:00
Roman Lebedev	6aca33534b	[InstSimplify] peek through unsigned FP casts for sign-bit compares (PR36682) This pattern came up in PR36682 / D44390 https://bugs.llvm.org/show_bug.cgi?id=36682 https://reviews.llvm.org/D44390 https://godbolt.org/g/oKvT5H See also D44421, D44424 Reviewers: spatel, majnemer, efriedma, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D44425 llvm-svn: 327642	2018-03-15 16:17:46 +00:00
Matthew Simpson	c1c4ad6e64	[ConstantFolding, InstSimplify] Handle more vector GEPs This patch addresses some additional cases where the compiler crashes upon encountering vector GEPs. This should fix PR36116. Differential Revision: https://reviews.llvm.org/D44219 Reference: https://bugs.llvm.org/show_bug.cgi?id=36116 llvm-svn: 327638	2018-03-15 16:00:29 +00:00
Sanjay Patel	43f71eade0	[InstSimplify] add tests with NaN operand for fp binops; NFC llvm-svn: 327631	2018-03-15 14:48:39 +00:00
Sanjay Patel	a4f42f2cfd	[PatternMatch, InstSimplify] allow undef elements when matching any vector FP zero This matcher implementation appears to be slightly more efficient than the generic constant check that it is replacing because every use was for matching FP patterns, but the previous code would check int and pointer type nulls too. llvm-svn: 327627	2018-03-15 14:29:27 +00:00
Sanjay Patel	8f063d0c70	[InstSimplify] remove 'nsz' requirement for frem 0, X From the LangRef definition for frem: "The value produced is the floating-point remainder of the two operands. This is the same output as a libm ‘fmod‘ function, but without any possibility of setting errno. The remainder has the same sign as the dividend. This instruction is assumed to execute in the default floating-point environment." llvm-svn: 327626	2018-03-15 14:04:31 +00:00
Ulrich Weigand	f4ceef8d3f	[Debug] Retain both copies of debug intrinsics in HoistThenElseCodeToIf When hoisting common code from the "then" and "else" branches of a condition to before the "if", the HoistThenElseCodeToIf routine will attempt to merge the debug location associated with the two original copies of the hoisted instruction. This is a problem in the special case where the hoisted instruction is a debug info intrinsic, since for those the debug location is considered part of the intrinsic and attempting to modify it may resut in invalid IR. This is the underlying cause of PR36410. This patch fixes the problem by handling debug info intrinsics specially: instead of hoisting one copy and merging the two locations, the code now simply hoists both copies, each with its original location intact. Note that this is still only done in the case where both original copies are otherwise (i.e. apart from location metadata) identical. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D44312 llvm-svn: 327622	2018-03-15 12:28:48 +00:00
Fedor Sergeev	194a407bda	[New PM][IRCE] port of Inductive Range Check Elimination pass to the new pass manager There are two nontrivial details here: * Loop structure update interface is quite different with new pass manager, so the code to add new loops was factored out * BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization sequence (e.g. LoopSimplify) might invalidate BPI results. Complete solution for BPI will likely take some time to discuss and figure out, so for now this was partially solved by making BPI optional in IRCE (skipping a couple of profitability checks if it is absent). Most of the IRCE tests got their corresponding new-pass-manager variant enabled. Only two of them depend on BPI, both marked with TODO, to be turned on when BPI starts being available for loop passes. Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D43795 llvm-svn: 327619	2018-03-15 11:01:19 +00:00
Andrei Elovikov	f9b8035f3c	[LoopUnroll] Ignore ephemeral values when checking full unroll profitability. Summary: Before this patch call graph is like this in the LoopUnrollPass: tryToUnrollLoop ApproximateLoopSize collectEphemeralValues /* Use collected ephemeral values / computeUnrollCount analyzeLoopUnrollCost / Bail out from the analysis if loop contains CallInst / This patch moves collection of the ephemeral values to the tryToUnrollLoop function and passes the collected values into both ApproximateLoopsize (as before) and additionally starts using them in analyzeLoopUnrollCost: tryToUnrollLoop collectEphemeralValues ApproximateLoopSize(EphValues) / Use EphValues / computeUnrollCount(EphValues) analyzeLoopUnrollCost(EphValues) / Ignore ephemeral values - they don't contribute to the final cost / / Bail out from the analysis if loop contains CallInst */ Reviewers: mzolotukhin, evstupac, sanjoy Reviewed By: evstupac Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43931 llvm-svn: 327617	2018-03-15 09:59:15 +00:00
Sanjay Patel	e4e3f79b83	[InstSimplify] add tests for frem and vectors with undef; NFC These should all be folded. The vector tests need to have m_AnyZero updated to ignore undef elements, but we need to be careful not to return the existing value in that case and unintentionally propagate undef. llvm-svn: 327585	2018-03-14 22:45:58 +00:00
Philip Reames	0adbb19409	[EarlyCSE] Exploit open ended invariant.start scopes If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed. This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.) This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time. I took a quick look at doing so, but probably need some input from a MemorySSA expert. Differential Revision: https://reviews.llvm.org/D43716 llvm-svn: 327577	2018-03-14 21:35:06 +00:00
Sanjay Patel	11f7f9908b	[InstSimplify] fix folds for (0.0 - X) + X --> 0 (PR27151) As shown in: https://bugs.llvm.org/show_bug.cgi?id=27151 ...the existing fold could miscompile when X is NaN. The fold was also dependent on 'ninf' but that's not necessary. From IEEE-754 (with default rounding which we can assume for these opcodes): "When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0...However, x + x = x − (−x) retains the same sign as x even when x is zero." llvm-svn: 327575	2018-03-14 21:23:27 +00:00
Sanjay Patel	fd82fd000c	[InstSimplify] add tests to show missing/broken fadd folds (PR27151, PR26958); NFC llvm-svn: 327554	2018-03-14 18:52:40 +00:00
Sanjay Patel	e011d7964d	[InstSimplify] regenerate checks; NFC llvm-svn: 327553	2018-03-14 18:49:57 +00:00
Roman Lebedev	60d24445dd	[InstSimplify] [NFC] cast-unsigned-icmp-cmp-0.ll - don't run instcombine As disscussed in post-commit review of D44421, there is simply no reason to run instcombine on this testcase. llvm-svn: 327541	2018-03-14 17:59:12 +00:00
Roman Lebedev	978aae7614	[InstSimplify] [NFC] Add tests for peeking through unsigned FP casts for sign compares (PR36682) Summary: This pattern came up in PR36682 / D44390 https://bugs.llvm.org/show_bug.cgi?id=36682 https://reviews.llvm.org/D44390 https://godbolt.org/g/oKvT5H Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj \| alive-nj ]], for all the type combinations i checked (input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`) for the following `icmp` comparisons the `uitofp`+`bitcast`+`icmp` can be evaluated to a boolean: * `slt 0` * `sgt -1` I did not check vectors, but i'm guessing it's the same there. {F5889242} Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle). There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized. Reviewers: spatel, majnemer, efriedma, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D44421 llvm-svn: 327535	2018-03-14 17:31:08 +00:00
Roman Lebedev	6ab60358ca	[InstCombine] [NFC] Add tests for peeking through unsigned FP casts for zero-equality compares (PR36682) Summary: This pattern came up in PR36682 / D44390 https://bugs.llvm.org/show_bug.cgi?id=36682 https://reviews.llvm.org/D44390 https://godbolt.org/g/oKvT5H Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj \| alive-nj ]], for all the type combinations i checked (input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`) for the following `icmp` comparisons the `uitofp`+`bitcast` can be dropped: * `eq 0` * `ne 0` I did not check vectors, but i'm guessing it's the same there. {F5889189} Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle). There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized. Generated with {F5889196} Reviewers: spatel, majnemer, efriedma, arsenm Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D44416 llvm-svn: 327534	2018-03-14 17:31:03 +00:00
Hiroshi Yamauchi	e6a3dc7699	Simplify more cases of logical ops of masked icmps. Summary: For example, ((X & 255) != 0) && ((X & 15) == 8) -> ((X & 15) == 8). ((X & 7) != 0) && ((X & 15) == 8) -> false. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43835 llvm-svn: 327450	2018-03-13 21:13:18 +00:00
Brian M. Rzycki	252165b27a	[LazyValueInfo] PR33357 prevent infinite recursion on BinaryOperator Summary: It is possible for LVI to encounter instructions that are not in valid SSA form and reference themselves. One example is the following: %tmp4 = and i1 %tmp4, undef Before this patch LVI would recurse until running out of stack memory and crashed. This patch marks these self-referential instructions as Overdefined and aborts analysis on the instruction. Fixes https://bugs.llvm.org/show_bug.cgi?id=33357 Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar Reviewed by: dberlin Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34135 llvm-svn: 327432	2018-03-13 18:14:10 +00:00
Sanjay Patel	204edeca56	[InstCombine] fix fmul reassociation to avoid creating an extra fdiv This was supposed to be an NFC refactoring that will eventually allow eliminating the isFast() predicate, but there's a rare possibility that we would pessimize the code as shown in the test case because we failed to check 'hasOneUse()' properly. This version also removes an inefficiency of the old code; we would look for: (X * C) * C1 --> X * (C * C1) ...but that pattern is always handled by SimplifyAssociativeOrCommutative(). llvm-svn: 327404	2018-03-13 14:46:32 +00:00
Daniel Neilson	41e781d5f1	[SROA] Take advantage of separate alignments for memcpy source and destination Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. This allows us to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy calls that it creates, as well as to only change the alignment of a memcpy/memmove argument that it replaces. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960, rL325816 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: chandlerc, bollu, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D42974 llvm-svn: 327398	2018-03-13 14:25:33 +00:00
Eugene Leviant	6f42a2cd91	[Evaluator] Evaluate load/store with bitcast Differential revision: https://reviews.llvm.org/D43457 llvm-svn: 327381	2018-03-13 10:19:50 +00:00
Clement Courbet	9f0b3170bc	[MergeICmps] Make sure that the comparison only has one use. Summary: Fixes PR36557. Reviewers: trentxintong, spatel Subscribers: mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44083 llvm-svn: 327372	2018-03-13 07:05:55 +00:00
Serguei Katkov	bbfbf21ddc	Revert [SCEV] Fix isKnownPredicate It is a revert of rL327362 which causes build bot failures with assert like Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed. llvm-svn: 327363	2018-03-13 06:36:00 +00:00
Serguei Katkov	b05574c0d3	[SCEV] Fix isKnownPredicate IsKnownPredicate is updated to implement the following algorithm proposed by @sanjoy and @mkazantsev : isKnownPredicate(Pred, LHS, RHS) { Collect set S all loops on which either LHS or RHS depend. If S is non-empty a. Let PD be the element of S which is dominated by all other elements of S b. Let E(LHS) be value of LHS on entry of PD. To get E(LHS), we should just take LHS and replace all AddRecs that are attached to PD on with their entry values. Define E(RHS) in the same way. c. Let B(LHS) be value of L on backedge of PD. To get B(LHS), we should just take LHS and replace all AddRecs that are attached to PD on with their backedge values. Define B(RHS) in the same way. d. Note that E(LHS) and E(RHS) are automatically available on entry of PD, so we can assert on that. e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) && isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS)) Return true if Pred, L, R is known from ranges, splitting etc. } This is follow-up for https://reviews.llvm.org/D42417. Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy, mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43507 llvm-svn: 327362	2018-03-13 06:10:27 +00:00
Vlad Tsyrklevich	aab6000684	Reland r327041: [ThinLTO] Keep available_externally symbols live Summary: This change fixes PR36483. The bug was originally introduced by a change that marked non-prevailing symbols dead. This broke LowerTypeTests handling of available_externally functions, which are non-prevailing. LowerTypeTests uses liveness information to avoid emitting thunks for unused functions. Marking available_externally functions dead is incorrect, the functions are used though the function definitions are not. This change keeps them live, and lets the EliminateAvailableExternally/GlobalDCE passes remove them later instead. (Reland with a suspected fix for a unit test failure I haven't been able to reproduce locally) Reviewers: pcc, tejohnson Reviewed By: tejohnson Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D43690 llvm-svn: 327360	2018-03-13 05:08:48 +00:00
Sanjay Patel	a71f95404f	[InstCombine] add test to show fmul transform creates extra fdiv; NFC Also, move fmul reassociation tests to the same file as other fmul transforms. llvm-svn: 327342	2018-03-12 23:10:08 +00:00
Volkan Keles	4ecdb44a64	BlockExtractor: Don’t delete functions directly Blocks may have function calls, so don’t erase functions directly to avoid erasing a function that has a user. llvm-svn: 327340	2018-03-12 22:28:18 +00:00
Sanjay Patel	1dbb6b7282	[PatternMatch] enhance m_NaN() to ignore undef elements in vectors llvm-svn: 327339	2018-03-12 22:18:47 +00:00
Saleem Abdulrasool	8b342680bf	ObjCARC: teach the cloner about funclets In the case that the CallInst that is being moved has an associated operand bundle which is a funclet, the move will construct an invalid instruction. The new site will have a different token and needs to be reassociated with the new instruction. Unfortunately, there is no way to alter the bundle after the construction of the instruction. Replace the call instruction cloning with a custom helper to clone the instruction and reassociate the funclet token. llvm-svn: 327336	2018-03-12 21:46:09 +00:00
Sanjay Patel	5b034c83d6	[InstSimplify] add fcmp tests for constant NaN vector with undef elt; NFC llvm-svn: 327335	2018-03-12 21:44:17 +00:00
Sanjay Patel	a0d8d127c6	[PatternMatch, InstSimplify] allow undef elements when matching vector -0.0 This is the FP equivalent of D42818. Use it for the few cases in InstSimplify with -0.0 folds (that's the only current use of m_NegZero()). Differential Revision: https://reviews.llvm.org/D43792 llvm-svn: 327307	2018-03-12 18:17:01 +00:00
Roman Lebedev	92e359ed67	[InstCombine] [NFC] Add tests for peeking through FP casts for sign-bit compares (PR36682) Summary: This pattern came up in PR36682: https://bugs.llvm.org/show_bug.cgi?id=36682 https://godbolt.org/g/LhuD9A Tests for proposed fix in D44367. Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj \| alive-nj ]], for all the type combinations i checked (input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`) for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped: * `eq 0` * `ne 0` * `slt 0` * `sle 0` * `sge 0` * `sgt 0` * `slt 1` * `sge 1` * `sle -1` * `sgt -1` I did not check vectors, but i'm guessing it's the same there. {F5887419} Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle). There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized. Generated with {F5887551} Reviewers: spatel, majnemer, efriedma, arsenm Reviewed By: spatel Subscribers: nlopes, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D44390 llvm-svn: 327301	2018-03-12 17:43:02 +00:00
Sanjay Patel	49b7dc2968	[InstSimplify] add test for m_NegZero with undef elt; NFC llvm-svn: 327287	2018-03-12 15:47:32 +00:00
Eugene Leviant	19e238746b	[ThinLTO] Recommit of import global variables This wasreverted in r326638 due to link problems and fixed afterwards llvm-svn: 327254	2018-03-12 10:30:50 +00:00
Justin Lebar	24b6640b1b	Back out "Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions." This reverts r326908, originally landed as D44102. Reverted for causing performance regressions on x86. (These regressions are not yet understood.) llvm-svn: 327252	2018-03-12 09:26:09 +00:00
Serguei Katkov	a20e05bb94	[CGP] Fix the remove of matched phis in complex addressing mode When we replace the Phi we created with matched ones it is possible that there are two identical phi nodes in IR. And matcher is smart enough to find that new created phi matches both of them. So we try to replace our phi node with matched ones twice and what is bad we delete our phi node twice causing a crash. As soon as we found that we have two identical Phi nodes it makes sense to do a clean-up and replace one phi node by other one. The patch implements it. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43758 llvm-svn: 327250	2018-03-12 03:50:07 +00:00
Sanjay Patel	6ee46702d5	[InstCombine] add tests for casted sign-bit cmp (PR36682); NFC llvm-svn: 327243	2018-03-11 16:45:31 +00:00
Sanjay Patel	4222716822	[InstSimplify] fp_binop X, undef --> NaN The variable operand could be NaN, so it's always safe to propagate NaN. llvm-svn: 327212	2018-03-10 16:51:28 +00:00
Sanjay Patel	e5606b4fa5	[ConstantFold] fp_binop AnyConstant, undef --> NaN With the updated LangRef ( D44216 / rL327138 ) in place, we can proceed with more constant folding. I'm intentionally taking the conservative path here: no matter what the constant or the FMF, we can always fold to NaN. This is because the undef operand can be chosen as NaN, and in our simplified default FP env, nothing else happens - NaN just propagates to the result. If we find some way/need to propagate undef instead, that can be added subsequently. The tests show that we always choose the same quiet NaN constant (0x7FF8000000000000 in IR text). There were suggestions to improve that with a 'NaN' string token or not always print a 64-bit hex value, but those are independent changes. We might also consider setting/propagating the payload of NaN constants as an enhancement. Differential Revision: https://reviews.llvm.org/D44308 llvm-svn: 327208	2018-03-10 15:56:25 +00:00
Florian Hahn	a7dcfa746e	[PartialInlining] Use isInlineViable to detect constructs preventing inlining. Use isInlineViable to prevent inlining of functions with non-inlinable constructs, in case cost analysis is skipped. Reviewers: efriedma, sfertile, davide, davidxl Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D42846 llvm-svn: 327207	2018-03-10 14:53:44 +00:00
Ulrich Weigand	019dd2316d	Revert "[Debug] Retain both sets of debug intrinsics in HoistThenElseCodeToIf" This reverts commit r327175 as problems in debug info generation were shown. llvm-svn: 327176	2018-03-09 22:00:10 +00:00
Ulrich Weigand	fa4e63c0d6	[Debug] Retain both sets of debug intrinsics in HoistThenElseCodeToIf When hoisting common code from the "then" and "else" branches of a condition to before the "if", there is no need to require that debug intrinsics match before moving them (and merging them). Instead, we can simply always keep all debug intrinsics from both sides of the "if". This fixes PR36410, which describes a problem where as a result of the attempt to merge debug locations for two debug intrinsics we end up with an invalid intrinsic, where the scope indicated in the !dbg location no longer matches the scope of the variable tracked by the intrinsic. In addition, this has the benefit that we no longer throw away information that is actually still valid, helping to generate better debug data. Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D44312 llvm-svn: 327175	2018-03-09 21:37:07 +00:00
Peter Collingbourne	2974856ad4	Use branch funnels for virtual calls when retpoline mitigation is enabled. The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the branch predictor, and as a result it can lead to a measurable loss of performance. We can reduce the performance impact of retpolined virtual calls by replacing them with a special construct known as a branch funnel, which is an instruction sequence that implements virtual calls to a set of known targets using a binary tree of direct branches. This allows the processor to speculately execute valid implementations of the virtual function without allowing for speculative execution of of calls to arbitrary addresses. This patch extends the whole-program devirtualization pass to replace certain virtual calls with calls to branch funnels, which are represented using a new llvm.icall.jumptable intrinsic. It also extends the LowerTypeTests pass to recognize the new intrinsic, generate code for the branch funnels (x86_64 only for now) and lay out virtual tables as required for each branch funnel. The implementation supports full LTO as well as ThinLTO, and extends the ThinLTO summary format used for whole-program devirtualization to support branch funnels. For more details see RFC: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html Differential Revision: https://reviews.llvm.org/D42453 llvm-svn: 327163	2018-03-09 19:11:44 +00:00
Renato Golin	bc94b98c44	[LV] Adding test for r327109 llvm-svn: 327155	2018-03-09 18:02:36 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Chad Rosier	95d9ccb2a0	[JumpThreading] Don't restrict cast-traversal to i1 In r263618, JumpThreading learned to look trough simple cast instructions, but only if the source of those cast instructions was a phi/cmp i1 (in an effort to limit compile time effects). I think this condition is too restrictive. For switches with limited value range, InstCombine will readily introduce an extra trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in the somewhat perverse situation that jump-threading would work before running instcombine, but not after. Since instcombine produces this pattern, I think we need to consider it canonical and support it in JumpThreading. In general, for limiting recursion, I think the existing restriction to phi and cmp nodes should be sufficient to avoid looking through unprofitable chains of instructions. Patch by Keno Fischer! Differential Revision: https://reviews.llvm.org/D42262 llvm-svn: 327150	2018-03-09 16:43:46 +00:00
Sanjay Patel	3675b8cece	[InstSimplify] fix FP infinite hex constant values in tests; NFC Really should improve this... llvm-svn: 327144	2018-03-09 16:14:02 +00:00
Stefan Pintilie	ef7c4976bb	Revert "[PowerPC] LSR tunings for PowerPC" Revert the rest of the LST tune commit. It seems that the LSR tune commit breaks internal tests. Reverting the commit. llvm-svn: 327143	2018-03-09 16:08:55 +00:00
Stefan Pintilie	7f879a8467	Revert "[PowerPC] Move test to correct location." Revert part of the LSR tune commit. llvm-svn: 327142	2018-03-09 16:08:48 +00:00
Eric Christopher	3caa0fd050	Revert "[ThinLTO] Keep available_externally symbols live" This reverts commit r327041 and the followup attempts at fixing the testcase as they're still failing. llvm-svn: 327094	2018-03-09 01:25:18 +00:00
Adrian Prantl	5b477be72a	LowerDbgDeclare: ignore dbg.declares for allocas with volatile access There is no point in lowering a dbg.declare describing an alloca that has volatile loads or stores as users, since the alloca cannot be elided. Lowering the dbg.declare will result in larger debug info that may also have worse coverage than just describing the alloca. rdar://problem/34496278 llvm-svn: 327092	2018-03-09 00:45:04 +00:00
Sanjay Patel	ee770e9c4e	[Reassociate] fix test to be independent of FP undef llvm-svn: 327071	2018-03-08 22:05:27 +00:00
Sanjay Patel	2ee7b9349d	[ConstantFold] fp_binop undef, undef --> undef These are uncontroversial and independent of a proposed LangRef edits (D44216). I tried to fix tests that would fold away: rL327004 rL327028 rL327030 rL327034 I'm not sure if the Reassociate tests are meaningless yet, but they probably will be as we add more folds, so if anyone has suggestions or wants to fix those, please do. Differential Revision: https://reviews.llvm.org/D44258 llvm-svn: 327058	2018-03-08 20:42:49 +00:00
Vlad Tsyrklevich	c9a1a6e964	Specify that test from r327041 requires asserts llvm-svn: 327051	2018-03-08 19:46:19 +00:00
Vlad Tsyrklevich	f6337d3a73	Fix test failure introduced in r327041 The "Assertion: `...' failed" error message format is not identical across platforms. llvm-svn: 327047	2018-03-08 19:20:08 +00:00
Vlad Tsyrklevich	7b66ef1036	[ThinLTO] Keep available_externally symbols live Summary: This change fixes PR36483. The bug was originally introduced by a change that marked non-prevailing symbols dead. This broke LowerTypeTests handling of available_externally functions, which are non-prevailing. LowerTypeTests uses liveness information to avoid emitting thunks for unused functions. Marking available_externally functions dead is incorrect, the functions are used though the function definitions are not. This change keeps them live, and lets the EliminateAvailableExternally/GlobalDCE passes remove them later instead. I've also enabled EliminateAvailableExternally for all optimization levels, I believe it being disabled for O1 was an oversight. Reviewers: pcc, tejohnson Reviewed By: tejohnson Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D43690 llvm-svn: 327041	2018-03-08 18:48:03 +00:00
Sanjay Patel	31051f8314	[InstCombine] add min/max tests with not ops; NFC These are based on: https://bugs.llvm.org/show_bug.cgi?id=35875 It's not clear if/how instcombine can reduce these, but we should have the tests here either way to document current behavior. llvm-svn: 327039	2018-03-08 18:34:23 +00:00
Sanjay Patel	788a4336cd	[StructurizeCFG] fix test to be independent of FP undef llvm-svn: 327028	2018-03-08 17:13:57 +00:00
Sanjay Patel	e755bf87a5	[StructurizeCFG] auto-generate full checks; NFC Not sure what the intent of this test is, but this will change when we fix FP undef constant folding. llvm-svn: 327022	2018-03-08 16:25:37 +00:00
Sanjay Patel	faf9b0f322	[InstCombine] regenerate checks; NFC We may not need any of these tests after rL327012, but leaving them here for now until that's confirmed. llvm-svn: 327014	2018-03-08 15:46:38 +00:00
Sanjay Patel	a87b74f72b	[InstSimplify] add more tests for FP undef; NFC llvm-svn: 327012	2018-03-08 15:39:39 +00:00
Sanjay Patel	4ad3c32144	[InstCombine, NewGVN] remove FP undef from tests I'm trying to preserve the intent of these tests by using non-undef operands; if we fix FP undef folding these tests will not pass. llvm-svn: 327004	2018-03-08 14:57:08 +00:00
Stefan Pintilie	f8c2dce236	[PowerPC] Move test to correct location. Test was added in r326906 to an incorrect location. Moving the test to PPC CodeGen directory as the test is PPC specific. llvm-svn: 326923	2018-03-07 18:27:10 +00:00
Justin Lebar	eccfbf1bcd	Re-land: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions. Summary: If the operands of a udiv/urem can be proved to fit within a smaller power-of-two-sized type, reduce the width of the udiv/urem. Backed out for failing an assert in clang bootstrap builds. Re-landing with a fix for handling non-power-of-two inputs (e.g. udiv i24). Original Differential Revision: https://reviews.llvm.org/D44102 llvm-svn: 326908	2018-03-07 16:56:49 +00:00
Stefan Pintilie	f8438e8e59	[PowerPC] LSR tunings for PowerPC The purpose of this patch is to have LSR generate better code on Power. This is done by overriding isLSRCostLess. Differential Revision: https://reviews.llvm.org/D40855 llvm-svn: 326906	2018-03-07 16:53:09 +00:00
Justin Lebar	eeeb0eb049	Revert rL326898: "Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions." Breaks bootstrap builds: clang built with this patch asserts while building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. llvm-svn: 326900	2018-03-07 16:05:43 +00:00
Justin Lebar	cb9e89c39b	Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions. Summary: If the operands of a udiv/urem can be proved to fit within a smaller power-of-two-sized type, reduce the width of the udiv/urem. Reviewers: spatel, sanjoy Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D44102 llvm-svn: 326898	2018-03-07 15:11:13 +00:00
Sven van Haastregt	19f531d31e	[LoadStoreVectorizer] Differentiate between <1 x T> and T The LoadStoreVectorizer thought that <1 x T> and T were the same types when merging stores, leading to a crash later. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D44014 llvm-svn: 326884	2018-03-07 10:29:28 +00:00
Evgeny Stupachenko	204ade4102	Add early exit on reassociation of 0 expression. Summary: Before the patch a try to reassociate ((v * 16) * 0) * 1 fall into infinite loop Reviewers: pankajchawla Differential Revision: http://reviews.llvm.org/D41467 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326861	2018-03-07 02:17:08 +00:00
Sebastian Pop	bf6e1c26cf	DA: remove uses of GEP, only ask SCEV It's been quite some time the Dependence Analysis (DA) is broken, as it uses the GEP representation to "identify" multi-dimensional arrays. It even wrongly detects multi-dimensional arrays in single nested loops: from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6 ;; for (long int i = 0; i < 50; i++) { ;; A[i][3i - 6] = i; ;; B++ = A[i][i]; DA used to detect two subscripts, which makes no sense in the LLVM IR or in C/C++ semantics, as there are no guarantees as in Fortran of subscripts not overlapping into a next array dimension: maximum nesting levels = 1 SrcPtrSCEV = %A DstPtrSCEV = %A using GEPs subscript 0 src = {0,+,1}<nuw><nsw><%for.body> dst = {0,+,1}<nuw><nsw><%for.body> class = 1 loops = {1} subscript 1 src = {-6,+,3}<nsw><%for.body> dst = {0,+,1}<nuw><nsw><%for.body> class = 1 loops = {1} Separable = {} Coupled = {1} With the current patch, DA will correctly work on only one dimension: maximum nesting levels = 1 SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body> DstSCEV = {%A,+,404}<%for.body> subscript 0 src = {(-2424 + %A)<nsw>,+,1212}<%for.body> dst = {%A,+,404}<%for.body> class = 1 loops = {1} Separable = {0} Coupled = {} This change removes all uses of GEP from DA, and we now only rely on the SCEV representation. The patch does not turn on -da-delinearize by default, and so the DA analysis will be more conservative in the case of multi-dimensional memory accesses in nested loops. I disabled some interchange tests, as the DA is not able to disambiguate the dependence anymore. To make DA stronger, we may need to compute a bound on the number of iterations based on the access functions and array dimensions. The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to avoid checking for snippets of LLVM IR: this form of checking is very hard to maintain. Instead, we now check for output of the pass that are more meaningful than dozens of lines of LLVM IR. Some tests now require -debug messages and thus only enabled with asserts. Patch written by Sebastian Pop and Aditya Kumar. Differential Revision: https://reviews.llvm.org/D35430 llvm-svn: 326837	2018-03-06 21:55:59 +00:00
Sanjay Patel	ed2211d50f	[PatternMatch] define m_Not using m_Xor and cst_pred_ty Using cst_pred_ty in the definition allows us to match vectors with undef elements. This is a continuation of an effort to make all pattern matchers allow undef elements in vectors: rL325437 rL325466 D43792 Differential Revision: https://reviews.llvm.org/D44076 llvm-svn: 326823	2018-03-06 18:19:42 +00:00
Florian Hahn	517dc51c48	[CallSiteSplitting] Do not crash when BB's terminator changes. Change doCallSiteSplitting to iterate until we reach the terminator instruction. tryToSplitCallSite can replace BB's terminator in case BB is a successor of itself. Then IE will be invalidated and we also have to check the current terminator. Reviewers: junbuml, davidxl, davide, fhahn Reviewed By: fhahn, junbuml Differential Revision: https://reviews.llvm.org/D43824 llvm-svn: 326793	2018-03-06 14:00:58 +00:00
Daniel Neilson	82daad31fe	[RewriteStatepoints] Fix stale parse points Summary: RewriteStatepointsForGC collects parse points for further processing. During the collection if a callsite is found in an unreachable block (DominatorTree::isReachableFromEntry()) then all unreachable blocks are removed by removeUnreachableBlocks(). Some of the removed blocks could have been reachable according to DominatorTree::isReachableFromEntry(). In this case the collected parse points became stale and resulted in a crash when accessed. The fix is to unconditionally canonicalize the IR to removeUnreachableBlocks and then collect the parse points. The added test crashes with the old version and passes with this patch. Patch by Yevgeny Rouban! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D43929 llvm-svn: 326748	2018-03-05 22:27:30 +00:00
Alexey Bataev	625ce229b1	[SLP] Additional tests for stores vectorization, NFC. llvm-svn: 326740	2018-03-05 20:20:12 +00:00
Daniel Neilson	bdda115e19	[InstCombine] Don't blow up in foldICmpWithCastAndCast on vector icmp instructions. Summary: Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is only invoked with icmp instructions of integer type. If that assumption is broken, and it is called with an icmp of vector type, then it fails (asserts/crashes). This patch addresses the deficiency. It allows it to simplify icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs, much as is done when the type is integer. Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44063 llvm-svn: 326730	2018-03-05 18:05:51 +00:00
Craig Topper	8452faceae	[InstCombine] Add constant vector support to getMinimumFPType for visitFPTrunc. This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops. Differential Revision: https://reviews.llvm.org/D43774 llvm-svn: 326729	2018-03-05 18:04:12 +00:00
Florian Hahn	0b7c6422fb	[IPSCCP] Add getCompare which returns either true, false, undef or null. getCompare returns true, false or undef constants if the comparison can be evaluated, or nullptr if it cannot. This is in line with what ConstantExpr::getCompare returns. It also allows us to use ConstantExpr::getCompare for comparing constants. Reviewers: davide, mssimpso, dberlin, anna Reviewed By: davide Differential Revision: https://reviews.llvm.org/D43761 llvm-svn: 326720	2018-03-05 17:33:50 +00:00
Xin Tong	8345c0e3a5	[MergeICmp] We can discard initial blocks that do other work Summary: We can discard initial blocks that do other work We do not need to limit ourselves to just the first block in the chain. Reviewers: courbet, davide Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44029 llvm-svn: 326698	2018-03-05 13:54:47 +00:00
Fedor Indutny	f9e09c1dd0	[CallSiteSplitting] properly split musttail calls Summary: `musttail` calls can't be naively splitted. The split blocks must include not only the call instruction itself, but also (optional) `bitcast` and `return` instructions that follow it. Clone `bitcast` and `ret`, place them into the split blocks, and remove the tail block when done. Reviewers: junbuml, mcrosier, davidxl, davide, fhahn Reviewed By: fhahn Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D43729 llvm-svn: 326666	2018-03-03 21:40:14 +00:00
Sanjay Patel	9119b844a3	[InstCombine] add test for vectors with undef elts; NFC llvm-svn: 326661	2018-03-03 18:00:15 +00:00
Sanjay Patel	1a8d5c3d1f	[InstCombine] (~X) - (~Y) --> Y - X llvm-svn: 326660	2018-03-03 17:53:25 +00:00
Sanjay Patel	73eb2d2555	[InstCombine] add tests for notnotsub; NFC As shown in D44043, we may need this fold in the backend, but it's also missing in the IR optimizer. llvm-svn: 326659	2018-03-03 17:20:37 +00:00
Chandler Carruth	a4619d9944	[ThinLTO] Revert r325320: Import global variables This caused some links to fail with ThinLTO due to missing symbols as well as causing some binaries to have failures at runtime. We're working with the author to get a test case, but want to get the tree green again. Further, it appears to introduce a data race. While the test usage of threads was disabled in r325361 & r325362, that isn't an acceptable fix. I've reverted both of these as well. This code needs to be thread safe. Test cases for this are already on the original commit thread. llvm-svn: 326638	2018-03-02 23:40:08 +00:00
Sanjay Patel	2fd0acf05a	[InstCombine] partly fix FMF for fmul+log2 fold The code was checking that all of the instructions in the sequence are 'fast', but that's not necessary. The final multiply is all that we need to check (tests adjusted). The fmul doesn't need to be fully 'fast' either, but that can be another patch. llvm-svn: 326608	2018-03-02 20:32:46 +00:00
Sanjay Patel	bb7228703a	[InstCombine] add tests for rL169025; NFC This narrow fold was added with no motivation or test cases a bit over 5 years ago. Removing a constant operand is a good canonicalization? We should handle Y*2.0 too then? llvm-svn: 326606	2018-03-02 19:26:13 +00:00
Craig Topper	18799f4c07	[InstCombine] Allow fptrunc (fpext X)) to be reduced to a single fpext/ftrunc If we are only truncating bits from the extend we should be able to just use a smaller extend. If we are truncating more than the extend we should be able to just use a fptrunc since the presense of the fpextend shouldn't affect rounding. Differential Revision: https://reviews.llvm.org/D43970 llvm-svn: 326595	2018-03-02 18:16:51 +00:00
Yaxun Liu	3c42f1c3c9	LoopUnroll: respect pragma unroll when AllowRemainder is disabled Currently when AllowRemainder is disabled, pragma unroll count is not respected even though there is no remainder. This bug causes a loop fully unrolled in many cases even though the user specifies a unroll count. Especially it affects OpenCL/CUDA since in many cases a loop contains convergent instructions and currently AllowRemainder is disabled for such loops. Differential Revision: https://reviews.llvm.org/D43826 llvm-svn: 326585	2018-03-02 16:22:32 +00:00
Clement Courbet	c9119b3b6a	[MergeICmps] Revert accidentally submitted failing test case. Reverts r326574. llvm-svn: 326582	2018-03-02 14:53:33 +00:00
Clement Courbet	1de4a183d4	[MergeIcmps] Add the test case from PR36557. Summary: See PR36557. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44009 llvm-svn: 326574	2018-03-02 14:34:39 +00:00
Craig Topper	8cb5fc15ee	[InstCombine] Add more test case to fpextend.ll. This includes the test cases from D43970 and additional tests for combining (fptrunc (binop (fpext), (fpext))) where the pre-extended types don't match the trunc and therefore can't be completely removed. llvm-svn: 326528	2018-03-02 01:34:42 +00:00
Fedor Indutny	1571b1271e	[ArgumentPromotion] don't break musttail invariant PR36543 Summary: Do not break musttail invariant by promoting arguments of musttail callee or caller. Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv, fhahn, rnk Reviewed By: rnk Subscribers: rnk, llvm-commits Differential Revision: https://reviews.llvm.org/D43926 llvm-svn: 326521	2018-03-02 00:59:27 +00:00
Craig Topper	113446ca37	[InstCombine] Simplify test cases by removing loads/stores that aren't required for what is being tested. The loads and stores were getting the data and storing the results. There's no reason we can't just use function arguments and return. llvm-svn: 326515	2018-03-02 00:27:44 +00:00
Sanjay Patel	d0cdb2f861	[InstCombine] allow fmul fold with less than 'fast' This is a retry of r326502 with updates to the reassociate test file that I missed the first time. @test15_reassoc in the supposed -reassociate test file (except that it tests 2 other passes too...) shows that there's no clear responsiblity for reassociation transforms. Instcombine now gets that case, but only because the constant values are identical. Otherwise, it would still miss that pattern. Reassociate doesn't get that case because it hasn't been updated to use less than 'fast' FMF. llvm-svn: 326513	2018-03-02 00:14:51 +00:00
Sanjay Patel	f2664d0663	[Reassociate] regenerate checks; NFC llvm-svn: 326511	2018-03-01 23:41:03 +00:00
Sanjay Patel	eb5d046890	revert r326502: [InstCombine] allow fmul fold with less than 'fast' I forgot that I added tests for 'reassoc' to -reassociate, but suprisingly that file calls -instcombine too, so it is affected. I'll update that file and try again. llvm-svn: 326510	2018-03-01 23:39:24 +00:00
Sanjay Patel	7373ae5c9a	[InstCombine] allow fmul fold with less than 'fast' llvm-svn: 326502	2018-03-01 22:53:47 +00:00
Craig Topper	f1a7c6755d	[InstCombine] Auto-generate complete checks. NFC llvm-svn: 326474	2018-03-01 20:05:07 +00:00
Sanjay Patel	d696e93cb6	[InstCombine] remove stale comments for tests; NFC llvm-svn: 326448	2018-03-01 16:28:32 +00:00
Sanjay Patel	3fd43a843b	[InstCombine] move/add tests for fmul reassociation; NFC This transform may be out-of-scope for instcombine, but this is only documenting the current behavior. llvm-svn: 326442	2018-03-01 15:30:44 +00:00
Sanjay Patel	237e52674f	[InstCombine] auto-generate full checks; NFC llvm-svn: 326440	2018-03-01 15:13:42 +00:00
Max Kazantsev	f8d2969abb	[SCEV] Smart range calculation for SCEVUnknown Phis The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418	2018-03-01 06:56:48 +00:00
Reid Kleckner	3762a089d7	[IPSCCP] do not break musttail invariant (PR36485) Do not replace results of `musttail` calls with a constant if the call itself can't be removed. Do not zap returns of `musttail` callees, if the call site can't be removed and replaced with a constant. Do not zap returns of `musttail`-calling blocks, this breaks invariant too. Patch by Fedor Indutny Differential Revision: https://reviews.llvm.org/D43695 llvm-svn: 326404	2018-03-01 01:19:18 +00:00
Reid Kleckner	cb9611ca67	[DAE] don't remove args of musttail target/caller `musttail` requires identical signatures of caller and callee. Removing arguments breaks `musttail` semantics. PR36441 Patch by Fedor Indutny Differential Revision: https://reviews.llvm.org/D43708 llvm-svn: 326394	2018-03-01 00:09:35 +00:00
Sanjay Patel	eaf5a120ed	[InstCombine] simplify code for X * -1.0 --> -X; NFC I've added random FMF to one of the tests to show those are propagated. llvm-svn: 326377	2018-02-28 22:30:04 +00:00
Jonas Devlieghere	9ca064552a	[GlobalOpt] don't change CC of musttail calle(e\|r) When the function has musttail call - its cc is fixed to be equal to the cc of the musttail callee. In such case (and in the case of the musttail callee), GlobalOpt should not change the cc to fastcc as it will break the invariant. This fixes PR36546 Patch by: Fedor Indutny (indutny) Differential revision: https://reviews.llvm.org/D43859 llvm-svn: 326376	2018-02-28 22:28:44 +00:00
Sanjay Patel	356e77f550	[InstCombine] auto-generate complete checks; NFC llvm-svn: 326331	2018-02-28 16:53:45 +00:00
Xin Tong	8ba674e43b	[MergeICmp] Fix a bug in MergeICmp that can lead to a block being processed more than once. Summary: Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module. The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block is processed and second time when the last block is processed. We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module. Reviewers: courbet, davide Reviewed By: courbet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43825 llvm-svn: 326318	2018-02-28 12:08:00 +00:00
Mohammad Shahid	ddeee12f59	[SLP] Added new tests and updated existing for jumbled load, NFC. llvm-svn: 326303	2018-02-28 04:19:34 +00:00
Sanjay Patel	bf28a8fc01	[InstSimplify] add tests for FP with undef operand; NFC Are any of these correct? llvm-svn: 326241	2018-02-27 20:17:18 +00:00
Craig Topper	301991080e	[ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to look through ExtractElement. This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element. Differential Revision: https://reviews.llvm.org/D43789 llvm-svn: 326237	2018-02-27 19:53:45 +00:00
Sanjay Patel	8529dd5ee1	[ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221	2018-02-27 18:33:24 +00:00
Sanjay Patel	04d1d79ee5	[AArch64] add SLP test based on TSVC; NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217	2018-02-27 18:06:15 +00:00
Florian Hahn	1807c516c7	[NewGVN] Update phi-of-ops def block when updating existing ValuePHI. In case we update a ValuePHI node created earlier, we could update it based on a different OpPHI which could be in a different block. We need to update the TempToBlock mapping reflecting the new block, otherwise we would end up placing the new phi node in a wrong block. This problem is exposed by the test case in https://bugs.llvm.org/show_bug.cgi?id=36504. This patch fixes a slightly simpler problem than in the bug report. In the bug's re-producer, the additional problem is that we are re-using a ValuePHI node with to few incoming values for the new OpPHI. If this patch makes sense, I will follow it up with a patch that creates a new PHI node if the existing PHI node has a different number of incoming values. Reviewers: davide, dberlin Reviewed By: dberlin Differential Revision: https://reviews.llvm.org/D43770 llvm-svn: 326181	2018-02-27 09:34:51 +00:00
Adam Nemet	b424cd5d61	Make test agnostic to cost model This was causing bot failures on greendragon llvm-svn: 326169	2018-02-27 05:41:16 +00:00
Evgeny Stupachenko	f1c058d99b	Fix r326154 buildbots test fail Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158	2018-02-27 01:33:11 +00:00
Evgeny Stupachenko	a732611ac8	Fix PR36032, PR35432 Summary: The change fix an assert fail at ScalarEvolutionExpander.cpp: assert(ExitCount != SE.getCouldNotCompute() && "Invalid loop count"); Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D42604 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326154	2018-02-27 00:17:31 +00:00
Sanjay Patel	66911b16e6	[InstCombine, InstSimplify] add tests with undef elements in constant FP vectors; NFC llvm-svn: 326148	2018-02-26 23:23:02 +00:00
Craig Topper	69c8972fd1	[ValueTracking] Teach cannotBeOrderedLessThanZeroImpl to handle vector constants. Summary: This allows vector fabs to be removed in more cases. Reviewers: spatel, arsenm, RKSimon Reviewed By: spatel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D43739 llvm-svn: 326138	2018-02-26 22:33:17 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
Alexey Bataev	b44e2b75e8	[SLP] Added new test + fixed some checks, NFC. llvm-svn: 326117	2018-02-26 20:01:24 +00:00
Craig Topper	43fb1cdef7	[InstCombine] Add test cases with vector constants to fpextend.ll llvm-svn: 326115	2018-02-26 19:36:37 +00:00
Craig Topper	b284e8b9b4	[InstCombine] Switch to using FileCheck instead of grep. Auto-generate checks. NFC llvm-svn: 326114	2018-02-26 19:36:36 +00:00
Sanjay Patel	31a90468e1	[InstCombine] allow fdiv folds with less than fully 'fast' ops Note: gcc appears to allow this fold with -freciprocal-math alone, but clang/llvm require more than that with this patch. The wording in the definitions seems fuzzy enough that it could go either way, but we'll err on the conservative side of FMF interpretation. This patch also changes the newly created fmul to have FMF propagated by the last fdiv rather than intersecting the FMF of the fdivs. This matches the behavior of other folds near here. The new fmul is only used to produce an intermediate op for the final fdiv result, so it shouldn't be any stricter than that result. The previous behavior could result in dropping FMF via other folds in instcombine or CSE. Differential Revision: https://reviews.llvm.org/D43398 llvm-svn: 326098	2018-02-26 16:02:45 +00:00
Renato Golin	9d1b2acaaa	[LV] Move isLegalMasked* functions from Legality to CostModel All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079	2018-02-26 11:06:36 +00:00
Florian Hahn	ed45836253	[LoopInterchange] Add test case for D43236. llvm-svn: 326078	2018-02-26 10:46:25 +00:00
Craig Topper	aee341ef28	[InstSimplify] Add test cases for removal of vector fabs on known positive. llvm-svn: 326050	2018-02-25 06:51:52 +00:00
Craig Topper	2b8f051aaa	[InstSimplify] Remove unused parameter from test cases. llvm-svn: 326049	2018-02-25 06:51:51 +00:00
Adam Nemet	e4e1de60aa	Revert "StructurizeCFG: Test for branch divergence correctly" This reverts commit r325881. Breaks many bots llvm-svn: 326037	2018-02-24 17:29:09 +00:00
Sanjay Patel	db53d1847b	[InstSimplify] sqrt(X) * sqrt(X) --> X This was misplaced in InstCombine. We can loosen the FMF as a follow-up step. llvm-svn: 325965	2018-02-23 22:20:13 +00:00
Sanjay Patel	d32104e1b2	[InstCombine] allow fmul-sqrt folds with less than full -ffast-math Also, add a Builder method for intrinsics to reduce code duplication for clients. llvm-svn: 325960	2018-02-23 21:16:12 +00:00
Matt Davis	708271849a	[Test] Fix the test to output to /dev/null instead of redirecting. The redirection was confusing the windows build machine. llvm-svn: 325937	2018-02-23 19:03:04 +00:00
Matt Davis	523c656e25	[Debug] Add dbg.value intrinsics for PHIs created during LCSSA. Summary: This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: ``` int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } ``` In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Reviewers: mzolotukhin, aprantl, vsk, davide Reviewed By: aprantl, vsk Subscribers: dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 325926	2018-02-23 17:38:27 +00:00
Nicolai Haehnle	43c1115cd4	StructurizeCFG: Test for branch divergence correctly Summary: This fixes cases like the new test @nonuniform. In that test, %cc itself is a uniform value; however, when reading it after the end of the loop in basic block %if, its value is effectively non-uniform. This problem was encountered in https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change in itself is not sufficient to fix that bug, as there is another issue in the AMDGPU backend. Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4 Reviewers: arsenm, rampitec, jlebar Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D40546 llvm-svn: 325881	2018-02-23 10:45:46 +00:00
Bjorn Steinbrink	983d6c3f18	Mark MergedLoadStoreMotion as not preserving MemDep results Summary: MemDep caches results that signify that a dependence is non-local, and there is currently no way to invalidate such cache entries. Unfortunately, when MLSM sinks a store that can result in a non-local dependence becoming a local one, and then MemDep gives wrong answers. The easiest way out here is to just say that MLSM does indeed not preserve MemDep results. Reviewers: davide, Gerolf Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43177 llvm-svn: 325880	2018-02-23 10:41:57 +00:00
Daniel Neilson	20c9207be3	[AlignmentFromAssumptions] Set source and dest alignments of memory intrinsiscs separately Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of MemoryIntrinsic in favour of getting/setting source & dest specific alignments through the new API. This allows us to simplify some of the code in this pass and also be more aggressive about setting the source and destination alignments separately. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: hfinkel, bollu, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D43081 llvm-svn: 325816	2018-02-22 18:55:59 +00:00
Luke Cheeseman	6c1e6bbe0c	[FunctionAttrs][ArgumentPromotion][GlobalOpt] Disable some optimisations passes for naked functions - Fix for bug 36078. - Prevent the functionattrs, function-attrs, globalopt and argpromotion passes from changing naked functions. - These passes can perform some alterations to the functions that should not be applied. An example is removing parameters that are seemingly not used because they are only referenced in the inline assembly. Another example is marking the function as fastcc. llvm-svn: 325788	2018-02-22 14:42:08 +00:00
Sanjay Patel	92b7371113	[InstCombine] add fmul multi-use test; NFC Also, rename tests to make their intent clearer. llvm-svn: 325785	2018-02-22 14:27:16 +00:00
Simon Pilgrim	864949d5e9	[SLPVectorizer][X86] Add load extend tests (PR36091) llvm-svn: 325772	2018-02-22 12:19:34 +00:00
Sanjay Patel	9befaeb582	[InstCombine] add some random FMF to tests so we know it's not dropped; NFC llvm-svn: 325734	2018-02-21 22:48:28 +00:00
Sanjay Patel	d53da082a0	[AArch64] fix IR names to not be 'tmp' because that gives the CHECK script problems llvm-svn: 325718	2018-02-21 20:48:14 +00:00
Sanjay Patel	ffe51e450f	[AArch64] add SLP test for matmul (PR36280); NFC This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 325717	2018-02-21 20:34:16 +00:00
Alexey Bataev	650f639d33	[LV] Fix test checks, NFC llvm-svn: 325699	2018-02-21 16:48:23 +00:00
Alexey Bataev	cdd0675ddc	[SLP] Fix test checks, NFC. llvm-svn: 325689	2018-02-21 15:32:58 +00:00
Silviu Baranga	10ad93c6bf	[SCEV] Temporarily disable loop versioning for the purpose of turning SCEVUnknowns of PHIs into AddRecExprs. This feature is now hidden behind the -scev-version-unknown flag. Fixes PR36032 and PR35432. llvm-svn: 325687	2018-02-21 15:20:32 +00:00
Vedant Kumar	56492f9177	[BDCE] Salvage debug info from dying insts This results in 15 additional unique source variables in a stage2 build of FileCheck (at '-Os -g'), with a negligible increase in the size of the .debug_loc section. llvm-svn: 325660	2018-02-21 01:55:33 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Sanjay Patel	6f716a7c5e	[InstCombine] C / -X --> -C / X We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. This is similar to rL325648. llvm-svn: 325649	2018-02-21 00:01:45 +00:00
Sanjay Patel	d8dd0151fc	[InstCombine] -X / C --> X / -C for FP We already do this in DAGCombiner, but it should also be good to eliminate the fsub use in IR. llvm-svn: 325648	2018-02-20 23:51:16 +00:00
Sanjay Patel	8357371861	[InstCombine] add tests for fdiv with negated op and constant op; NFC llvm-svn: 325644	2018-02-20 23:34:43 +00:00
Sanjay Patel	3e569ac0cc	[PatternMatch] allow vector matches with m_FNeg llvm-svn: 325642	2018-02-20 23:29:05 +00:00
Sanjoy Das	737fa40ffa	[DSE] Don't DSE stores that subsequent memmove calls read from Summary: We used to remove the first memmove in cases like this: memmove(p, p+2, 8); memmove(p, p+2, 8); which is incorrect. Fix this by changing isPossibleSelfRead to what was most likely the intended behavior. Historical note: the buggy code was added in https://reviews.llvm.org/rL120974 to address PR8728. Reviewers: rsmith Subscribers: mcrosier, llvm-commits, jlebar Differential Revision: https://reviews.llvm.org/D43425 llvm-svn: 325641	2018-02-20 23:19:34 +00:00
Sanjay Patel	4f65e0d008	[InstCombine] auto-generate full checks; NFC llvm-svn: 325639	2018-02-20 23:08:47 +00:00
Sanjay Patel	088f4690f5	[InstCombine] add test for vector -X/-Y; NFC m_FNeg doesn't match vector types. llvm-svn: 325637	2018-02-20 22:46:38 +00:00
Benjamin Kramer	1516dd70bb	Fix broken test from r325630. llvm-svn: 325634	2018-02-20 22:30:16 +00:00
Benjamin Kramer	fd0630665b	[MemoryBuiltins] Check nobuiltin status when identifying calls to free. This is usually not a problem because this code's main purpose is eliminating unused new/delete pairs. We got deletes of nullptr or nobuiltin deletes of builtin new wrong though. llvm-svn: 325630	2018-02-20 22:00:33 +00:00
Sanjay Patel	e29caaa9c5	[PatternMatch] enhance m_SignMask() to ignore undef elements in vectors llvm-svn: 325623	2018-02-20 21:02:40 +00:00
Sanjay Patel	ff7b777bbe	[InstSimplify] add tests for m_SignMask with undef vector elements; NFC llvm-svn: 325622	2018-02-20 20:53:35 +00:00
Alexey Bataev	42bcec7d38	[LV] Fix test checks, NFC. llvm-svn: 325617	2018-02-20 19:49:25 +00:00
Alexey Bataev	47dfd249f0	[SLP] Fix tests checks, NFC. llvm-svn: 325605	2018-02-20 18:11:50 +00:00
Sanjay Patel	90f4c8ec29	[InstCombine] fold fdiv with non-splat divisor to fmul: X/C --> X * (1/C) llvm-svn: 325590	2018-02-20 16:08:15 +00:00
Sanjay Patel	1d14779aed	[InstCombine] allow fdiv with constant dividend folds with less than full -ffast-math It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this should be conservatively better than what we have right now. GCC allows this with only -freciprocal-math. The last test is changed to show a case that is expected to fold, but we need D43398. llvm-svn: 325533	2018-02-19 21:46:52 +00:00
Sanjay Patel	e82cc6fcc5	[InstCombine] move fdiv tests; NFC Also, use vector constants just to prove that already works. llvm-svn: 325530	2018-02-19 21:13:39 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Ivan A. Kosarev	f03f579d1d	[Transforms] Propagate new-format TBAA tags on simplification of memory-transfer intrinsics With this patch in place, when a new-format TBAA tag is available for a memory-transfer intrinsic call, we prefer propagating that new-format tag. Otherwise, we fallback to the old approach where we try to construct a proper TBAA access tag from 'tbaa.struct' metadata. Differential Revision: https://reviews.llvm.org/D41543 llvm-svn: 325488	2018-02-19 12:10:20 +00:00
Sanjay Patel	adf6e88c74	[PatternMatch, InstSimplify] enhance m_AllOnes() to ignore undef elements in vectors Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not assume that because an operand constant matches that it's safe to return it as a result). So I'm making that change here too (that diff could be independent, but I'm not sure how to reveal it before the matcher change). This also seems like a good reason to not include matchers that capture the value. We don't want to encourage the potential misstep of propagating undef values when it's not allowed/intended. I didn't include the capture variant option here or in the related rL325437 (m_One), but it already exists for other constant matchers. llvm-svn: 325466	2018-02-18 18:05:08 +00:00
Sanjay Patel	7faceaed31	[InstSimplify] add tests with vector undef elts; NFC llvm-svn: 325465	2018-02-18 17:39:09 +00:00
Sanjay Patel	f569578373	[PatternMatch] enhance m_One() to ignore undef elements in vectors llvm-svn: 325437	2018-02-17 16:00:42 +00:00
Sanjay Patel	a6a1426cf1	[InstSimplify, InstCombine] add tests with vector undef elts; NFC These would fold if the m_One pattern matcher accounted for undef elts. llvm-svn: 325436	2018-02-17 15:55:40 +00:00
Sanjay Patel	841ca95219	[InstSimplify] add vector select tests with undef elts in condition; NFC llvm-svn: 325419	2018-02-17 01:18:53 +00:00
Sanjay Patel	870fbda805	[InstCombine] add FMF to better show current fdiv fold behavior; NFC llvm-svn: 325365	2018-02-16 17:46:50 +00:00
Eugene Leviant	8c83b9b8c5	[ThinLTO] Fix data race in test #2 Switched to the right option (-thinlto-threads) llvm-svn: 325362	2018-02-16 17:25:03 +00:00
Eugene Leviant	c9724d9149	[ThinLTO] Fix data race in test llvm-svn: 325361	2018-02-16 16:56:33 +00:00
Brian M. Rzycki	f1a7df5ef2	[JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis Summary: The LazyValueInfo pass caches a copy of the DominatorTree when available. Whenever there are pending DominatorTree updates within JumpThreading's DeferredDominance object we cannot use the cached DT for LVI analysis. This commit adds the new methods enableDT() and disableDT() to LVI. JumpThreading also sets the appropriate usage model before calling LVI analysis methods. Fixes https://bugs.llvm.org/show_bug.cgi?id=36133 Reviewers: sebpop, dberlin, kuhar Reviewed by: sebpop, kuhar Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov Differential Revision: https://reviews.llvm.org/D42717 llvm-svn: 325356	2018-02-16 16:35:17 +00:00
Ivan A. Kosarev	53270d0fa6	[Transforms] Propagate TBAA info in SROA Now that we have the new TBAA metadata format that is capable of representing accesses to aggregates, we can propagate TBAA access tags from memory setting and transferring intrinsics to load and store instructions and vice versa. Since SROA produces lots of new loads and stores on optimized builds, this change significantly decreases the share of undecorated memory accesses on such builds. Differential Revision: https://reviews.llvm.org/D41563 llvm-svn: 325329	2018-02-16 10:10:29 +00:00
Eugene Leviant	7331a0bf1c	[ThinLTO] Import global variables Differential revision: https://reviews.llvm.org/D43077 llvm-svn: 325320	2018-02-16 08:11:04 +00:00
Vedant Kumar	3dc6de619a	Remove brittle check lines from a test, NFC llvm-svn: 325310	2018-02-16 01:21:01 +00:00
Vedant Kumar	616fdb00df	[GVN] Partially revert debug info salvage change (r325063) In r325063, we salvaged debug values from dying instructions in GVN::processBlock() and GVN::performScalarPRE(). The change in performScalarPRE(), while correct, is unhelpful. It introduced a call to salvageDebugInfo() which was immediately followed by a RAUW, meaning it prevented the RAUW from efficiently updating dbg.value intrinsics. This commit reverts the mistake and tightens up the affected test case. llvm-svn: 325308	2018-02-16 01:15:20 +00:00
Vedant Kumar	1df820ecd7	[DCE] Salvage debug info from dead insts This results in small increases in the size of the .debug_loc section and the number of unique source variables in a stage2 build of opt. llvm-svn: 325301	2018-02-15 22:26:18 +00:00
Brian Gesiak	a5e3675bd3	[Coroutines] Don't move stores for allocator args Summary: The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7` allows coroutine parameters to be passed into allocator functions. The instructions to store values into the alloca'd parameters must not be moved past the frame allocation, otherwise uninitialized values are passed to the allocator. Test Plan: `check-llvm` Reviewers: rsmith, GorNishanov, eric_niebler Reviewed By: GorNishanov Subscribers: compnerd, EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D43000 llvm-svn: 325285	2018-02-15 19:31:45 +00:00
Vedant Kumar	24953dc876	[SCCP] Test that constant propagation updates debug info, NFC This extends an existing test to check that SCCP updates the operands of relevant dbg.value instructions as it does its work. llvm-svn: 325281	2018-02-15 19:13:04 +00:00
Alexey Bataev	862c476fc2	[SLP] Fix the test for the reversed stores, NFC. llvm-svn: 325268	2018-02-15 17:11:50 +00:00
Alexey Bataev	ac619599d8	[SLP] Added test for reversed stores, NFC. llvm-svn: 325265	2018-02-15 16:56:49 +00:00
Sanjay Patel	9174416e89	[InstCombine] test fdiv folds better; NFC We had redundant tests, but no tests for extra uses or vectors. 'fast' is an overly conservative requirement for these folds. llvm-svn: 325262	2018-02-15 16:28:15 +00:00
Sanjay Patel	339b4d338d	[InstCombine] allow sin/cos transforms with 'reassoc' The variable name 'AllowReassociate' is a lie at this point because it's set to 'isFast()' which is more than the 'reassoc' FMF after rL317488. In D41286, we showed that this transform may be valid even with strict math by brute force checking every 32-bit float result. There's a potential problem here because we're replacing with a tan() libcall rather than a hypothetical LLVM tan intrinsic. So we might set errno when we should be guaranteed not to do that. But that's independent of this change. llvm-svn: 325247	2018-02-15 15:07:12 +00:00
Sanjay Patel	6a0f667077	[InstCombine] allow X / C -> X * (1.0/C) for vector splat FP constants llvm-svn: 325237	2018-02-15 13:55:52 +00:00
Max Kazantsev	6e4ce23add	[NFC] Fix metadata placement in test llvm-svn: 325215	2018-02-15 07:13:18 +00:00
Max Kazantsev	c5941d12f4	[SCEV] Favor isKnownViaSimpleReasoning over constant ranges check There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that does constant range check and few more additional checks. We use it some places (e.g. when proving implications) and in some other places we only check constant ranges. Currently, indvar simplifier fails to remove the check in following loop: int inc = ...; for (int i = inc, j = inc - 1; i < 200; ++i, ++j) if (i > j) { ... } This patch replaces all usages of `isKnownPredicateViaConstantRanges` with `isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the case above. Reviewed-By: sanjoy Differential Revision: https://reviews.llvm.org/D43175 llvm-svn: 325214	2018-02-15 07:09:00 +00:00
Sanjay Patel	af5d499cb9	[InstCombine] add tests and comments for fdiv X, C; NFC llvm-svn: 325161	2018-02-14 19:54:51 +00:00
Craig Topper	1c19cc1745	[InstCombine] Don't fold select(C, Z, binop(select(C, X, Y), W)) -> select(C, Z, binop(Y, W)) if the binop is rem or div. The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe. Fixes PR36362. Differential Revision: https://reviews.llvm.org/D43276 llvm-svn: 325148	2018-02-14 18:08:33 +00:00
Sanjay Patel	48f671bb27	[InstCombine] regenerate checks; NFC llvm-svn: 325144	2018-02-14 17:37:32 +00:00
Alexey Bataev	7f246e003a	[SLP] Allow vectorization of reversed loads. Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134	2018-02-14 15:29:15 +00:00
Florian Hahn	b4e3bad89b	Recommit r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call. For basic blocks with instructions between the beginning of the block and a call we have to duplicate the instructions before the call in all split blocks and add PHI nodes for uses of the duplicated instructions after the call. Currently, the threshold for the number of instructions before a call is quite low, to keep the impact on binary size low. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41860 llvm-svn: 325126	2018-02-14 13:59:12 +00:00
Florian Hahn	c6296fea3f	[LoopInterchange] Incrementally update the dominator tree. We can use incremental dominator tree updates to avoid re-calculating the dominator tree after interchanging 2 loops. Reviewers: dmgreen, kuhar Reviewed By: kuhar Differential Revision: https://reviews.llvm.org/D43176 llvm-svn: 325122	2018-02-14 13:13:15 +00:00
Petar Jovanovic	1768957c82	[Utils] Salvage the debug info of DCE'ed 'and' instructions Preserve debug info from a dead 'and' instruction with a constant. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D43163 llvm-svn: 325119	2018-02-14 13:10:35 +00:00
Elena Demikhovsky	945b7e5aa6	Adding a width of the GEP index to the Data Layout. Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout. p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits. The index size parameter is optional, if not specified, it is equal to the pointer size. Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width. It works fine if you can convert pointer to integer for address calculation and all registered targets do this. But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout. http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account. This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size. Differential Revision: https://reviews.llvm.org/D42123 llvm-svn: 325102	2018-02-14 06:58:08 +00:00
Sanjay Patel	3ce76ad26f	[InstCombine] put tests of mul with neg operand(s) together; NFC llvm-svn: 325066	2018-02-13 23:02:12 +00:00
Vedant Kumar	1d5d31b706	[GVN] Salvage debug info from dead insts This preserves an additional 581 unique source variables in a stage2 build of clang (according to `llvm-dwarfdump --statistics`). It increases the size of the .debug_loc section by 0.1% (or 87139 bytes). Differential Revision: https://reviews.llvm.org/D43255 llvm-svn: 325063	2018-02-13 22:27:17 +00:00
Sanjay Patel	7558d860af	[InstCombine] (lshr X, 31) * Y --> (ashr X, 31) & Y This replaces the bit-tracking based fold that did the same thing, but it only worked for scalars and not directly. There is no evidence in existing regression tests that the greater power of bit-tracking was needed here, but we should be aware of this potential loss of optimization. llvm-svn: 325062	2018-02-13 22:24:37 +00:00
Sanjay Patel	fdb3b036cc	[InstCombine] add vector tests, fix comments; NFC The scalar folds are done indirectly and use potentially expensive value tracking calls. That can be improved along with the enhancement to support vector types. llvm-svn: 325051	2018-02-13 21:19:42 +00:00
Sanjay Patel	cb8ac00f73	[InstCombine] (bool X) * Y --> X ? Y : 0 This is both a functional improvement for vectors and an efficiency improvement for scalars. The existing code below the new folds does the same thing for scalars, but in an indirect and expensive way. llvm-svn: 325048	2018-02-13 20:41:22 +00:00
Sanjay Patel	2e2958497f	[InstCombine] fix test comment and add vector test; NFC llvm-svn: 325039	2018-02-13 18:48:27 +00:00
Sanjay Patel	b13fcd52ed	[InstCombine, InstSimplify] (re)move tests, regenerate checks; NFC The InstCombine integer mul test file had tests that belong in InstSimplify (including fmul tests). Move things to where they belong and auto-generate complete checks for everything. llvm-svn: 325037	2018-02-13 18:22:53 +00:00
Vedant Kumar	35fc103e1e	[DeadStoreElimination] Salvage debug info from dead insts According to `llvm-dwarfdump --statistics` this salvages 43 additional unique source variables in a stage2 build of clang. It increases the size of the .debug_loc section by 0.002% (or 2864 bytes). Differential Revision: https://reviews.llvm.org/D43220 llvm-svn: 325035	2018-02-13 18:15:26 +00:00
Yaxun Liu	0124b5484c	[AMDGPU] Change constant addr space to 4 Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030	2018-02-13 18:00:25 +00:00
Florian Hahn	35d744d388	Revert r325001: [CallSiteSplitting] Support splitting of blocks with instrs before call. Due to memsan not being happy with the array of ValueToValue maps. llvm-svn: 325009	2018-02-13 14:48:39 +00:00
Ivan A. Kosarev	4a381b444e	[IR] Fix creating mutable versions of TBAA access tags Due to a typo in D41565, mutable TBAA tags created with createMutableTBAAAccessTag() lose their base types. This patch fixes that typo and updates tests respectively. Differential Revision: https://reviews.llvm.org/D42364 llvm-svn: 325008	2018-02-13 14:44:25 +00:00
Florian Hahn	b0884b6443	[CallSiteSplitting] Support splitting of blocks with instrs before call. For basic blocks with instructions between the beginning of the block and a call we have to duplicate the instructions before the call in all split blocks and add PHI nodes for uses of the duplicated instructions after the call. Currently, the threshold for the number of instructions before a call is quite low, to keep the impact on binary size low. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41860 llvm-svn: 325001	2018-02-13 12:00:48 +00:00
Florian Hahn	1f95ef1815	[LoopInterchange] Check number of latch successors before accessing them. In cases where the OuterMostLoopLatchBI only has a single successor, accessing the second successor will fail. This fixes a failure when building the test-suite with loop-interchange enabled. Reviewers: mcrosier, karthikthecool, davide Reviewed by: karthikthecool Differential Revision: https://reviews.llvm.org/D42906 llvm-svn: 324994	2018-02-13 10:02:52 +00:00
Vedant Kumar	388fac5de6	[Utils] Salvage debug info from all no-op casts We already try to salvage debug values from no-op bitcasts and inttoptr instructions: we should handle ptrtoint instructions as well. This saves an additional 24,444 debug values in a stage2 build of clang, and (according to llvm-dwarfdump --statistics) provides an additional 289 unique source variables. llvm-svn: 324982	2018-02-13 03:34:23 +00:00
Vedant Kumar	4011c26cc7	[Utils] Salvage debug info of DCE'ed mul/sdiv/srem instructions Here are the number of additional debug values salvaged in a stage2 build of clang: 63 SALVAGE: MUL 1250 SALVAGE: SDIV (No values were salvaged from `srem` instructions in this experiment, but it's a simple case to handle so we might as well.) llvm-svn: 324976	2018-02-13 01:09:52 +00:00
Vedant Kumar	31ec356a48	[Utils] Salvage debug info of DCE'ed shl/lhsr/ashr instructions Here are the number of additional debug values salvaged in a stage2 build of clang: 1912 SALVAGE: ASHR 405 SALVAGE: LSHR 249 SALVAGE: SHL llvm-svn: 324975	2018-02-13 01:09:49 +00:00
Vedant Kumar	47b16c45d7	[Utils] Salvage the debug info of DCE'ed 'sub' instructions This salvages 14 debug values in a stage2 build of clang. llvm-svn: 324974	2018-02-13 01:09:47 +00:00
Vedant Kumar	96b7dc041b	[Utils] Salvage the debug info of DCE'ed 'xor' instructions This salvages 259 debug values in a stage2 build of clang. Differential Revision: https://reviews.llvm.org/D43207 llvm-svn: 324973	2018-02-13 01:09:46 +00:00
Sanjay Patel	246d769232	[InstSimplify] allow exp/log simplifications with only 'reassoc' FMF These intrinsic folds were added with D41381, but only allowed with isFast(). That's more than necessary because FMF has 'reassoc' to apply to these kinds of folds after D39304, and that's all we need in these cases. Differential Revision: https://reviews.llvm.org/D43160 llvm-svn: 324967	2018-02-12 23:51:23 +00:00
Sanjay Patel	c5d5933bd5	[InstSimplify] change tests to 'fast' to reflect current folds The diff to use 'reassoc' is part of D43160; it should not have been made with rL324961. Reverting that part here, so we'll see the intended diff with the code change. llvm-svn: 324963	2018-02-12 23:39:10 +00:00
Sanjay Patel	de3e889a88	[InstSimplify] consolidate tests for log-exp inverse folds Some tests didn't add much value because we already show stronger constraints for the folds in other tests, so the weaker versions were deleted. Moved the remaining tests into 1 file because the folds are very similar and handled from 1 place in the code. llvm-svn: 324961	2018-02-12 23:18:11 +00:00
Daniel Neilson	2363da9236	[InstCombine] Simplify MemTransferInst's source and dest alignments separately Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify the source and destination alignment attributes separately. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: majnemer, bollu, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D42871 llvm-svn: 324960	2018-02-12 23:06:55 +00:00
Daniel Neilson	095d72989d	[SafeStack] Use updated CreateMemCpy API to set more accurate source and destination alignments. Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the creation of memcpys in the SafeStack pass to set the alignment of the destination object to its stack alignment while separately setting the source byval arguments alignment to its alignment. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. (rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html Reviewers: eugenis, bollu Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42710 llvm-svn: 324955	2018-02-12 22:39:47 +00:00
Vedant Kumar	a8b8d3225b	Move the debuginfo-dce-or test into debuginfo-variables.ll, NFC llvm-svn: 324933	2018-02-12 21:02:45 +00:00
Sanjay Patel	4a4f35f324	[InstCombine] X / (X * Y) --> 1.0 / Y This is similar to the instsimplify fold added with D42385 ( rL323716 ) ...but this can't be in instsimplify because we're creating/morphing a different instruction. llvm-svn: 324927	2018-02-12 19:39:21 +00:00
Sanjay Patel	ee4257f676	[InstCombine] add tests for missing fdiv fold; NFC llvm-svn: 324926	2018-02-12 19:23:39 +00:00
Sanjay Patel	e4fcb00290	[InstCombine] regenerate checks; NFC llvm-svn: 324924	2018-02-12 19:14:01 +00:00
Jun Bum Lim	144eb593dd	[LICM] update BlockColors after splitting predecessors Update BlockColors after splitting predecessors. Do not allow splitting EHPad for sinking when the BlockColors is not empty, so we can simply assign predecessor's color to the new block. Fixes PR36184 llvm-svn: 324916	2018-02-12 17:56:55 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Sanjay Patel	510d647a4d	[InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflow The related cases for (X * Y) / X were handled in rL124487. https://rise4fun.com/Alive/6k9 The division in these tests is subsequently eliminated by existing instcombines for 1/X. llvm-svn: 324843	2018-02-11 17:20:32 +00:00
Sanjay Patel	aee107f30d	[InstCombine] add tests for div-mul folds; NFC The related cases for (X * Y) / X were handled in rL124487. llvm-svn: 324840	2018-02-11 16:52:44 +00:00
Craig Topper	4dccffc84a	[X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827	2018-02-10 23:33:55 +00:00
Simon Pilgrim	19495198af	[InstCombine] Add constant vector support for ~(C >> Y) --> ~C >> Y Includes adding m_NonNegative constant pattern matcher llvm-svn: 324825	2018-02-10 21:46:09 +00:00
Mircea Trofin	73b96d6dcf	[LV] Fix analyzeInterleaving when -pass-remarks enabled Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786	2018-02-10 00:07:45 +00:00
Vedant Kumar	04386d8e3d	[Utils] Salvage debug info from dead 'or' instructions Extend salvageDebugInfo to preserve the debug info from a dead 'or' with a constant. Patch by Ismail Badawi! Differential Revision: https://reviews.llvm.org/D43129 llvm-svn: 324764	2018-02-09 19:19:55 +00:00
Simon Pilgrim	0919a8c130	[InstCombine] Add vector xor tests This doesn't cover everything in InstCombiner.visitXor yet, but increases coverage for a lot of tests llvm-svn: 324753	2018-02-09 17:45:45 +00:00
Simon Pilgrim	9620f4b746	[InstCombine] Add constant vector support for X udiv C, where C >= signbit llvm-svn: 324728	2018-02-09 10:43:59 +00:00
Dmitry Mikulin	87e1c4c8de	Minor tweak to test case. llvm-svn: 324670	2018-02-08 23:10:07 +00:00
Dmitry Mikulin	5cf73cea9c	[ThinLTO] Skip BlockAddresses while replacing uses in function import. Differential Revision: https://reviews.llvm.org/D43027 llvm-svn: 324658	2018-02-08 22:14:56 +00:00
Simon Pilgrim	3f462e90cc	Regenerate test llvm-svn: 324639	2018-02-08 19:28:05 +00:00
Simon Pilgrim	f30656add3	[InstCombine] Add vector udiv tests Tests for X udiv C, where C >= signbit llvm-svn: 324635	2018-02-08 18:58:00 +00:00
Simon Pilgrim	1889f26b94	[InstCombine] Add m_Negative pattern matching Allows us to add non-uniform constant vector support for "X urem C -> X < C ? X : X - C, where C >= signbit." llvm-svn: 324631	2018-02-08 18:36:01 +00:00
Simon Pilgrim	11a02589c1	[InstCombine] Add vector urem tests. Improve coverage of InstCombiner::visitURem for vector types llvm-svn: 324629	2018-02-08 18:10:08 +00:00
Simon Pilgrim	ab689cb638	[InstCombine] Regenerate vector mul tests. llvm-svn: 324627	2018-02-08 17:54:24 +00:00
Daniel Neilson	fb99a493be	[LoopIdiom] Be more aggressive when setting alignment in memcpy Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the LoopIdiom pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. This allows us to be slightly more aggressive in setting the alignment of memcpy calls that loop idiom creates. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 324626	2018-02-08 17:33:08 +00:00
Sanjay Patel	574fb73c89	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324616	2018-02-08 15:32:28 +00:00
Sanjay Patel	124392f038	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324615	2018-02-08 15:30:39 +00:00
Sanjay Patel	e2c5e9a970	[SLPVectorizer] move RUN line to top-of-file; NFC I was confused what we were checking because the RUN line was in the middle of the file. llvm-svn: 324614	2018-02-08 15:28:49 +00:00
Simon Pilgrim	2a90acd17a	[InstCombine] Fix issue with X udiv (POW2_C1 << N) for non-splat constant vectors foldUDivShl was assuming that the input was a scalar or a splat constant llvm-svn: 324613	2018-02-08 15:19:38 +00:00
Sanjay Patel	cfa5c03039	[SLPVectorizer] auto-generate complete checks; NFC llvm-svn: 324612	2018-02-08 15:16:26 +00:00
Sanjay Patel	42b8c23cc6	[LoopVectorize] auto-generate complete checks; NFC llvm-svn: 324611	2018-02-08 15:13:47 +00:00
Sanjay Patel	a60aec1ab7	[ValueTracking] don't crash when assumptions conflict (PR36270) The last assume in the test says that %B12 is 0. The first assume says that %and1 is less than %B12. Therefore, %and1 is unsigned less than 0...does not compute. That means this line: Known.Zero.setHighBits(RHSKnown.countMinLeadingZeros() + 1); ...tries to set more bits than exist. Differential Revision: https://reviews.llvm.org/D43052 llvm-svn: 324610	2018-02-08 14:52:40 +00:00
Simon Pilgrim	94cc89d5f2	[InstCombine] Fix issue with X udiv 2^C -> X >> C for non-splat constant vectors foldUDivPow2Cst was assuming that the input was a scalar or a splat constant llvm-svn: 324608	2018-02-08 14:46:10 +00:00
Simon Pilgrim	0b9f3912ce	[InstCombine] Improve mul(x, pow2) -> shl combine for vector constants Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors. llvm-svn: 324603	2018-02-08 14:10:01 +00:00
Serguei Katkov	c8016e7a65	[Loop Predication] Teach LP about reverse loops with uge and sge latch conditions Add support of uge and sge latch condition to Loop Prediction for reverse loops. Reviewers: apilipenko, mkazantsev, sanjoy, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42837 llvm-svn: 324589	2018-02-08 10:34:08 +00:00
Serguei Katkov	66182d6c38	[SimplifyCFG] Re-apply Relax restriction for folding unconditional branches The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. The test profile/Linux/counter_promo_nest.c in compiler-rt project is updated to meet this change. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324572	2018-02-08 07:16:29 +00:00
Mircea Trofin	06ac8cfbd1	Verify profile data confirms large loop trip counts. Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543	2018-02-07 23:29:52 +00:00
Alexey Bataev	cd8d6de381	[SLP] Add a tests for PR36280, NFC. llvm-svn: 324510	2018-02-07 20:11:37 +00:00
Max Kazantsev	b299ade2c5	Re-enable "[SCEV] Make isLoopEntryGuardedByCond a bit smarter" The failures happened because of assert which was overconfident about SCEV's proving capabilities and is generally not valid. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324473	2018-02-07 11:16:29 +00:00
Serguei Katkov	69246ca787	Revert [SCEV] Make isLoopEntryGuardedByCond a bit smarter Revert rL324453 commit which causes buildbot failures. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324462	2018-02-07 09:10:08 +00:00
Max Kazantsev	dd5ee6f5d9	[SCEV] Make isLoopEntryGuardedByCond a bit smarter Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly. But it is a common situation when `a >= b` is known from ranges and `a != b` is known from a dominating condition. Thia patch teaches SCEV to sum these facts together and prove strict comparison via non-strict one. Differential Revision: https://reviews.llvm.org/D42835 llvm-svn: 324453	2018-02-07 07:56:26 +00:00
Michael Zolotukhin	cae66ba5f8	The xfailed test from r324448 passed on one of the bots: remove it entirely for now. llvm-svn: 324451	2018-02-07 06:54:11 +00:00
Michael Zolotukhin	1713dd5b8d	Xfail the test added in r324445 until the underlying issue in LoopSink is fixed. llvm-svn: 324448	2018-02-07 06:11:50 +00:00
Michael Zolotukhin	e82e83fcce	Follow-up for r324429: "[LCSSAVerification] Run verification only when asserts are enabled." Before r324429 we essentially didn't have a verification of LCSSA, so no wonder that it has been broken: currently loop-sink breaks it (the attached test illustrates the failure). It was detected during a stage2 RA build, so to unbreak it I'm disabling the check for now. llvm-svn: 324445	2018-02-07 04:24:44 +00:00
Alexey Bataev	1e593fe73e	[SLP] Update test checks, NFC. llvm-svn: 324387	2018-02-06 20:00:05 +00:00
Simon Pilgrim	9f2ae7e2d1	[InstCombine][ValueTracking] Match non-uniform constant power-of-two vectors Generalize existing constant matching to work with non-uniform constant vectors as well. Differential Revision: https://reviews.llvm.org/D42818 llvm-svn: 324369	2018-02-06 18:39:23 +00:00
Simon Pilgrim	e11c64162c	Regenerate vector-urem test. NFCI. llvm-svn: 324357	2018-02-06 16:10:12 +00:00
Clement Courbet	a7a1746865	[MergeICmps] Handle chains with several complex BCE basic blocks. - Fix condition for detecting that a complex basic block was the first in the chain. - Add tests. This was caught by buildbots when submitting rL324319. llvm-svn: 324341	2018-02-06 12:25:33 +00:00
Hiroshi Inoue	ba3585eaf2	[ThinLTO] fix test failure without x86 backend This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled. llvm-svn: 324316	2018-02-06 07:03:09 +00:00
Peter Collingbourne	29c6f4833c	ThinLTOBitcodeWriter: Do not include module-level inline asm in the merged module. If the inline asm provides the definition of a symbol, this can result in duplicate symbol errors. Differential Revision: https://reviews.llvm.org/D42944 llvm-svn: 324313	2018-02-06 03:29:18 +00:00
Teresa Johnson	791c98e4c8	[ThinLTO] Remove dead and dropped symbol declarations when possible Summary: Removing the dropped symbols will prevent indirect call promotion in the ThinLTO Backend from adding a new reference to a symbol, which can result in linker unsats. This can happen when we compile with a sample profile collected from one binary by used for another, which may have profiled targets that aren't used in the new binary. Note that until dropDeadSymbols handles variables and aliases (in progress), we may not be able to remove the declaration and can still have an issue. Reviewers: grimar, davidxl Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D42816 llvm-svn: 324299	2018-02-06 00:43:39 +00:00
Sanjay Patel	d7c702b451	[LoopStrengthReduce, x86] don't add cost for a cmp that will be macro-fused (PR35681) In the motivating case from PR35681 and represented by the macro-fuse-cmp test: https://bugs.llvm.org/show_bug.cgi?id=35681 ...there's a 37 -> 31 byte size win for the loop because we eliminate the big base address offsets. SPEC2017 on Ryzen shows no significant perf difference. Differential Revision: https://reviews.llvm.org/D42607 llvm-svn: 324289	2018-02-05 23:43:05 +00:00
Sanjay Patel	49aafec2e6	[InstCombine] don't try to evaluate instructions with >1 use (revert r324014) This example causes a compile-time explosion: define i16 @foo(i16 %in) { %x = zext i16 %in to i32 %a1 = mul i32 %x, %x %a2 = mul i32 %a1, %a1 %a3 = mul i32 %a2, %a2 %a4 = mul i32 %a3, %a3 %a5 = mul i32 %a4, %a4 %a6 = mul i32 %a5, %a5 %a7 = mul i32 %a6, %a6 %a8 = mul i32 %a7, %a7 %a9 = mul i32 %a8, %a8 %a10 = mul i32 %a9, %a9 %a11 = mul i32 %a10, %a10 %a12 = mul i32 %a11, %a11 %a13 = mul i32 %a12, %a12 %a14 = mul i32 %a13, %a13 %a15 = mul i32 %a14, %a14 %a16 = mul i32 %a15, %a15 %a17 = mul i32 %a16, %a16 %a18 = mul i32 %a17, %a17 %a19 = mul i32 %a18, %a18 %a20 = mul i32 %a19, %a19 %a21 = mul i32 %a20, %a20 %a22 = mul i32 %a21, %a21 %a23 = mul i32 %a22, %a22 %a24 = mul i32 %a23, %a23 %T = trunc i32 %a24 to i16 ret i16 %T } llvm-svn: 324276	2018-02-05 21:50:32 +00:00
Sanjay Patel	1c84dd9a8f	[InstCombine] add test corresponding to r324252 (PR36225); NFC As PR36225 shows, we definitely don't want to enable the canEvaluate* logic with phis. There's still a question of whether we should just revert r324014 completely because it exposes a compile-time sinkhole (although that problem might exist independently). llvm-svn: 324266	2018-02-05 19:59:52 +00:00
Sanjay Patel	e9a153f414	[InstCombine] add unsigned saturation subtraction canonicalizations This is the instcombine part of unsigned saturation canonicalization. Backend patches already commited: https://reviews.llvm.org/D37510 https://reviews.llvm.org/D37534 It converts unsigned saturated subtraction patterns to forms recognized by the backend: (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b) (b < a) ? a - b : 0 -> ((a > b) ? a : b) - b) (b > a) ? 0 : a - b -> ((a > b) ? a : b) - b) (a < b) ? 0 : a - b -> ((a > b) ? a : b) - b) ((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b) ((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b) ((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b) ((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b) Patch by Yulia Koval! Differential Revision: https://reviews.llvm.org/D41480 llvm-svn: 324255	2018-02-05 17:53:29 +00:00
Hans Wennborg	22db17cf43	Revert r323472 "[Debug] Add dbg.value intrinsics for PHIs created during LCSSA." This broke the Chromium build; see PR36238. > This patch is an enhancement to propagate dbg.value information when > Phis are created on behalf of LCSSA. I noticed a case where a value > carried across a loop was reported as <optimized out>. > > Specifically this case: > > int bar(int x, int y) { > return x + y; > } > > int foo(int size) { > int val = 0; > for (int i = 0; i < size; ++i) { > val = bar(val, i); // Both val and i are correct > } > return val; // <optimized out> > } > > In the above case, after all of the interesting computation completes > our value is reported as "optimized out." This change will add a > dbg.value to correct this. > > This patch also moves the dbg.value insertion routine from > LoopRotation.cpp into Local.cpp, so that we can share it in both places > (LoopRotation and LCSSA). > > Patch by Matt Davis! > > Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 324247	2018-02-05 16:10:42 +00:00
Serguei Katkov	276b32bb14	Revert [SimplifyCFG] Relax restriction for folding unconditional branches The patch causes the failure of the test compiler-rt/test/profile/Linux/counter_promo_nest.c To unblock buildbot, revert the patch while investigation is in progress. Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324214	2018-02-05 09:05:43 +00:00
Max Kazantsev	f7667483c1	[NFC] Add tests for PR35743 llvm-svn: 324209	2018-02-05 08:09:49 +00:00
Serguei Katkov	6e93980e82	[SimplifyCFG] Relax restriction for folding unconditional branches The commit rL308422 introduces a restriction for folding unconditional branches. Specifically if empty block with unconditional branch leads to header of the loop then elimination of this basic block is prohibited. However it seems this condition is redundantly strict. If elimination of this basic block does not introduce more back edges then we can eliminate this block. The patch implements this relax of restriction. Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: pacxx Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42691 llvm-svn: 324208	2018-02-05 07:56:43 +00:00
Serguei Katkov	ec7029c286	Re-apply [SCEV] Fix isLoopEntryGuardedByCond usage ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. To bugs additionally fixed: assert is moved after the check whether loop is not a nullptr. Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow is guarded by isAvailableAtLoopEntry. Reviewers: sanjoy, mkazantsev, anna, dorit, reames Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42417 llvm-svn: 324204	2018-02-05 05:49:47 +00:00
Florian Hahn	642637aab4	[PartialInliner] Update test (NFC). llvm-svn: 324199	2018-02-04 18:40:24 +00:00
Florian Hahn	8f804fc07d	[InlineFunction] Set arg attrs even if there only are VarArg attrs. When using the partial inliner, we might have attributes for forwarded varargs, but the CodeExtractor does not create an empty argument attribute set for regular arguments in that case, because it does not know of the additional arguments. So in case we have attributes for VarArgs, we also have to make sure we create (empty) attributes for all regular arguments. This fixes PR36210. llvm-svn: 324197	2018-02-04 18:27:47 +00:00
Chad Rosier	a097bc69df	[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195	2018-02-04 15:42:24 +00:00
David Green	9688ed61fe	Remove unneeded -debug argument from new test llvm-svn: 324176	2018-02-03 17:33:50 +00:00
David Green	7174023f57	[InstCombine] Allow common type conversions to i8/i16/i32 This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 324174	2018-02-03 16:51:03 +00:00
Sanjay Patel	a767ee5af0	[InstCombine] make sure tests are providing coverage for the stated pattern; NFC Without extra instructions and uses, swapMayExposeCSEOpportunities() would change the icmp (as seen in the check lines), so we were not actually testing patterns that should be handled by D41480. llvm-svn: 324143	2018-02-02 21:40:54 +00:00
Sanjay Patel	5b8cb26bcc	[InstCombine] add baseline tests for unsigned saturated sub (D41480); NFC llvm-svn: 324109	2018-02-02 17:43:16 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Sanjay Patel	3343fcef86	[InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst This is the enhancement suggested in D42536 to fix a shortcoming in regular InstCombine's canEvaluate* functionality. When we have multiple uses of a value, but they're all in one instruction, we can allow that expression to be narrowed or widened for the same cost as a single-use value. AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified away if the operands are the same value; add becomes shl; shifts with a variable shift amount aren't handled. Differential Revision: https://reviews.llvm.org/D42739 llvm-svn: 324014	2018-02-01 21:55:53 +00:00
David Green	184df0c35d	Revert commit rL323951 Looks like it's causing timeouts out on at least ppc64le buildbots. llvm-svn: 323959	2018-02-01 13:05:25 +00:00
David Green	e11f0545db	[InstCombine] Allow common type conversions to i8/i16/i32 This, in instcombine, allows conversions to i8/i16/i32 (very common cases) even if the resulting type is not legal according to the data layout. This can often open up extra combine opportunities. Differential Revision: https://reviews.llvm.org/D42424 llvm-svn: 323951	2018-02-01 11:06:18 +00:00
Mikael Holmen	6d06976e74	[LSR] Don't force bases of foldable formulae to the final type. Summary: Before emitting code for scaled registers, we prevent SCEVExpander from hoisting any scaled addressing mode by emitting all the bases first. However, these bases are being forced to the final type, resulting in some odd code. For example, if the type of the base is an integer and the final type is a pointer, we will emit an inttoptr for the base, a ptrtoint for the scale, and then a 'reverse' GEP where the GEP pointer is actually the base integer and the index is the pointer. It's more intuitive to use the pointer as a pointer and the integer as index. Patch by: Bevin Hansson Reviewers: atrick, qcolombet, sanjoy Reviewed By: qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42103 llvm-svn: 323946	2018-02-01 06:38:34 +00:00
Amjad Aboud	b86b771c02	[AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node correctly. This covers the case where TruncInst leaf node is a constant expression. See PR36121 for more details. Differential Revision: https://reviews.llvm.org/D42622 llvm-svn: 323926	2018-01-31 22:39:05 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Marek Olsak	8f2df9d26c	[SeparateConstOffsetFromGEP] Fix up addrspace in the AMDGPU test llvm-svn: 323913	2018-01-31 20:49:19 +00:00
Marek Olsak	13e4741275	AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16} Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908	2018-01-31 20:18:04 +00:00
Marek Olsak	8e7d149a31	[SeparateConstOffsetFromGEP] Preserve metadata when splitting GEPs Summary: !amdgpu.uniform needs to be preserved for AMDGPU, otherwise bad things happen. Reviewers: arsenm, nhaehnle, jingyue, broune, majnemer, bjarke.roune, dblaikie Subscribers: wdng, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D42744 llvm-svn: 323907	2018-01-31 20:17:52 +00:00
Daniel Neilson	be58a220e9	[CodeGenPrepare] Improve source and dest alignments of memory intrinsics independently Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the CodeGenPrepare pass to be more aggressive in improving the source and destination alignments of memcpy/memmove/memset by exploiting our new ability to record independent alignments for each argument. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 323891	2018-01-31 17:24:53 +00:00
Sanjay Patel	fd58ade81c	[InstCombine] move related tests into the same file; NFC llvm-svn: 323882	2018-01-31 15:47:59 +00:00
Sanjay Patel	8c74a9a155	[InstCombine] add tests to show limit of canEvaluate* ; NFC llvm-svn: 323881	2018-01-31 15:28:39 +00:00
Amjad Aboud	d895bff5f2	[AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks. Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis. See PR36134 for more details. Differential Revision: https://reviews.llvm.org/D42683 llvm-svn: 323862	2018-01-31 10:41:31 +00:00
Alexey Bataev	1c8f53f47d	[SLP] Add extra test for extractelement shuffle, NFC. llvm-svn: 323815	2018-01-30 21:06:06 +00:00
Sanjay Patel	ffb37a29d1	[LoopStrengthReduce] add test to show potential macro-fusion-based diff (PR35681); NFC This is the baseline output for the test proposed with D42607. llvm-svn: 323806	2018-01-30 19:17:38 +00:00
Simon Pilgrim	073f089c6e	[X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types. Differential Revision: https://reviews.llvm.org/D42526 llvm-svn: 323797	2018-01-30 18:10:21 +00:00
Petar Jovanovic	9208e8fbf6	[DeadArgumentElimination] Preserve llvm.dbg.values's first argument When removing return value Dead Argument Elimination pass clobbers first llvm.dbg.value’s argument for live arguments of that function by replacing it with nullptr. In the next pass it will be deleted, so debug location about those arguments are lost. This change fixes it. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D42541 llvm-svn: 323784	2018-01-30 16:42:04 +00:00
Zaara Syeda	1f59ae311b	Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This recommits r322721 reverted due to sanitizer memory leak build bot failures. Original commit message: This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 323778	2018-01-30 16:17:22 +00:00
Daniel Neilson	594f443b06	[RS4GC] Handle call/invoke instructions as base defining values of vectors Summary: There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions, and the former does not. This appears to be simple oversight. This patch remedies the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector(). Reviewers: DaniilSuchkov, anna Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42653 llvm-svn: 323764	2018-01-30 14:43:41 +00:00
Sanjay Patel	1aef27f5cd	[DSE] make sure memory is not modified before partial store merging (PR36129) We missed a critical check in D30703. We must make sure that no intermediate store is sitting between the stores that we want to merge. This should fix: https://bugs.llvm.org/show_bug.cgi?id=36129 Differential Revision: https://reviews.llvm.org/D42663 llvm-svn: 323759	2018-01-30 13:53:59 +00:00
Sanjay Patel	83f056604c	[InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops This is the FP counterpart that was mentioned in PR35709: https://bugs.llvm.org/show_bug.cgi?id=35709 Differential Revision: https://reviews.llvm.org/D42385 llvm-svn: 323716	2018-01-30 00:18:37 +00:00
Sanjay Patel	d023a9b777	[DSE] add test for PR36129; NFC We can miscompile because we're not checking is the memory might me modified between the seemingly redundant store ops. llvm-svn: 323704	2018-01-29 22:50:08 +00:00
Alexey Bataev	9c5c103283	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662	2018-01-29 16:08:52 +00:00
Alexey Bataev	10f5c9e765	[SLP] Add a test with extract for PR32086, NFC. llvm-svn: 323661	2018-01-29 15:56:52 +00:00
Davide Italiano	8b797a0fd2	[CVP] Don't Replace incoming values from unreachable blocks with undef. This pretty much reverts r322006, except that we keep the test, because we work around the issue exposed in a different way (a recursion limit in value tracking). There's still probably some sequence that exposes this problem, and the proper way to fix that for somebody who has time is outlined in the code review. llvm-svn: 323630	2018-01-29 05:59:55 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Florian Hahn	1636651e35	[InlineCost] Mark functions accessing varargs as not viable. This prevents functions accessing varargs from being inlined if they have the alwaysinline attribute. Reviewers: efriedma, rnk, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D42556 llvm-svn: 323619	2018-01-28 19:11:49 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00
Sanjay Patel	5bce08ddff	[x86] auto-generate complete checks; NFC llvm-svn: 323571	2018-01-26 22:06:07 +00:00
Vedant Kumar	e48597a50e	[InstCombine] Preserve debug values for eliminable casts A cast from A to B is eliminable if its result is casted to C, and if the pair of casts could just be expressed as a single cast. E.g here, %c1 is eliminable: %c1 = zext i16 %A to i32 %c2 = sext i32 %c1 to i64 InstCombine optimizes away eliminable casts. This patch teaches it to insert a dbg.value intrinsic pointing to the final result, so that local variables pointing to the eliminable result are preserved. Differential Revision: https://reviews.llvm.org/D42566 llvm-svn: 323570	2018-01-26 22:02:52 +00:00
Alexey Bataev	7ad4e31c3b	[SLP] Test for trunc vectorization, NFC. llvm-svn: 323556	2018-01-26 20:07:55 +00:00
Alexey Bataev	167003df28	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530	2018-01-26 14:31:09 +00:00
Florian Hahn	212afb9fd9	[CallSiteSplitting] Fix infinite loop when recording conditions. Fix infinite loop when recording conditions by correctly marking basic blocks as visited. Fixes https://bugs.llvm.org/show_bug.cgi?id=36105 llvm-svn: 323515	2018-01-26 10:36:50 +00:00
Vedant Kumar	6394df9fc4	[Debug] LCSSA: Insert dbg.value at the first available insertion point Inserting a dbg.value instruction at the start of a basic block with a landingpad instruction triggers a verifier failure. We should be OK if we insert the instruction a bit later. Speculative fix for the bot failure described here: https://reviews.llvm.org/D42551 llvm-svn: 323482	2018-01-25 23:48:29 +00:00
Vedant Kumar	60f54084bf	[Debug] Add dbg.value intrinsics for PHIs created during LCSSA. This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Patch by Matt Davis! Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 323472	2018-01-25 21:37:07 +00:00
Alexey Bataev	102d4b59f9	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323441 to fix buildbots. llvm-svn: 323447	2018-01-25 17:28:12 +00:00
Alexey Bataev	c8cfa14b6d	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323441	2018-01-25 16:45:18 +00:00
Sanjay Patel	1d68112c4b	[InstCombine] narrow masked zexted binops (PR35792) This is guarded by shouldChangeType(), so the tests show that we don't do the fold if the narrower type is not legal. Note that there is a proposal (D42424) that would change the results for the specific cases shown in these tests. That difference is also discussed in PR35792: https://bugs.llvm.org/show_bug.cgi?id=35792 Alive proofs for the cases handled here as well as the bitwise logic binops that we should already do better on: https://rise4fun.com/Alive/c97 https://rise4fun.com/Alive/Lc5E https://rise4fun.com/Alive/kdf llvm-svn: 323437	2018-01-25 16:34:36 +00:00
Sanjay Patel	0f95dd234d	[InstCombine] add tests for PR35792; NFC llvm-svn: 323436	2018-01-25 16:03:44 +00:00
Alexey Bataev	a0b2c78efc	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323430 to fix buildbots. llvm-svn: 323432	2018-01-25 15:20:29 +00:00
Alexey Bataev	ad51fe3644	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323430	2018-01-25 15:01:36 +00:00
Amjad Aboud	f1f57a3137	Another try to commit 323321 (aggressive instruction combine). llvm-svn: 323416	2018-01-25 12:06:32 +00:00
Sanjay Patel	60c13c7712	[InstCombine] fix datalayout in test file The only part of the datalayout that should matter for these tests is the part that specifies the legal int widths ('n*'). But there was a bug - that part of the string was not correctly separated with the expected '-' character, so we were testing as if there were no legal int widths at all. Removed the leading cruft so we have some legal ints to test with. I noticed this while testing a potential change to the way we transform shifts and sexts in D42424. llvm-svn: 323377	2018-01-24 21:36:45 +00:00
Alexey Bataev	0affccc8d7	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323348 because of the broken buildbots. llvm-svn: 323359	2018-01-24 18:36:51 +00:00
Nicolai Haehnle	4afb64e4c6	Revert r321751, "StructurizeCFG: Fix broken backedge detection" It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 llvm-svn: 323355	2018-01-24 18:02:05 +00:00
Alexey Bataev	4bd8e5332f	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323348	2018-01-24 17:50:53 +00:00
Zvi Rackover	51f0d64b9c	InstSimplify: If divisor element is undef simplify to undef Summary: If any vector divisor element is undef, we can arbitrarily choose it be zero which would make the div/rem an undef value by definition. Reviewers: spatel, reames Reviewed By: spatel Subscribers: magabari, llvm-commits Differential Revision: https://reviews.llvm.org/D42485 llvm-svn: 323343	2018-01-24 17:22:00 +00:00
Simon Pilgrim	f15886eb30	Regenerate shuffle sink test llvm-svn: 323328	2018-01-24 14:59:02 +00:00
Amjad Aboud	d53504e379	Reverted 323321. llvm-svn: 323326	2018-01-24 14:48:49 +00:00
Amjad Aboud	e4453233d7	[InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine). Combine expression patterns to form expressions with fewer, simple instructions. This pass does not modify the CFG. For example, this pass reduce width of expressions post-dominated by TruncInst into smaller width when applicable. It differs from instcombine pass in that it contains pattern optimization that requires higher complexity than the O(1), thus, it should run fewer times than instcombine pass. Differential Revision: https://reviews.llvm.org/D38313 llvm-svn: 323321	2018-01-24 12:42:42 +00:00
Max Kazantsev	0f720e1296	[NFC] Remove overconfident assert from IRCE This patch removes assert that SCEV is able to prove that a value is non-negative. In fact, SCEV can sometimes be unable to do this because its cache does not update properly. This assert will be returned once this problem is resolved. llvm-svn: 323309	2018-01-24 07:51:41 +00:00
Hiroshi Inoue	501931b117	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 323302	2018-01-24 05:04:35 +00:00
Zvi Rackover	b5447b1e7c	X86: Update isVectorShiftByScalarCheap with cases covered by AVX512BW Summary: AVX512BW adds support for variable shift amount for 16-bit element vectors. Reviewers: craig.topper, RKSimon, spatel Reviewed By: RKSimon Subscribers: rengolin, tschuett, llvm-commits Differential Revision: https://reviews.llvm.org/D42437 llvm-svn: 323292	2018-01-24 01:36:40 +00:00
Sanjay Patel	c4ed9ed276	[SLPVectorizer] add test for PR13837; NFC This was probably fixed long ago, but I don't see a test that lines up with the example and target in the bug report: https://bugs.llvm.org/show_bug.cgi?id=13837 ...so adding it here. llvm-svn: 323269	2018-01-23 22:04:17 +00:00
Simon Pilgrim	67b21313ae	Add bdver shuffle sink tests. llvm-svn: 323268	2018-01-23 22:03:57 +00:00
Volkan Keles	dc40be75f8	[llvm-extract] Support extracting basic blocks Summary: Currently, there is no way to extract a basic block from a function easily. This patch extends llvm-extract to extract the specified basic block(s). Reviewers: loladiro, rafael, bogner Reviewed By: bogner Subscribers: hintonda, mgorny, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D41638 llvm-svn: 323266	2018-01-23 21:51:34 +00:00
Simon Pilgrim	bf3c39877e	Regenerate select test. NFCI. llvm-svn: 323265	2018-01-23 21:50:46 +00:00
Simon Pilgrim	a075ce04d8	Regenerate shuffle sink test. NFCI. llvm-svn: 323264	2018-01-23 21:50:11 +00:00
Alexey Bataev	4f74a31c0e	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323246 because of the broken buildbots. llvm-svn: 323252	2018-01-23 20:11:27 +00:00
Alexey Bataev	6719e2418c	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323246	2018-01-23 19:30:26 +00:00
Zvi Rackover	87937e8ed2	X86 Tests: Add AVX512BW config to CodeGenPrepare test. NFC Case points out that we don't consider shifts supported by AVX512BW in isVectorShiftByScalarCheap() llvm-svn: 323242	2018-01-23 19:20:39 +00:00
Serguei Katkov	17e5794f11	[CGP] Fix the GV handling in complex addressing mode If in complex addressing mode the difference is in GV then base reg should not be installed because we plan to use base reg as a merge point of different GVs. This is a fix for PR35980. Reviewers: reames, john.brawn, santosh Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42230 llvm-svn: 323192	2018-01-23 12:07:49 +00:00
Anton Bikineev	82f61151b3	[InstSimplify] (X << Y) % X -> 0 llvm-svn: 323182	2018-01-23 09:27:47 +00:00
Chandler Carruth	c58f2166ab	Introduce the "retpoline" x86 mitigation technique for variant #2 of the speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took in the speculative execution, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` in addition to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile all libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we strongly recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155	2018-01-22 22:05:25 +00:00
Serguei Katkov	f38041dc3e	Revert [SCEV] Fix isLoopEntryGuardedByCond usage It causes buildbot failures. New added assert is fired. It seems not all usages of isLoopEntryGuardedByCond are fixed. llvm-svn: 323079	2018-01-22 07:47:02 +00:00
Serguei Katkov	50714a1cbc	[SCEV] Fix isLoopEntryGuardedByCond usage ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. Reviewers: sanjoy, mkazantsev, anna, dorit Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42165 llvm-svn: 323077	2018-01-22 07:31:41 +00:00
Sanjay Patel	9530f18864	[InstCombine] (X << Y) / X -> 1 << Y ...when the shift is known to not overflow with the matching signed-ness of the division. This closes an optimization gap caused by canonicalizing mul by power-of-2 to shl as shown in PR35709: https://bugs.llvm.org/show_bug.cgi?id=35709 Patch by Anton Bikineev! Differential Revision: https://reviews.llvm.org/D42032 llvm-svn: 323068	2018-01-21 16:14:51 +00:00
Sanjay Patel	7a44e4d594	[InstSimplify] add baseline tests for (X << Y) % X -> 0; NFC This is the 'rem' counterpart to D42032 and would be folded by D42341. Patch by Anton Bikineev. Differential Revision: https://reviews.llvm.org/D42342 llvm-svn: 323067	2018-01-21 15:36:15 +00:00
Sanjay Patel	439132185d	[InstCombine] add baseline tests for (X << Y) / X -> 1 << Y; NFC This fold is proposed in D42032. llvm-svn: 323043	2018-01-20 16:13:40 +00:00
Craig Topper	0d797a34d8	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015	2018-01-20 00:26:08 +00:00
Akira Hatanaka	73ceb50d85	[ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call to @objc_autorelease if its operand is a PHI and the PHI has an equivalent value that is used by a return instruction. For example, ARC optimizer shouldn't replace the call in the following example, as doing so breaks the AutoreleaseRV/RetainRV optimization: %v1 = bitcast i32* %v0 to i8* br label %bb3 bb2: %v3 = bitcast i32* %v2 to i8* br label %bb3 bb3: %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ] %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p) ret i32* %retval Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's operand uses with its value so that the call gets tail-called. rdar://problem/15894705 llvm-svn: 323009	2018-01-19 23:51:13 +00:00
Jakub Kuderski	d2e371f046	[Dominators] Visit affected node candidates found at different root levels Summary: This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 \| PR35969 ]] . When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not. This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected. Reviewers: brzycki, davide, dberlin, grosser Reviewed By: brzycki Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42231 llvm-svn: 322993	2018-01-19 21:27:24 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Alexey Bataev	fa80c47c6a	[SLP] Fix vectorization for tree with trunc to minimum required bit width. Summary: If the vectorized tree has truncate to minimum required bit width and the vector type of the cast operation after the truncation is the same as the vector type of the cast operands, count cost of the vector cast operation as 0, because this cast will be later removed. Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner. Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41948 llvm-svn: 322946	2018-01-19 14:40:13 +00:00
John Brawn	2867bd72c0	[InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr Three (or more) operand getelementptrs could plausibly also be handled, but handling only two-operand fits in easily with the existing BinaryOperator handling. Differential Revision: https://reviews.llvm.org/D39958 llvm-svn: 322930	2018-01-19 10:05:15 +00:00
Sanjay Patel	a19b748f6d	[InstSimplify] regenerate checks and add tests for commutes; NFC llvm-svn: 322907	2018-01-18 23:11:24 +00:00
Alexey Bataev	a2d6fe4ab4	[SLP] Fix test checks, NFC. llvm-svn: 322865	2018-01-18 17:34:27 +00:00
Craig Topper	83b0a98902	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820	2018-01-18 07:44:09 +00:00
Rafael Espindola	9fbc040599	Make GlobalValues with non-default visibilility dso_local. This is similar to r322317, but for visibility. It is not as neat because we have to special case extern_weak. The idea is the same as the previous change, make the transition to explicit dso_local easier for the frontends. With this they only have to add dso_local to symbols where we need some external information to decide if it is dso_local (like it being part of an ELF executable). llvm-svn: 322806	2018-01-18 02:08:23 +00:00
Zaara Syeda	c9dc7b451b	Revert [PowerPC] This reverts commit rL322721 Failing build bots. Revert the commit now. llvm-svn: 322748	2018-01-17 20:00:15 +00:00
Sanjay Patel	218a0b51dd	[InstCombine] add baseline tests for D39958; NFC llvm-svn: 322733	2018-01-17 19:04:18 +00:00
Zaara Syeda	8e951fd2f6	[PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721	2018-01-17 18:22:55 +00:00
Sanjay Patel	aa766efd09	[InstCombine] fix demanded-bits propagation for zext/trunc I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. llvm-svn: 322662	2018-01-17 14:39:28 +00:00
Sanjay Patel	178deccb63	[InstCombine] add test to show hole in demanded bits; NFC llvm-svn: 322660	2018-01-17 14:27:35 +00:00
Ivan A. Kosarev	4d0ff0c74d	[Transforms] Support making mutable versions of new-format TBAA access tags Differential Revision: https://reviews.llvm.org/D41565 llvm-svn: 322650	2018-01-17 13:29:54 +00:00
Florian Hahn	c6c89bffdc	[CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite. This removes some duplication from splitCallSite and makes it easier to add additional code dealing with each predecessor. It also allows us to split for more than 2 predecessors, although that is not enabled for now. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41858 llvm-svn: 322599	2018-01-16 22:13:15 +00:00
Alexey Bataev	6977dbcc7b	[SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations. Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33954 llvm-svn: 322579	2018-01-16 18:17:01 +00:00
Hiroshi Inoue	99a8faa615	[SROA] fix assetion failure This patch fixes the assertion failure in SROA reported in PR35657. PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522. The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert. In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets. This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices. Differential Revision: https://reviews.llvm.org/D41981 llvm-svn: 322533	2018-01-16 06:23:05 +00:00
Andrei Elovikov	7457aa0bce	[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 llvm-svn: 322473	2018-01-15 10:56:07 +00:00
Sanjay Patel	4158eff0f8	[InstSimplify] fold implied null ptr check (PR35790) This extends rL322327 to handle the pointer cast and should solve: https://bugs.llvm.org/show_bug.cgi?id=35790 Name: or_eq_zero %isnull = icmp eq i64* %p, null %x = ptrtoint i64* %p to i64 %somebits = and i64 %x, %y %somebits_are_zero = icmp eq i64 %somebits, 0 %or = or i1 %somebits_are_zero, %isnull => %or = %somebits_are_zero Name: and_ne_zero %isnotnull = icmp ne i64* %p, null %x = ptrtoint i64* %p to i64 %somebits = and i64 %x, %y %somebits_are_not_zero = icmp ne i64 %somebits, 0 %and = and i1 %somebits_are_not_zero, %isnotnull => %and = %somebits_are_not_zero https://rise4fun.com/Alive/CQ3 llvm-svn: 322439	2018-01-13 15:44:44 +00:00
Sanjay Patel	6691e40980	[InstSimplify] add tests for implied ptr cmp with null (PR35790); NFC llvm-svn: 322411	2018-01-12 22:16:26 +00:00
Brian M. Rzycki	9b7ae23256	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perform the preversation was minimally altered and simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements such as threading across loop headers. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 322401	2018-01-12 21:06:48 +00:00
Max Kazantsev	ef0576000c	[IRCE][NFC] Make range check's End a non-null SCEV Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value. Aside from that, `End` can also be `nullptr` which can be later conditionally converted into a non-null SCEV. To make this logic more transparent, this patch makes `End` a SCEV and calculates it early, so that it is never a null. Differential Revision: https://reviews.llvm.org/D39590 llvm-svn: 322364	2018-01-12 10:00:26 +00:00
Serguei Katkov	a757d65cec	[LoopDeletion] Handle users in unreachable block This is a fix for PR35884. When we want to delete dead loop we must clean uses in unreachable blocks otherwise we'll get an assert during deletion of instructions from the loop. Reviewers: anna, davide Reviewed By: anna Subscribers: llvm-commits, lebedev.ri Differential Revision: https://reviews.llvm.org/D41943 llvm-svn: 322357	2018-01-12 07:24:43 +00:00
Sanjay Patel	6ef6aa987c	[InstSimplify] fold implied cmp with zero (PR35790) This doesn't handle the more complicated case in the bug report yet: https://bugs.llvm.org/show_bug.cgi?id=35790 For that, we have to match / look through a cast. llvm-svn: 322327	2018-01-11 23:27:37 +00:00
Sanjay Patel	ac0edcb3f3	[InstSimplify] add tests for implied cmp with zero (PR35790); NFC llvm-svn: 322323	2018-01-11 22:48:07 +00:00
Rafael Espindola	e4b0231c63	Make internal/private GVs implicitly dso_local. While updating clang tests for having clang set dso_local I noticed that: - There are a lot of tests to update. - Many of the updates are redundant. They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read. llvm-svn: 322317	2018-01-11 22:15:05 +00:00
Fiona Glaser	efe6a84e5b	[Sink] Really really fix predicate in legality check LoadInst isn't enough; we need to include intrinsics that perform loads too. All side-effecting intrinsics and such are already covered by the isSafe check, so we just need to care about things that read from memory. D41960, originally from D33179. llvm-svn: 322311	2018-01-11 21:28:57 +00:00
Benjamin Kramer	738e6e7cb0	[InstCombine] Apply the fix from r322284 for sin / cos -> tan too llvm-svn: 322285	2018-01-11 15:33:21 +00:00
Benjamin Kramer	44993ede60	[InstCombine] For cos/sin -> tan copy attributes from cos instead of the parent function Ideally we should merge the attributes from the functions somehow, but this is obviously an improvement over taking random attributes from the caller which will trip up the verifier if they're nonsensical for an unary intrinsic call. llvm-svn: 322284	2018-01-11 15:19:02 +00:00
Sanjay Patel	e63d8dda5a	[ValueTracking] recognize min/max-of-min/max with notted ops (PR35875) This was originally planned as the fix for: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but simpler transforms handled that case, so I implemented a lesser solution. It turns out we need to handle the case with 'not' ops too because the real code example that we are trying to solve: https://bugs.llvm.org/show_bug.cgi?id=35875 ...has extra uses of the intermediate values, so we can't rely on smaller canonicalizations to get us to the goal. As with rL321672, I've tried to show every possibility in the codegen tests because that's the simplest way to prove we're doing the right thing in the wide variety of permutations of this pattern. We can also show an InstCombine win because we added a fold for this case in: rL321998 / D41603 An Alive proof for one variant of the pattern to show that the InstCombine and codegen results are correct: https://rise4fun.com/Alive/vd1 Name: min3_nots %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %cmpxyz = icmp slt i8 %minxz, %ny %r = select i1 %cmpxyz, i8 %minxz, i8 %ny Name: min3_nots_alt %nx = xor i8 %x, -1 %ny = xor i8 %y, -1 %nz = xor i8 %z, -1 %cmpxz = icmp slt i8 %nx, %nz %minxz = select i1 %cmpxz, i8 %nx, i8 %nz %cmpyz = icmp slt i8 %ny, %nz %minyz = select i1 %cmpyz, i8 %ny, i8 %nz %cmpyx = icmp slt i8 %y, %x %r = select i1 %cmpyx, i8 %minxz, i8 %minyz => %xz = icmp sgt i8 %x, %z %maxxz = select i1 %xz, i8 %x, i8 %z %xyz = icmp sgt i8 %maxxz, %y %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y %r = xor i8 %maxxyz, -1 llvm-svn: 322283	2018-01-11 15:13:47 +00:00
Sanjay Patel	e0df4650f8	[InstCombine] add min3-with-nots test (PR35875); NFC llvm-svn: 322281	2018-01-11 14:53:45 +00:00
Dmitry Venikov	e5fbf591a7	[InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x) Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag Reviewers: hfinkel, spatel Reviewed By: spatel Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits Differential Revision: https://reviews.llvm.org/D41286 llvm-svn: 322255	2018-01-11 06:33:00 +00:00
Alexey Bataev	90e29b81d6	[SLP] Add/update tests for SLP vectorizer, NFC. llvm-svn: 322225	2018-01-10 21:29:18 +00:00
Sanjay Patel	d04026ea43	[InstCombine] add test to show missed bswap; NFC D41353 / D41233 are proposing to alter the shl/and canonicalization, but I think that would just move an existing pattern-matching hole to a different place. llvm-svn: 322206	2018-01-10 18:47:21 +00:00
Bjorn Pettersson	3851496e6e	Avoid inlining if there is byval arguments with non-alloca address space Summary: After teaching InlineCost more about address spaces () another fault was detected in the inliner. If an argument has the byval attribute the parameter might be copied to an alloca. That part seems to work fine even if the argument has a different address space than the alloca address space. However, if the address spaces differ, then the inlined function still might refer to the parameter using the original address space (the inliner does not handle that situation very well). This patch avoids the problem by simply disallowing inlining when there are byval arguments with address space that differs from the alloca address space. I'm not really sure how to transform the code if we want to get inlining for this situation. I assume that it never has been working, and that the fixes in r321809 just exposed an old problem. Fault found by skatkov (Serguei Katkov). It is mentioned in follow up comments to https://reviews.llvm.org/D40455. Reviewers: skatkov Reviewed By: skatkov Subscribers: uabelho, eraman, llvm-commits, haicheng Differential Revision: https://reviews.llvm.org/D41898 llvm-svn: 322181	2018-01-10 13:01:18 +00:00
Vlad Tsyrklevich	cdec22ef9a	LowerTypeTests: Add limited support for aliases Summary: LowerTypeTests moves some function definitions from individual object files to the merged module, leaving a stub to be called in the merged module's jump table. If an alias was pointing to such a function definition LowerTypeTests would fail because the alias would be left without a definition to point to. This change 1) emits information about aliases to the ThinLTO summary, 2) replaces aliases pointing to function definitions that are moved to the merged module with function declarations, and 3) re-emits those aliases in the merged module pointing to the correct function definitions. The patch does not correctly fix all possible mis-uses of aliases in LowerTypeTests. For example, it does not handle aliases with a different type from the pointed to function. The addition of alias data increases the size of Chrome build artifacts by less than 1%. Reviewers: pcc Reviewed By: pcc Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc Differential Revision: https://reviews.llvm.org/D41741 llvm-svn: 322139	2018-01-10 00:00:51 +00:00
Michael Zolotukhin	1f562176e9	[LoopRotate] Detect loops with indirect branches better (we're giving up on them). llvm-svn: 322137	2018-01-09 23:54:35 +00:00
Chris Bieneman	abdea268c1	[IPSCCP] Remove calls without side effects Summary: When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects. This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time. Reviewers: davide, sanjoy, bruno, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38856 llvm-svn: 322125	2018-01-09 21:58:46 +00:00
Daniel Berlin	56cca7437c	NewGVN: Fix PR/33367, which was causing us to delete non-copy intrinsics accidentally in some rare cases llvm-svn: 322115	2018-01-09 20:12:42 +00:00
Hubert Tong	55662a8e9f	Profiling tests: Endianess XFAIL for powerpc- (32-bit) Add powerpc- (32-bit) as XFAIL for tests that are documented either in- line or via commit messages as expected to fail on big-endian systems. Tests not documented in-line are documented in commit messages as follows: r211172 - test/tools/llvm-cov/llvm-cov.test r247920 - test/Transforms/SampleProfile/gcc-simple.ll llvm-svn: 322114	2018-01-09 20:09:23 +00:00
Easwaran Raman	bdf20261d8	Add a pass to generate synthetic function entry counts. Summary: This pass synthesizes function entry counts by traversing the callgraph and using the relative block frequencies of the callsites. The intended use of these counts is in inlining to determine hot/cold callsites in the absence of profile information. The pass is split into two files with the code that propagates the counts in a callgraph in a Utils file. I plan to add support for propagation in the thinlto link phase and the propagation code will be shared and hence this split. I did not add support to the old PM since hot callsite determination in inlining is not possible in old PM (although we could use hot callee heuristic with synthetic counts in the old PM it is not worth the effort tuning it) Reviewers: davidxl, silvas Subscribers: mgorny, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D41604 llvm-svn: 322110	2018-01-09 19:39:35 +00:00
Alexey Bataev	771ec9f399	[COST]Fix PR35865: Fix cost model evaluation for shuffle on X86. Summary: If the vector type is transformed to non-vector single type, the compile may crash trying to get vector information about non-vector type. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41862 llvm-svn: 322106	2018-01-09 19:08:22 +00:00
Sanjay Patel	6fb1357c35	[InstCombine] weaken assertions for icmp folds (PR35846) Because of potential UB (known bits conflicts with an llvm.assume), we have to check rather than assert here because InstSimplify doesn't kill the compare: https://bugs.llvm.org/show_bug.cgi?id=35846 llvm-svn: 322104	2018-01-09 18:56:03 +00:00
Petar Jovanovic	1d26c7e4ff	[EarlyCSE] Salvage debug info during DCE EarlyCSE did not try to salvage debug info during erasing of instructions. This change fixes it. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D41496 llvm-svn: 322083	2018-01-09 15:08:37 +00:00
Simon Pilgrim	5d909be91b	[InstCombine] Check for out of range ashr values using APInt before calling getZExtValue Reduced from oss-fuzz #5032 test case llvm-svn: 322078	2018-01-09 14:23:46 +00:00
Simon Pilgrim	94357afd26	[InstCombine] Add pow2 mul -> shl tests for vectors with uniform/non-uniform constants llvm-svn: 322072	2018-01-09 11:55:27 +00:00
Serguei Katkov	6a7a4c6a55	[SCEV] Do not cache S -> V if S is not equivalent of V SCEV tracks the correspondence of created SCEV to original instruction. However during creation of SCEV it is possible that nuw/nsw/exact flags are lost. As a result during expansion of the SCEV the instruction with nuw/nsw/exact will be used where it was expected and we produce poison incorreclty. Reviewers: sanjoy, mkazantsev, sebpop, jbhateja Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41578 llvm-svn: 322058	2018-01-09 06:47:14 +00:00
Serguei Katkov	4d1dd6b53a	[CGP] Fix Complex addressing mode for offset If the offset is differ in two addressing mode we can continue only if ScaleReg is not set due to we will use it as merge of different offsets. It should fix PR35799 and PR35805. Reviewers: john.brawn, reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41227 llvm-svn: 322056	2018-01-09 04:37:06 +00:00
Sanjay Patel	7dfe96ad16	[ValueTracking] remove overzealous assert The test is derived from a failing fuzz test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5008 Credit to @rksimon for pointing out the problem. llvm-svn: 322016	2018-01-08 18:31:13 +00:00
Sanjay Patel	52149f0305	[TargetLibraryInfo] fix finite mathlib function availability This patch was part of: https://reviews.llvm.org/D41338 ...but we can expose the bug in IR via constant propagation as shown in the test. Unless the triple includes 'linux', we should not fold these because the functions don't exist on other platforms (yet?). llvm-svn: 322010	2018-01-08 17:38:09 +00:00
Davide Italiano	9a60d2c157	[CVP] Replace incoming values from unreachable blocks with undef. This is an attempt of fixing PR35807. Due to the non-standard definition of dominance in LLVM, where uses in unreachable blocks are dominated by anything, you can have, in an unreachable block: %patatino = OP1 %patatino, CONSTANT When `SimplifyInstruction` receives a PHI where an incoming value is of the aforementioned form, in some cases, loops indefinitely. What I propose here instead is keeping track of the incoming values from unreachable blocks, and replacing them with undef. It fixes this case, and it seems to be good regardless (even if we can't prove that the value is constant, as it's coming from an unreachable block, we can ignore it). Differential Revision: https://reviews.llvm.org/D41812 llvm-svn: 322006	2018-01-08 16:34:06 +00:00
Sanjay Patel	31b4b76f99	[InstCombine] fold min/max tree with common operand (PR35717) There is precedence for factorization transforms in instcombine for FP ops with fast-math. We also have similar logic in foldSPFofSPF(). It would take more work to add this to reassociate because that's specialized for binops, and min/max are not binops (or even single instructions). Also, I don't have evidence that larger min/max trees than this exist in real code, but if we find that's true, we might want to reorganize where/how we do this optimization. In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have: int test(int xc, int xm, int xy) { int xk; if (xc < xm) xk = xc < xy ? xc : xy; else xk = xm < xy ? xm : xy; return xk; } This patch solves that problem because we recognize more min/max patterns after rL321672 https://rise4fun.com/Alive/Qjne https://rise4fun.com/Alive/3yg Differential Revision: https://reviews.llvm.org/D41603 llvm-svn: 321998	2018-01-08 15:05:34 +00:00
Alexey Bataev	5b9a77d4ea	[SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994	2018-01-08 14:43:06 +00:00
Alexey Bataev	118a0a2c38	[SLP] Fix PR35628: Count external uses on extra reduction arguments. Summary: If the vectorized value is marked as extra reduction argument, its users are not considered as external users. Patch fixes this. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41786 llvm-svn: 321993	2018-01-08 14:33:11 +00:00
Davide Italiano	4c39758a38	[SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974	2018-01-07 22:06:24 +00:00
Florian Hahn	55be37e7d4	[CodeExtractor] Use subset of function attributes for extracted function. In addition to target-dependent attributes, we can also preserve a white-listed subset of target independent function attributes. The white-list excludes problematic attributes, most prominently: * attributes related to memory accesses, as alloca instructions could be moved in/out of the extracted block * control-flow dependent attributes, like no_return or thunk, as the relerelevant instructions might or might not get extracted. Thanks @efriedma and @aemerson for providing a set of attributes that cannot be propagated. Reviewers: efriedma, davidxl, davide, silvas Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41334 llvm-svn: 321961	2018-01-07 11:22:25 +00:00
Florian Hahn	a82eef2363	[InlineFunction] Preserve calling convention when forwarding VarArgs. Reviewers: efriedma, rnk, davide Reviewed By: rnk, davide Differential Revision: https://reviews.llvm.org/D41556 llvm-svn: 321943	2018-01-06 20:56:27 +00:00
Florian Hahn	de10e6e064	[InlineFunction] Preserve attributes when forwarding VarArgs. Reviewers: rnk, efriedma Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D41555 llvm-svn: 321942	2018-01-06 20:46:00 +00:00
Florian Hahn	80788d8088	[InlineFunction] Inline vararg functions that do not access varargs. If the varargs are not accessed by a function, we can inline the function. Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41335 llvm-svn: 321940	2018-01-06 19:45:40 +00:00
Sanjay Patel	26a6fcde83	[InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b) In the minimal case, this won't remove instructions, but it still improves uses of existing values. In the motivating example from PR35834, it does remove instructions, and sets that case up to be optimized by something like D41603: https://reviews.llvm.org/D41603 llvm-svn: 321936	2018-01-06 17:34:22 +00:00
Sanjay Patel	f7e775291e	[InstCombine] add more tests for max(~a, ~b) and PR35834; NFC llvm-svn: 321935	2018-01-06 17:14:46 +00:00
Sanjay Patel	5a48aef3f0	[x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325) This is the last step needed to fix PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases. The 24-byte test shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests). Differential Revision: https://reviews.llvm.org/D41714 llvm-svn: 321934	2018-01-06 16:16:04 +00:00
Vedant Kumar	1f6f5f1df9	[Debugify] Handled unsized types llvm-svn: 321918	2018-01-06 00:37:01 +00:00
Sanjay Patel	5b6aacf2c1	[InstCombine] add folds for min(~a, b) --> ~max(a, b) Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b), the use checking and operand creation were off. We were potentially creating repeated identical instructions of existing values. This led to infinite looping after I added the extra folds. By using the simpler m_Not matcher and not creating new 'not' ops for a and b, we avoid that problem. It's possible that not using IsFreeToInvert() here is more limiting than the simpler matcher, but there are no tests for anything more exotic. It's also possible that we should relax the use checking further to handle a case like PR35834: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but we can make that a follow-up if it is needed. llvm-svn: 321882	2018-01-05 19:01:17 +00:00
Alexey Bataev	fa13848da8	[SLP] Update more test checks, NFC. llvm-svn: 321872	2018-01-05 16:15:17 +00:00
Alexey Bataev	e565ebcdad	[SLP] Update test checks, NFC. llvm-svn: 321870	2018-01-05 15:20:40 +00:00
Alexey Bataev	988db0bd50	[SLP] Update tests checks, NFC. llvm-svn: 321869	2018-01-05 14:40:04 +00:00
Reid Kleckner	cd78ddc119	Revert "[JumpThreading] Preservation of DT and LVI across the pass" This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming. llvm-svn: 321832	2018-01-04 23:23:46 +00:00
Brian M. Rzycki	cdad6c0b60	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 321825	2018-01-04 21:57:32 +00:00
Bjorn Pettersson	77f3299415	Teach InlineCost about address spaces Summary: I basically copied this patch from here: https://reviews.llvm.org/D1251 But I skipped some of the refactoring to make the patch more clean. The new outer3/inner3 test case in ptr-diff.ll triggers the following assert without this patch: lib/IR/Constants.cpp:1834: static llvm::Constant llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant , llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed. The other new test cases makes sure that there is code coverage for all modifications in InlineCost.cpp (getting different values due to not fetching sizes for address space zero). I only guarantee code coverage for those tests. The tests are not written in a way that they would break if not having the corrections in InlineCost.cpp. I found it quite hard to fine tune the tests into getting different results based on the pointer sizes (except for the test case where we hit an assert if not teaching InlineCost about address spaces). Reviewers: chandlerc, arsenm, haicheng Reviewed By: arsenm Subscribers: wdng, eraman, llvm-commits, haicheng Differential Revision: https://reviews.llvm.org/D40455 llvm-svn: 321809	2018-01-04 18:23:40 +00:00
Matt Arsenault	35c4244aea	StructurizeCFG: xfail one of the testcases from r321751 It fails with -verify-region-info. This seems to be a issue with RegionInfo itself which existed before. llvm-svn: 321806	2018-01-04 17:23:24 +00:00
Sanjay Patel	c63f9014d6	[InstCombine] safely create a constant of the right type (PR35794) llvm-svn: 321801	2018-01-04 14:31:56 +00:00
Aditya Kumar	1f90cae80f	[GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load in case of a loop Reviewers: dberlin sebpop eli.friedman Differential Revision: https://reviews.llvm.org/D41453 llvm-svn: 321789	2018-01-04 07:47:24 +00:00
Philip Reames	cf524a408a	[PRE] Add a bunch of test cases for LICM-like PRE patterns These were inspired by a very old review I'm about to abandon (https://reviews.llvm.org/D7061). Several of the test cases from that worked without modification and expanding test coverage of such cases is always worthwhile. llvm-svn: 321764	2018-01-03 22:28:26 +00:00
Matt Arsenault	8070882b4e	StructurizeCFG: Fix broken backedge detection The work order was changed in r228186 from SCC order to RPO with an arbitrary sorting function. The sorting function attempted to move inner loop nodes earlier. This was was apparently relying on an assumption that every block in a given loop / the same loop depth would be seen before visiting another loop. In the broken testcase, a block outside of the loop was encountered before moving onto another block in the same loop. The testcase would then structurize such that one blocks unconditional successor could never be reached. Revert to plain RPO for the analysis phase. This fixes detecting edges as backedges that aren't really. The processing phase does use another visited set, and I'm unclear on whether the order there is as important. An arbitrary order doesn't work, and triggers some infinite loops. The reversed RPO list seems to work and is closer to the order that was used before, minus the arbitary custom sorting. A few of the changed tests now produce smaller code, and a few are slightly worse looking. llvm-svn: 321751	2018-01-03 18:45:37 +00:00
Simon Pilgrim	3bf2d64589	[InstCombine] Check for out of range shift values using APInt before calling getZExtValue Reduced from oss-fuzz #4871 test case llvm-svn: 321748	2018-01-03 18:28:20 +00:00
Dmitry Venikov	3d8cd34a5d	[InstSimplify] Missed optimization in math expression: squashing exp(log), log(exp) Summary: This patch enables folding following expressions under -ffast-math flag: exp(log(x)) -> x, exp2(log2(x)) -> x, log(exp(x)) -> x, log2(exp2(x)) -> x Reviewers: spatel, hfinkel, davide Reviewed By: spatel, hfinkel, davide Subscribers: scanon, llvm-commits Differential Revision: https://reviews.llvm.org/D41381 llvm-svn: 321710	2018-01-03 14:37:42 +00:00
Florian Hahn	dcc0ba9bbb	[InstCombine] Add test to remove VarArg casts (NFC) llvm-svn: 321706	2018-01-03 13:35:43 +00:00

... 14 15 16 17 18 ...

11553 Commits