llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	2eaacda53e	[SLP] Add more tests for SLP Vectorizer. llvm-svn: 287801	2016-11-23 20:10:32 +00:00
Alina Sbirlea	a3d2f703a5	[LoadStoreVectorizer] Enable vectorization of stores in the presence of an aliasing load Summary: The "getVectorizablePrefix" method would give up if it found an aliasing load for a store chain. In practice, the aliasing load can be treated as a memory barrier and all stores that precede it are a valid vectorizable prefix. Issue found by volkan in D26962. Testcase is a pruned version of the one in the original patch. Reviewers: jlebar, arsenm, tstellarAMD Subscribers: mzolotukhin, wdng, nhaehnle, anna, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D27008 llvm-svn: 287781	2016-11-23 17:43:15 +00:00
Simon Pilgrim	4e9b9cbee9	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762	2016-11-23 14:01:18 +00:00
Simon Pilgrim	03cd8f887c	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs llvm-svn: 287760	2016-11-23 13:42:09 +00:00
Davide Italiano	f6fbe21bef	[SCCP] Add a test for switches on undef. Without this test, you can just remove the code fixing the switch to the first constant in ResolvedUndefs in and everything pass. This test, instead, fails with an assertion if the code is removed. Found while refactoring SCCP to integrate undef in the solver. llvm-svn: 287731	2016-11-23 01:42:39 +00:00
Dehao Chen	554f500ae2	Before sample pgo annotation, do not inline a function that has no debug info. (NFC) If there is no debug info in the callee, inlining it will not help annotator. This avoids infinite loop as reported in PR/31119. llvm-svn: 287710	2016-11-22 22:50:01 +00:00
Davide Italiano	e7ffae9dea	[SCCP] Remove code in visitBinaryOperator (and add tests). We visit and/or, we try to derive a lattice value for the instruction even if one of the operands is overdefined. If the non-overdefined value is still 'unknown' just return and wait for ResolvedUndefsIn to "plug in" the correct value. This simplifies the logic a bit. While I'm here add tests for missing cases. llvm-svn: 287709	2016-11-22 22:11:25 +00:00
Sanjay Patel	e359eaaf70	[InstCombine] change bitwise logic type to eliminate bitcasts In PR27925: https://llvm.org/bugs/show_bug.cgi?id=27925 ...we proposed adding this fold to eliminate a bitcast. In D20774, there was some concern about changing the type of a bitwise op as well as creating bitcasts that might not be free for a target. However, if we're strictly eliminating an instruction (by limiting this to one-use ops), then we should be able to do this in InstCombine. But we're cautiously restricting the transform for now to vector types to avoid possible backend problems. A transform to make sure the logic op is legal for the target should be added to reverse this transform and improve codegen. Differential Revision: https://reviews.llvm.org/D26641 llvm-svn: 287707	2016-11-22 22:05:48 +00:00
Vyacheslav Klochkov	9a630dfb57	Fixed the lost FastMathFlags in GVN(Global Value Numbering). Reviewer: Hal Finkel. Differential Revision: https://reviews.llvm.org/D26952 llvm-svn: 287700	2016-11-22 20:52:53 +00:00
Vyacheslav Klochkov	68a677ae5b	Fixed the lost FastMathFlags in Reassociate optimization. Reviewer: Hal Finkel. Differential Revision: https://reviews.llvm.org/D26957 llvm-svn: 287695	2016-11-22 20:23:04 +00:00
Justin Lebar	3e50a5be8f	[CodeGenPrepare] Don't sink non-cheap addrspacecasts. Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591	2016-11-21 22:49:15 +00:00
Eli Friedman	c0bba1a96d	[LoopReroll] Make root-finding more aggressive. Allow using an instruction other than a mul or phi as the base for root-finding. For example, the included testcase includes a loop which requires using a getelementptr as the base for root-finding. Differential Revision: https://reviews.llvm.org/D26529 llvm-svn: 287588	2016-11-21 22:35:34 +00:00
Sanjay Patel	3b0bafee63	[InstCombine] canonicalize min/max constant to select's false value This is a first step towards canonicalization and improved folding/codegen for integer min/max as discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html Here, we're just matching the simplest min/max patterns and adjusting the icmp predicate while swapping the select operands. I've included FIXME tests in test/Transforms/InstCombine/select_meta.ll so it's easier to see how this might be extended (corresponds to the TODO comment in the code). That's also why I'm using matchSelectPattern() rather than a simpler check; once the backend is patched, we can just remove some of the restrictions to allow the obfuscated min/max patterns in the FIXME tests to be matched. Differential Revision: https://reviews.llvm.org/D26525 llvm-svn: 287585	2016-11-21 22:04:14 +00:00
Hubert Tong	1e5677649c	reassociate-deadinst.ll: avoid accidental match on path Pipe from stdin to avoid accidentally matching on the path. llvm-svn: 287583	2016-11-21 21:53:01 +00:00
Mandeep Singh Grang	17e3f9b79d	[MemorySSA] Fix unit tests broken by D26704 Summary: D26704 fixed the non-determinism in codegen by sorting basic blocks before iteration so as to have a defined iteration order. As a result we need to fix the names (numbers) of the temporaries in the following unit tests: test/Transforms/Util/MemorySSA/multi-edges.ll test/Transforms/Util/MemorySSA/multiple-backedges-hal.ll Reviewers: dberlin, david2050, mgrang Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26926 llvm-svn: 287575	2016-11-21 20:39:08 +00:00
Mandeep Singh Grang	73f0095d71	[MemorySSA] Fix for non-determinism in codegen This patch fixes the non-determinism caused due to iterating SmallPtrSet's which was uncovered due to the experimental "reverse iteration order " patch: https://reviews.llvm.org/D26718 The following unit tests failed because of the undefined order of iteration. LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll LLVM :: Transforms/Util/MemorySSA/many-doms.ll LLVM :: Transforms/Util/MemorySSA/phi-translation.ll Reviewers: dberlin, mgrang Subscribers: dberlin, llvm-commits, david2050 Differential Revision: https://reviews.llvm.org/D26704 llvm-svn: 287563	2016-11-21 19:33:02 +00:00
Jun Bum Lim	82f55c5446	[CodeGenPrep] Skip merging empty case blocks Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553	2016-11-21 16:47:28 +00:00
Yaxun Liu	02f75f31e0	Fix known zero bits for addrspacecast. Currently LLVM assumes that a pointer addrspacecasted to a different addr space is equivalent to trunc or zext bitwise, which is not true. For example, in amdgcn target, when a null pointer is addrspacecasted from addr space 4 to 0, its value is changed from i64 0 to i32 -1. This patch teaches LLVM not to assume known bits of addrspacecast instruction to its operand. Differential Revision: https://reviews.llvm.org/D26803 llvm-svn: 287545	2016-11-21 15:42:31 +00:00
Davide Italiano	2ae76dd239	[GlobalSplit] Port to the new pass manager. llvm-svn: 287511	2016-11-21 00:28:23 +00:00
Sanjay Patel	47e577eb92	[InstCombine] add tests to show likely unwanted select widening; NFC This is a prerequisite patch for D26556: https://reviews.llvm.org/D26556 ...because there was no direct coverage for these folds (which in some cases are adding instructions). llvm-svn: 287400	2016-11-18 23:22:00 +00:00
Michael Zolotukhin	5020c9971b	[LoopSimplify] Preserve LCSSA when removing edges from unreachable blocks. This fixes PR30454. llvm-svn: 287379	2016-11-18 21:01:12 +00:00
Florian Hahn	77382be56b	[simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations. insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now propagates existing llvm.loop metadata to newly the added backedge. llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp now propagates existing llvm.loop metadata to the branch instructions in the predecessor blocks of the empty block that is removed. Differential Revision: https://reviews.llvm.org/D26495 llvm-svn: 287341	2016-11-18 13:12:07 +00:00
Craig Topper	1de753f7f5	[InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316	2016-11-18 06:04:33 +00:00
Craig Topper	07f1c15995	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64 Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298	2016-11-18 02:25:34 +00:00
Dehao Chen	41d72a8632	Use profile info to adjust loop unroll threshold. Summary: For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold. For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling. Reviewers: davidxl, mzolotukhin Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D26527 llvm-svn: 287186	2016-11-17 01:17:02 +00:00
Peter Collingbourne	f72a8d4e08	Introduce GlobalSplit pass. This pass splits globals into elements using inrange annotations on getelementptr indices. Differential Revision: https://reviews.llvm.org/D22295 llvm-svn: 287178	2016-11-16 23:40:26 +00:00
Craig Topper	6910fa0ef4	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083	2016-11-16 05:24:10 +00:00
Vyacheslav Klochkov	b3dc774a99	Fixed the lost FastMathFlags for CALL operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26575 llvm-svn: 287064	2016-11-16 00:55:50 +00:00
Justin Lebar	2860573529	[BypassSlowDivision] Handle division by constant numerators better. Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062	2016-11-16 00:44:47 +00:00
Wei Mi	37c4aaaf52	Revert r286999 which caused buildbot test failures. Some testcases need to be made target specific. llvm-svn: 287014	2016-11-15 19:42:05 +00:00
Wei Mi	7ccf7651c0	[LSR] Allow formula containing Reg for SCEVAddRecExpr related with outerloop. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 286999	2016-11-15 18:35:53 +00:00
Wei Mi	d2948cef70	[IndVars] Change the order to compute WidenAddRec in widenIVUse. When both WidenIV::getWideRecurrence and WidenIV::getExtendedOperandRecurrence return non-null but different WideAddRec, if getWideRecurrence is called before getExtendedOperandRecurrence, we won't bother to call getExtendedOperandRecurrence again. But As we know it is possible that after SCEV folding, we cannot prove the legality using the SCEVAddRecExpr returned by getWideRecurrence. Meanwhile if getExtendedOperandRecurrence returns non-null WideAddRec, we know for sure that it is legal to do widening for current instruction. So it is better to put getExtendedOperandRecurrence before getWideRecurrence, which will increase the chance of successful widening. Differential Revision: https://reviews.llvm.org/D26059 llvm-svn: 286987	2016-11-15 17:34:52 +00:00
Sanjay Patel	bb238bb4e5	[InstCombine] add tests for bitcasted selects; NFC llvm-svn: 286978	2016-11-15 16:01:16 +00:00
Pablo Barrio	4f80c93a2e	Revert "[JumpThreading] Unfold selects that depend on the same condition" This reverts commit ac54d0066c478a09c7cd28d15d0f9ff8af984afc. llvm-svn: 286976	2016-11-15 15:42:23 +00:00
Robert Lougher	b0905209dd	[LoopVectorizer] When estimating reg usage, unused insts may "end" another use The register usage algorithm incorrectly treats instructions whose value is not used within the loop (e.g. those that do not produce a value). The algorithm first calculates the usages within the loop. It iterates over the instructions in order, and records at which instruction index each use ends (in fact, they're actually recorded against the next index, as this is when we want to delete them from the open intervals). The algorithm then iterates over the instructions again, adding each instruction in turn to a list of open intervals. Instructions are then removed from the list of open intervals when they occur in the list of uses ended at the current index. The problem is, instructions which are not used in the loop are skipped. However, although they aren't used, the last use of a value may have been recorded against that instruction index. In this case, the use is not deleted from the open intervals, which may then bump up the estimated register usage. This patch fixes the issue by simply moving the "is used" check after the loop which erases the uses at the current index. Differential Revision: https://reviews.llvm.org/D26554 llvm-svn: 286969	2016-11-15 14:27:33 +00:00
Sanjay Patel	aaa06fa486	[InstCombine] add tests to show missing bitcast folds llvm-svn: 286900	2016-11-14 22:44:06 +00:00
Simon Pilgrim	27fed8e5d6	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832	2016-11-14 14:45:16 +00:00
James Molloy	6df8f27c95	[InlineCost] Remove skew when calculating call costs When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself. However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case: int g() { h(); } int f() { g(); } llvm-svn: 286814	2016-11-14 11:14:41 +00:00
Craig Topper	b4173a5a70	[InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics. llvm-svn: 286755	2016-11-13 07:26:19 +00:00
Craig Topper	8b831cbb2a	[InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739	2016-11-13 01:51:55 +00:00
Sanjay Patel	8e68394376	[InstCombine] update test to use FileCheck; NFC llvm-svn: 286668	2016-11-11 23:12:46 +00:00
Adam Nemet	9bfbf8bbdf	[LV] Stop saying "use -Rpass-analysis=loop-vectorize" This is PR28376. Unfortunately given the current structure of optimization diagnostics we lack the capability to tell whether the user has passed -Rpass-analysis=loop-vectorize since this is local to the front-end (BackendConsumer::OptimizationRemarkHandler). So rather than printing this even if the user has already passed -Rpass-analysis, this patch just punts and stops recommending this option. I don't think that getting this right is worth the complexity. Differential Revision: https://reviews.llvm.org/D26563 llvm-svn: 286662	2016-11-11 22:51:46 +00:00
Evgeniy Stepanov	1fe189d795	[cfi] Fix weak functions handling. When a function pointer is replaced with a jumptable pointer, special case is needed to preserve the semantics of extern_weak functions. Since a jumptable entry can not be extern_weak, we emulate that behaviour by replacing all references to F (the extern_weak function) with the following expression: F != nullptr ? JumpTablePtr : nullptr. Extra special care is needed for global initializers, since most (or probably all) backends can not lower an initializer that includes this kind of constant expression. Initializers like that are replaced with a global constructor (i.e. a runtime initializer). llvm-svn: 286636	2016-11-11 21:39:26 +00:00
Vyacheslav Klochkov	f1a12fe0f5	Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer. Reviewer: Michael Zolotukhin. Differential Revision: https://reviews.llvm.org/D26543 llvm-svn: 286626	2016-11-11 19:55:29 +00:00
Sanjay Patel	da0149dd74	[InstCombine] add tests to show size-increasing select transforms llvm-svn: 286619	2016-11-11 19:37:54 +00:00
Evgeniy Stepanov	f48ffab554	[cfi] Implement cfi-icall using inline assembly. The current implementation is emitting a global constant that happens to evaluate to the same bytes + relocation as a jump instruction on X86. This does not work for PIE executables and shared libraries though, because we end up with a wrong relocation type. And it has no chance of working on ARM/AArch64 which use different relocation types for jump instructions (R_ARM_JUMP24) that is never generated for data. This change replaces the constant with module-level inline assembly followed by a hidden declaration of the jump table. Works fine for ARM/AArch64, but has some drawbacks. * Extra symbols are added to the static symbol table, which inflate the size of the unstripped binary a little. Stripped binaries are not affected. This happens because jump table declarations must be external (because their body is in the inline asm). * Original functions that were anonymous are now named <original name>.cfi, and it affects symbolization sometimes. This is necessary because the only user of these functions is the (inline asm) jump table, so they had to be added to @llvm.used, which does not allow unnamed functions. llvm-svn: 286611	2016-11-11 18:49:09 +00:00
Adam Nemet	7da20c39ee	[OptDiag] Remove non-printable chars from function name The r283656 did this in the remark arguments. We also need to do this in the main function attribute as that is written to YAML as well. llvm-svn: 286482	2016-11-10 17:47:03 +00:00
Sanjay Patel	40d33e7554	[InstCombine] auto-generate better checks; NFC Note that the existing metadata checking was re-added by hand because the script doesn't currently know how to generate checks for lines outside of functions. llvm-svn: 286460	2016-11-10 14:58:17 +00:00
Sanjay Patel	4e1b5a53c7	[InstCombine] avoid infinite loop from shuffle-extract-insert sequence (PR30923) Removing the limitation in visitInsertElementInst() causes several regressions because we're not prepared to fold sequences of shuffles or inserts and extracts separated by shuffles. Fixing that appears to be a difficult mission because we are purposely trying to avoid creating shuffles with arbitrary shuffle masks because some targets may choke on those. https://llvm.org/bugs/show_bug.cgi?id=30923 llvm-svn: 286423	2016-11-10 00:15:14 +00:00
Dehao Chen	06e079a530	Update vectorization debug info unittest. Summary: The change will test the change in r286159. The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location. Reviewers: probinson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26428 llvm-svn: 286403	2016-11-09 22:25:19 +00:00
Sanjay Patel	600631daf3	[InstCombine] regenerate checks; NFC llvm-svn: 286402	2016-11-09 22:21:58 +00:00
Sanjay Patel	16da6c466f	[InstCombine] regenerate checks; NFC llvm-svn: 286399	2016-11-09 21:41:34 +00:00
Alexandros Lamprineas	0ee3ec2fe4	[ARM] Loop Strength Reduction crashes when targeting ARM or Thumb. Scalar Evolution asserts when not all the operands of an Add Recurrence Expression are loop invariants. Loop Strength Reduction should only create affine Add Recurrences, so that both the start and the step of the expression are loop invariants. Differential Revision: https://reviews.llvm.org/D26185 llvm-svn: 286347	2016-11-09 08:53:07 +00:00
Dehao Chen	947dbe1254	Enable Loop Sink pass for functions that has profile. Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code. Reviewers: davidxl, hfinkel Subscribers: mehdi_amini, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26155 llvm-svn: 286325	2016-11-09 00:58:19 +00:00
Sanjay Patel	4e9d6cd354	[InstCombine] fix profitability equation for max-of-nots transform As the test change shows, we can increase the critical path by adding a 'not' instruction, so make sure that we're actually removing an instruction if we do this transform. This transform could also cause us to miss folds of min/max pairs. llvm-svn: 286315	2016-11-09 00:13:11 +00:00
Davide Italiano	1e77aaca8a	[LibcallsShrinkWrap] This pass doesn't preserve the CFG. For example, it invalidates the domtree, causing assertions in later passes which need dominator infos. Make it preserve GlobalsAA, as suggested by Eli. Differential Revision: https://reviews.llvm.org/D26381 llvm-svn: 286271	2016-11-08 19:18:20 +00:00
Sanjay Patel	8625c43662	[InstCombine] move min/max tests to min/max test file; NFC llvm-svn: 286256	2016-11-08 18:12:19 +00:00
Sanjay Patel	686cf49f7a	[InstCombine] update checks; NFC llvm-svn: 286255	2016-11-08 18:06:14 +00:00
Pablo Barrio	9f45254138	[JumpThreading] Unfold selects that depend on the same condition Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: rengolin, haicheng, sebpop Subscribers: jojo, jmolloy, llvm-commits Differential Revision: https://reviews.llvm.org/D26391 llvm-svn: 286236	2016-11-08 14:53:30 +00:00
Simon Pilgrim	d02c55204b	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233	2016-11-08 14:10:28 +00:00
Adam Nemet	b103fc52d3	[OptDiag, opt-viewer] Save callee's location and display as link With this we get a new field in the YAML record if the value being streamed out has a debug location. For examples, please see the changes to the tests. This is then used in opt-viewer to display a link for the callee function in the inlining remarks. Differential Revision: https://reviews.llvm.org/D26366 llvm-svn: 286169	2016-11-07 22:41:13 +00:00
Sanjoy Das	e06ef141fc	Avoid tail recursion elimination across calls with operand bundles Summary: In some specific scenarios with well understood operand bundle types (like `"deopt"`) it may be possible to go ahead and convert recursion to iteration, but TailRecursionElimination does not have that logic today so avoid doing the right thing for now. I need some input on whether `"funclet"` operand bundles should also block tail recursion elimination. If not, I'll allow TRE across calls with `"funclet"` operand bundles and add a test case. Reviewers: rnk, majnemer, nlewycky, ahatanak Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26270 llvm-svn: 286147	2016-11-07 21:01:49 +00:00
Benjamin Kramer	1697d39eef	[MemCpyOpt] Don't emit IR in an unspecified order Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126	2016-11-07 17:47:28 +00:00
Sanjay Patel	86408a8048	[InstCombine] allow splat vector folds in adjustMinMax() (retry r285732) This was reverted at r285866 because there was a crash handling a scalar select of vectors. I added a check for that pattern and a test case based on the example provided in the post-commit thread for r285732. llvm-svn: 286113	2016-11-07 15:52:45 +00:00
NAKAMURA Takumi	b4eef1fa4a	llvm/test/Transforms/DCE/calls-errno.ll: Suppress checking @pow(+0,-1). It depends on host's pow(3), and mingw's pow doesn't raise any errors, just returns +INF. llvm-svn: 286005	2016-11-04 18:50:45 +00:00
Greg Bedwell	5fc6f94591	Revert "[InstCombine] allow splat vector folds in adjustMinMax()" This reverts commit r285732. This change introduced a new assertion failure in the following testcase at -O2: typedef short __v8hi __attribute__((__vector_size__(16))); __v8hi foo(__v8hi &V1, __v8hi &V2, unsigned mask) { __v8hi Result = V1; if (mask & 0x80) Result[0] = V2[0]; return Result; } llvm-svn: 285866	2016-11-02 23:17:05 +00:00
Eli Friedman	b6befc3bc4	DCE math library calls with a constant operand. On platforms which use -fmath-errno, math libcalls without any uses require some extra checks to figure out if they are actually dead. Fixes https://llvm.org/bugs/show_bug.cgi?id=30464 . Differential Revision: https://reviews.llvm.org/D25970 llvm-svn: 285857	2016-11-02 20:48:11 +00:00
Bjorn Pettersson	7424c8ccd1	[Reassociate] Skip analysis of dead code to avoid infinite loop. Summary: It was detected that the reassociate pass could enter an inifite loop when analysing dead code. Simply skipping to analyse basic blocks that are dead avoids such problems (and as a side effect we avoid spending time on optimising dead code). The solution is using the same Reverse Post Order ordering of the basic blocks when doing the optimisations, as when building the precalculated rank map. A nice side-effect of this solution is that we now know that we only try to do optimisations for blocks with ranked instructions. Fixes https://llvm.org/bugs/show_bug.cgi?id=30818 Reviewers: llvm-commits, davide, eli.friedman, mehdi_amini Subscribers: dberlin Differential Revision: https://reviews.llvm.org/D26154 llvm-svn: 285793	2016-11-02 08:55:19 +00:00
Sanjay Patel	c3d89842ad	[InstCombine] allow splat vector folds in adjustMinMax() llvm-svn: 285732	2016-11-01 20:08:02 +00:00
Sanjay Patel	c0339c77ef	[InstCombine] Fold nuw left-shifts in `ugt`/`ule` comparisons. This transforms %a = shl nuw %x, c1 %b = icmp {ugt\|ule} %a, c0 into %b = icmp {ugt\|ule} %x, (c0 >> c1) z3: (declare-const x (_ BitVec 64)) (declare-const c0 (_ BitVec 64)) (declare-const c1 (_ BitVec 64)) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvugt (bvshl x c1) c0) (bvugt x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) (push) (assert (= x (bvlshr (bvshl x c1) c1))) ; nuw (assert (not (= (bvule (bvshl x c1) c0) (bvule x (bvlshr c0 c1))))) (check-sat) (get-model) (pop) Patch by bryant! Differential Revision: https://reviews.llvm.org/D25913 llvm-svn: 285729	2016-11-01 19:19:29 +00:00
Sanjay Patel	700193ae05	[InstCombine] add vector tests for ext+adjust min/max llvm-svn: 285713	2016-11-01 17:34:29 +00:00
Sanjay Patel	586aeee341	[InstCombine] move/fix tests for adjusted min/max I think the former 'test50' had a typo making it functionally equivalent to the former 'test49'; changed the predicate to provide more coverage. llvm-svn: 285706	2016-11-01 16:39:30 +00:00
Sanjay Patel	04ef115ff7	[InstCombine] fix tests for adjusted min/max 1. Delete identical tests 2. Rename tests to reflect actual functionality 3. Add comments 4. Add unsigned variants 5. Add vector variants with FIXME comments 6. Rename test file llvm-svn: 285699	2016-11-01 15:48:30 +00:00
Simon Pilgrim	6dd8fab443	[InstCombine] Folding of shifts by the sum of positive values This patch introduces the combine: (C1 shift (A add C2)) -> ((C1 shift C2) shift A) iff A and C2 are both positive If both A and C2 are know to be positive then we can safely split into 2 shifts, permitting the folding of the Inner shift. Fix for the spec benchmark case mentioned by @nadav on PR15141 (assuming we can prove that the inputs as positive). Differential Revision: https://reviews.llvm.org/D26000 llvm-svn: 285696	2016-11-01 15:40:30 +00:00
Sanjay Patel	69587324e8	[InstCombine] auto-generate better checks llvm-svn: 285693	2016-11-01 14:38:30 +00:00
Dorit Nuzman	bf2c15b5dc	Second attempt at r285517. llvm-svn: 285568	2016-10-31 13:17:31 +00:00
Dorit Nuzman	06903d16af	Revert r285517 due to build failures. llvm-svn: 285518	2016-10-30 14:34:57 +00:00
Dorit Nuzman	3c1c658f24	[LoopVectorize] Make interleaved-accesses analysis less conservative about possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 llvm-svn: 285517	2016-10-30 12:23:26 +00:00
Sanjay Patel	36eeb6d6f6	[ValueTracking] recognize more variants of smin/smax Try harder to detect obfuscated min/max patterns: the initial pattern was added with D9352 / rL236202. There was a bug fix for PR27137 at rL264996, but I think we can do better by folding the corresponding smax pattern and commuted variants. The codegen tests demonstrate the effect of ValueTracking on the backend via SelectionDAGBuilder. We can't expose these differences minimally in IR because we don't have smin/smax intrinsics for IR. Differential Revision: https://reviews.llvm.org/D26091 llvm-svn: 285499	2016-10-29 16:21:19 +00:00
Sanjay Patel	978f827d12	[InstCombine] re-use bitcasted compare operands in selects (PR28001) These mixed bitcast patterns show up with SSE/AVX intrinsics because we bitcast function parameters to <2 x i64>. The bitcasts obfuscate the expected min/max forms as shown in PR28001: https://llvm.org/bugs/show_bug.cgi?id=28001#c6 Differential Revision: https://reviews.llvm.org/D25943 llvm-svn: 285495	2016-10-29 15:22:04 +00:00
Justin Lebar	1535a5e9df	Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX. llvm-svn: 285464	2016-10-28 21:56:07 +00:00
Justin Lebar	0ede5fb1bb	Don't leave unused divs/rems sitting around in BypassSlowDivision. Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 llvm-svn: 285460	2016-10-28 21:43:54 +00:00
Justin Lebar	468bf73209	Don't claim the udiv created in BypassSlowDivision is exact. Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 llvm-svn: 285459	2016-10-28 21:43:51 +00:00
Matt Arsenault	ef00283425	SpeculativeExecution: Allow speculating more inst types Partial step towards removing the whitelist and only using TTI's cost. llvm-svn: 285438	2016-10-28 20:00:33 +00:00
Sanjay Patel	19ace1d548	[InstCombine] move/add tests for smin/smax folds llvm-svn: 285414	2016-10-28 16:54:03 +00:00
Matthew Simpson	9b6755362b	[LV] Correct misleading comments in test (NFC) llvm-svn: 285402	2016-10-28 14:27:45 +00:00
Davide Italiano	631cd27f29	[Reassociate] Removing instructions mutates the IR. Fixes PR 30784. Discussed with Justin, who pointed out that in the new PassManager infrastructure we can have more fine-grained control on which analyses we want to preserve, but this is the best we can do with the current infrastructure. llvm-svn: 285380	2016-10-28 02:47:09 +00:00
Davide Italiano	30665147f9	[ConstantFold] Get the correct vector type when folding a getelementptr. Differential Revision: https://reviews.llvm.org/D26014 llvm-svn: 285371	2016-10-28 00:53:16 +00:00
Davide Italiano	e4146714ca	Remove accidentally commited test. llvm-svn: 285366	2016-10-27 23:40:19 +00:00
Davide Italiano	e865a525df	[IR] Reintroduce getGEPReturnType(), it will be used in a later patch. llvm-svn: 285365	2016-10-27 23:38:51 +00:00
Sanjay Patel	c0de9c9e40	[InstCombine] fix foldSPFofSPF() to handle vector splats llvm-svn: 285345	2016-10-27 21:19:40 +00:00
Sanjay Patel	923f74b27c	[InstCombine] add vector tests for foldSPFofSPF to show missing folds llvm-svn: 285340	2016-10-27 20:51:03 +00:00
Sanjay Patel	cc8e0af3e5	[InstCombine] auto-generate checks for min/max tests llvm-svn: 285336	2016-10-27 19:54:15 +00:00
Sanjay Patel	611f9f92fc	[InstCombine] handle simple vector integer constants in IsFreeToInvert llvm-svn: 285318	2016-10-27 17:30:50 +00:00
Dehao Chen	b94c09baa0	Add Loop Sink pass to reverse the LICM based of basic block frequency. Summary: LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency. Reviewers: hfinkel, davidxl, chandlerc Subscribers: anna, modocache, mgorny, beanz, reames, dberlin, chandlerc, mcrosier, junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22778 llvm-svn: 285308	2016-10-27 16:30:08 +00:00
Sanjay Patel	e372aecb8a	[ValueTracking] fix matchSelectPattern to allow vector splat folds of min/max/abs/nabs llvm-svn: 285303	2016-10-27 15:26:10 +00:00
Sanjay Patel	d5b8d64d4b	[InstCombine] add tests for missing folds of vector abs/nabs/min/max llvm-svn: 285299	2016-10-27 15:02:45 +00:00
Sanjay Patel	f21dd2648f	[InstCombine] auto-generate better checks; NFC llvm-svn: 285293	2016-10-27 13:55:37 +00:00
Alexey Bataev	46c0278e7d	[SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer. After successfull horizontal reduction vectorization attempt for PHI node vectorizer tries to update root binary op by combining vectorized tree and the ReductionPHI node. But during vectorization this ReductionPHI can be vectorized itself and replaced by the `undef` value, while the instruction itself is marked for deletion. This 'marked for deletion' PHI node then can be used in new binary operation, causing "Use still stuck around after Def is destroyed" crash upon PHI node deletion. Also the test is fixed to make it perform actual testing. Differential Revision: https://reviews.llvm.org/D25671 llvm-svn: 285286	2016-10-27 12:02:28 +00:00
Sanjoy Das	01969218a4	Simplify `x >=u x >> y` and `x >=u x udiv y` Summary: Extends InstSimplify to handle both `x >=u x >> y` and `x >=u x udiv y`. This is a folloup of rL258422 and https://github.com/rust-lang/rust/pull/30917 where llvm failed to optimize away the bounds checking in a binary search. Patch by Arthur Silva! Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25941 llvm-svn: 285228	2016-10-26 19:18:43 +00:00
Dehao Chen	e713000eb6	Introduce updateDiscriminator interface to DILocation to make it cleaner assigning discriminators. Summary: This patch introduces updateDiscriminator to DILocation so that it can be directly called by AddDiscriminator. It also makes it easier to update the discriminator later. Reviewers: dnovillo, dblaikie, aprantl, echristo Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25959 llvm-svn: 285207	2016-10-26 15:48:45 +00:00
Sanjay Patel	0bacecfb32	[InstCombine] consolidate zext tests and auto-generate checks; NFC llvm-svn: 285195	2016-10-26 14:08:49 +00:00
Sanjay Patel	0b756ff1d1	[InstCombine] auto-generate better checks; NFC llvm-svn: 285194	2016-10-26 13:58:22 +00:00
Rong Xu	33308f92eb	[PGO] Fix select instruction annotation Summary: Select instruction annotation in IR PGO uses the edge count to infer the branch count. It's currently placed in setInstrumentedCounts() where no all the BB counts have been computed. This leads to wrong branch weights. Move the annotation after all BB counts are populated. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25961 llvm-svn: 285128	2016-10-25 21:47:24 +00:00
Guozhi Wei	ae541f6a71	[InstCombine] Resubmit the combine of A->B->A BitCast and fix for pr27996 The original patch of the A->B->A BitCast optimization was reverted by r274094 because it may cause infinite loop inside compiler https://llvm.org/bugs/show_bug.cgi?id=27996. The problem is with following code xB = load (type B); xA = load (type A); +yA = (A)xB; B -> A +zAn = PHI[yA, xA]; PHI +zBn = (B)zAn; // A -> B store zAn; store zBn; optimizeBitCastFromPhi generates +zBn = (B)zAn; // A -> B and expects it will be combined with the following store instruction to another store zAn Unfortunately before combineStoreToValueType is called on the store instruction, optimizeBitCastFromPhi is called on the new BitCast again, and this pattern repeats indefinitely. optimizeBitCastFromPhi only generates BitCast for load/store instructions, only the BitCast before store can cause the reexecution of optimizeBitCastFromPhi, and BitCast before store can easily be handled by InstCombineLoadStoreAlloca.cpp. So the solution to the problem is if all users of a CI are store instructions, we should not do optimizeBitCastFromPhi on it. Then optimizeBitCastFromPhi will not be called on the new BitCast instructions. Differential Revision: https://reviews.llvm.org/D23896 llvm-svn: 285116	2016-10-25 20:43:42 +00:00
Sanjay Patel	f3dda13bd2	[InstCombine] Ensure that truncated int types are legal. Fixes the FIXMEs in D25952 and rL285075. Patch by bryant! Differential Revision: https://reviews.llvm.org/D25955 llvm-svn: 285108	2016-10-25 20:11:47 +00:00
Matthew Simpson	c62266d680	[LV] Sink scalar operands of predicated instructions When we predicate an instruction (div, rem, store) we place the instruction in its own basic block within the vectorized loop. If a predicated instruction has scalar operands, it's possible to recursively sink these scalar expressions into the predicated block so that they might avoid execution. This patch sinks as much scalar computation as possible into predicated blocks. We previously were able to sink such operands only if they were extractelement instructions. Differential Revision: https://reviews.llvm.org/D25632 llvm-svn: 285097	2016-10-25 18:59:45 +00:00
Sanjay Patel	8f9235dd7f	[InstCombine] add tests for missing icmp + shl nuw fold Patch by bryant! Differential Revision: https://reviews.llvm.org/D25952 llvm-svn: 285095	2016-10-25 18:47:56 +00:00
Michael Ilseman	e542804343	Add -strip-nonlinetable-debuginfo capability This adds a new function to DebugInfo.cpp that takes an llvm::Module as input and removes all debug info metadata that is not directly needed for line tables, thus effectively stripping all type and variable information from the module. The primary motivation for this feature was the bitcode work flow (cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html for more background). This is not wired up yet, but will be in subsequent patches. For testing, the new functionality is exposed to opt with a -strip-nonlinetable-debuginfo option. The secondary use-case (and one that works right now!) is as a reduction pass in bugpoint. I added two new bugpoint options (-disable-strip-debuginfo and -disable-strip-debug-types) to control the new features. By default it will first attempt to remove all debug information, then only the type info, and then proceed to hack at any remaining MDNodes. Thanks to Adrian Prantl for stewarding this patch! llvm-svn: 285094	2016-10-25 18:44:13 +00:00
Geoff Berry	91e9a5cc23	[EarlyCSE] Make MemorySSA memory dependency check more aggressive. Now that MemorySSA keeps track of whether MemoryUses are optimized, use getClobberingMemoryAccess() to check MemoryUse memory dependencies since it should no longer be so expensive. This is a follow-up change to https://reviews.llvm.org/D25881 llvm-svn: 285080	2016-10-25 16:18:47 +00:00
Sanjay Patel	d59f7f9047	[InstCombine] add test and code comment to show potentially misguided icmp trunc transform llvm-svn: 285075	2016-10-25 15:16:39 +00:00
Sanjay Patel	62fbfe4e21	[InstCombine] fix checks for previous commit (r285069) Accidentally put in the hoped-for checks ahead of the transform! llvm-svn: 285070	2016-10-25 13:30:19 +00:00
Sanjay Patel	97beffe037	[InstCombine] add tests for bitcast interference with min/max (PR28001) llvm-svn: 285069	2016-10-25 13:27:56 +00:00
Sanjay Patel	02be063351	[InstCombine] auto-generate checks llvm-svn: 285046	2016-10-25 00:44:02 +00:00
Sanjay Patel	5cff621dce	[InstCombine] auto-generate checks llvm-svn: 285045	2016-10-25 00:41:00 +00:00
Sanjay Patel	60f80d7a8b	[InstCombine] regenerate some checks llvm-svn: 285036	2016-10-24 22:50:26 +00:00
Mandeep Singh Grang	da99e33ae3	[llvm] Remove redundant --check-prefix=CHECK from tests Reviewers: MatzeB, mcrosier, rengolin Differential Revision: https://reviews.llvm.org/D25894 llvm-svn: 285003	2016-10-24 18:57:55 +00:00
Adrian Prantl	28d2d281e7	add-discriminators: Fix handling of lexical scopes. This fixes a bug in the handling of lexical scopes, when more than one scope is defined on the same line or functions are inlined into call sites that are on the same line as the function definition. This situation can easily happen in macro expansions. The problem is solved by introducing a SmallDenseMap<DIScope , DILexicalBlockFile , 1> that keeps track of all the different lexical scopes that share a line/file location. Fixes PR30681. llvm-svn: 284998	2016-10-24 18:23:51 +00:00
Geoff Berry	6815468768	[EarlyCSE] Optimize MemoryPhis and reduce memory clobber queries w/ MemorySSA Summary: When using MemorySSA, re-optimize MemoryPhis when removing a store since this may create MemoryPhis with all identical arguments. Also, when using MemorySSA to check if two MemoryUses are reading from the same version of the heap, use the defining access instead of calling getClobberingAccess, since the latter can currently result in many more AA calls. Once the MemorySSA use optimization tracking changes are done, we can remove this limitation, which should result in more loads being CSE'd. Reviewers: dberlin Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D25881 llvm-svn: 284984	2016-10-24 15:54:00 +00:00
Nico Weber	b38d341106	Revert 284971. It seems to break selfhost on some bots, see e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/21 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/20 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/22 llvm-svn: 284979	2016-10-24 14:52:04 +00:00
Pablo Barrio	f9e0d0b7d0	[JumpThreading] Unfold selects that depend on the same condition Summary: These are good candidates for jump threading. This enables later opts (such as InstCombine) to combine instructions from the selects with instructions out of the selects. SimplifyCFG will fold the select again if unfolding wasn't worth it. Patch by James Molloy and Pablo Barrio. Reviewers: reames, bkramer, mcrosier, gberry, haicheng, jmolloy, sebpop Subscribers: jojo, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D25477 llvm-svn: 284971	2016-10-24 13:04:45 +00:00
Anna Thomas	0860259434	[StripGCRelocates] New pass to remove gc.relocates added by RS4GC Summary: Utility pass to remove gc.relocates created by rewrite statepoints for GC. With respect to safepoint verification, the IR generated would be incorrect, and cannot run as such. This would be a single transformation on the final optimized IR. The benefit of the pass is for easy analysis when the IRs are 'polluted' by too many gc.relocates. Added tests. test run: All RS4GC tests with -verify option. Local downstream tests on large IR files. This also works when the pointer being gc.relocated is another gc.relocate. Reviewers: sanjoy, reames Subscribers: beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25096 llvm-svn: 284855	2016-10-21 18:43:16 +00:00
Artur Pilipenko	47dc098c06	[LVI] Fix a bug with a guard being the very first instruction in a BB not taken into account While looking for guards use reverse iterator and scan up to rend() not to begin() llvm-svn: 284827	2016-10-21 15:02:21 +00:00
John Brawn	84b21835f1	[LoopUnroll] Keep the loop test only on the first iteration of max-or-zero loops When we have a loop with a known upper bound on the number of iterations, and furthermore know that either the number of iterations will be either exactly that upper bound or zero, then we can fully unroll up to that upper bound keeping only the first loop test to check for the zero iteration case. Most of the work here is in plumbing this 'max-or-zero' information from the part of scalar evolution where it's detected through to loop unrolling. I've also gone for the safe default of 'false' everywhere but howManyLessThans which could probably be improved. Differential Revision: https://reviews.llvm.org/D25682 llvm-svn: 284818	2016-10-21 11:08:48 +00:00
Davide Italiano	d15477b09d	Revert "[GVN/PRE] Hoist global values outside of loops." There's no agreement about this patch. I personally find the PRE machinery of the current GVN hard enough to reason about that I'm not sure I'll try to land this again, instead of working on the rewrite). llvm-svn: 284796	2016-10-21 01:37:02 +00:00
Michael Kuperstein	b2443ed62b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Sanjay Patel	efd8885772	[InstSimplify] fold negation of sign-bit 0 - X --> X, if X is 0 or the minimum signed value 0 - X --> 0, if X is 0 or the minimum signed value and the sub is NSW I noticed this pattern might be created in the backend after the change from D25485, so we'll want to add a similar fold for the DAG. The use of computeKnownBits in InstSimplify may be something to investigate if the compile time of InstSimplify is noticeable. We could replace computeKnownBits with specific pattern matchers or limit the recursion. Differential Revision: https://reviews.llvm.org/D25785 llvm-svn: 284649	2016-10-19 21:23:45 +00:00
Artur Pilipenko	5c6ef75485	[IndVarSimplify] Teach calculatePostIncRange to take guards into account Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25739 llvm-svn: 284632	2016-10-19 19:43:54 +00:00
Matthew Simpson	41fa838f07	[LV] Avoid emitting trivially dead instructions Some instructions from the original loop, when vectorized, can become trivially dead. This happens because of the way we structure the new loop. For example, we create new induction variables and induction variable "steps" in the new loop. Thus, when we go to vectorize the original induction variable update, it may no longer be needed due to the instructions we've already created. This patch prevents us from creating these redundant instructions. This reduces code size before simplification and allows greater flexibility in code generation since we have fewer unnecessary instruction uses. Differential Revision: https://reviews.llvm.org/D25631 llvm-svn: 284631	2016-10-19 19:22:02 +00:00
Artur Pilipenko	f2d5dc5dc6	[IndVarSimplify] Use control-dependent range information to prove non-negativity This change is motivated by the case when IndVarSimplify doesn't widen a comparison of IV increment because it can't prove IV increment being non-negative. We end up with a redundant trunc of the widened increment on this example. for.body: %i = phi i32 [ %start, %for.body.lr.ph ], [ %i.inc, %for.inc ] %within_limits = icmp ult i32 %i, 64 br i1 %within_limits, label %continue, label %for.end continue: %i.i64 = zext i32 %i to i64 %arrayidx = getelementptr inbounds i32, i32* %base, i64 %i.i64 %val = load i32, i32* %arrayidx, align 4 br label %for.inc for.inc: %i.inc = add nsw nuw i32 %i, 1 %cmp = icmp slt i32 %i.inc, %limit br i1 %cmp, label %for.body, label %for.end There is a range check inside of the loop which guarantees the IV to be non-negative. NSW on the increment guarantees that the increment is also non-negative. Teach IndVarSimplify to use the range check to prove non-negativity of loop increments. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D25738 llvm-svn: 284629	2016-10-19 18:59:03 +00:00
Sanjay Patel	cf26c27478	[InstSimplify] move one and add more tests for potential negation folds llvm-svn: 284627	2016-10-19 18:42:12 +00:00
Dehao Chen	4b41571d24	Update the section.ll to fix non-x86 failure. llvm-svn: 284566	2016-10-19 03:53:41 +00:00
Dehao Chen	95fc43143d	Revert r284545 again as the regression in ppc still exists. There is bug in MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563	2016-10-19 01:18:25 +00:00
Rong Xu	1c0e9b97d2	Conditionally eliminate library calls where the result value is not used Summary: This pass shrink-wraps a condition to some library calls where the call result is not used. For example: sqrt(val); is transformed to if (val < 0) sqrt(val); Even if the result of library call is not being used, the compiler cannot safely delete the call because the function can set errno on error conditions. Note in many functions, the error condition solely depends on the incoming parameter. In this optimization, we can generate the condition can lead to the errno to shrink-wrap the call. Since the chances of hitting the error condition is low, the runtime call is effectively eliminated. These partially dead calls are usually results of C++ abstraction penalty exposed by inlining. This optimization hits 108 times in 19 C/C++ programs in SPEC2006. Reviewers: hfinkel, mehdi_amini, davidxl Subscribers: modocache, mgorny, mehdi_amini, xur, llvm-commits, beanz Differential Revision: https://reviews.llvm.org/D24414 llvm-svn: 284542	2016-10-18 21:36:27 +00:00
Dehao Chen	83033e0b9a	Add target for test to fix regression introduced by r284533. llvm-svn: 284538	2016-10-18 21:13:31 +00:00
Dehao Chen	302b69c940	Use profile info to set function section prefix to group hot/cold functions. Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533	2016-10-18 20:42:47 +00:00
Simon Pilgrim	4ddc92b6cd	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Davide Italiano	84bd58e915	[opt] Strip coverage if debug info is not present. If -coverage is passed, but -g is not, clang populates the PassManager pipeline with StripSymbols(debugOnly = true). The stripSymbol pass therefore scans the list of named metadata, drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD) around. The verifier runs, and finds out that there's a CU not listed in !llvm.dbg.cu (as it was previously dropped) -> crash. When we strip debug info, so, check if there's coverage data, and strip it as well, in order to avoid pending metadata left around. Differential Revision: https://reviews.llvm.org/D25689 llvm-svn: 284418	2016-10-17 20:05:35 +00:00
Dehao Chen	018a3afa99	Ignore debug info when making optimization decisions in SimplifyCFG. Summary: Debug info should not affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info. Reviewers: davidxl, mzolotukhin, jmolloy Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D25286 llvm-svn: 284415	2016-10-17 19:28:44 +00:00
Oliver Stannard	fe4432b105	[SimplifyCFG] Don't lower complex ConstantExprs to lookup tables Not all ConstantExprs can be represented by a global variable, for example most pointer arithmetic other than addition of a constant, so we can't convert these values from switch statements to lookup tables. Differential Revision: https://reviews.llvm.org/D25550 llvm-svn: 284379	2016-10-17 12:00:24 +00:00
Davide Italiano	590ad7037e	[GVN/PRE] Hoist global values outside of loops. In theory this could be generalized to move anything where we prove the operands are available, but that would require rewriting PRE. As NewGVN will hopefully come soon, and we're trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably not worth spending too much time on it. Fix provided by Daniel Berlin! llvm-svn: 284311	2016-10-15 21:35:23 +00:00
David L Kreitzer	01a057a0c4	Add a pass to optimize patterns of vectorized interleaved memory accesses for X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260	2016-10-14 18:20:41 +00:00
Tom Stellard	64a9d0876c	AMDGPU/SI: Don't allow unaligned scratch access Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257	2016-10-14 18:10:39 +00:00
David L Kreitzer	d5c6755d83	[safestack] Use non-thread-local unsafe stack pointer for Contiki OS Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254	2016-10-14 17:56:00 +00:00
Sanjay Patel	6d6eca5cdc	[InstCombine] use m_APInt to allow sub with constant folds for splat vectors llvm-svn: 284247	2016-10-14 16:31:54 +00:00
Sanjay Patel	ecd0da2619	[InstCombine] add tests for missing vector folds llvm-svn: 284245	2016-10-14 15:55:34 +00:00
Sanjay Patel	a3bc38b36d	[InstCombine] auto-generate checks llvm-svn: 284244	2016-10-14 15:41:25 +00:00
Sanjay Patel	0b611dcabf	[InstCombine] remove redundant test This test was apparently checking for 2 independent folds, but we have plenty of tests for those individual folds already. We are lacking vector tests, however, because we don't have the shift folds for vectors. llvm-svn: 284243	2016-10-14 15:36:28 +00:00
Sanjay Patel	ad0757febb	[InstCombine] update test to use FileCheck and auto-generate checks llvm-svn: 284242	2016-10-14 15:30:31 +00:00
Sanjay Patel	c6c5965a42	[InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y) Prefer add/zext because they are better supported in terms of value-tracking. Note that the backend should be prepared for this IR canonicalization (including vector types) after: https://reviews.llvm.org/rL284015 Differential Revision: https://reviews.llvm.org/D25135 llvm-svn: 284241	2016-10-14 15:24:31 +00:00
David L Kreitzer	d9ca3589de	[safestack] Move X86-targeted tests into the X86 subdirectory. Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D25340 llvm-svn: 284139	2016-10-13 17:51:59 +00:00
Matthew Simpson	1d4b163fc0	[LV] Account for predicated stores in instruction costs This patch ensures that we scale the estimated cost of predicated stores by block probability. This is a follow-on patch for r284123. llvm-svn: 284126	2016-10-13 14:54:31 +00:00
Matthew Simpson	6cdb5a6f96	[LV] Avoid rounding errors for predicated instruction costs This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch since we currently don't scale the cost of predicated stores by block probability. Differential Revision: https://reviews.llvm.org/D25333 llvm-svn: 284123	2016-10-13 14:19:48 +00:00
Sebastian Pop	5ba9f24ed7	commit back "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This is with an extra change to avoid calling MemoryLocation::get() on a call instruction. Differential Revision: https://reviews.llvm.org/D25542 llvm-svn: 284098	2016-10-13 01:39:10 +00:00
Reid Kleckner	8958f6a529	Revert "GVN-hoist: fix store past load dependence analysis (PR30216, PR30499)" This CL didn't actually address the test case in PR30499, and clang still crashes. Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC" Reverts r283965 and r283967. llvm-svn: 284093	2016-10-13 00:18:26 +00:00
Haicheng Wu	1ef17e90b2	Reapply "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049. The original summary: This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. llvm-svn: 284053	2016-10-12 21:29:38 +00:00
Haicheng Wu	45e4ef737d	Revert "[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop" This reverts commit r284044. llvm-svn: 284051	2016-10-12 21:02:22 +00:00
Krzysztof Parzyszek	3cb5ffeb35	Fix testcases failing after r284036 The codegen has changed slightly between my tests and the commit. llvm-svn: 284049	2016-10-12 20:39:33 +00:00
Haicheng Wu	6cac34fd41	[LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop This patch tries to fully unroll loops having break statement like this for (int i = 0; i < 8; i++) { if (a[i] == value) { found = true; break; } } GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts. The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code. The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count. Differential Revision: https://reviews.llvm.org/D24790 llvm-svn: 284044	2016-10-12 20:24:32 +00:00
Krzysztof Parzyszek	8271be9a1d	Do not remove implicit defs in BranchFolder Branch folder removes implicit defs if they are the only non-branching instructions in a block, and the branches do not use the defined registers. The problem is that in some cases these implicit defs are required for the liveness information to be correct. Differential Revision: https://reviews.llvm.org/D25478 llvm-svn: 284036	2016-10-12 19:50:57 +00:00
Sanjoy Das	bc357e8fa3	[SimplifyCFG] Don't create PHI nodes for constant bundle operands Summary: Constant bundle operands may need to retain their constant-ness for correctness. I'll admit that this is slightly odd, but it looks like SimplifyCFG already does this for things like @llvm.frameaddress and @llvm.stackmap, so I suppose adding one more case is not a big deal. It is possible to add a mechanism to denote bundle operands that need to remain constants, but that's probably too complicated for the time being. Reviewers: jmolloy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D25502 llvm-svn: 284028	2016-10-12 18:15:33 +00:00
Artur Pilipenko	c6eb6bd9cb	[ValueTracking] An improvement to IR ValueTracking on Non-negative Integers Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default. Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18777 llvm-svn: 284022	2016-10-12 16:18:43 +00:00
Chad Rosier	c215c3fd14	[CVP] Convert an AShr to a LShr if 1st operand is known to be nonnegative. An arithmetic shift can be safely changed to a logical shift if the first operand is known positive. This allows ComputeKnownBits (and similar analysis) to determine the sign bit of the shifted value in some cases. In turn, this allows InstCombine to canonicalize a signed comparison (a > 0) into an equality check (a != 0). PR30577 Differential Revision: https://reviews.llvm.org/D25119 llvm-svn: 284013	2016-10-12 13:41:38 +00:00
Simon Pilgrim	fd0d7b21e0	[InstCombine] Fix constexpr issue in select combining As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead. Differential Revision: https://reviews.llvm.org/D25466 llvm-svn: 284000	2016-10-12 10:20:15 +00:00
Sebastian Pop	ab12fb62ee	GVN-hoist: fix store past load dependence analysis (PR30216, PR30499) This is a refreshed version of a patch that was reverted: it fixes the problems reported in both PR30216 and PR30499, and contains all the test-cases from both bugs. To hoist stores past loads, we used to search for potential conflicting loads on the hoisting path by following a MemorySSA def-def link from the store to be hoisted to the previous defining memory access, and from there we followed the def-use chains to all the uses that occur on the hoisting path. The problem is that the def-def link may point to a store that does not alias with the store to be hoisted, and so the loads that are walked may not alias with the store to be hoisted, and even as in the testcase of PR30216, the loads that may alias with the store to be hoisted are not visited. The current patch visits all loads on the path from the store to be hoisted to the hoisting position and uses the alias analysis to ask whether the store may alias the load. I was not able to use the MemorySSA functionality to ask for whether load and store are clobbered: I'm not sure which function to call, so I used a call to AA->isNoAlias(). Store past store is still working as before using a MemorySSA query: I added an extra test to pr30216.ll to make sure store past store does not regress. Tested on x86_64-linux with check and a test-suite run. Differential Revision: https://reviews.llvm.org/D25476 llvm-svn: 283965	2016-10-12 02:23:39 +00:00
David Majnemer	80dca0c78f	[InstCombine] Transform !range metadata to !nonnull when combining loads When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair. This fixes PR30597. Patch by Ariel Ben-Yehuda! Differential Revision: https://reviews.llvm.org/D25215 llvm-svn: 283836	2016-10-11 01:00:45 +00:00
Adrian Prantl	3bfe1093df	Teach llvm::StripDebugInfo() about global variable !dbg attachments. This is a regression introduced by the global variable ownership reversal performed in r281284. rdar://problem/28448075 llvm-svn: 283784	2016-10-10 17:53:33 +00:00
Simon Pilgrim	cfef627b1f	[SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests llvm-svn: 283756	2016-10-10 14:28:06 +00:00
Simon Pilgrim	2c0733c678	[SLPVectorizer][X86] Add avx512 sitofp/uitofp tests llvm-svn: 283751	2016-10-10 14:14:31 +00:00
Simon Pilgrim	6cadb5610e	[SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests Fixed copy+paste vector alignment to correct for per-element scalar loads Increased to 512-bit data sizes in preparation of avx512 tests llvm-svn: 283748	2016-10-10 14:10:41 +00:00
Adam Nemet	ee5cf031ce	[OptRemarks] Remove non-printable chars from function name Value names may be prefixed with a binary '1' to indicate that the backend should not modify the symbols due to any platform naming convention. This should not show up in the YAML opt record file because it breaks the YAML parser. llvm-svn: 283656	2016-10-08 04:47:20 +00:00
Gor Nishanov	1b6aec8e25	[coroutines] Store an address of destroy OR cleanup part in the coroutine frame. Summary: If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup. Before this change, CoroSplit synthesized these stores after coro.begin: ``` store void (%f.Frame) @f.resume, void (%f.Frame)* %resume.addr store void (%f.Frame) @f.destroy, void (%f.Frame)* %destroy.addr ``` In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops. Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow: ``` store void (%f.Frame) @f.resume, void (%f.Frame)* %resume.addr %0 = select i1 %need.alloc, void (%f.Frame) @f.destroy, void (%f.Frame) @f.cleanup store void (%f.Frame) %0, void (%f.Frame)* %destroy.addr ``` Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D25377 llvm-svn: 283625	2016-10-08 00:22:50 +00:00
Davide Italiano	f6988d2980	[InstCombine] Don't unpack arrays that are too large (part 2). This is similar to r283599, but for store instructions. Thanks to David for pointing out! llvm-svn: 283612	2016-10-07 21:53:09 +00:00
Davide Italiano	da11412243	[InstCombine] Don't unpack arrays that are too large Differential Revision: https://reviews.llvm.org/D25376 llvm-svn: 283599	2016-10-07 20:57:42 +00:00
Anna Thomas	e76d77ace5	[RS4GC] Strengthen coverage: add more tests Summary: Add tests for cases where we have zero coverage in RS4GC. Reviewers: sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25341 llvm-svn: 283591	2016-10-07 20:34:00 +00:00
Sanjay Patel	4326c4ac8f	[InstCombine] fold select X, (ext X), C If we're going to canonicalize IR towards select of constants, try harder to create those. Also, don't lose the metadata. This is actually 4 related transforms in one patch: // select X, (sext X), C --> select X, -1, C // select X, (zext X), C --> select X, 1, C // select X, C, (sext X) --> select X, C, 0 // select X, C, (zext X) --> select X, C, 0 Differential Revision: https://reviews.llvm.org/D25126 llvm-svn: 283575	2016-10-07 17:53:07 +00:00
Matthew Simpson	a371c14ffe	[LV] Don't mark multi-use branch conditions uniform Previously, we marked the branch conditions of latch blocks uniform after vectorization if they were instructions contained in the loop. However, if a condition instruction has users other than the branch, it may not remain uniform. This patch ensures the conditions we mark uniform are only used by the branch. This should fix PR30627. Reference: https://llvm.org/bugs/show_bug.cgi?id=30627 llvm-svn: 283563	2016-10-07 15:20:13 +00:00
Tom Stellard	17eb3413cd	[ValueTracking] Fix crash in GetPointerBaseWithConstantOffset() Summary: While walking defs of pointer operands we were assuming that the pointer size would remain constant. This is not true, because addresspacecast instructions may cast the pointer to an address space with a different pointer width. This partial reverts r282612, which was a more conservative solution to this problem. Reviewers: reames, sanjoy, apilipenko Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24772 llvm-svn: 283557	2016-10-07 14:23:29 +00:00
Alexey Bataev	6ad5da7c81	[SLPVectorizer] Fix for PR25748: reduction vectorization after loop unrolling. The next code is not vectorized by the SLPVectorizer: ``` int test(unsigned int *p) { int sum = 0; for (int i = 0; i < 8; i++) sum += p[i]; return sum; } ``` During optimization this loop is fully unrolled and SLPVectorizer is unable to vectorize it. Patch tries to fix this problem. Differential Revision: https://reviews.llvm.org/D24796 llvm-svn: 283535	2016-10-07 09:39:22 +00:00
Oliver Stannard	4df1cc0b00	[ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI With the ROPI and RWPI relocation models we can't always have pointers to global data or functions in constant data, so don't try to convert switches into lookup tables if any value in the lookup table would require a relocation. We can still safely emit lookup tables of other values, such as simple constants. Differential Revision: https://reviews.llvm.org/D24462 llvm-svn: 283530	2016-10-07 08:48:24 +00:00
David Majnemer	8c03c1bade	[SimplifyCFG] Correctly test for unconditional branches in GetCaseResults GetCaseResults assumed that a terminator with one successor was an unconditional branch. This is not necessarily the case, it could be a cleanupret. Strengthen the check by querying whether or not the terminator is exceptional. llvm-svn: 283517	2016-10-07 01:38:35 +00:00
Michael Kuperstein	5185b7dde3	[LV] Remove triples from target-independent vectorizer tests. NFC. Vectorizer tests in the target-independent directory should not have a target triple. If a test really needs to query a specific backend, it belongs in the right target subdirectory (which "REQUIRES" the right backend). Otherwise, it should not specify a triple. llvm-svn: 283512	2016-10-06 23:57:25 +00:00
Rong Xu	0e79f7d11d	[PGO] Create weak alias for the renamed Comdat function Add a weak alias to the renamed Comdat function in IR level instrumentation, using it's original name. This ensures the same behavior w/ and w/o IR instrumentation, even for non standard conforming code. Differential Revision: http://reviews.llvm.org/D25339 llvm-svn: 283490	2016-10-06 20:38:13 +00:00
Michael Ilseman	6d6b4d87a3	Revert "Add -strip-nonlinetable-debuginfo capability" This reverts commit r283473. Reverted until review is completed. llvm-svn: 283478	2016-10-06 18:30:26 +00:00
Michael Ilseman	d0a4db7632	Add -strip-nonlinetable-debuginfo capability This adds a new function to DebugInfo.cpp that takes an llvm::Module as input and removes all debug info metadata that is not directly needed for line tables, thus effectively stripping all type and variable information from the module. The primary motivation for this feature was the bitcode work flow (cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html for more background). This is not wired up yet, but will be in subsequent patches. For testing, the new functionality is exposed to opt with a -strip-nonlinetable-debuginfo option. The secondary use-case (and one that works right now!) is as a reduction pass in bugpoint. I added two new bugpoint options (-disable-strip-debuginfo and -disable-strip-debug-types) to control the new features. By default it will first attempt to remove all debug information, then only the type info, and then proceed to hack at any remaining MDNodes. llvm-svn: 283473	2016-10-06 17:58:38 +00:00
Hal Finkel	bdd6735a9e	Don't filter diagnostics written as YAML to the output file The purpose of the YAML diagnostic output file is to collect information on optimizations performed, or not performed, for later processing by tools that help users (and compiler developers) understand how code was optimized. As such, the diagnostics that appear in the file should not be coupled to what a user might want to see summarized for them as the compiler runs, and in fact, because the user likely does not know what optimization diagnostics their tools might want to use, the user cannot provide a useful filter regardless. As such, we shouldn't filter the diagnostics going to the output file. Differential Revision: https://reviews.llvm.org/D25224 llvm-svn: 283236	2016-10-04 18:13:45 +00:00
Adam Nemet	0428e93217	Serialize remark argument as a mapping to get proper quotation for the value. llvm-svn: 283231	2016-10-04 17:05:04 +00:00
Alexey Bataev	7e217c2402	[SLPVectorizer] Add a test with non-vectorizable IR. llvm-svn: 283225	2016-10-04 15:07:23 +00:00
Anna Thomas	479cbb9405	[RS4GC] Handle ShuffleVector instruction in findBasePointer Summary: This patch modifies the findBasePointer to handle the shufflevector instruction. Tests run: RS4GC tests, local downstream tests. Reviewers: reames, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25197 llvm-svn: 283219	2016-10-04 13:48:37 +00:00
Sanjoy Das	0359a193a7	[PruneEH] Be correct in the face IPO This fixes one spot I had missed in r265762. Credit goes to Philip Reames for spotting this one! llvm-svn: 283137	2016-10-03 19:35:30 +00:00
Hans Wennborg	b4d2678c6f	Jump threading: avoid trying to split edge into landingpad block (PR27840) Splitting the edge is nontrivial because of the landing pad, and we would currently assert trying to do it. Differential Revision: https://reviews.llvm.org/D24680 llvm-svn: 283129	2016-10-03 18:18:04 +00:00
Sanjay Patel	d27a21874b	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119	2016-10-03 16:38:27 +00:00
Simon Pilgrim	04e249b128	[SLPVectorizer][X86] Added fptosi/fptoui tests llvm-svn: 283048	2016-10-01 19:35:59 +00:00
Simon Pilgrim	567c4fbdae	[SLPVectorizer][X86] Added fcopysign tests llvm-svn: 283046	2016-10-01 17:00:26 +00:00
Simon Pilgrim	cceeb2a4fa	[SLPVectorizer][X86] Added fabs tests llvm-svn: 283045	2016-10-01 16:54:01 +00:00
Sanjay Patel	f7b851fe84	[InstCombine] allow non-splat folds of select cond (ext X), C llvm-svn: 282906	2016-09-30 19:49:22 +00:00
Dehao Chen	a067b0926f	Revert test change in r282894 as it's broken in some platforms. llvm-svn: 282903	2016-09-30 19:25:23 +00:00
Gor Nishanov	a263a60ad5	[Coroutines] Part15c: Fix coro-split to correctly handle definitions between coro.save and coro.suspend Summary: In the case below, %Result.i19 is defined between coro.save and coro.suspend and used after coro.suspend. We need to correctly place such a value into the coroutine frame. ``` %save = call token @llvm.coro.save(i8* null) %Result.i19 = getelementptr inbounds %"struct.lean_future<int>::Awaiter", %"struct.lean_future<int>::Awaiter"* %ref.tmp7, i64 0, i32 0 %suspend = call i8 @llvm.coro.suspend(token %save, i1 false) switch i8 %suspend, label %exit [ i8 0, label %await.ready i8 1, label %exit ] await.ready: %val = load i32, i32* %Result.i19 ``` Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24418 llvm-svn: 282902	2016-09-30 19:24:19 +00:00
Sanjay Patel	75b2518762	[InstCombine] add tests for non-splat select(ext) llvm-svn: 282901	2016-09-30 19:15:41 +00:00
Gor Nishanov	c16219486a	[Coroutines] Part15b: Fix dbg information handling in coro-split. Summary: Without the fix, if there was a function inlined into the coroutine with debug information, CloneFunctionInto(NewF, &F, VMap, /ModuleLevelChanges=/true, Returns); would duplicate all of the debug information including the DICompileUnit. We know use VMap to indicate that debug metadata for a File, Unit and FunctionType should not be duplicated when we creating clones that will become f.resume, f.destroy and f.cleanup. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24417 llvm-svn: 282899	2016-09-30 19:05:06 +00:00
Gor Nishanov	768de2c604	[Coroutines] Part 15a: Lower coro.subfn.addr in CoroCleanup Summary: Not all coro.subfn.addr intrinsics can be eliminated in CoroElide through devirtualization. Those that remain need to be lowered in CoroCleanup. Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24412 llvm-svn: 282897	2016-09-30 18:41:35 +00:00
Sanjay Patel	b43712a513	clean up tests and auto-generate checks llvm-svn: 282896	2016-09-30 18:37:34 +00:00
Dehao Chen	977853b7c5	Update loop unroller cost model to make sure debug info does not affect optimization decisions. Summary: Debug info should not affect optimization decisions. This patch updates loop unroller cost model to make it not affected by debug info. Reviewers: davidxl, mzolotukhin Subscribers: haicheng, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D25098 llvm-svn: 282894	2016-09-30 18:30:04 +00:00
Sanjay Patel	9685b3eb57	[InstCombine] add tests for select X, (ext X), C llvm-svn: 282891	2016-09-30 18:10:14 +00:00
Matthew Simpson	7808833e28	[LV] Build all scalar steps for non-uniform induction variables When building the steps for scalar induction variables, we previously attempted to determine if all the scalar users of the induction variable were uniform. If they were, we would only emit the step corresponding to vector lane zero. This optimization was too aggressive. We generally don't know the entire set of induction variable users that will be scalar. We have isScalarAfterVectorization, but this is only a conservative estimate of the instructions that will be scalarized. Thus, an induction variable may have scalar users that aren't already known to be scalar. To avoid emitting unused steps, we can only check that the induction variable is uniform. This should fix PR30542. Reference: https://llvm.org/bugs/show_bug.cgi?id=30542 llvm-svn: 282863	2016-09-30 15:13:52 +00:00
Piotr Padlewski	d28694739c	[thinlto] Don't decay threshold for hot callsites Summary: We don't want to decay hot callsites to import chains of hot callsites. The same mechanism is used in LIPO. Reviewers: tejohnson, eraman, mehdi_amini Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24976 llvm-svn: 282833	2016-09-30 03:01:17 +00:00
Piotr Padlewski	ba72b95f7b	[thinlto] Add cold-callsite import heuristic Summary: Not tunned up heuristic, but with this small heuristic there is about +0.10% improvement on SPEC 2006 Reviewers: tejohnson, mehdi_amini, eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24940 llvm-svn: 282733	2016-09-29 17:32:07 +00:00
Evgeny Stupachenko	dc8a254663	Wisely choose sext or zext when widening IV. Summary: The patch fixes regression caused by two earlier patches D18777 and D18867. Reviewers: reames, sanjoy Differential Revision: http://reviews.llvm.org/D24280 From: Li Huang llvm-svn: 282650	2016-09-28 23:39:39 +00:00
Sanjay Patel	3e9e5ccf7c	[InstCombine] update to use FileCheck Also, remove unnecessary function attributes, parameters, and comments. It looks like at least some of these tests are not minimal though... llvm-svn: 282620	2016-09-28 19:10:16 +00:00
Artur Pilipenko	b6ce6e5dac	Don't look through addrspacecast in GetPointerBaseWithConstantOffset Pointers in different addrspaces can have different sizes, so it's not valid to look through addrspace cast calculating base and offset for a value. This is similar to D13008. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D24729 llvm-svn: 282612	2016-09-28 17:57:16 +00:00
Sanjay Patel	220a8730fb	[InstSimplify] allow or-of-icmps folds with vector splat constants llvm-svn: 282592	2016-09-28 14:27:21 +00:00
Sanjay Patel	a8f9e57c74	[InstSimplify] add vector splat tests for or-of-icmps llvm-svn: 282591	2016-09-28 14:17:35 +00:00
Sanjay Patel	1b312ad42d	[InstSimplify] allow and-of-icmps folds with vector splat constants llvm-svn: 282590	2016-09-28 13:53:13 +00:00
Adam Nemet	c507ac96f5	[Inliner] Port all opt remarks to new streaming API llvm-svn: 282559	2016-09-27 23:47:03 +00:00
Adam Nemet	0427909434	Pass -S to opt in this test to avoid printing binary on mismatch The purpose of the test is to verify diagnostics. llvm-svn: 282558	2016-09-27 23:46:59 +00:00
Adam Nemet	1142147e41	[Inliner] Fold the analysis remark into the missed remark There is really no reason for these to be separate. The vectorizer started this pretty bad tradition that the text of the missed remarks is pretty meaningless, i.e. vectorization failed. There, you have to query analysis to get the full picture. I think we should just explain the reason for missing the optimization in the missed remark when possible. Analysis remarks should provide information that the pass gathers regardless whether the optimization is passing or not. llvm-svn: 282542	2016-09-27 21:58:17 +00:00
Michael Zolotukhin	1a554be3b6	[LoopSimplify] When simplifying phis in loop-simplify, do it only if it preserves LCSSA form. llvm-svn: 282541	2016-09-27 21:03:45 +00:00
Adam Nemet	a62b7e1a28	Output optimization remarks in YAML (Re-committed after moving the template specialization under the yaml namespace. GCC was complaining about this.) This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282539	2016-09-27 20:55:07 +00:00
Adam Nemet	cc2a3fa8e8	Revert "Output optimization remarks in YAML" This reverts commit r282499. The GCC bots are failing llvm-svn: 282503	2016-09-27 16:39:24 +00:00
Adam Nemet	92e928c10a	Output optimization remarks in YAML This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282499	2016-09-27 16:15:16 +00:00
Ivan Krasin	4ff4f21e15	Revert r277556. Add -lowertypetests-bitsets-level to control bitsets generation Summary: We don't currently need this facility for CFI. Disabling individual hot methods proved to be a better strategy in Chrome. Also, the design of the feature is suboptimal, as pointed out by Peter Collingbourne. Reviewers: pcc Subscribers: kcc Differential Revision: https://reviews.llvm.org/D24948 llvm-svn: 282461	2016-09-27 00:29:53 +00:00
Piotr Padlewski	d9830eb79f	[thinlto] Basic thinlto fdo heuristic Summary: This patch improves thinlto importer by importing 3x larger functions that are called from hot block. I compared performance with the trunk on spec, and there were about 2% on povray and 3.33% on milc. These results seems to be consistant and match the results Teresa got with her simple heuristic. Some benchmarks got slower but I think they are just noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with more iterations to confirm. Geomean of all benchmarks including the noisy ones were about +0.02%. I see much better improvement on google branch with Easwaran patch for pgo callsite inlining (the inliner actually inline those big functions) Over all I see +0.5% improvement, and I get +8.65% on povray. So I guess we will see much bigger change when Easwaran patch will land (it depends on new pass manager), but it is still worth putting this to trunk before it. Implementation details changes: - Removed CallsiteCount. - ProfileCount got replaced by Hotness - hot-import-multiplier is set to 3.0 for now, didn't have time to tune it up, but I see that we get most of the interesting functions with 3, so there is no much performance difference with higher, and binary size doesn't grow as much as with 10.0. Reviewers: eraman, mehdi_amini, tejohnson Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24638 llvm-svn: 282437	2016-09-26 20:37:32 +00:00
Daniel Berlin	1e98c04226	Remove pruning of phi nodes in MemorySSA - it makes updating harder Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24923 llvm-svn: 282419	2016-09-26 17:22:54 +00:00
Matthew Simpson	b764aba2ab	[LV] Scalarize instructions marked scalar after vectorization This patch ensures that we actually scalarize instructions marked scalar after vectorization. Previously, such instructions may have been vectorized instead. Differential Revision: https://reviews.llvm.org/D23889 llvm-svn: 282418	2016-09-26 17:08:37 +00:00
Gor Nishanov	bc0ebb383c	[Coroutines] Part14: Handle coroutines with no suspend points. Summary: If coroutine has no suspend points, remove heap allocation and turn a coroutine into a normal function. Also, if a pattern is detected that coroutine resumes or destroys itself prior to coro.suspend call, turn the suspend point into a simple jump to resume or cleanup label. This pattern occurs when coroutines are used to propagate errors in functions that return expected<T>. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24408 llvm-svn: 282414	2016-09-26 15:49:28 +00:00
Alexey Bataev	793c946ecb	[InstCombine] Fixed bug introduced in r282237 The index of the new insertelement instruction was evaluated in the wrong way, it was considered as the index of the inserted value instead of index of the position, where the value should be inserted. llvm-svn: 282401	2016-09-26 13:18:59 +00:00
Andrea Di Biagio	a82d52d11d	[InstCombine] Teach the udiv folding logic how to handle constant expressions. This patch fixes PR30366. Function foldUDivShl() worked under the assumption that one of the values in input to the function was always an instance of llvm::Instruction. However, function visitUDivOperand() (the only user of foldUDivShl) was clearly violating that precondition; internally, visitUDivOperand() uses pattern matches to check the operands of a udiv. Pattern matchers for binary operators know how to handle both Instruction and ConstantExpr values. This patch fixes the problem in foldUDivShl(). Now we use pattern matchers instead of explicit casts to Instruction. The reduced test case from PR30366 has been added to test file InstCombine/udiv-simplify.ll. Differential Revision: https://reviews.llvm.org/D24565 llvm-svn: 282398	2016-09-26 12:07:23 +00:00
Sanjay Patel	04949faf69	[TLI] isdigit / isascii / toascii param type should match return type (PR30484) We crash in LibCallSimplifier if we don't check the validity of the function signature properly. llvm-svn: 282278	2016-09-23 18:44:09 +00:00
Alexey Bataev	fee9078dcd	[InstCombine] Fix for PR29124: reduce insertelements to shufflevector If inserting more than one constant into a vector: define <4 x float> @foo(<4 x float> %x) { %ins1 = insertelement <4 x float> %x, float 1.0, i32 1 %ins2 = insertelement <4 x float> %ins1, float 2.0, i32 2 ret <4 x float> %ins2 } InstCombine could reduce that to a shufflevector: define <4 x float> @goo(<4 x float> %x) { %shuf = shufflevector <4 x float> %x, <4 x float> <float undef, float 1.0, float 2.0, float undef>, <4 x i32><i32 0, i32 5, i32 6, i32 3> ret <4 x float> %shuf } Also, InstCombine tries to convert shuffle instruction to single insertelement, if one of the vectors is a constant vector and only a single element from this constant should be used in shuffle, i.e. shufflevector <4 x float> %v, <4 x float> <float undef, float 1.0, float undef, float undef>, <4 x i32> <i32 0, i32 5, i32 undef, i32 undef> -> insertelement <4 x float> %v, float 1.0, 1 Differential Revision: https://reviews.llvm.org/D24182 llvm-svn: 282237	2016-09-23 09:14:08 +00:00
Sanjay Patel	30ef70b090	[InstCombine] fold X urem C -> X < C ? X : X - C when C is big (PR28672) We already have the udiv variant of this transform, so I think this is ok for InstCombine too even though there is an increase in IR instructions. As the tests and TODO comments show, the transform can lead to follow-on combines. This should fix: https://llvm.org/bugs/show_bug.cgi?id=28672 Differential Revision: https://reviews.llvm.org/D24527 llvm-svn: 282209	2016-09-22 22:36:26 +00:00
Hans Wennborg	c7957ef86c	Revert r282168 "GVN-hoist: fix store past load dependence analysis (PR30216)" and also the dependent r282175 "GVN-hoist: do not dereference null pointers" It's causing compiler crashes building Harfbuzz (PR30499). llvm-svn: 282199	2016-09-22 21:20:53 +00:00
Sebastian Pop	8e6e3318c2	GVN-hoist: fix store past load dependence analysis (PR30216) To hoist stores past loads, we used to search for potential conflicting loads on the hoisting path by following a MemorySSA def-def link from the store to be hoisted to the previous defining memory access, and from there we followed the def-use chains to all the uses that occur on the hoisting path. The problem is that the def-def link may point to a store that does not alias with the store to be hoisted, and so the loads that are walked may not alias with the store to be hoisted, and even as in the testcase of PR30216, the loads that may alias with the store to be hoisted are not visited. The current patch visits all loads on the path from the store to be hoisted to the hoisting position and uses the alias analysis to ask whether the store may alias the load. I was not able to use the MemorySSA functionality to ask for whether load and store are clobbered: I'm not sure which function to call, so I used a call to AA->isNoAlias(). Store past store is still working as before using a MemorySSA query: I added an extra test to pr30216.ll to make sure store past store does not regress. Differential Revision: https://reviews.llvm.org/D24517 llvm-svn: 282168	2016-09-22 15:33:51 +00:00
Sebastian Pop	f6bfc1ad8b	GVN-hoist: move hoist testcase to GVNHoist dir llvm-svn: 282161	2016-09-22 14:45:46 +00:00
Sebastian Pop	440f15b7fc	GVN-hoist: only hoist relevant scalar instructions Without this patch, GVN-hoist would think that a branch instruction is a scalar instruction and would try to value number it. The patch filters out all such kind of irrelevant instructions. A bit frustrating is that there is no easy way to discard all those very infrequent instructions, a bit like isa<TerminatorInst> that stands for a large family of instructions. I'm thinking that checking for those very infrequent other instructions would cost us more in compilation time than just letting those instructions getting numbered, so I'm still thinking that a simpler check: if (isa<TerminatorInst>(I)) return false; is better than listing all the other less frequent instructions. Differential Revision: https://reviews.llvm.org/D23929 llvm-svn: 282160	2016-09-22 14:45:40 +00:00
Anna Thomas	82c3717f54	[RS4GC] Remat in presence of phi and use live value Summary: Reviewers: Subscribers: llvm-svn: 282150	2016-09-22 13:13:06 +00:00
Dorit Nuzman	d1247a684e	Fix revision 281960 llvm-svn: 282139	2016-09-22 07:56:23 +00:00
Chad Rosier	00eb8db3a1	[LoopInterchange] Track all dependencies, not just anti dependencies. Currently, we give up on loop interchange if we encounter a flow dependency anywhere in the loop list. Worse yet, we don't even track output dependencies. This patch updates the dependency matrix computation to track flow and output dependencies in the same way we track anti dependencies. This improves an internal workload by 2.2x. Note the loop interchange pass is off by default and it can be enabled with '-mllvm -enable-loopinterchange' Differential Revision: https://reviews.llvm.org/D24564 llvm-svn: 282101	2016-09-21 19:16:47 +00:00
Matthew Simpson	15869f86d8	[LV] Don't emit unused scalars for uniform instructions If we identify an instruction as uniform after vectorization, we know that we should only use the value corresponding to the first vector lane of each unroll iteration. However, when scalarizing such instructions, we still produce values for the other vector lanes. This patch prevents us from generating the unused scalars. Differential Revision: https://reviews.llvm.org/D24275 llvm-svn: 282087	2016-09-21 16:50:24 +00:00
Dehao Chen	160fbc3f95	Change the basic block weight calculation algorithm to use max instead of voting. Summary: Now that we have more precise debug info, we should change back to use maximum to get basic block weight. Reviewers: dnovillo Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24788 llvm-svn: 282084	2016-09-21 16:26:51 +00:00
Hans Wennborg	1049085c78	Revert r281895 "Add @llvm.dbg.value entries for the phi node created by -mem2reg" (And follow-up r281964.) It caused PR30468. llvm-svn: 282077	2016-09-21 15:55:53 +00:00
Arnold Schwaighofer	f62ba1031f	DeadArgElim: Don't mark swifterror arguments as unused Replacing swifterror arguments with undef creates invalid IR. rdar://28300490 llvm-svn: 282075	2016-09-21 15:29:08 +00:00
Adam Nemet	e3cef93727	[LV] When reporting about a specific instruction without debug location use loop's This can occur for example if some optimization drops the debug location. llvm-svn: 282048	2016-09-21 03:14:20 +00:00
Michael Kuperstein	79dcc274b4	[InferAttributes] Don't access parameters that don't exist. Check for the correct number of parameters before querying their type. This fixes PR30455. llvm-svn: 282038	2016-09-20 23:10:31 +00:00
Dorit Nuzman	02efef0525	Reverting revision 281960 due to test failures. llvm-svn: 281961	2016-09-20 08:27:48 +00:00
Dorit Nuzman	d3686e5269	[SROA] Preserve llvm.mem.parallel_loop_access metadata. SROA doesn't preserve the llvm.mem.parallel_loop_access metadata when it transforms loads/stores. This patch fixes a couple occurences of this issue. (Partially addresses PR28981). Differential Revision: https://reviews.llvm.org/D23549 llvm-svn: 281960	2016-09-20 07:50:49 +00:00
Dehao Chen	20866ed57e	Handle early inline for hot callsites that reside in the same basic block. Summary: Callsites in the same basic block should share the same hotness. This patch checks for the hottest callsite in the same basic block, and use the hotness for all callsites in that basic block for early inline decisions. It also fixes the test to add "-S" so theat the "CHECK-NOT" is actually checking the content. Reviewers: dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24734 llvm-svn: 281927	2016-09-19 18:38:14 +00:00
Dehao Chen	82667d04c5	Only set branch weight during sample pgo annotation when max_weight of the branch is non-zero. Otherwise use default static profile to set branch probability. Summary: It does not make sense to set equal weights for all unkown branches as we have static branch prediction available. Reviewers: dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24732 llvm-svn: 281912	2016-09-19 16:33:41 +00:00
Dehao Chen	38e3731c47	Use call target count to derive the call instruction weight Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24410 llvm-svn: 281911	2016-09-19 16:06:37 +00:00
Keith Walker	c941252374	Add @llvm.dbg.value entries for the phi node created by -mem2reg When phi nodes are created in the -mem2reg phase, the @llvm.dbg.declare entries are converted to @llvm.dbg.value entries at the place where the store instructions existed. However no entry is created to describe the resulting value of the phi node. The effect of this is especially noticeable in for loops which have a constant for the intial value; the loop control variable's location would be described as the intial constant value in the loop body once the -mem2reg optimization phase was run. This change adds the creation of the @llvm.dbg.value entries to describe variables whose location is the result of a phi node created in -mem2reg. Also when the phi node is finally lowered to a machine instruction it is important that the lowered "load" instruction is placed before the associated DEBUG_VALUE entry describing the value loaded. Differential Revision: https://reviews.llvm.org/D23715 llvm-svn: 281895	2016-09-19 09:49:30 +00:00
James Molloy	0efb96a8ee	[SimplifyCFG] Update (AND) IR flags when CSE'ing instructions We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to. Fixes PR30373. llvm-svn: 281889	2016-09-19 08:23:08 +00:00
Dehao Chen	41cde0b986	Handle Invoke during sample profiler annotation: make it inlinable. Summary: Previously we reline on inst-combine to remove inlinable invoke instructions. This causes trouble because a few extra optimizations are schedule early that could introduce too much CFG change (e.g. simplifycfg removes too much control flow). This patch handles invoke instruction in-place during sample profile annotation, so that we do not rely on instcombine to remove those invoke instructions. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24409 llvm-svn: 281870	2016-09-18 23:11:37 +00:00
Simon Pilgrim	34032cba14	Rename tests llvm-svn: 281863	2016-09-18 20:25:41 +00:00
Xinliang David Li	4ca1733a06	[Profile] Implement select instruction instrumentation in IR PGO Differential Revision: http://reviews.llvm.org/D23727 llvm-svn: 281858	2016-09-18 18:34:07 +00:00
Elena Demikhovsky	5f8cc0c346	[Loop Vectorizer] Consecutive memory access - fixed and simplified Amended consecutive memory access detection in Loop Vectorizer. Load/Store were not handled properly without preceding GEP instruction. Differential Revision: https://reviews.llvm.org/D20789 llvm-svn: 281853	2016-09-18 13:56:08 +00:00
Teresa Johnson	fbb431b292	[ThinLTO] Ensure anonymous globals renamed even at -O0 Summary: This fixes an issue when files are compiled with -flto=thin at default -O0. We need to rename anonymous globals before attempting to write the module summary because all values need names for the summary. This was happening at -O1 and above, but not before the early exit when constructing the pipeline for -O0. Also add an internal -prepare-for-thinlto option to enable this to be tested via opt. Fixes PR30419. Reviewers: mehdi_amini Subscribers: probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24701 llvm-svn: 281840	2016-09-17 20:40:16 +00:00
Sanjay Patel	f26710d97d	[InstCombine] canonicalize vector select with constant vector condition to shuffle As discussed on llvm-dev ( http://lists.llvm.org/pipermail/llvm-dev/2016-August/104210.html ): turn a vector select with constant condition operand into a shuffle as a canonicalization step. Shuffles may be easier to reason about in conjunction with other shuffles and insert/extract. Possible known (minor?) regressions from this change are filed as: https://llvm.org/bugs/show_bug.cgi?id=28530 https://llvm.org/bugs/show_bug.cgi?id=28531 https://llvm.org/bugs/show_bug.cgi?id=30371 If something terrible happens to perf after this commit, feel free to revert until a backend fix is in place. Differential Revision: https://reviews.llvm.org/D24279 llvm-svn: 281787	2016-09-16 22:16:18 +00:00
Evgeniy Stepanov	aa84f050fc	[safestack] Fix assertion failure in stack coloring. This is a fix for PR30318. Clang may generate IR where an alloca is already live when entering a BB with lifetime.start. In this case, conservatively extend the alloca lifetime all the way back to the block entry. llvm-svn: 281784	2016-09-16 22:04:10 +00:00
Sanjay Patel	c96f6db246	[InstCombine] allow vector types for constant folding / computeKnownBits (PR24942) computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine. I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements. This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942: https://llvm.org/bugs/show_bug.cgi?id=24942 Differential Revision: https://reviews.llvm.org/D24677 llvm-svn: 281777	2016-09-16 21:20:36 +00:00
Sanjay Patel	0417c7c3ef	auto-generate checks llvm-svn: 281755	2016-09-16 17:48:16 +00:00
Mehdi Amini	55b06538b5	Fix test after renaming -name-anon-functions pass to -name-anon-globals llvm-svn: 281752	2016-09-16 17:18:16 +00:00
Mehdi Amini	2cac787919	Fix NameAnonFunctions pass: for ThinLTO we need to rename global variables as well A follow-up patch will rename this pass and the source file accordingly, but I figured the non-NFC change will be easier to spot in isolation. Differential Revision: https://reviews.llvm.org/D24641 llvm-svn: 281744	2016-09-16 16:56:25 +00:00
David Majnemer	08566532ae	Add a test for r280191 llvm-svn: 281694	2016-09-16 02:43:36 +00:00
Sanjay Patel	af91d1f81e	[InstCombine] allow icmp (shr/shl) folds for vectors These 2 helper functions were already using APInt internally, so just change the API and caller to allow folds for splats. The scalar regression tests look quite thorough, so I just added a couple of tests to prove that vectors are handled too. These folds should be grouped with the other cmp+shift folds though. That can be an NFC follow-up. llvm-svn: 281663	2016-09-15 21:35:30 +00:00
Sanjay Patel	d590db6eac	regenerate checks llvm-svn: 281655	2016-09-15 20:39:01 +00:00
Mehdi Amini	d880309835	[GlobalOpt] Dead Eliminate declarations GlobalOpt is already dead-code-eliminating global definitions. With this change it also takes care of declarations. Hopefully this should make it now a strict superset of GlobalDCE. This is important for LTO/ThinLTO as we don't want the linker to see "undefined reference" when it processes the input files: it could prevent proper internalization (or even load an extra file from a static archive, changing the behavior of the program!). llvm-svn: 281653	2016-09-15 20:26:27 +00:00
David Majnemer	8b16da8744	[InstCombine] Do not RAUW a constant GEP canRewriteGEPAsOffset expects to process instructions, not constants. This fixes PR30342. llvm-svn: 281650	2016-09-15 20:10:09 +00:00
Sanjay Patel	886a542e23	[InstCombine] allow icmp (sub nsw) folds for vectors Also, clean up the code and comments for the existing folds in foldICmpSubConstant(). llvm-svn: 281631	2016-09-15 18:05:17 +00:00
Sanjay Patel	514068397e	[InstCombine] add vector tests for icmp (sub nsw) llvm-svn: 281630	2016-09-15 17:54:47 +00:00
Sanjay Patel	40c53ea933	[InstCombine] allow (icmp sgt smin(PosA, B), 0) fold for vectors llvm-svn: 281624	2016-09-15 16:23:20 +00:00
Sanjay Patel	78a7c7d26a	[InstCombine] add vector tests for icmp sgt smin llvm-svn: 281623	2016-09-15 16:13:41 +00:00
Sanjay Patel	e957f1fe2c	[InstCombine] auto-generate checks llvm-svn: 281621	2016-09-15 15:48:53 +00:00
Sanjay Patel	7577a3d799	[InstCombine] use m_APInt to allow icmp folds using known bits for splat constant vectors llvm-svn: 281613	2016-09-15 14:15:47 +00:00
NAKAMURA Takumi	6a5bac48cf	llvm/test/Transforms/CorrelatedValuePropagation/alloca.ll REQUIRES +Asserts. llvm-svn: 281598	2016-09-15 09:45:31 +00:00
Wei Mi	f160e345be	Add some shortcuts in LazyValueInfo to reduce compile time of Correlated Value Propagation. The patch is to partially fix PR10584. Correlated Value Propagation queries LVI to check non-null for pointer params of each callsite. If we know the def of param is an alloca instruction, we know it is non-null and can return early from LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is constant. If the def of the pointer is an alloca instruction, we know it is not a constant pointer. These shortcuts can reduce the cost of CVP significantly. Differential Revision: https://reviews.llvm.org/D18066 llvm-svn: 281586	2016-09-15 06:28:34 +00:00
Sanjay Patel	9f036b5a97	[InstCombine] add vector tests for foldICmpUsingKnownBits() llvm-svn: 281559	2016-09-14 23:15:11 +00:00
Matthew Simpson	b25e87fca5	[LV] Process pointer IVs with PHINodes in collectLoopUniforms This patch moves the processing of pointer induction variables in collectLoopUniforms from the consecutive pointer phase of the analysis to the phi node phase. Previously, if a pointer induction variable was used by both a scalarized non-memory instruction as well as a vectorized memory instruction, we would incorrectly identify the pointer as uniform. Pointer induction variables should be treated the same as other phi nodes. That is, they are uniform if all users of the induction variable and induction variable update are uniform. Differential Revision: https://reviews.llvm.org/D24511 llvm-svn: 281485	2016-09-14 14:47:40 +00:00
Andrea Di Biagio	e8e1af3649	[InstCombine] Merged two test files and regenerated checks using update_test_checks.py. NFC. llvm-svn: 281478	2016-09-14 14:18:21 +00:00
Akira Hatanaka	dea090e6b2	[ObjCARC] Traverse chain downwards to replace uses of argument passed to ObjC library call with call return. ARC contraction tries to replace uses of an argument passed to an objective-c library call with the call return value. For example, in the following IR, it replaces uses of argument %9 and uses of the values discovered traversing the chain upwards (%7 and %8) with the call return %10, if they are dominated by the call to @objc_autoreleaseReturnValue. This transformation enables code-gen to tail-call the call to @objc_autoreleaseReturnValue, which is necessary to enable auto release return value optimization. %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = bitcast %0* %8 to i8* %10 = tail call i8* @objc_autoreleaseReturnValue(i8* %9) ret %0* %8 Since r276727, llvm started removing redundant bitcasts and as a result started feeding the following IR to ARC contraction: %7 = tail call i8* @objc_loadWeakRetained(i8** %6) %8 = bitcast i8* %7 to %0* %9 = tail call i8* @objc_autoreleaseReturnValue(i8* %7) ret %0* %8 ARC contraction no longer does the optimization described above since it only traverses the chain upwards and fails to recognize that the function return can be replaced by the call return. This commit changes ARC contraction to traverse the chain downwards too and replace uses of bitcasts with the call return. rdar://problem/28011339 Differential Revision: https://reviews.llvm.org/D24523 llvm-svn: 281419	2016-09-13 23:43:11 +00:00
Sanjay Patel	ab40f9d0ef	add tests for PR28672 I'm not sure if we actually want to transform all of these in InstCombine yet, so I'm not labeling these with FIXME. llvm-svn: 281386	2016-09-13 20:36:13 +00:00
Matt Arsenault	e2e6cfee61	Reapply "InstCombine: Reduce trunc (shl x, K) width." This reapplies r272987 with a fix for infinitely looping when the truncated value is another shift of a constant. llvm-svn: 281379	2016-09-13 19:43:57 +00:00
Andrea Di Biagio	7277afeec1	[ConstantFold] Improve the bitcast folding logic for constant vectors. The constant folder didn't know how to always fold bitcasts of constant integer vectors. In particular, it was unable to handle the case where a constant vector had some undef elements, and the resulting (i.e. bitcasted) vector type had more elements than the original vector type. Example: %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32> On a little endian target, %cast could have been folded to: <4 x i32><i32 undef, i32 undef, i32 2, i32 0> This patch improves the folding logic by teaching how to correctly propagate undef elements in the folded vector. Differential Revision: https://reviews.llvm.org/D24301 llvm-svn: 281343	2016-09-13 14:50:47 +00:00
Andrea Di Biagio	3647a96a44	[InstSimplify] Add tests to show missed bitcast folding opportunities. InstSimplify doesn't always know how to fold a bitcast of a constant vector. In particular, the logic in InstSimplify doesn't know how to handle the case where the constant vector in input contains some undef elements, and the number of elements is smaller than the number of elements of the bitcast vector type. llvm-svn: 281332	2016-09-13 13:17:42 +00:00
Sam Parker	64781ed4bb	Remove InstCombine test file My previous commit should of removed a test file but I missed it. llvm-svn: 281326	2016-09-13 12:33:06 +00:00
Sam Parker	214f7bf5cc	Enable simplify libcalls for ARM PCS Teach SimplifyLibcalls that in can treat functions annotated with apcs, aapcs or aapcs_vfp like normal C functions if they only take and return integer or pointer values, and the target is not iOS. Differential Revision: https://reviews.llvm.org/D24453 llvm-svn: 281322	2016-09-13 12:10:14 +00:00
Peter Collingbourne	d4135bbc30	DebugInfo: New metadata representation for global variables. This patch reverses the edge from DIGlobalVariable to GlobalVariable. This will allow us to more easily preserve debug info metadata when manipulating global variables. Fixes PR30362. A program for upgrading test cases is attached to that bug. Differential Revision: http://reviews.llvm.org/D20147 llvm-svn: 281284	2016-09-13 01:12:59 +00:00
Sanjay Patel	ff00fae8e6	add more tests for PR30273 llvm-svn: 281270	2016-09-12 22:28:29 +00:00
Sanjay Patel	dea26950a0	[InstCombine] add test for PR30327 llvm-svn: 281248	2016-09-12 19:50:08 +00:00
Sanjay Patel	2eea9a1d58	[InstCombine] regenerate checks llvm-svn: 281247	2016-09-12 19:29:26 +00:00
Sanjay Patel	f5887f1fbd	[InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors isSignBitCheck could be changed to take a pointer param to avoid the 'UnusedBit' ugliness. llvm-svn: 281231	2016-09-12 16:25:41 +00:00
David Majnemer	c83044d9bb	[FunctionAttrs] Don't try to infer returned if it is already on an argument Trying to infer the 'returned' attribute if an argument is already 'returned' can lead to verification failure: inference might determine that a different argument is passed through which would result in two different arguments marked as 'returned'. This fixes PR30350. llvm-svn: 281221	2016-09-12 16:04:59 +00:00
Sanjay Patel	db400baa80	[InstCombine] add tests to show missing vector folds llvm-svn: 281219	2016-09-12 15:51:42 +00:00
Sanjay Patel	f9ca770225	[InstCombine] regenerate checks llvm-svn: 281186	2016-09-12 00:12:56 +00:00
Sanjay Patel	a2aabfcc17	[InstCombine] regenerate checks llvm-svn: 281185	2016-09-12 00:08:33 +00:00
James Molloy	104370ab37	[SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd This should actually fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas. The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan). llvm-svn: 281162	2016-09-11 09:00:03 +00:00
James Molloy	18d96e8fa5	[SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example. So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction. Should fix PR30244. llvm-svn: 281160	2016-09-11 08:07:30 +00:00
Justin Lebar	11a3204355	Add handling of !invariant.load to PropagateMetadata. Summary: This will let e.g. the load/store vectorizer propagate this metadata appropriately. Reviewers: arsenm Subscribers: tra, jholewinski, hfinkel, mzolotukhin Differential Revision: https://reviews.llvm.org/D23479 llvm-svn: 281153	2016-09-11 01:39:08 +00:00
Arnold Schwaighofer	5d335559b9	InstCombine: Don't combine loads/stores from swifterror to a new type This generates invalid IR: the only users of swifterror can be call arguments, loads, and stores. rdar://28242257 llvm-svn: 281144	2016-09-10 18:14:57 +00:00
Arnold Schwaighofer	c9277f40fd	Inliner: Don't mark swifterror allocas with lifetime markers This would create a bitcast use which fails the verifier: swifterror values may only be used by loads, stores, and as function arguments. rdar://28233244 llvm-svn: 281114	2016-09-09 22:40:27 +00:00
Matt Arsenault	950a82047b	LSV: Fix incorrectly increasing alignment If the unaligned access has a dynamic offset, it may be odd which would make the adjusted alignment incorrect to use. llvm-svn: 281110	2016-09-09 22:20:14 +00:00
Sanjay Patel	58109abe91	[InstCombine] use m_APInt to allow icmp ult X, C folds for splat constant vectors llvm-svn: 281107	2016-09-09 21:59:37 +00:00
Dehao Chen	22ce5eb051	Do not widen load for different variable in GVN. Summary: Widening load in GVN is too early because it will block other optimizations like PRE, LICM. https://llvm.org/bugs/show_bug.cgi?id=29110 The SPECCPU2006 benchmark impact of this patch: Reference: o2_nopatch (1): o2_patched Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 25.2 -0.08% spec/2006/fp/C++/447.dealII 45.92 +1.05% spec/2006/fp/C++/450.soplex 41.7 -0.26% spec/2006/fp/C++/453.povray 35.65 +1.68% spec/2006/fp/C/433.milc 23.79 +0.42% spec/2006/fp/C/470.lbm 41.88 -1.12% spec/2006/fp/C/482.sphinx3 47.94 +1.67% spec/2006/int/C++/471.omnetpp 22.46 -0.36% spec/2006/int/C++/473.astar 21.19 +0.24% spec/2006/int/C++/483.xalancbmk 36.09 -0.11% spec/2006/int/C/400.perlbench 33.28 +1.35% spec/2006/int/C/401.bzip2 22.76 -0.04% spec/2006/int/C/403.gcc 32.36 +0.12% spec/2006/int/C/429.mcf 41.04 -0.41% spec/2006/int/C/445.gobmk 26.94 +0.04% spec/2006/int/C/456.hmmer 24.5 -0.20% spec/2006/int/C/458.sjeng 28 -0.46% spec/2006/int/C/462.libquantum 55.25 +0.27% spec/2006/int/C/464.h264ref 45.87 +0.72% geometric mean +0.23% For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400. Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames Subscribers: gberry, junbuml Differential Revision: https://reviews.llvm.org/D24096 llvm-svn: 281074	2016-09-09 18:42:35 +00:00
Sanjay Patel	6da0fb8c74	[InstCombine] add tests to show pattern matching failures due to commutation I was looking to fix a bug in getComplexity(), and these cases showed up as obvious failures. I'm not sure how to find these in general though. llvm-svn: 281055	2016-09-09 16:35:20 +00:00
Gor Nishanov	faf36c2e0b	[Coroutines] Part13: Handle single edge PHINodes across suspends Summary: If one of the uses of the value is a single edge PHINode, handle it. Original: %val = something <suspend> %p = PHINode [%val] After Spill + Part13: %val = something %slot = gep val.spill.slot store %val, %slot <suspend> %p = load %slot Plus tiny fixes/changes: * use correct index for coro.free in CoroCleanup * fixup id parameter in coro.free to allow authoring coroutine in plain C with __builtins Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24242 llvm-svn: 281020	2016-09-09 05:39:00 +00:00
Dehao Chen	87823f8e4d	Remove debug info when hoisting instruction from then/else branch. Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch. Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D24164 llvm-svn: 280995	2016-09-08 21:53:33 +00:00
Sanjay Patel	a4c6223319	[InstCombine] regenerate checks llvm-svn: 280993	2016-09-08 21:40:21 +00:00
Matthew Simpson	bfe5e1817b	[LV] Ensure proper handling of multi-use case when collecting uniforms The test case included in r280979 wasn't checking what it was supposed to be checking for the predicated store case. Fixing the test revealed that the multi-use case (when a pointer is used by both vectorized and scalarized memory accesses) wasn't being handled properly. We can't skip over non-consecutive-like pointers since they may have looked consecutive-like with a different memory access. llvm-svn: 280992	2016-09-08 21:38:26 +00:00
Sanjay Patel	ed9fda01a3	[InstCombine] regenerate checks llvm-svn: 280991	2016-09-08 21:32:21 +00:00
Matthew Simpson	408a3abcfe	[LV] Don't mark pointers used by scalarized memory accesses uniform Previously, all consecutive pointers were marked uniform after vectorization. However, if a consecutive pointer is used by a memory access that is eventually scalarized, the pointer won't remain uniform after all. An example is predicated stores. Even though a predicated store may be consecutive, it will still be scalarized, making it's pointer operand non-uniform. This patch updates the logic in collectLoopUniforms to consider the cases where a memory access may be scalarized. If a memory access may be scalarized, its pointer operand is not marked uniform. The determination of whether a given memory instruction will be scalarized or not has been moved into a common function that is used by the vectorizer, cost model, and legality analysis. Differential Revision: https://reviews.llvm.org/D24271 llvm-svn: 280979	2016-09-08 19:11:07 +00:00
Dehao Chen	ebb715b119	Add unittest for r280760 llvm-svn: 280963	2016-09-08 16:53:40 +00:00
Simon Pilgrim	073472a2bf	[InstCombine][X86] Regenerate masked memory op combine tests llvm-svn: 280960	2016-09-08 16:32:37 +00:00
Simon Pilgrim	cd7b2830b9	[InstCombine][X86] Regenerate vperm2f128/vperm2i128 combine tests llvm-svn: 280959	2016-09-08 16:30:46 +00:00
Simon Pilgrim	3b1ecbe66c	[InstCombine][X86] Regenerate insertps combine tests llvm-svn: 280957	2016-09-08 16:15:21 +00:00
Michael Zolotukhin	e72997a524	Revert "[LoopUnroll] Properly update loop-info when cloning prologues and epilogues." This reverts commit r280901. This caused a bunch of failures, reverting it until I investigate them. llvm-svn: 280905	2016-09-08 03:51:30 +00:00
Michael Zolotukhin	5e0a20697e	[LoopUnroll] Properly update loop-info when cloning prologues and epilogues. Summary: When cloning blocks for prologue/epilogue we need to replicate the loop structure from the original loop. It wasn't a problem for the innermost loops, but it led to an incorrect loop info when we unrolled a loop with a child loop - in this case created prologue-loop had a child loop, but loop info didn't reflect that. This fixes PR28888. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas Differential Revision: https://reviews.llvm.org/D24203 llvm-svn: 280901	2016-09-08 01:52:26 +00:00
Sanjay Patel	9b40f98357	[InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for splat constant vectors llvm-svn: 280873	2016-09-07 22:33:03 +00:00
Hal Finkel	ac5803ba91	[SimplifyCFG] Don't try to create metadata-valued PHIs We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866	2016-09-07 21:38:22 +00:00
Sanjay Patel	def931e76a	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861	2016-09-07 20:50:44 +00:00
Justin Lebar	3a5f40c191	[LSV] Use the original loads' names for the extractelement instructions. Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818	2016-09-07 15:49:48 +00:00
Andrea Di Biagio	bdd576dbb0	Regenerate vector bitcast folding tests using update_test_checks.py. Two tests have been merged together, regenerated and then moved to a more appropriate directory. No functional change. llvm-svn: 280814	2016-09-07 14:50:07 +00:00
Andrea Di Biagio	f3fd316223	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807	2016-09-07 12:47:53 +00:00
Andrea Di Biagio	8df5b9cf48	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804	2016-09-07 12:03:03 +00:00
James Molloy	ec905a62ae	[SimplifyCFG] Update workaround for PR30188 to also include loads I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792	2016-09-07 08:40:20 +00:00
James Molloy	bf1837d9c9	[SimplifyCFG] Check PHI uses more accurately PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790	2016-09-07 08:15:54 +00:00
Adam Nemet	c520822dbf	[JumpThreading] Only write back branch-weight MDs for blocks that originally had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / \| eq_1 \| (8) \ \| check_2 (16) (8) / \| eq_2 \| (8) \ \| check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * : check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| \ eq_2 \| (0) \ \ \| ` --- check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with ): check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| /-- eq_2~ \| (0) (back edge) \| check_3 (0) (0) / \| (loop exit) \| (0*) \| (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713	2016-09-06 16:08:33 +00:00
Sanjay Patel	e341c919b0	fix FileCheck variables for test added with r280677 The script (utils/update_test_checks.py) seems to have problems with variable names that start with the same string. llvm-svn: 280679	2016-09-05 23:49:32 +00:00
Gor Nishanov	ccabaca273	[Coroutines] Part12: Handle alloca address-taken Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678	2016-09-05 23:45:45 +00:00
Sanjay Patel	eea2ef7862	[InstCombine] don't assert that division-by-constant has been folded (PR30281) This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677	2016-09-05 23:38:22 +00:00
Sanjay Patel	46f9df5b71	[InstCombine] revert r280637 because it causes test failures on an ARM bot http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676	2016-09-05 22:36:32 +00:00
Oliver Stannard	ef38d53a7e	[SimplifyCFG] Add test for sinking inline asm in if/else This test code previously caused a failure in the module verifier, because SimplifyCFG created this invalid instruction, which tries to take the address of inline asm: %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r" This has been fixed recently, presumably by James Molloy's patches that re-wrote and changed parts of SimplifyCFG, so this patch just adds a regression test for it. Differential Revision: https://reviews.llvm.org/D24231 llvm-svn: 280660	2016-09-05 13:49:26 +00:00
Gor Nishanov	0e18f75a92	[Coroutines] Part11: Add final suspend handling. Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646	2016-09-05 04:44:30 +00:00
Sanjay Patel	c641e9d6ff	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637	2016-09-04 20:58:27 +00:00
Dorit Nuzman	abd15f69b2	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617	2016-09-04 07:49:39 +00:00
Joseph Tremoulet	e92e0a9042	Fix inliner funclet unwind memoization Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610	2016-09-04 01:23:20 +00:00
Matt Arsenault	46a0382ab2	AMDGPU: Do basic folding of class intrinsic This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586	2016-09-03 07:06:58 +00:00
Wei Mi	c37307a5f4	Fix buildbot error. Add -mtriple=x86_64-unknown-linux-gnu for the test and move it to CodeGen/X86. llvm-svn: 280568	2016-09-03 01:43:28 +00:00
Xinliang David Li	40afd5c9e4	[Profile] handle select instruction in 'expect' lowering Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547	2016-09-02 22:03:40 +00:00
Sanjay Patel	70277411d3	[InstCombine] auto-generate assertions for tighter checking llvm-svn: 280531	2016-09-02 19:38:37 +00:00
Wei Mi	c54d1298f5	Split the store of a wide value merged from an int-fp pair into multiple stores. For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505	2016-09-02 17:17:04 +00:00
Sanjay Patel	521f19f249	[InsttCombine] fold insertelement of constant into shuffle with constant operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504	2016-09-02 17:05:43 +00:00
Matthew Simpson	b65c230eab	[LV] Ensure reverse interleaved group GEPs remain uniform For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497	2016-09-02 16:19:22 +00:00
Andrea Di Biagio	805815f407	[instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all NaN. This patch fixes a crash caused by an incorrect folding of an ordered comparison between a packed floating point vector and a splat vector of NaN. An ordered comparison between a vector and a constant vector of NaN, should always be folded into a constant vector where each element is i1 false. Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar 'false'. Later on, this would cause an assertion failure, since the value type of the folded value doesn't match the expected value type of the uses of the original instruction: "Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!". This patch fixes the issue and adds a test case to the already existing test InstSimplify/floating-point-compares.ll. Differential Revision: https://reviews.llvm.org/D24143 llvm-svn: 280488	2016-09-02 14:47:43 +00:00
Alexey Bataev	ed27d69037	[InstCombine] Add test for insertelementinsts with constants. Added a tests that shows that several insertelementinsts with constant indexes/data are not folded into a single shuffleinst. llvm-svn: 280474	2016-09-02 09:00:53 +00:00
James Molloy	f3cf2a494b	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470	2016-09-02 07:29:00 +00:00
NAKAMURA Takumi	9aec5c3797	llvm/test/Transforms/GCOVProfiling/three-element-mdnode.ll: Use %/T instead of %T, not to emit backslashes. llvm-svn: 280451	2016-09-02 01:33:00 +00:00
Sanjay Patel	199a8e6a92	[InstCombine] add tests to show potential shuffle+insert folds llvm-svn: 280403	2016-09-01 19:14:19 +00:00
Sanjay Patel	dd861964d1	[InstCombine] remove fold of an icmp pattern that should never happen While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger this code path. The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic() or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast(). Differential Revision: https://reviews.llvm.org/D24031 llvm-svn: 280370	2016-09-01 14:20:43 +00:00
James Molloy	88cad7e5cf	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280364	2016-09-01 12:58:13 +00:00
James Molloy	eec6df3193	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280351	2016-09-01 10:44:35 +00:00
Hal Finkel	40d7f5c277	Add a counter-function insertion pass As discussed in https://reviews.llvm.org/D22666, our current mechanism to support -pg profiling, where we insert calls to mcount(), or some similar function, is fundamentally broken. We insert these calls in the frontend, which means they get duplicated when inlining, and so the accumulated execution counts for the inlined-into functions are wrong. Because we don't want the presence of these functions to affect optimizaton, they should be inserted in the backend. Here's a pass which would do just that. The knowledge of the name of the counting function lives in the frontend, so we're passing it here as a function attribute. Clang will be updated to use this mechanism. Differential Revision: https://reviews.llvm.org/D22825 llvm-svn: 280347	2016-09-01 09:42:39 +00:00
Nick Lewycky	97e49ac59e	Add -fprofile-dir= to clang. -fprofile-dir=path allows the user to specify where .gcda files should be emitted when the program is run. In particular, this is the first flag that causes the .gcno and .o files to have different paths, LLVM is extended to support this. -fprofile-dir= does not change the file name in the .gcno (and thus where lcov looks for the source) but it does change the name in the .gcda (and thus where the runtime library writes the .gcda file). It's different from a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX. To implement this we split -coverage-file into -coverage-data-file and -coverage-notes-file to specify the two different names. The !llvm.gcov metadata node grows from a 2-element form {string coverage-file, node dbg.cu} to 3-elements, {string coverage-notes-file, string coverage-data-file, node dbg.cu}. In the 3-element form, the file name is already "mangled" with .gcno/.gcda suffixes, while the 2-element form left that to the middle end pass. llvm-svn: 280306	2016-08-31 23:04:32 +00:00
Sanjay Patel	0d70831d73	[InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 ) allows us to remove the ConstantInt check; no other changes needed. llvm-svn: 280300	2016-08-31 22:18:43 +00:00
Sanjay Patel	541aef4661	[InstCombine] allow icmp (div X, Y), C folds for splat constant vectors Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO. llvm-svn: 280299	2016-08-31 21:57:21 +00:00
Geoff Berry	8d84605f25	[EarlyCSE] Optionally use MemorySSA. NFC. Summary: Use MemorySSA, if requested, to do less conservative memory dependency checking. This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC. Reviewers: dberlin, sanjoy, reames, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19821 llvm-svn: 280279	2016-08-31 19:24:10 +00:00
Geoff Berry	64f5ed172a	[EarlyCSE] Allow forwarding a non-invariant load into an invariant load. Reviewers: sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23935 llvm-svn: 280265	2016-08-31 17:45:31 +00:00
Philip Reames	2b1084ac93	[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we only fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any known miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250	2016-08-31 15:12:17 +00:00
James Molloy	cacfc16109	Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd" This reverts commit r280216 - it caused buildbot failures. llvm-svn: 280234	2016-08-31 13:16:52 +00:00
James Molloy	76c9d423a7	Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches" This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280233	2016-08-31 13:16:45 +00:00
James Molloy	06a45483a1	Revert "[SimplifyCFG] Add a workaround to fix PR30188" This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280232	2016-08-31 13:16:36 +00:00
James Molloy	8a66a39cbf	Revert "[SimplifyCFG] Fix bootstrap failure after r280220" This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence. llvm-svn: 280231	2016-08-31 13:16:30 +00:00
James Molloy	b7efa6c227	[SimplifyCFG] Fix bootstrap failure after r280220 We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash. llvm-svn: 280228	2016-08-31 12:33:48 +00:00
James Molloy	171fdac7ce	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280219	2016-08-31 10:46:45 +00:00
James Molloy	c53b40b509	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280217	2016-08-31 10:46:33 +00:00
James Molloy	55bd04cd20	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280216	2016-08-31 10:46:23 +00:00
James Molloy	923e98c232	[SimplifyCFG] Tail-merge calls with sideeffects This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour at least vaguely consistent with the previous version and keep it as close to NFC as I could. There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup along the way to ensure we don't create indirect calls. Should fix PR28964. llvm-svn: 280215	2016-08-31 10:46:16 +00:00
Gor Nishanov	50d7fb974f	[Coroutines] Part 10: Add coroutine promise support. Summary: 1) CoroEarly now lowers llvm.coro.promise intrinsic that allows to obtain a coroutine promise pointer from a coroutine frame and vice versa. 2) CoroFrame now interprets Promise argument of llvm.coro.begin to place CoroutinPromise alloca at a deterministic offset from the coroutine frame. Now, the coroutine promise example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex4.ll). Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23993 llvm-svn: 280184	2016-08-31 00:35:41 +00:00
Alina Sbirlea	3f8f7840bf	[LoadStoreVectorizer] Change VectorSet to Vector to match head and tail positions. Resolves PR29148. Summary: LSV was using two vector sets (heads and tails) to track pairs of adjiacent position to vectorize. A recent optimization is trying to obtain the longest chain to vectorize and assumes the positions in heads(H) and tails(T) match, which is not the case is there are multiple tails for the same head. e.g.: i1: store a[0] i2: store a[1] i3: store a[1] Leads to: H: i1 T: i2 i3 Instead of: H: i1 i1 T: i2 i3 So the positions for instructions that follow i3 will have different indexes in H/T. This patch resolves PR29148. This issue also surfaced the fact that if the chain is too long, and TLI returns a "not-fast" answer, the whole chain will be abandoned for vectorization, even though a smaller one would be beneficial. Added a testcase and FIXME for this. Reviewers: tstellarAMD, arsenm, jlebar Subscribers: mzolotukhin, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24057 llvm-svn: 280179	2016-08-30 23:53:59 +00:00
Sanjay Patel	ddb53dd080	[InstCombine] add tests to show type limitations of InsertRangeTest and callers llvm-svn: 280175	2016-08-30 23:16:59 +00:00
Michael Kuperstein	2954d1db77	[LoopVectorizer] Predicate instructions in blocks with several incoming edges We don't need to limit predication to blocks that have a single incoming edge, we just need to use the right mask. This fixes PR30172. Differential Revision: https://reviews.llvm.org/D24009 llvm-svn: 280148	2016-08-30 20:22:21 +00:00
Daniel Berlin	c943d72d94	IntrArgMemOnly is only defined (and current AA machinery only sanely supports) pointer arguments, and these intrinsics have vector of pointer arguments. Remove ArgMemOnly until we either have the machinery, define a new attribute, or something similar llvm-svn: 280143	2016-08-30 19:58:48 +00:00
Sanjay Patel	b37145712e	[InstCombine] replace divide-by-constant checks with asserts; NFC These folds already have tests for scalar and vector types, except for the vector div-by-0 case, so I'm adding tests for that. llvm-svn: 280115	2016-08-30 17:31:34 +00:00
James Molloy	d13b1239e4	[SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles. llvm-svn: 280072	2016-08-30 10:56:08 +00:00
Teresa Johnson	8c1bc986b5	[ThinLTO] Indirect call promotion fixes for promoted local functions Summary: Fix a couple issues limiting the application of indirect call promotion in ThinLTO mode: - Invoke indirect call promotion before globalopt, since it may eliminate imported functions which appear unreferenced. - Invoke indirect call promotion with InLTO=true so that the PGOFuncName metadata is used to get the name for locals which would have been renamed during promotion. Reviewers: davidxl, mehdi_amini Subscribers: Prazek, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D24004 llvm-svn: 280024	2016-08-29 22:46:56 +00:00
Matthew Simpson	df19502b16	[LV] Move insertelement sequence after scalar definitions After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 llvm-svn: 280001	2016-08-29 20:14:04 +00:00
David Majnemer	e8fd5f9ffd	[SimplifyCFG] Hoisting invalidates metadata We forgot to remove optimization metadata when performing hosting during FoldTwoEntryPHINode. This fixes PR29163. llvm-svn: 279980	2016-08-29 17:14:08 +00:00
Anna Thomas	2bc129c5fd	[StatepointsForGC] Rematerialize in the presence of PHIs Summary: While walking the use chain for identifying rematerializable values in RS4GC, add the case where the current value and base value are the same PHI nodes. This will aid rematerialization of geps and casts instead of relocating. Reviewers: sanjoy, reames, igor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23920 llvm-svn: 279975	2016-08-29 15:41:59 +00:00
Sanjay Patel	25475bcc0c	[Constant] remove fdiv and frem from canTrap() Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in the backend after: https://reviews.llvm.org/rL279970 llvm-svn: 279973	2016-08-29 15:27:17 +00:00
Sanjay Patel	20f02b271b	[SimplifyCFG] rename test file, regenerate checks, and add test The fdiv test shows a problem similar to: https://reviews.llvm.org/rL279970 llvm-svn: 279972	2016-08-29 14:57:53 +00:00
Gor Nishanov	dce9b02677	[Coroutines] Part 9: Add cleanup subfunction. Summary: [Coroutines] Part 9: Add cleanup subfunction. This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll) Intrinsic Changes: * coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine. * coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined. CoroSplit now creates three subfunctions: # f$resume - resume logic # f$destroy - cleanup logic, followed by a deallocation code # f$cleanup - just the cleanup code CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not. Other fixes, improvements: * Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point. * Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>) Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23844 llvm-svn: 279971	2016-08-29 14:34:12 +00:00
Sanjay Patel	5c5311f4e5	[InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors llvm-svn: 279937	2016-08-28 18:18:00 +00:00
Elena Demikhovsky	3622fbfc68	[Loop Vectorizer] Fixed memory confilict checks. Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930	2016-08-28 08:53:53 +00:00
Adam Nemet	cef3314156	[Inliner] Report when inlining fails because callee's def is unavailable Summary: This is obviously an interesting case because it may motivate code restructuring or LTO. Reporting this requires instantiation of ORE in the loop where the call sites are first gathered. I've checked compile-time overhead with -Rpass-with-hotness and the worst slow-down was 6% in mcf and quickly tailing off. As before without -Rpass-with-hotness there is no overhead. Because this could be a pretty noisy diagnostics, it is currently qualified as 'verbose'. As of this patch, 'verbose' diagnostics are only emitted with -Rpass-with-hotness, i.e. when the output is expected to be filtered. Reviewers: eraman, chandlerc, davidxl, hfinkel Subscribers: tejohnson, Prazek, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D23415 llvm-svn: 279860	2016-08-26 20:21:05 +00:00
Tim Shen	3ad8b43cc2	[MemCpy] Add comments for r279769 Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279778	2016-08-25 21:03:46 +00:00
Tim Shen	a3dbead2d6	[MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands Summary: This fixes pr29105. The reason is that lifetime marks creates new aliasing pointers the original ones, but before this patch aliases were not checked in performMemCpyToMemSetOptzn. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279769	2016-08-25 19:27:26 +00:00
Sebastian Pop	5f0d0e60d1	GVN-hoist: fix hoistingFromAllPaths for loops (PR29034) It is invalid to hoist stores or loads if they are not executed on all paths from the hoisting point to the exit of the function. In the testcase, there are paths in the loop that do not execute the stores or the loads, and so hoisting them within the loop is unsafe. The problem is that the current implementation of hoistingFromAllPaths is incomplete: it walks all blocks dominated by the hoisting point, and does not return false when the loop contains a path on which the hoisted ld/st is not executed. Differential Revision: https://reviews.llvm.org/D23843 llvm-svn: 279732	2016-08-25 11:55:47 +00:00
Xinliang David Li	cad3a995a4	[Profile] Propagate branch metadata properly in instcombine Differential Revision: http://reviews.llvm.org/D23590 llvm-svn: 279693	2016-08-25 00:26:32 +00:00
Sanjay Patel	d398d4a39e	[InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors llvm-svn: 279677	2016-08-24 22:22:06 +00:00
Matthew Simpson	abd2be1e2e	[LV] Unify vector and scalar maps This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 llvm-svn: 279649	2016-08-24 18:23:17 +00:00
Sanjoy Das	ff855b6020	[SCCP] Don't delete side-effecting instructions I'm not sure if the `!isa<CallInst>(Inst) && !isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the case we know is broken. llvm-svn: 279647	2016-08-24 18:10:21 +00:00
Gil Rapaport	550148b2f6	[Loop Vectorizer] Support predication of div/rem div/rem instructions in basic blocks that require predication currently prevent vectorization. This patch extends the existing mechanism for predicating stores to handle other instructions and leverages it to predicate divs and rems. Differential Revision: https://reviews.llvm.org/D22918 llvm-svn: 279620	2016-08-24 11:37:57 +00:00
Gor Nishanov	241b041fba	[Coroutines] Part 8: Coroutine Frame Building algorithm Summary: This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled). Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) ... 7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461) 8. Coroutine Frame Building algorithm <= we are here 9. Add f.cleanup subfunction. 10+. The rest of the logic Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23586 llvm-svn: 279609	2016-08-24 04:44:35 +00:00
Matthew Simpson	df2ab917ad	[SLP] Avoid signed integer overflow The test case included with r279125 exposed an existing signed integer overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together with other costs, such as getReductionCost. This patch removes the possibility of assigning a cost of INT_MAX. Since we were previously using INT_MAX as an indicator for "should not vectorize", we now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable" before computing a cost. This patch adds a run-line to the test case used for r279125 that ensures we don't vectorize. Previously, this line would vectorize the test case by chance due to undefined behavior in the cost calculation. Differential Revision: https://reviews.llvm.org/D23723 llvm-svn: 279562	2016-08-23 20:48:50 +00:00
Sanjay Patel	6946e2ade3	[InstSimplify] allow icmp with constant folds for splat vectors, part 2 Completes the m_APInt changes for simplifyICmpWithConstant(). Other commits in this series: https://reviews.llvm.org/rL279492 https://reviews.llvm.org/rL279530 https://reviews.llvm.org/rL279534 https://reviews.llvm.org/rL279538 llvm-svn: 279543	2016-08-23 18:00:51 +00:00
Sanjay Patel	200e3cbfb0	[InstSimplify] allow icmp with constant folds for splat vectors, part 1 llvm-svn: 279538	2016-08-23 17:30:56 +00:00
Sanjay Patel	ada2bb3d5d	[InstSimplify] add tests to show missing vector icmp folds llvm-svn: 279534	2016-08-23 17:13:38 +00:00
Sanjay Patel	5c269d0b7a	[InstSimplify] move icmp with constant tests to another file; NFC ...because like the corresponding code, this is just too big to keep adding to. And the next step is to add a vector version of each of these tests to show missed folds. Also, auto-generate CHECK lines and add comments for the tests that correspond to the source code. llvm-svn: 279530	2016-08-23 16:46:53 +00:00
Sanjay Patel	a392049419	[InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors llvm-svn: 279472	2016-08-22 20:45:06 +00:00
James Molloy	5bf2114265	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd [Recommitting now an unrelated assertion in SROA is sorted out] The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables. This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup. llvm-svn: 279460	2016-08-22 19:07:15 +00:00
Jun Bum Lim	ec8b8cc595	[InstCombine] Allow sinking from unique predecessor with multiple edges Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination. Reviewers: mcrosier, majnemer, spatel, reames Subscribers: reames, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23722 llvm-svn: 279448	2016-08-22 18:21:56 +00:00
James Molloy	475f4a763f	Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd" This reverts commit r279443. It caused buildbot failures. llvm-svn: 279447	2016-08-22 18:13:12 +00:00
James Molloy	353052698a	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables. This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup. llvm-svn: 279443	2016-08-22 17:40:23 +00:00
Artur Pilipenko	bc76ecada0	Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks. See https://reviews.llvm.org/D18777 for details. llvm-svn: 279433	2016-08-22 13:14:07 +00:00
Artur Pilipenko	b78ad9d41f	Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks. See comments on https://reviews.llvm.org/D18777 for details. llvm-svn: 279432	2016-08-22 13:12:07 +00:00
Balaram Makam	a927aa4ad0	[PM] Port LoopDataPrefetch AArch64 tests to new pass manager Reviewers: mcrosier, tejohnson Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23724 llvm-svn: 279431	2016-08-22 12:59:58 +00:00
Sanjay Patel	643d21a62c	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 4 This concludes the fixes for icmp+shl in this series: https://reviews.llvm.org/rL279339 https://reviews.llvm.org/rL279398 https://reviews.llvm.org/rL279399 llvm-svn: 279401	2016-08-21 17:10:07 +00:00
Sanjay Patel	163a5ab799	remove FIXME comment; fixed by previous commit llvm-svn: 279400	2016-08-21 16:40:42 +00:00
Sanjay Patel	7ffcde7422	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 3 This is a partial enablement (move the ConstantInt guard down). llvm-svn: 279399	2016-08-21 16:35:34 +00:00
Sanjay Patel	7e09f13fed	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 2 This is a partial enablement (move the ConstantInt guard down). llvm-svn: 279398	2016-08-21 16:28:22 +00:00
Matthew Simpson	235e479984	Reapply "[SLP] Initialize VectorizedValue when gathering" The test case included in r279125 exposed existing undefined behavior in the SLP vectorizer that it did not introduce. This patch reapplies the original patch, but modifies the test case to avoid hitting the undefined behavior. This allows us to close PR28330 while keeping the UBSan bot happy. The undefined behavior the original test uncovered will be addressed in a follow-on patch. Reference: https://llvm.org/bugs/show_bug.cgi?id=28330 llvm-svn: 279370	2016-08-20 14:49:02 +00:00
Vitaly Buka	cc7db13bf0	Revert "[SLP] Initialize VectorizedValue when gathering" to fix ubsan bot. This reverts commit r279125. https://reviews.llvm.org/D23410 llvm-svn: 279363	2016-08-20 07:09:39 +00:00
Xinliang David Li	3bf2d58a21	[Profile] add test with large counts llvm-svn: 279361	2016-08-20 05:28:42 +00:00
Sanjay Patel	fa7de606c4	[InstCombine] use m_APInt to allow icmp (shl X, Y), C folds for splat constant vectors, part 1 This is a partial enablement (move the ConstantInt guard down) because there are many different folds here and one of the later ones will require reworking 'isSignBitCheck'. llvm-svn: 279339	2016-08-19 22:33:26 +00:00
Reid Kleckner	98a48afa5d	Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd" This reverts commit r279229. It breaks intrinsic function calls in diamonds. llvm-svn: 279313	2016-08-19 20:22:39 +00:00
Reid Kleckner	a871d3872a	Fix regression in InstCombine introduced by r278944 The intended transform is: // Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0 // -> and (icmp eq P, null), (icmp eq Q, null). P and Q are both pointer types, but may have different types. We need two calls to getNullValue() to make the icmps. llvm-svn: 279271	2016-08-19 16:53:18 +00:00
David Majnemer	5554edabef	[CloneFunction] Don't remove unrelated nodes from the CGSSC CGSCC use a WeakVH to track call sites. RAUW a call within a function can result in that WeakVH getting confused about whether or not the call site is still around. llvm-svn: 279268	2016-08-19 16:37:40 +00:00
Sanjay Patel	a867afe094	[InstCombine] use m_APInt to allow icmp (shl 1, Y), C folds for splat constant vectors llvm-svn: 279266	2016-08-19 16:12:16 +00:00
Sanjay Patel	57b12d3876	[InstCombine] use m_APInt to allow icmp X, C folds for splat constant vectors Of course, we really need to refactor and fix all of the cmp predicates, but this one is interesting because without it, we later perform an information-losing transform of icmp (shl 1, Y), C, and we can't recover the better fold. llvm-svn: 279263	2016-08-19 15:40:44 +00:00
Sanjay Patel	78111a7617	[InstCombine] add tests for missing vector icmp folds llvm-svn: 279259	2016-08-19 15:27:28 +00:00
Sanjay Patel	14cdf1968f	[InstCombine] add missing tests for basic icmp folds These are implicitly included as part of larger test cases, but they don't exist stand-alone (and don't happen for vectors...). llvm-svn: 279257	2016-08-19 15:21:45 +00:00
James Molloy	11a1936b70	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. llvm-svn: 279229	2016-08-19 10:10:27 +00:00
Amaury Sechet	763c59dc9a	Make cltz and cttz zero undef when the operand cannot be zero in InstCombine Summary: Also add popcount(n) == bitsize(n) -> n == -1 transformation. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23134 llvm-svn: 279141	2016-08-18 20:43:50 +00:00
Sanjay Patel	40e8ca46ad	[InstCombine] use m_APInt to allow icmp (trunc X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 https://reviews.llvm.org/rL278935 https://reviews.llvm.org/rL278945 https://reviews.llvm.org/rL279066 https://reviews.llvm.org/rL279077 https://reviews.llvm.org/rL279101 llvm-svn: 279133	2016-08-18 20:28:54 +00:00
Matthew Simpson	11db6b6b8c	[SLP] Initialize VectorizedValue when gathering We abort building vectorizable trees in some cases (e.g., if the maximum recursion depth is reached, if the region size is too large, etc.). If this happens for a reduction, we can be left with a root entry that needs to be gathered. For these cases, we need make sure we actually set VectorizedValue to the resulting vector. This patch ensures we properly set VectorizedValue, and it also ensures the insertelement sequence generated for the gathers is inserted at the correct location. Reference: https://llvm.org/bugs/show_bug.cgi?id=28330 Differential Revison: https://reviews.llvm.org/D23410 llvm-svn: 279125	2016-08-18 19:50:32 +00:00
Sanjay Patel	fa5ca2bf46	[InstCombine] use m_APInt to allow icmp (udiv X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 https://reviews.llvm.org/rL278935 https://reviews.llvm.org/rL278945 https://reviews.llvm.org/rL279066 https://reviews.llvm.org/rL279077 llvm-svn: 279101	2016-08-18 17:55:59 +00:00
Artur Pilipenko	615b820af6	CVP. Turn marking adds as no wrap (introduced by r278107) off by default It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression. llvm-svn: 279082	2016-08-18 16:08:35 +00:00
Sanjay Patel	6347807f87	[InstCombine] use m_APInt to allow icmp (mul X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 https://reviews.llvm.org/rL278935 https://reviews.llvm.org/rL278945 https://reviews.llvm.org/rL279066 llvm-svn: 279077	2016-08-18 15:44:44 +00:00
Sanjay Patel	4c5e60d95c	[InstCombine] use m_APInt to allow icmp (xor X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 https://reviews.llvm.org/rL278935 https://reviews.llvm.org/rL278945 llvm-svn: 279066	2016-08-18 14:10:48 +00:00
Chad Rosier	83f6bbc154	[Reassociate] Add test for PR28367. llvm-svn: 279063	2016-08-18 13:22:37 +00:00
Sanjay Patel	3c92db7560	[InstCombine] add test for missing vector icmp fold Also, add a scalar test to demonstrate one of the intermediate folds that is necessary to accomplish the existing, multi-step test. And simplify the vector tests to only check the final piece of that multi-step transform. llvm-svn: 278995	2016-08-17 22:18:57 +00:00
Sanjay Patel	84ff18ba92	[InstCombine] minimize tests and autogenerate checks llvm-svn: 278960	2016-08-17 19:56:10 +00:00
Sanjay Patel	63e14a07e8	[InstCombine] use m_APInt to allow icmp (or X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 https://reviews.llvm.org/rL278935 llvm-svn: 278945	2016-08-17 16:38:57 +00:00
Sanjay Patel	f636d762ed	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278943	2016-08-17 16:23:15 +00:00
Chad Rosier	ea7e4647db	Revert "Reassociate: Reprocess RedoInsts after each inst". This reverts commit r258830, which introduced a bug described in PR28367. PR28367 llvm-svn: 278938	2016-08-17 15:54:39 +00:00
Sanjay Patel	4f7eb2aa95	[InstCombine] use m_APInt to allow icmp (add X, Y), C folds for splat constant vectors This is a sibling of: https://reviews.llvm.org/rL278859 llvm-svn: 278935	2016-08-17 15:24:30 +00:00
Chad Rosier	a6822f64f3	Revert "[Reassociate] Avoid iterator invalidation when negating value." This reverts commit r278928 due to lit test failures. llvm-svn: 278929	2016-08-17 14:31:34 +00:00
Chad Rosier	cf3e8121a6	[Reassociate] Avoid iterator invalidation when negating value. Differential Revision: https://reviews.llvm.org/D23464 PR28367 llvm-svn: 278928	2016-08-17 14:16:45 +00:00
Chandler Carruth	67fc52f067	[PM] Port the always inliner to the new pass manager in a much more minimal and boring form than the old pass manager's version. This pass does the very minimal amount of work necessary to inline functions declared as always-inline. It doesn't support a wide array of things that the legacy pass manager did support, but is alse ... about 20 lines of code. So it has that going for it. Notably things this doesn't support: - Array alloca merging - To support the above, bottom-up inlining with careful history tracking and call graph updates - DCE of the functions that become dead after this inlining. - Inlining through call instructions with the always_inline attribute. Instead, it focuses on inlining functions with that attribute. The first I've omitted because I'm hoping to just turn it off for the primary pass manager. If that doesn't pan out, I can add it here but it will be reasonably expensive to do so. The second should really be handled by running global-dce after the inliner. I don't want to re-implement the non-trivial logic necessary to do comdat-correct DCE of functions. This means the -O0 pipeline will have to be at least 'always-inline,global-dce', but that seems reasonable to me. If others are seriously worried about this I'd like to hear about it and understand why. Again, this is all solveable by factoring that logic into a utility and calling it here, but I'd like to wait to do that until there is a clear reason why the existing pass-based factoring won't work. The final point is a serious one. I can fairly easily add support for this, but it seems both costly and a confusing construct for the use case of the always inliner running at -O0. This attribute can of course still impact the normal inliner easily (although I find that a questionable re-use of the same attribute). I've started a discussion to sort out what semantics we want here and based on that can figure out if it makes sense ta have this complexity at O0 or not. One other advantage of this design is that it should be quite a bit faster due to checking for whether the function is a viable candidate for inlining exactly once per function instead of doing it for each call site. Anyways, hopefully a reasonable starting point for this pass. Differential Revision: https://reviews.llvm.org/D23299 llvm-svn: 278896	2016-08-17 02:56:20 +00:00
Sanjay Patel	7ad324b396	[InstCombine] add tests for fold with no coverage and missing vector fold llvm-svn: 278867	2016-08-16 23:18:42 +00:00
Sanjay Patel	e47df1ac62	[InstCombine] use m_APInt to allow icmp (sub X, Y), C folds for splat constant vectors llvm-svn: 278859	2016-08-16 21:53:19 +00:00
David Majnemer	00940fb854	Make MDNode::intersect faster than O(n * m) It is pretty easy to get it down to O(nlogn + mlogm). This implementation has the added benefit of automatically deduplicating entries between the two sets. llvm-svn: 278837	2016-08-16 18:48:37 +00:00
David Majnemer	fa0f1e660b	Don't passively concatenate MDNodes I have audited all the callers of concatenate and none require duplicate entries to service concatenation. These duplicates serve no purpose but to needlessly embiggen the IR. N.B. Layering getMostGenericAliasScope on top of concatenate makes it O(nlogn + mlogm) instead of O(n*m). llvm-svn: 278836	2016-08-16 18:48:34 +00:00
Gor Nishanov	74309fa014	[Coroutines] Part 7: Split coroutine into subfunctions Summary: This patch adds simple coroutine splitting logic to CoroSplit pass. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) ... 7. Split coroutine into subfunctions <= we are here 8. Coroutine Frame Building algorithm 9. Handle coroutine with unwinds 10+. The rest of the logic Reviewers: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23461 llvm-svn: 278830	2016-08-16 18:04:14 +00:00
David Majnemer	5c5df6283a	[InstSimplify] Fold gep (gep V, C), (xor V, -1) to C-1 llvm-svn: 278779	2016-08-16 06:13:46 +00:00
Sanjay Patel	46a68ba618	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278768	2016-08-16 00:48:38 +00:00
Sanjay Patel	f1bf21c56b	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278765	2016-08-16 00:27:12 +00:00
Sanjay Patel	df77a4dbb0	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278757	2016-08-15 22:43:52 +00:00
Sanjay Patel	41520e1712	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278751	2016-08-15 21:47:50 +00:00
Sanjay Patel	638b613101	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278747	2016-08-15 21:37:24 +00:00
Sanjay Patel	55d87a88cc	update tests to use FileCheck and exact checking llvm-svn: 278741	2016-08-15 21:02:25 +00:00
Sanjay Patel	3e9acec2fa	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278737	2016-08-15 20:56:11 +00:00
Sanjay Patel	b37bd6d7b7	[InstCombine] add test for missing vector icmp fold llvm-svn: 278727	2016-08-15 20:02:40 +00:00
Sanjay Patel	7d98be81cc	[InstCombine] add tests for vector icmp folds llvm-svn: 278726	2016-08-15 19:58:21 +00:00
Sanjay Patel	b860859611	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278717	2016-08-15 19:16:33 +00:00
Sanjay Patel	6866b82a05	update test to use FileCheck and autogenerated checks llvm-svn: 278714	2016-08-15 18:56:10 +00:00
Sanjay Patel	2044a8eba9	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278709	2016-08-15 18:45:10 +00:00
Sanjay Patel	aaf34d1bfc	[InstCombine] add test for missing vector icmp fold llvm-svn: 278708	2016-08-15 18:39:54 +00:00
Sanjay Patel	195eb9340a	minimize test llvm-svn: 278707	2016-08-15 18:35:44 +00:00
Sanjay Patel	3f506daf8c	remove unnecessary IR comments about uses llvm-svn: 278705	2016-08-15 18:32:50 +00:00
Sanjay Patel	d391b0d69e	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278704	2016-08-15 18:26:56 +00:00
Sanjay Patel	cbd62a082c	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278689	2016-08-15 17:55:39 +00:00
Sanjay Patel	566b348987	[InstCombine] auto-generate exact checks Note that several of these tests belong in InstSimplify rather than InstCombine because they return existing operands or constants. llvm-svn: 278684	2016-08-15 17:19:07 +00:00
Sanjay Patel	a7b9bb3785	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278683	2016-08-15 17:10:35 +00:00
Reid Kleckner	70a600b8bb	Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd" This reverts commit r278660. It causes downstream assertion failure in InstCombine on shuffle instructions. Comes up in __mm_swizzle_epi32. llvm-svn: 278672	2016-08-15 15:42:31 +00:00
James Molloy	9a3c82f5cf	[SimplifyCFG] Rewrite SinkThenElseCodeToEnd The new version has several advantages: 1) IMSHO it's more readable and neater 2) It handles loads and stores properly 3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch. With this change we can now finally sink load-modify-store idioms such as: if (a) return b += 3; else return b += 4; => %z = load i32, i32* %y %.sink = select i1 %a, i32 5, i32 7 %b = add i32 %z, %.sink store i32 %b, i32* %y ret i32 %b When this works for switches it'll be even more powerful. llvm-svn: 278660	2016-08-15 08:04:56 +00:00
James Molloy	196ad0823e	[LSR] Don't try and create post-inc expressions on non-rotated loops If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs! Motivating testcase: void f(float a, float b, float c, int n) { while (n-- > 0) c++ = a++ + b++; } It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage. llvm-svn: 278658	2016-08-15 07:53:03 +00:00
Sanjay Patel	52fe9ae990	[InstCombine] add test for missing vector icmp fold llvm-svn: 278639	2016-08-14 22:56:46 +00:00
Sanjay Patel	7e57b00274	[InstCombine] add tests for vector icmp folds llvm-svn: 278637	2016-08-14 22:44:10 +00:00
Sanjay Patel	8554f70c07	[InstCombine] add test for potentially missing vector icmp fold llvm-svn: 278636	2016-08-14 22:30:07 +00:00
Sanjay Patel	beebe05af1	[InstCombine] add test for missing vector icmp fold llvm-svn: 278635	2016-08-14 22:29:27 +00:00
Sanjay Patel	ba1f9fbddc	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278634	2016-08-14 22:28:50 +00:00
Sanjay Patel	f6559404d5	[InstCombine] remove unnecessary function attributes from tests llvm-svn: 278633	2016-08-14 21:48:21 +00:00
Sanjay Patel	b44ca3bfa9	[InstCombine] add tests for missing vector icmp folds llvm-svn: 278632	2016-08-14 21:36:22 +00:00
Sanjay Patel	bbb3dffd0a	[InstCombine] add test for missing vector icmp fold llvm-svn: 278631	2016-08-14 21:05:08 +00:00
Sanjay Patel	66a3457a4c	[InstCombine] add test for missing vector icmp fold llvm-svn: 278630	2016-08-14 20:39:42 +00:00
Sanjoy Das	2143447c73	[IRCE] Create llvm::Loop instances for cloned out loops llvm-svn: 278618	2016-08-14 01:04:46 +00:00
Sanjoy Das	7a18a238c6	[IRCE] Don't iterate on loops that were cloned out IRCE has the ability to further version pre-loops and post-loops that it created, but this isn't useful at all. This change teaches IRCE to leave behind some metadata in the loops it creates (by cloning the main loop) so that these new loops are not re-processed by IRCE. Today this bug is hidden by another bug -- IRCE does not update LoopInfo properly so the loop pass manager does not re-invoke IRCE on the loops it split out. However, once the latter is fixed the bug addressed in this change causes IRCE to infinite-loop in some cases (e.g. it splits out a pre-loop, a pre-pre-loop from that, a pre-pre-pre-loop from that and so on). llvm-svn: 278617	2016-08-14 01:04:36 +00:00
Sanjoy Das	1b1272f515	[IRCE] Fix test case; NFC The (negative) test case is supposed to check that IRCE does not muck with range checks it cannot handle, not that it does the right thing in the absence of profiling information. llvm-svn: 278612	2016-08-13 23:36:40 +00:00
Sanjoy Das	2a2f14d7ab	[IRCE] Be resilient in the face of non-simplified loops Loops containing `indirectbr` may not be in simplified form, even after running LoopSimplify. Reject then gracefully, instead of tripping an assert. llvm-svn: 278611	2016-08-13 23:36:35 +00:00
Mehdi Amini	8c629ecf3a	Revert "Revert "Invariant start/end intrinsics overloaded for address space"" This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530. llvm-svn: 278609	2016-08-13 23:31:24 +00:00
Mehdi Amini	164ac651da	Revert "Invariant start/end intrinsics overloaded for address space" This reverts commit r276447. llvm-svn: 278608	2016-08-13 23:27:32 +00:00
Teresa Johnson	1eca6bc6a7	[PM] Port LoopDataPrefetch to new pass manager Summary: Refactor the existing support into a LoopDataPrefetch implementation class and a LoopDataPrefetchLegacyPass class that invokes it. Add a new LoopDataPrefetchPass for the new pass manager that utilizes the LoopDataPrefetch implementation class. Reviewers: mehdi_amini Subscribers: sanjoy, mzolotukhin, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23483 llvm-svn: 278591	2016-08-13 04:11:27 +00:00
Sanjoy Das	3502511548	[IndVars] Ignore (s\|z)exts that don't extend the induction variable `IVVisitor::visitCast` used to have the invariant that if the instruction it was passed was a sext or zext instruction, the result of the instruction would be wider than the induction variable. This is no longer true after rL275037, so this change teaches `IndVarSimplify` s implementation of `IVVisitor::visitCast` to work with the relaxed invariant. A corresponding change to SimplifyIndVar to preserve the said invariant after rL275037 would also work, but given how `IVVisitor::visitCast` is spelled (no indication of said invariant), I figured the current fix is cleaner. Fixes PR28935. llvm-svn: 278584	2016-08-13 00:58:31 +00:00
Tim Shen	c9c0d2dcb5	[LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop body, even if that cycle isn't a natural loop. Fixes PR28541. Differential Revision: https://reviews.llvm.org/D22952 llvm-svn: 278573	2016-08-12 22:47:13 +00:00
Reid Kleckner	6ee00a2602	[Inliner] Don't treat inalloca allocas as static They aren't static, and moving them to the entry block across something else will only result in tears. Root cause of http://crbug.com/636558. llvm-svn: 278571	2016-08-12 22:23:04 +00:00
Michael Kuperstein	31b8399beb	[PM] Port LowerInvoke to the new pass manager llvm-svn: 278531	2016-08-12 17:28:27 +00:00
Dehao Chen	c0a1e432c7	Fine tuning of sample profile propagation algorithm. Summary: The refined propagation algorithm is more accurate and robust. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23224 llvm-svn: 278522	2016-08-12 16:22:12 +00:00
Artur Pilipenko	2e8f82d962	[LVI] Take guards into account Teach LVI to gather control dependant constraints from guards. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D23358 llvm-svn: 278518	2016-08-12 15:52:23 +00:00
Artur Pilipenko	b623088abe	[LVI] Fix potential memory corruption in getValueFromCondition Rewrite Visited[Cond] = getValueFromConditionImpl(..., Visited) statement which can lead to a memory corruption since getValueFromConditionImpl changes Visited map and invalidates the iterators. llvm-svn: 278514	2016-08-12 15:08:15 +00:00
Artur Pilipenko	6669f253d5	[LVI] Take range metadata into account while calculating icmp condition constraints Take range metadata into account for conditions like this: %length = load i32, i32* %length_ptr, !range !{i32 0, i32 2147483647} %cmp = icmp ult i32 %a, %length This is a common pattern for range checks where the length of the array is dynamically loaded. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D23267 llvm-svn: 278496	2016-08-12 10:14:11 +00:00
Artur Pilipenko	635625855f	[LVI] Handle any predicate in comparisons like icmp <pred> (add Val, Offset), ... Currently LVI can only gather value constraints from comparisons like: * icmp <pred> Val, ... * icmp ult (add Val, Offset), ... In fact we can handle any predicate in latter comparisons. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D23357 llvm-svn: 278493	2016-08-12 10:05:11 +00:00
Gor Nishanov	0f303accde	[Coroutines]: Part6b: Add coro.id intrinsic. Summary: 1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined. 2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated. 3. doc + test + code updated to support the new intrinsic. Reviewers: mehdi_amini, majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23412 llvm-svn: 278481	2016-08-12 05:45:49 +00:00
Eli Friedman	a6707f56b5	[DSE] Don't remove stores made live by a call which unwinds. Issue exposed by noalias or more aggressive alias analysis. Fixes http://llvm.org/PR25422. Differential revision: https://reviews.llvm.org/D21007 llvm-svn: 278451	2016-08-12 01:09:53 +00:00
Piotr Padlewski	332b3b2210	Don't import variadic functions Summary: This patch adds IsVariadicFunction bit to summary in order to not import variadic functions. Inliner doesn't inline variadic functions because it is hard to reason about it. This one small fix improves Importer by about 16% (going from 86% to 100% of imported functions that are inlined anywhere) on some spec benchmarks like 'int' and others. Reviewers: eraman, mehdi_amini, tejohnson Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23339 llvm-svn: 278432	2016-08-11 22:13:57 +00:00
Ehsan Amiri	dbcfea9811	Extend trip count instead of truncating IV in LFTR, when legal When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because (1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant). (2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change). I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere. To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence. This commit contains changes in a newly added testcase which was not included in the previous commit (which was reverted later on). https://reviews.llvm.org/D23075 llvm-svn: 278421	2016-08-11 21:31:40 +00:00
Geoff Berry	d01828096f	[SCEV] Update interface to handle SCEVExpander insert point motion. Summary: This is an extension of the fix in r271424. That fix dealt with builder insert points being moved by SCEV expansion, but only for the lifetime of the expand call. This change modifies the interface so that LSR can safely call expand multiple times at the same insert point and do the right thing if one of the expansions decides to move the original insert point. This is a fix for PR28719. Reviewers: sanjoy Subscribers: llvm-commits, mcrosier, mzolotukhin Differential Revision: https://reviews.llvm.org/D23342 llvm-svn: 278413	2016-08-11 21:05:17 +00:00
Daniel Berlin	2698cbb4f1	Move GVNHoist tests into their own directory since it is a separate pass llvm-svn: 278404	2016-08-11 20:35:07 +00:00
Daniel Berlin	f75fd1b58b	Fix PR 28933 Summary: This fixes PR 28933 by making sure GVNHoist does not try to recreate memory accesses when it has not actually moved them. Reviewers: sebpop Subscribers: llvm-commits, george.burgess.iv Differential Revision: https://reviews.llvm.org/D23411 llvm-svn: 278401	2016-08-11 20:32:43 +00:00
Ivan Krasin	f3403fd2c8	WholeProgramDevirt: generate more detailed and accurate remarks. Summary: Keep track of all methods for which we have devirtualized at least one call and then print them sorted alphabetically. That allows to avoid duplicates and also makes the order deterministic. Add optimization names into the remarks, so that it's easier to understand how has each method been devirtualized. Fix a bug when wrong methods could have been reported for tryVirtualConstProp. Reviewers: kcc, mehdi_amini Differential Revision: https://reviews.llvm.org/D23297 llvm-svn: 278389	2016-08-11 19:09:02 +00:00
Andrew Kaylor	7cdf01ef58	Target independent codesize heuristics for Loop Idiom Recognition Patch by Sunita Marathe Differential Revision: https://reviews.llvm.org/D21449 llvm-svn: 278378	2016-08-11 18:28:33 +00:00
Ehsan Amiri	3818f1b38a	revert 278334 llvm-svn: 278337	2016-08-11 14:51:14 +00:00
Ehsan Amiri	b9fcc2b171	Extend trip count instead of truncating IV in LFTR, when legal When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because (1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant). (2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change). I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere. To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence. https://reviews.llvm.org/D23075 llvm-svn: 278334	2016-08-11 13:51:20 +00:00
Xinliang David Li	76a0108be4	[Profile] improve warning control option Change --no-pgo-warn-missing to -pgo-warn-missing-function and negate the default. /NFC Add more test to make sure the warning is off by default llvm-svn: 278314	2016-08-11 05:09:30 +00:00
Andrew Kaylor	498d3113c3	[IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18867 llvm-svn: 278269	2016-08-10 18:56:35 +00:00
Andrew Kaylor	b10f6876cd	[ValueTracking] An improvement to IR ValueTracking on Non-negative Integers Patch by Li Huang Differential Revision: https://reviews.llvm.org/D18777 llvm-svn: 278267	2016-08-10 18:47:19 +00:00
Gor Nishanov	b2a9c02521	[Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible Summary: A particular coroutine usage pattern, where a coroutine is created, manipulated and destroyed by the same calling function, is common for coroutines implementing RAII idiom and is suitable for allocation elision optimization which avoid dynamic allocation by storing the coroutine frame as a static `alloca` in its caller. coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed when dynamic allocation elision happens: ``` entry: %elide = call i8* @llvm.coro.alloc() %need.dyn.alloc = icmp ne i8* %elide, null br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc dyn.alloc: %alloc = call i8* @CustomAlloc(i32 4) br label %coro.begin coro.begin: %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ] %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null, i8* bitcast ([2 x void (%f.frame)]* @f.resumers to i8)) ``` and ``` %mem = call i8 @llvm.coro.free(i8* %hdl) %need.dyn.free = icmp ne i8* %mem, null br i1 %need.dyn.free, label %dyn.free, label %if.end dyn.free: call void @CustomFree(i8* %mem) br label %if.end if.end: ... ``` If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant. Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234) 6.Add coroutine heap elision + tests. <= we are here 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23245 llvm-svn: 278242	2016-08-10 16:40:39 +00:00
Artur Pilipenko	fd223d5d25	[LVI] Handle conditions in the form of (cond1 && cond2) Teach LVI how to gather information from conditions in the form of (cond1 && cond2). Our out-of-tree front-end emits range checks in this form. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23200 llvm-svn: 278231	2016-08-10 15:13:15 +00:00
Artur Pilipenko	e896325ca3	Add a test case for r278217 "[LVI] Relax the assertion about LVILatticeVal type in getConstantRange" llvm-svn: 278226	2016-08-10 13:51:01 +00:00
Artur Pilipenko	e171ea8a33	Teach CorrelatedValuePropagation to mark adds as no wrap This is a resubmission of previously reverted r277592. It was hitting overly strong assertion in getConstantRange which was relaxed in r278217. Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23059 llvm-svn: 278220	2016-08-10 13:08:34 +00:00
Davide Italiano	873219c406	[SimplifyLibCalls] Restore the old behaviour, emit a libcall. Hal pointed out that the semantic of our intrinsic and the libc call are slightly different. Add a comment while I'm here to explain why we can't emit an intrinsic. Thanks Hal! llvm-svn: 278200	2016-08-10 06:33:32 +00:00
Adam Nemet	896c09bd10	[Inliner,OptDiag] Add hotness attribute to opt diagnostics Summary: The inliner not being a function pass requires the work-around of generating the OptimizationRemarkEmitter and in turn BFI on demand. This will go away after the new PM is ready. BFI is only computed inside ORE if the user has requested hotness information for optimization diagnostitics (-pass-remark-with-hotness at the 'opt' level). Thus there is no additional overhead without the flag. Reviewers: hfinkel, davidxl, eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22694 llvm-svn: 278185	2016-08-10 00:44:44 +00:00
Michael Zolotukhin	aae168f993	[LoopSimplify] Rebuild LCSSA for the inner loop after separating nested loops. Summary: This hopefully fixes PR28825. The problem now was that a value from the original loop was used in a subloop, which became a sibling after separation. While a subloop doesn't need an lcssa phi node, a sibling does, and that's where we broke LCSSA. The most natural way to fix this now is to simply call formLCSSA on the original loop: it'll do what we've been doing before plus it'll cover situations described above. I think we don't need to run formLCSSARecursively here, and we have an assert to verify this (I've tried testing it on LLVM testsuite + SPECs). I'd be happy to be corrected here though. I also changed a run line in the test from '-lcssa -loop-unroll' to '-lcssa -loop-simplify -indvars', because it exercises LCSSA preservation to the same extent, but also makes less unrelated transformation on the CFG, which makes it easier to verify. Reviewers: chandlerc, sanjoy, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23288 llvm-svn: 278173	2016-08-09 22:44:56 +00:00
Anna Thomas	b2d12b81c3	[EarlyCSE] Teach about CSE'ing over invariant.start intrinsics Summary: Teach EarlyCSE about invariant.start intrinsic. Specifically, we can perform store-load, load-load forwarding over this call. Reviewers: majnemer, reames, dberlin, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23268 llvm-svn: 278153	2016-08-09 20:00:47 +00:00
Sanjay Patel	a814d89b61	update to use FileCheck and auto-generate checks llvm-svn: 278150	2016-08-09 19:42:52 +00:00
Anna Thomas	037e540f08	[AliasAnalysis] Treat invariant.start as read-memory Summary: We teach alias analysis that invariant.start is readonly. This helps with GVN and memcopy optimizations that currently treat. invariant.start as a clobber. We need to treat this as readonly, so that DSE does not incorrectly remove stores prior to the invariant.start Reviewers: sanjoy, reames, majnemer, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23214 llvm-svn: 278138	2016-08-09 17:18:05 +00:00
Sanjay Patel	4ee824a88d	auto-generate checks llvm-svn: 278137	2016-08-09 17:03:51 +00:00
Sanjay Patel	25af7444df	auto-generate checks llvm-svn: 278136	2016-08-09 17:02:17 +00:00
Sanjay Patel	52958dc111	auto-generate checks llvm-svn: 278135	2016-08-09 16:59:54 +00:00
Sanjay Patel	9f36a2d54b	add tests for missing vector icmp folds llvm-svn: 278132	2016-08-09 16:39:05 +00:00
Sanjay Patel	f36a29199f	update to use FileCheck and auto-generate checks llvm-svn: 278131	2016-08-09 16:19:57 +00:00
Sanjay Patel	a6090256d5	regenerate checks llvm-svn: 278130	2016-08-09 16:17:46 +00:00
Sanjay Patel	e466865f67	add tests for missing vector icmp folds llvm-svn: 278129	2016-08-09 16:05:57 +00:00
Xinliang David Li	9035cfceef	[Profile] turn off verbose warnings by default no prof data for func warning is turned off by default due to its high verbosity and minimal usefulness. Differential Revision: http://reviews.llvm.org/D23295 llvm-svn: 278127	2016-08-09 15:35:28 +00:00
Artur Pilipenko	c710a461b5	[LVI] Make LVI smarter about comparisons with non-constants Make LVI smarter about comparisons with a non-constant. For example, a s< b constraints a to be in [INT_MIN, INT_MAX) range. This is a part of https://llvm.org/bugs/show_bug.cgi?id=28620 fix. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D23205 llvm-svn: 278122	2016-08-09 14:50:08 +00:00
Artur Pilipenko	d97eedff40	Revert 278107 which causes buildbot failures and in addition has wrong commit message llvm-svn: 278109	2016-08-09 10:00:22 +00:00
Artur Pilipenko	a410d81f64	Teach CorrelatedValuePropagation to mark adds as no wrap Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23059 llvm-svn: 278107	2016-08-09 09:41:34 +00:00
Derek Schuff	b7d6d9e3cd	[WebAssembly] Fix CFI index to account for padding nullptr function The WebAssembly linker now creates a dummy function at index 0 to prevent miscomparisons with the NULL pointer, see https://github.com/WebAssembly/binaryen/pull/658. Thanks to pcc for pointing out this problem! Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D23137 llvm-svn: 278073	2016-08-08 23:56:01 +00:00
Daniel Berlin	4b4c722e79	[MSSA] Fix PR28880 by fixing use optimizer's lower bound tracking behavior. Summary: In the use optimizer, we need to keep of whether the lower bound still dominates us or else we may decide a lower bound is still valid when it is not due to intervening pushes/pops. Fixes PR28880 (and probably a bunch of other things). Reviewers: george.burgess.iv Subscribers: MatzeB, llvm-commits, sebpop Differential Revision: https://reviews.llvm.org/D23237 llvm-svn: 277978	2016-08-08 04:44:53 +00:00
Eli Friedman	02419a9849	[JumpThreading] Fix handling of aliasing metadata. Summary: The correctness fix here is that when we CSE a load with another load, we need to combine the metadata on the two loads. This matches the behavior of other passes, like instcombine and GVN. There's also a minor optimization improvement here: for load PRE, the aliasing metadata on the inserted load should be the same as the metadata on the original load. Not sure why the old code was throwing it away. Issue found by inspection. Differential Revision: http://reviews.llvm.org/D21460 llvm-svn: 277977	2016-08-08 04:10:22 +00:00
Davide Italiano	e3b916d164	[SimplifyLibCalls] Emit sqrt intrinsic instead of a libcall. llvm-svn: 277972	2016-08-08 03:23:01 +00:00
Eli Friedman	2a65dd1ba6	[SROA] Fix crash with lifetime intrinsic partially covering alloca. Summary: PromoteMemToReg looks specifically for the pattern bitcast+lifetime.start (or a bitcast-equivalent GEP); any offset will lead to an assertion failure. Fixes https://llvm.org/bugs/show_bug.cgi?id=27999 . Differential Revision: https://reviews.llvm.org/D22737 llvm-svn: 277969	2016-08-08 01:30:53 +00:00
Davide Italiano	27da131f32	[SLC] Emit an intrinsic instead of a libcall for pow. Differential Revision: https://reviews.llvm.org/D22104 llvm-svn: 277963	2016-08-07 20:27:03 +00:00
David Majnemer	d150137f64	[InstSimplify] Fold gep (gep V, C), (sub 0, V) to C llvm-svn: 277952	2016-08-07 07:58:12 +00:00
David Majnemer	dc8767a49a	[InstSimplify] Try hard to simplify pointer comparisons Simplify ptrtoint comparisons involving operands with different source types. llvm-svn: 277951	2016-08-07 07:58:10 +00:00
David Majnemer	4e4f4437c2	[InstCombine] Infer inbounds on geps of allocas llvm-svn: 277950	2016-08-07 07:58:00 +00:00
Michael Zolotukhin	442b82f0eb	Revert "Revert "[LoopSimplify] Fix updating LCSSA after separating nested loops."" This reverts commit r277901. Reaaply the commit as it looks like it has nothing to do with the bots failures. llvm-svn: 277946	2016-08-07 01:56:54 +00:00
Gor Nishanov	874651129e	[Coroutines] Passify the build bots. Remove restart-trigger.ll test for now llvm-svn: 277937	2016-08-06 21:01:22 +00:00
Gor Nishanov	2ed6e788a8	[Coroutines] Part 5: Add CGSCC restart trigger Summary: CoroSplit pass processes the coroutine twice. First, it lets it go through complete IPO optimization pipeline as a single function. It forces restart of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger" which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager. (In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up, adds coroutine subfunctions to the SCC to be processed by IPO pipeline.) Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization (https://reviews.llvm.org/D23229) 5.Add CGSCC restart trigger + tests. <= we are here 6.Add coroutine heap elision + tests. 7.Add the rest of the logic (split into more patches) Reviewers: mehdi_amini, majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D23234 llvm-svn: 277936	2016-08-06 20:44:39 +00:00
David Majnemer	a19d0f2f3e	[ValueTracking] Teach computeKnownBits about [su]min/max Reasoning about a select in terms of a min or max allows us to derive a tigher bound on the result. llvm-svn: 277914	2016-08-06 08:16:00 +00:00
Sanjoy Das	ba04d3a620	[InstCombine] Don't coerce non-integral pointers to integers Reviewers: majnemer Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23231 llvm-svn: 277910	2016-08-06 02:58:48 +00:00
Gor Nishanov	31d8c9af89	Part 4c: Coroutine Devirtualization: Devirtualize coro.resume and coro.destroy. Summary: This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine subfunction associated with that coro.begin. Documentation and overview is here: http://llvm.org/docs/Coroutines.html. Upstreaming sequence (rough plan) 1.Add documentation. (https://reviews.llvm.org/D22603) 2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659) 3.Add empty coroutine passes. (https://reviews.llvm.org/D22847) 4.Add coroutine devirtualization + tests. ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998) c) Do devirtualization <= we are here 5.Add CGSCC restart trigger + tests. 6.Add coroutine heap elision + tests. 7.Add the rest of the logic (split into more patches) Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D23229 llvm-svn: 277908	2016-08-06 02:16:35 +00:00
Michael Zolotukhin	09cf304ebc	Revert "[LoopSimplify] Fix updating LCSSA after separating nested loops." This reverts commit r277877. Try to appease clang-x64-ninja-win7 buildbot. llvm-svn: 277901	2016-08-06 01:48:51 +00:00
Sanjoy Das	cf181867a6	[IRCE] Preserve loop-simplify form Fixes PR28764. Right now there is no way to test this, but (as mentioned on the PR) with Michael Zolotukhin's yet to be checked in LoopSimplify verfier, 8 of the llvm-lit tests for IRCE crash. llvm-svn: 277891	2016-08-06 00:01:56 +00:00
Michael Zolotukhin	4c65c3596a	[LoopSimplify] Fix updating LCSSA after separating nested loops. This fixes PR28825. The problem was that we only checked if a value from a created inner loop is used in the outer loop, and fixed LCSSA for them. But we missed to fixup LCSSA for values used in exits of the outer loop. llvm-svn: 277877	2016-08-05 21:52:58 +00:00
Dehao Chen	de39cb9384	Replace hot-callsite based heuristic to use its own threshold parameter instead of share inline-hint parameter Summary: Hot callsites should have higher threshold than inline hints. This patch uses separate threshold parameter for hot callsites. Reviewers: davidxl, eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D22368 llvm-svn: 277860	2016-08-05 20:28:41 +00:00
Ivan Krasin	b05e06e4fd	WholeProgramDevirt: print remarks with devirtualized method names. Summary: Chrome on Linux uses WholeProgramDevirt for speed ups, and it's important to detect regressions on both sides: the toolchain, if fewer methods get devirtualized after an update, and Chrome, if an innocently looking change caused many hot methods become virtual again. The need to track devirtualized methods is not Chrome-specific, but it's probably the only user of the pass at this time. Reviewers: kcc Differential Revision: https://reviews.llvm.org/D23219 llvm-svn: 277856	2016-08-05 19:45:16 +00:00
Sanjoy Das	6fa08aafcc	[ConstantFolding] Don't create illegal (non-integral) inttoptrs Reviewers: majnemer, arsenm Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23182 llvm-svn: 277854	2016-08-05 19:23:29 +00:00
Dehao Chen	17c6afc35b	Do not assign new discriminator for all intrinsics. Summary: We do not care about intrinsic calls when assigning discriminators. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23212 llvm-svn: 277843	2016-08-05 17:56:49 +00:00
Gor Nishanov	f3bb361750	opt: Adding -O0 to opt tool Summary: Having -O0 in opt allows testing that -O0 optimization pipeline is built correctly. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23208 llvm-svn: 277829	2016-08-05 16:27:33 +00:00
Benjamin Kramer	000a87d1b0	Actually, r277337 was fine. Just kill the DAGs that made the test allow nondeterminism. llvm-svn: 277821	2016-08-05 14:58:34 +00:00
Benjamin Kramer	aa160c22f7	[SimplifyCFG] Make range reduction code deterministic. This generated IR based on the order of evaluation, which is different between GCC and Clang. With that in mind you get bootstrap miscompares if you compare a Clang built with GCC-built Clang vs. Clang built with Clang-built Clang. Diagnosing that made my head hurt. This also reverts commit r277337, which "fixed" the test case. llvm-svn: 277820	2016-08-05 14:55:02 +00:00
Sanjay Patel	5a9b9f98c0	reduce tests; auto-generate checks llvm-svn: 277819	2016-08-05 14:50:11 +00:00
Nicolai Haehnle	870bf1788c	[InstCombine] try to fold (select C, (sext A), B) into logical ops Summary: Turn (select C, (sext A), B) into (sext (select C, A, B')) when A is i1 and B is a compatible constant, also for zext instead of sext. This will then be further folded into logical operations. The transformation would be valid for non-i1 types as well, but other parts of InstCombine prefer to have sext from non-i1 as an operand of select. Motivated by the shader compiler frontend in Mesa for AMDGPU, which emits i32 for boolean operations. With this change, the boolean logic is fully recovered. Reviewers: majnemer, spatel, tstellarAMD Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22747 llvm-svn: 277801	2016-08-05 08:22:29 +00:00
Sebastian Pop	429740a6c2	GVN-hoist: fix early exit logic The patch splits a complex && if condition into easier to read and understand logic. That wrong early exit condition was letting some instructions with not all operands available pass through when HoistingGeps was true. Differential Revision: https://reviews.llvm.org/D23174 llvm-svn: 277785	2016-08-04 23:49:05 +00:00
Michael Kuperstein	3ceac2bbd5	[LV, X86] Be more optimistic about vectorizing shifts. Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782	2016-08-04 22:48:03 +00:00
Sanjay Patel	3bade138b5	[InstCombine] use m_APInt to allow icmp eq (mul X, C1), C2 folds for splat constant vectors This concludes the splat vector enhancements for foldICmpEqualityWithConstant(). Other commits in this series: https://reviews.llvm.org/rL277762 https://reviews.llvm.org/rL277752 https://reviews.llvm.org/rL277738 https://reviews.llvm.org/rL277731 https://reviews.llvm.org/rL277659 https://reviews.llvm.org/rL277638 https://reviews.llvm.org/rL277629 llvm-svn: 277779	2016-08-04 22:19:27 +00:00
David Majnemer	b48ed0f721	[CloneFunction] Add a testcase for r277691/r277693 PR28848 had a very nice reduction of the underlying cause of the bug. Our ValueMap had, in an entry for an Instruction, a ConstantInt. This is not at all unexpected but should be handled properly. llvm-svn: 277773	2016-08-04 21:28:59 +00:00
Matt Arsenault	6ad97732aa	GVNHoist: Don't hoist convergent calls llvm-svn: 277767	2016-08-04 20:52:57 +00:00
David Majnemer	f93082e71a	[coroutines] Part 4[ab]: Coroutine Devirtualization: Lower coro.resume and coro.destroy. This is the forth patch in the coroutine series. CoroEaly pass now lowers coro.resume and coro.destroy intrinsics by replacing them with an indirect call to an address returned by coro.subfn.addr intrinsic. This is done so that CGPassManager recognizes devirtualization when CoroElide replaces a call to coro.subfn.addr with an appropriate function address. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22998 llvm-svn: 277765	2016-08-04 20:30:07 +00:00
Sanjay Patel	d938e88e89	[InstCombine] use m_APInt to allow icmp eq (and X, C1), C2 folds for splat constant vectors llvm-svn: 277762	2016-08-04 20:05:02 +00:00
Sanjay Patel	b3de75d3a0	[InstCombine] use m_APInt to allow icmp eq (or X, C1), C2 folds for splat constant vectors llvm-svn: 277752	2016-08-04 19:12:12 +00:00
Sanjay Patel	80f2eec4b2	remove FIXME comments (fixed with r277738) llvm-svn: 277744	2016-08-04 18:14:02 +00:00
Sanjay Patel	bcaf6f39dd	[InstCombine] use m_APInt to allow icmp eq (op X, Y), C folds for splat constant vectors I'm removing a misplaced pair of more specific folds from InstCombine in this patch as well, so we know where those folds are happening in InstSimplify. llvm-svn: 277738	2016-08-04 17:48:04 +00:00
Sanjay Patel	bf82f44e7b	add tests for missing vector folds llvm-svn: 277736	2016-08-04 16:48:30 +00:00
Alina Sbirlea	6f937b1144	LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735	2016-08-04 16:38:44 +00:00
Sanjay Patel	9d591d15ec	[InstCombine] use m_APInt to allow icmp eq (sub C1, X), C2 folds for splat constant vectors llvm-svn: 277731	2016-08-04 15:19:25 +00:00
Simon Pilgrim	5d5ca9c0cb	[X86][SSE] Add initial costs for vector CTTZ/CTLZ llvm-svn: 277716	2016-08-04 10:51:41 +00:00
Amaury Sechet	bf3adfdbfb	Fix intrinsics.ll test llvm-svn: 277695	2016-08-04 05:35:25 +00:00
Amaury Sechet	6bea674c43	Add popcount(n) == bitsize(n) -> n == -1 transformation. Summary: As per title. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23139 llvm-svn: 277694	2016-08-04 05:27:20 +00:00
David Majnemer	909793fa63	Reinstate "[CloneFunction] Don't remove side effecting calls" This reinstates r277611 + r277614 and reverts r277642. A cast_or_null should have been a dyn_cast_or_null. llvm-svn: 277691	2016-08-04 04:24:02 +00:00
Sanjay Patel	00a324e893	[InstCombine] use m_APInt to allow icmp eq (add X, C1), C2 folds for splat constant vectors llvm-svn: 277659	2016-08-03 22:08:44 +00:00
Sebastian Pop	5d3822fc12	GVN-hoist: compute MSSA once per function (PR28670) With this patch we compute the MemorySSA once and update it in the code generator. Differential Revision: https://reviews.llvm.org/D22966 llvm-svn: 277649	2016-08-03 20:54:33 +00:00
Sanjoy Das	ac5bf59b6e	[IndVars] Un-grepify test; NFC Some of these tests need to be cleaned up further to make it obvious what they're testing, but as a first step remove all instances of "grep". llvm-svn: 277648	2016-08-03 20:53:23 +00:00
Reid Kleckner	a6be60871f	Revert "[CloneFunction] Don't remove side effecting calls" This reverts commit r277611 and the followup r277614. Bootstrap builds and chromium builds are crashing during inlining after this change. llvm-svn: 277642	2016-08-03 20:01:01 +00:00
George Burgess IV	024f3d2683	[MSSA] Add special handling for invariant/constant loads. This is a follow-up to r277637. It teaches MemorySSA that invariant loads (and loads of provably constant memory) are always liveOnEntry. llvm-svn: 277640	2016-08-03 19:57:02 +00:00
Sanjay Patel	2e9675ff52	[InstCombine] use m_APInt to allow icmp eq (srem X, C1), C2 folds for splat constant vectors llvm-svn: 277638	2016-08-03 19:48:40 +00:00
George Burgess IV	82e355ce48	[MSSA] Add logic for special handling of atomics/volatiles. This patch makes MemorySSA recognize atomic/volatile loads, and makes MSSA treat said loads specially. This allows us to be a bit more aggressive in some cases. Administrative note: Revision was LGTM'ed by reames in person. Additionally, this doesn't include the `invariant.load` recognition in the differential revision, because I feel it's better to commit that separately. Will commit soon. Differential Revision: https://reviews.llvm.org/D16875 llvm-svn: 277637	2016-08-03 19:39:54 +00:00
Tobias Grosser	8757e387dd	[InstCombine] Refactor optimization of zext(or(icmp, icmp)) to enable more aggressive cast-folding Summary: InstCombine unfolds expressions of the form `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` such that in a later iteration of InstCombine the exposed `zext(icmp)` instructions can be optimized. We now combine this unfolding and the subsequent `zext(icmp)` optimization to be performed together. Since the unfolding doesn't happen separately anymore, we also again enable the folding of `logic(cast(icmp), cast(icmp))` expressions to `cast(logic(icmp, icmp))` which had been disabled due to its interference with the unfolding transformation. Tested via `make check` and `lnt`. Background ========== For a better understanding on how it came to this change we subsequently summarize its history. In commit r275989 we've already tried to enable the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` which had to be reverted in r276106 because it could lead to an endless loop in InstCombine (also see http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html). The root of this problem is that in `visitZExt()` in InstCombineCasts.cpp there also exists a reverse of the above folding transformation, that unfolds `zext(or(icmp, icmp))` to `or(zext(icmp), zext(icmp))` in order to expose `zext(icmp)` operations which would then possibly be eliminated by subsequent iterations of InstCombine. However, before these `zext(icmp)` would be eliminated the folding from r275989 could kick in and cause InstCombine to endlessly switch back and forth between the folding and the unfolding transformation. This is the reason why we now combine the `zext`-unfolding and the elimination of the exposed `zext(icmp)` to happen at one go because this enables us to still allow the cast-folding in `logic(cast(icmp), cast(icmp))` without entering an endless loop again. Details on the submitted changes ================================ - In `visitZExt()` we combine the unfolding and optimization of `zext` instructions. - In `transformZExtICmp()` we have to use `Builder->CreateIntCast()` instead of `CastInst::CreateIntegerCast()` to make sure that the new `CastInst` is inserted in a `BasicBlock`. The new calls to `transformZExtICmp()` that we introduce in `visitZExt()` would otherwise cause according assertions to be triggered (in our case this happend, for example, with lnt for the MultiSource/Applications/sqlite3 and SingleSource/Regression/C++/EH/recursive-throw tests). The subsequent usage of `replaceInstUsesWith()` is necessary to ensure that the new `CastInst` replaces the `ZExtInst` accordingly. - In InstCombineAndOrXor.cpp we again allow the folding of casts on `icmp` instructions. - The instruction order in the optimized IR for the zext-or-icmp.ll test case is different with the introduced changes. - The test cases in zext.ll have been adopted from the reverted commits r275989 and r276105. Reviewers: grosser, majnemer, spatel Subscribers: eli.friedman, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22864 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 277635	2016-08-03 19:30:35 +00:00
Nicolai Haehnle	fcac6f8376	[InstCombine] Cleanup select-bitext.ll tests Follow-up to r277596. llvm-svn: 277633	2016-08-03 19:10:13 +00:00
Sanjay Patel	43aeb001c9	[InstCombine] use m_APInt to allow icmp (binop X, Y), C folds with constant splat vectors This removes the restriction for the icmp constant, but as noted by the FIXME comments, we still need to change individual checks for binop operand constants. llvm-svn: 277629	2016-08-03 18:59:03 +00:00
Duncan P. N. Exon Smith	9cbc69d1fe	IR: Drop uniquing when an MDNode Value operand is deleted This is a fix for PR28697. An MDNode can indirectly refer to a GlobalValue, through a ConstantAsMetadata. When the GlobalValue is deleted, the MDNode operand is reset to `nullptr`. If the node is uniqued, this can lead to a hard-to-detect cache invalidation in a Metadata map that's shared across an LLVMContext. Consider: 1. A map from Metadata* to `T` called RemappedMDs. 2. A node that references a global variable, `!{i1* @GV}`. 3. Insert `!{i1* @GV} -> SomeT` in the map. 4. Delete `@GV`, leaving behind `!{null} -> SomeT`. Looking up the generic and uninteresting `!{null}` gives you `SomeT`, which is likely related to `@GV`. Worse, `SomeT`'s lifetime may be tied to the deleted `@GV`. This occurs in practice in the shared ValueMap used since r266579 in the IRMover. Other code that handles more than one Module (with different lifetimes) in the same LLVMContext could hit it too. The fix here is a partial revert of r225223: in the rare case that an MDNode operand is a ConstantAsMetadata (i.e., wrapping a node from the Value hierarchy), drop uniquing if it gets replaced with `nullptr`. This changes step #4 above to leave behind `distinct !{null} -> SomeT`, which can't be confused with the generic `!{null}`. In theory, this can cause some churn in the LLVMContext's MDNode uniquing map when Values are being deleted. However: - The number of GlobalValues referenced from uniqued MDNodes is expected to be quite small. E.g., the debug info metadata schema only references GlobalValues from distinct nodes. - Other Constants have the lifetime of the LLVMContext, whose teardown is careful to drop references before deleting the constants. As a result, I don't expect a compile time regression from this change. llvm-svn: 277625	2016-08-03 18:19:43 +00:00
David Majnemer	fad0490869	[CloneFunction] Don't remove side effecting calls We were able to figure out that the result of a call is some constant. While propagating that fact, we added the constant to the value map. This is problematic because it results in us losing the call site when processing the value map. This fixes PR28802. llvm-svn: 277611	2016-08-03 17:12:47 +00:00
Renato Golin	f583097434	Revert "Teach CorrelatedValuePropagation to mark adds as no wrap" This reverts commit r277592, trying to fix the AArch64 42VMA buildbot. llvm-svn: 277607	2016-08-03 16:20:48 +00:00
Sanjay Patel	91bab5364e	add a vector variant of each test llvm-svn: 277598	2016-08-03 14:25:55 +00:00
Nicolai Haehnle	c1f1ad998a	[InstCombine] Add select-bitext.ll tests As requested for D22747. llvm-svn: 277596	2016-08-03 13:37:56 +00:00
Artur Pilipenko	68cb947cc9	Teach CorrelatedValuePropagation to mark adds as no wrap Use LVI to prove that adds do not wrap. The change is motivated by https://llvm.org/bugs/show_bug.cgi?id=28620 bug and it's the first step to fix that problem. Reviewed By: sanjoy Differential Revision: http://reviews.llvm.org/D23059 llvm-svn: 277592	2016-08-03 13:11:39 +00:00
Chandler Carruth	6cb2ab2c60	[PM] Significantly refactor the pass pipeline parsing to be easier to reason about and less error prone. The core idea is to fully parse the text without trying to identify passes or structure. This is done with a single state machine. There were various bugs in the logic around this previously that were repeated and scattered across the code. Having a single routine makes it much easier to fix and get correct. For example, this routine doesn't suffer from PR28577. Then the actual pass construction is handled using much easier to read code and simple loops, with particular pass manager construction sunk to live with other pass construction. This is especially nice as the pass managers are in fact passes. Finally, the "implicit" pass manager synthesis is done much more simply by forming "pre-parsed" structures rather than having to duplicate tons of logic. One of the bugs fixed by this was evident in the tests where we accepted a pipeline that wasn't really well formed. Another bug is PR28577 for which I have added a test case. The code is less efficient than the previous code but I'm really hoping that's not a priority. ;] Thanks to Sean for the review! Differential Revision: https://reviews.llvm.org/D22724 llvm-svn: 277561	2016-08-03 03:21:41 +00:00
Ivan Krasin	3aade11252	Add -lowertypetests-bitsets-level to control bitsets generation. Summary: Sometimes, bitsets could get really large (>300k entries) and we might want to drop a check, as it would have a too much cost. Adding a flag to control how much penalty are we willing to pay for bitsets. Reviewers: kcc Differential Revision: https://reviews.llvm.org/D23088 llvm-svn: 277556	2016-08-03 00:59:38 +00:00
Sanjay Patel	287b81d27b	add vector test for icmp+sub llvm-svn: 277555	2016-08-03 00:36:54 +00:00
Daniel Berlin	df10119e4e	Support for lifetime begin/end markers in the MemorySSA use optimizer Summary: Depends on D23072 Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23076 llvm-svn: 277553	2016-08-03 00:01:46 +00:00
Evgeniy Stepanov	d99f80b48e	[safestack] Layout large allocas first to reduce fragmentation. llvm-svn: 277544	2016-08-02 23:21:30 +00:00
Michael Zolotukhin	ca0d48b742	[LoopUnroll] Fix a PowerPC test broken by r277524. llvm-svn: 277527	2016-08-02 21:43:25 +00:00
Michael Zolotukhin	b2738e41bf	[LoopUnroll] Switch the default value of -unroll-runtime-epilog back to its original value. As agreed in post-commit review of r265388, I'm switching the flag to its original value until the 90% runtime performance regression on SingleSource/Benchmarks/Stanford/Bubblesort is addressed. llvm-svn: 277524	2016-08-02 21:24:14 +00:00
Wei Mi	dc7001afb2	[LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC Update comment for isOutOfScope and add a testcase for uniform value being used out of scope. Differential Revision: https://reviews.llvm.org/D23073 llvm-svn: 277515	2016-08-02 20:27:49 +00:00
Sanjoy Das	f45e03e201	[IRCE] Preserve DomTree and LCSSA This changes IRCE to "preserve" LCSSA and DomTree by recomputing them. It still does not preserve LoopSimplify. llvm-svn: 277505	2016-08-02 19:31:54 +00:00
Michael Zolotukhin	d9b6ad3c01	[LoopUnroll] Ensure we create prolog loops in simplified form. llvm-svn: 277502	2016-08-02 19:19:31 +00:00
Matthew Simpson	18d8898317	[LV] Generate both scalar and vector integer induction variables This patch enables the vectorizer to generate both scalar and vector versions of an integer induction variable for a given loop. Previously, we only generated a scalar induction variable if we knew all its users were going to be scalar. Otherwise, we generated a vector induction variable. In the case of a loop with both scalar and vector users of the induction variable, we would generate the vector induction variable and extract scalar values from it for the scalar users. With this patch, we now generate both versions of the induction variable when there are both scalar and vector users and select which version to use based on whether the user is scalar or vector. Differential Revision: https://reviews.llvm.org/D22869 llvm-svn: 277474	2016-08-02 15:25:16 +00:00
Matthew Simpson	58f562887b	[LV] Untangle the concepts of uniform and scalar This patch refactors the logic in collectLoopUniforms and collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It adds isScalarAfterVectorization along side isUniformAfterVectorization to distinguish the two. Known scalar values include those that are uniform, getelementptr instructions that won't be vectorized, and induction variables and induction variable update instructions whose users are all known to be scalar. This patch includes the following functional changes: - In collectLoopUniforms, we mark uniform the pointer operands of interleaved accesses. Although non-consecutive, these pointers are treated like consecutive pointers during vectorization. - In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it isScalarAfterVectorization rather than isUniformAfterVectorization. This differs from the previous functionaly in that we now add getelementptr instructions that will not be vectorized into VecValuesToIgnore. This patch also removes the ValuesNotWidened set used for induction variable scalarization since, after the above changes, it is now equivalent to isScalarAfterVectorization. Differential Revision: https://reviews.llvm.org/D22867 llvm-svn: 277460	2016-08-02 14:29:41 +00:00
Igor Breger	f44b79d08e	[AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check. Differential Revision: http://reviews.llvm.org/D23055 llvm-svn: 277435	2016-08-02 09:15:28 +00:00
Sean Silva	f801575fd0	CodeExtractor : Add ability to preserve profile data. Added ability to estimate the entry count of the extracted function and the branch probabilities of the exit branches. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22744 llvm-svn: 277411	2016-08-02 02:15:45 +00:00
Derek Schuff	c64d7655b2	[WebAssembly] Support CFI for WebAssembly target Summary: This patch implements CFI for WebAssembly. It modifies the LowerTypeTest pass to pre-assign table indexes to functions that are called indirectly, and lowers type checks to test against the appropriate table indexes. It also modifies the WebAssembly backend to support a special ".indidx" assembly directive that propagates the table index assignments out to the linker. Patch by Dominic Chen Differential Revision: https://reviews.llvm.org/D21768 llvm-svn: 277398	2016-08-01 22:25:02 +00:00
Michael Kuperstein	c40618610f	[PM] Port SpeculativeExecution to the new PM Differential Revision: https://reviews.llvm.org/D23033 llvm-svn: 277393	2016-08-01 21:48:33 +00:00
David Majnemer	ba6665d88a	[Verifier] Resume instructions can only be in functions w/ a personality This fixes PR28799. llvm-svn: 277360	2016-08-01 18:06:34 +00:00
Simon Pilgrim	7fd4ad6849	Fixed test check ordering issue on windows buildbots llvm-svn: 277337	2016-08-01 10:40:15 +00:00
James Molloy	bade86cedc	[SimplifyCFG] Fix nasty RAUW bug from r277325 Using RAUW was wrong here; if we have a switch transform such as: 18 -> 6 then 6 -> 0 If we use RAUW, while performing the second transform the transformed 6 from the first will be also replaced, so we end up with: 18 -> 0 6 -> 0 Found by clang stage2 bootstrap; testcase added. llvm-svn: 277332	2016-08-01 09:34:48 +00:00
Craig Topper	d2b2d745ff	[AVX-512] Fix a test missed in r277327. llvm-svn: 277330	2016-08-01 08:15:30 +00:00
James Molloy	91821bd0b4	[SimplifyCFG] Try and pacify buildbots after r277325 It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this. llvm-svn: 277329	2016-08-01 08:09:55 +00:00
James Molloy	b2e436de42	[SimplifyCFG] Range reduce switches If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example: switch (i) { case 5: ... case 9: ... case 13: ... case 17: ... } can become: if ( (i - 5) % 4 ) goto default; switch ((i - 5) / 4) { case 0: ... case 1: ... case 2: ... case 3: ... } or even better: switch ( ROTR(i - 5, 2) { case 0: ... case 1: ... case 2: ... case 3: ... } The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables. llvm-svn: 277325	2016-08-01 07:45:11 +00:00
Sean Silva	423c7149dc	Revert r277313 and r277314. They seem to trigger an LSan failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/15140/steps/check-llvm%20asan/logs/stdio Revert "Add the tests for r277313" This reverts commit r277314. Revert "CodeExtractor : Add ability to preserve profile data." This reverts commit r277313. llvm-svn: 277317	2016-08-01 04:16:09 +00:00
Sean Silva	e5a5c966cd	Move this test to x86-specific directory. No bots have yelled yet, but this test references an x86 intrinsic. Also, it invokes llc on x86 IR. Fixup to r277315. llvm-svn: 277316	2016-08-01 03:22:05 +00:00
Sean Silva	a0a802abe3	Fix - CodeExtractor : Inherit Target Dependent Attributes from the parent function. When extracting a set of blocks make sure to inherit all of the target dependent attributes to make sure that the function will be valid for lowering. One example is the "target-features" attribute for x86, if the extracted region has functionality that relies on a specific feature it will fail to be lowered. This also allows for extracted functions to be valid for inlining, at least back into the parent function, as the target attributes are tested when inlining for compatibility. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22713 llvm-svn: 277315	2016-08-01 03:15:32 +00:00
Sean Silva	72be9a6937	Add the tests for r277313 Forgot to `git add` them. llvm-svn: 277314	2016-08-01 03:04:34 +00:00
Simon Pilgrim	9e201eac32	[SLPVectorizer][X86] Added vXi8/vXi16 sitofp/uitofp tests Dropped useless 2i32-2f32 test llvm-svn: 277281	2016-07-30 21:01:34 +00:00
Simon Pilgrim	f5134a2867	[SLPVectorizer][X86] Added SITOFP/UITOFP vectorization tests llvm-svn: 277275	2016-07-30 18:43:30 +00:00
Adam Nemet	12937c361f	[LoopUnroll] Include hotness of region in opt remark LoopUnroll is a loop pass, so the analysis of OptimizationRemarkEmitter is added to the common function analysis passes that loop passes depend on. The BFI and indirectly BPI used in this pass is computed lazily so no overhead should be observed unless -pass-remarks-with-hotness is used. This is how the patch affects the O3 pipeline: Dominator Tree Construction Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Rotate Loops Loop Invariant Code Motion Unswitch loops Simplify the CFG Dominator Tree Construction Basic Alias Analysis (stateless AA impl) Function Alias Analysis Results Combine redundant instructions Natural Loop Information Canonicalize natural loops Loop-Closed SSA Form Pass Scalar Evolution Analysis + Lazy Branch Probability Analysis + Lazy Block Frequency Analysis + Optimization Remark Emitter Loop Pass Manager Induction Variable Simplification Recognize loop idioms Delete dead loops Unroll loops ... llvm-svn: 277203	2016-07-29 19:29:47 +00:00
David Majnemer	718da3d1f6	[ConstantFolding] Handle bitcasts of undef fp vector elements We used the wrong type for constructing a zero vector element which led to type mismatches. This fixes PR28771. llvm-svn: 277197	2016-07-29 18:48:27 +00:00
Andrew Kaylor	b99d1cc7ed	Recommitting r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!) llvm-svn: 277189	2016-07-29 18:23:18 +00:00
Matt Masten	a6669a1e05	Initial support for vectorization using svml (short vector math library). Differential Revision: https://reviews.llvm.org/D19544 llvm-svn: 277166	2016-07-29 16:42:44 +00:00
David Majnemer	130b9f99d6	[EarlyCSE] Correctly handle simplified, but live, instructions Some instructions may have their uses replaced with a symbolic constant. However, the instruction may still have side effects which percludes it from being removed from the function. EarlyCSE treated such an instruction as if it were removed, resulting in PR28763. llvm-svn: 277114	2016-07-29 05:39:21 +00:00
David Majnemer	e4218cf11e	[ConstantFolding] Fold bitcasts of vectors w/ undef elements An undef vector element can be treated as if it had any value. Folding such a vector element to 0 in a bitcast can open up further folding opportunities. llvm-svn: 277104	2016-07-29 04:06:09 +00:00
David Majnemer	57b94c8d6a	[ConstantFolding] Use ConstantExpr::getWithOperands ConstantExpr::getWithOperands does much of the hard work that ConstantFoldInstOperandsImpl tries to do but more completely. This lets us fold ExtractValue/InsertValue expressions. llvm-svn: 277100	2016-07-29 03:27:31 +00:00
David Majnemer	d536f2328e	[ConstnatFolding] Teach the folder how to fold ConstantVector A ConstantVector can have ConstantExpr operands and vice versa. However, the folder had no ability to fold ConstantVectors which, in some cases, was an optimization barrier. Instead, rephrase the folder in terms of Constants instead of ConstantExprs and teach callers how to deal with failure. llvm-svn: 277099	2016-07-29 03:27:26 +00:00
Piotr Padlewski	84abc74f2c	Added ThinLTO inlining statistics Summary: copypasta doc of ImportedFunctionsInliningStatistics class \brief Calculate and dump ThinLTO specific inliner stats. The main statistics are: (1) Number of inlined imported functions, (2) Number of imported functions inlined into importing module (indirect), (3) Number of non imported functions inlined into importing module (indirect). The difference between first and the second is that first stat counts all performed inlines on imported functions, but the second one only the functions that have been eventually inlined to a function in the importing module (by a chain of inlines). Because llvm uses bottom-up inliner, it is possible to e.g. import function `A`, `B` and then inline `B` to `A`, and after this `A` might be too big to be inlined into some other function that calls it. It calculates this statistic by building graph, where the nodes are functions, and edges are performed inlines and then by marking the edges starting from not imported function. If `Verbose` is set to true, then it also dumps statistics per each inlined function, sorted by the greatest inlines count like - number of performed inlines - number of performed inlines to importing module Reviewers: eraman, tejohnson, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D22491 llvm-svn: 277089	2016-07-29 00:27:16 +00:00
Adam Nemet	aa3506c5f0	[BPI] Add new LazyBPI analysis Summary: The motivation is the same as in D22141: In order to add the hotness attribute to optimization remarks we need BFI to be available in all passes that emit optimization remarks. BFI depends on BPI so unless we make this lazy as well we would still compute BPI unconditionally. The solution is to use the new LazyBPI pass in LazyBFI and only compute BPI when computation of BFI is requested by the client. I extended the laziness test using a LoopDistribute test to also cover BPI. Reviewers: hfinkel, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22835 llvm-svn: 277083	2016-07-28 23:31:12 +00:00
Vitaly Buka	0ab23cf1c8	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277072	2016-07-28 22:59:03 +00:00
Vitaly Buka	2fae6a7702	Should be committed as one CL. This reverts commits r277068 r277067 r277066. llvm-svn: 277071	2016-07-28 22:59:01 +00:00
Vitaly Buka	f0500b6ae5	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277068	2016-07-28 22:50:48 +00:00
Vitaly Buka	3645793872	maned llvm-svn: 277067	2016-07-28 22:50:45 +00:00
Michael Kuperstein	e45d4d9b35	[PM] Port LowerGuardIntrinsic to the new PM. llvm-svn: 277057	2016-07-28 22:08:41 +00:00
David Majnemer	3d32b7ed0d	[coroutines] Part 3 of N: Adding Boilerplate for Coroutine Passes This adds boilerplate code for all coroutine passes, the passes are no-ops for now. Also, a small test has been added to verify that passes execute in the expected order or not at all if coroutine support is disabled. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22847 llvm-svn: 277033	2016-07-28 21:04:31 +00:00
David Majnemer	19d024b2fd	[ConstantFolding] Don't bail on folding if ConstantFoldConstantExpression fails When folding an expression, we run ConstantFoldConstantExpression on each operand of that expression. However, ConstantFoldConstantExpression can fail and retur nullptr. Previously, we would bail on further refining the expression. Instead, use the original operand and see if we can refine a later operand. llvm-svn: 276959	2016-07-28 06:39:48 +00:00
David Majnemer	0be7155350	[InstCombine] Handle failures from ConstantFoldConstantExpression ConstantFoldConstantExpression returns null when folding fails. This fixes PR28745. llvm-svn: 276952	2016-07-28 02:29:06 +00:00
Wei Mi	315bb33f27	Fix the assertion error in collectLoopUniforms caused by empty Worklist before expanding. Contributed-by: David Callahan Differential Revision: https://reviews.llvm.org/D22886 llvm-svn: 276943	2016-07-27 23:53:58 +00:00
Justin Lebar	23a9686011	[LSV] Don't assume that bitcast ops are Instructions. Summary: When we ask the builder to create a bitcast on a constant, we get back a constant, not an instruction. Reviewers: asbirlea Subscribers: jholewinski, mzolotukhin, llvm-commits, arsenm Differential Revision: https://reviews.llvm.org/D22878 llvm-svn: 276922	2016-07-27 21:45:48 +00:00
Sebastian Pop	55c3007b88	GVN-hoist: improve code generation for recursive GEPs When loading or storing in a field of a struct like "a.b.c", GVN is able to detect the equivalent expressions, and GVN-hoist would fail in the code generation. This is because the GEPs are not hoisted as scalar operations to avoid moving the GEPs too far from their ld/st instruction when the ld/st is not movable. So we end up having to generate code for the GEP of a ld/st when we move the ld/st. In the case of a GEP referring to another GEP as in "a.b.c" we need to code generate all the GEPs necessary to make all the operands available at the new location for the ld/st. With this patch we recursively walk through the GEP operands checking whether all operands are available, and in the case of a GEP operand, it recursively makes all its operands available. Code generation happens from the inner GEPs out until reaching the GEP that appears as an operand of the ld/st. Differential Revision: https://reviews.llvm.org/D22599 llvm-svn: 276841	2016-07-27 05:48:12 +00:00
David Majnemer	bc36b15253	[ConstantFolding] Correctly handle failures in ConstantFoldConstantExpressionImpl Failures in ConstantFoldConstantExpressionImpl were ignored causing crashes down the line. This fixes PR28725. llvm-svn: 276827	2016-07-27 02:39:16 +00:00
Andrew Kaylor	f990fa5f7b	Reverting r276771 due to MSan failures. llvm-svn: 276824	2016-07-27 01:19:24 +00:00
David Majnemer	6774d612d4	[InstSimplify] Cast folding can be made more generic Use isEliminableCastPair to determine if a pair of casts are foldable. llvm-svn: 276777	2016-07-26 17:58:05 +00:00
Andrew Kaylor	3104a6bad0	Re-committing r275284: add support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 276771	2016-07-26 17:23:13 +00:00
David Majnemer	a90a621d1e	Reapply: [InstSimplify] Add support for bitcasts" This reverts commit r276700 and reapplies r276698. The relevant clang tests have been updated. llvm-svn: 276727	2016-07-26 05:52:29 +00:00
Sebastian Pop	91d4a30159	GVN-hoist: use a DFS numbering of instructions (PR28670) Instead of DFS numbering basic blocks we now DFS number instructions that avoids the costly operation of which instruction comes first in a basic block. Patch mostly written by Daniel Berlin. Differential Revision: https://reviews.llvm.org/D22777 llvm-svn: 276714	2016-07-26 00:15:10 +00:00
Evgeniy Stepanov	906f6fb565	[safestack] Fix stack guard live range. Stack guard slot is live throughout the function. llvm-svn: 276712	2016-07-26 00:05:14 +00:00
David Majnemer	6e06b577cc	Revert "[InstSimplify] Add support for bitcasts" This reverts commit r276698. Clang has tests which rely on the optimizer :( llvm-svn: 276700	2016-07-25 22:24:59 +00:00
David Majnemer	62611fd3f7	[InstSimplify] Add support for bitcasts BitCasts of BitCasts can be folded away as can BitCasts which don't change the type of the operand. llvm-svn: 276698	2016-07-25 22:04:58 +00:00
Matt Arsenault	7cddfed7e8	Scalarizer: Support scalarizing intrinsics llvm-svn: 276681	2016-07-25 20:02:54 +00:00
Evgeniy Stepanov	8d78bd5041	Fix invalid iterator use in safestack coloring. llvm-svn: 276676	2016-07-25 19:25:40 +00:00
Rong Xu	705f7775bb	[PGO] Fix profile mismatch in COMDAT function with pre-inliner Pre-instrumentation inline (pre-inliner) greatly improves the IR instrumentation code performance, among other benefits. One issue of the pre-inliner is it can introduce CFG-mismatch for COMDAT functions. This is due to the fact that the same COMDAT function may have different early inline decisions across different modules -- that means different copies of COMDAT functions will have different CFG checksum. In this patch, we propose a partially renaming the COMDAT group and its member function/variable so we have different profile counter for each version. We will post-fix the COMDAT function and the group name with its FunctionHash. Differential Revision: http://reviews.llvm.org/D22600 llvm-svn: 276673	2016-07-25 18:45:37 +00:00
Sean Silva	fe5abd5e0c	Fix : Partial Inliner requires AssumptionCacheTracker The public InlineFunction utility assumes that the passed in InlineFunctionInfo has a valid AssumptionCacheTracker. Patch by River Riddle! Differential Revision: https://reviews.llvm.org/D22706 llvm-svn: 276609	2016-07-25 05:00:00 +00:00
David Majnemer	68623a0e9f	[GVNHoist] Merge metadata on hoisted instructions less conservatively We can combine metadata from multiple instructions intelligently for certain metadata nodes. llvm-svn: 276602	2016-07-25 02:21:25 +00:00
David Majnemer	4728569d0a	[GVNHoist] Properly merge alignments when hoisting If we two loads of two different alignments, we must use the minimum of the two alignments when hoisting. Same deal for stores. For allocas, use the maximum of the two allocas. llvm-svn: 276601	2016-07-25 02:21:23 +00:00
Elena Demikhovsky	376a18bd92	[Loop Vectorizer] Handling loops FP induction variables. Allowed loop vectorization with secondary FP IVs. Like this: float *A; float x = init; for (int i=0; i < N; ++i) { A[i] = x; x -= fp_inc; } The auto-vectorization is possible when the induction binary operator is "fast" or the function has "unsafe" attribute. Differential Revision: https://reviews.llvm.org/D21330 llvm-svn: 276554	2016-07-24 07:24:54 +00:00
Sanjay Patel	1271bf9178	[InstCombine] allow icmp (bit-manipulation-intrinsic(), C) folds for vectors llvm-svn: 276523	2016-07-23 13:06:49 +00:00
Xinliang David Li	9239245401	[Profile] Use explicit flag to enable IR PGO Patch by Jake VanAdrighem Differential Revision: http://reviews.llvm.org/D22607 llvm-svn: 276516	2016-07-23 04:28:52 +00:00
Sanjay Patel	8d8594acb9	auto-generate checks llvm-svn: 276501	2016-07-23 00:09:54 +00:00
Adam Nemet	9e6e63fba2	[LoopDataPrefetch] Include hotness of region in opt remark llvm-svn: 276488	2016-07-22 22:53:17 +00:00
Sanjay Patel	e063ddb347	add tests for icmp vector folds llvm-svn: 276482	2016-07-22 22:19:52 +00:00
Michael Kuperstein	38e7298093	[SLPVectorizer] Vectorize reverse-order loads in horizontal reductions When vectorizing a tree rooted at a store bundle, we currently try to sort the stores before building the tree, so that the stores can be vectorized. For other trees, the order of the root bundle - which determines the order of all other bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads happens to appear in the wrong order, we will not vectorize it. This is partially mitigated when the root is a binary operator, by trying to build a "reversed" tree when that's considered profitable. This patch extends the workaround we have for binops to trees rooted in a horizontal reduction. This fixes PR28474. Differential Revision: https://reviews.llvm.org/D22554 llvm-svn: 276477	2016-07-22 21:28:48 +00:00
Sanjay Patel	cbc4377af1	add tests for icmp vector folds llvm-svn: 276476	2016-07-22 21:28:20 +00:00
Sanjay Patel	97e61dcc2d	add tests for icmp vector folds llvm-svn: 276475	2016-07-22 21:13:08 +00:00
Sanjay Patel	b73d7aed71	add tests for icmp vector folds llvm-svn: 276472	2016-07-22 21:02:33 +00:00
Sanjay Patel	859278005d	update to use FileCheck and auto-generate checks llvm-svn: 276466	2016-07-22 20:39:07 +00:00
Sanjay Patel	296a776a5b	add tests for icmp vector folds llvm-svn: 276464	2016-07-22 20:11:08 +00:00
Jun Bum Lim	6a7dc5c430	Recommit - [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals Recommiting r275571 after fixing crash reported in PR28270. Now we erase elements of IOL in deleteDeadInstruction(). Original Summary: This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics. Add test cases which was missing opportunities before. llvm-svn: 276452	2016-07-22 18:27:24 +00:00
Sanjay Patel	beaea95a0d	add tests for vector bit manipulation intrinsics llvm-svn: 276451	2016-07-22 18:22:25 +00:00
Anna Thomas	0be4a0e6a4	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: apilipenko, reames Subscribers: llvm-commits llvm-svn: 276447	2016-07-22 17:49:40 +00:00
David Majnemer	522a91181a	Don't remove side effecting instructions due to ConstantFoldInstruction Just because we can constant fold the result of an instruction does not imply that we can delete the instruction. It may have side effects. This fixes PR28655. llvm-svn: 276389	2016-07-22 04:54:44 +00:00
Sanjoy Das	aae623f4c2	[IRCE] Don't misuse CHECK-LABEL; NFC llvm-svn: 276373	2016-07-22 00:41:02 +00:00
Sanjoy Das	bb969791b4	[IRCE] Add an option to skip profitability checks If `-irce-skip-profitability-checks` is passed in, IRCE will kick in in all cases where it is legal for it to kick in. This flag is intended to help diagnose and analyse performance issues. llvm-svn: 276372	2016-07-22 00:40:56 +00:00
Sebastian Pop	31fd506623	GVH-hoist: only clone GEPs (PR28606) Do not clone stored values unless they are GEPs that are special cased to avoid hoisting them without hoisting their associated ld/st. Differential revision: https://reviews.llvm.org/D22652 llvm-svn: 276358	2016-07-21 23:22:10 +00:00
Wei Mi	1cf58f8996	[PM] Port NaryReassociate to the new PM Differential Revision: https://reviews.llvm.org/D22648 llvm-svn: 276349	2016-07-21 22:28:52 +00:00
Sanjay Patel	e9fc79bb13	[InstSimplify] don't crash handling a pointer or aggregate type llvm-svn: 276345	2016-07-21 21:56:00 +00:00
Sanjay Patel	a3bfb4e313	[InstSimplify] recognize trunc + icmp sgt/slt variants of select simplifications (PR28466) rL245171 exposed a hole in InstSimplify that manifested in a strange way in PR28466: https://llvm.org/bugs/show_bug.cgi?id=28466 It's possible to use trunc + icmp sgt/slt in place of an and + icmp eq/ne, so we need to recognize that pattern to eliminate selects that are choosing between some value and some bitmasked version of that value. Note that there is significant room for improvement (refactoring) and enhancement (more patterns, possibly in InstCombine rather than here). Differential Revision: https://reviews.llvm.org/D22537 llvm-svn: 276341	2016-07-21 21:26:45 +00:00
Adam Nemet	84a6425d61	[OptDiag,LDist] Convert remaining opt remarks to use the new API llvm-svn: 276340	2016-07-21 21:21:34 +00:00
Matthew Simpson	102729cf1b	[LV] Move vector int induction update to end of latch This patch moves the update instruction for vectorized integer induction phi nodes to the end of the latch block. This ensures consistent placement of all induction updates across all the kinds of int inductions we create (scalar, splat vector, or vector phi). Differential Revision: https://reviews.llvm.org/D22416 llvm-svn: 276339	2016-07-21 21:20:15 +00:00
Sanjay Patel	9eec550a2b	add vector tests and a simpler version of the negative tests llvm-svn: 276328	2016-07-21 20:11:08 +00:00
Anna Thomas	c858faa244	Revert "Invariant start/end intrinsics overloaded for address space" This reverts commit r276316. llvm-svn: 276320	2016-07-21 19:06:28 +00:00
Anna Thomas	29b24dfe44	Invariant start/end intrinsics overloaded for address space Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: tstellarAMD, reames, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22519 llvm-svn: 276316	2016-07-21 18:41:44 +00:00
David Majnemer	825e4ab9e3	[GVNHoist] Preserve optimization hints which agree If we have optimization hints with agree with each other along different paths, preserve them. llvm-svn: 276248	2016-07-21 07:16:26 +00:00
David Majnemer	4808f26422	[GVNHoist] Don't wrongly preserve TBAA We hoisted loads/stores without taking into account which can cause miscompiles. llvm-svn: 276240	2016-07-21 05:59:53 +00:00
Adam Nemet	7cfd5971ab	[OptDiag,LV] Add hotness attribute to applied-optimization remarks Test coverage is provided by modifying the function in the FP-math testcase that we are allowed to vectorize. llvm-svn: 276223	2016-07-21 01:07:13 +00:00
Sanjay Patel	0753c06d9c	[InstCombine] LogicOpc (zext X), C --> zext (LogicOpc X, C) (PR28476) The benefits of this change include: 1. Remove DeMorgan-matching code that was added specifically to work-around the missing transform in http://reviews.llvm.org/rL248634. 2. Makes the DeMorgan transform work for vectors too. 3. Fix PR28476: https://llvm.org/bugs/show_bug.cgi?id=28476 Extending this transform to other casts and other associative operators may be useful too. See https://reviews.llvm.org/D22421 for a prerequisite for doing that though. Differential Revision: https://reviews.llvm.org/D22271 llvm-svn: 276221	2016-07-21 00:24:18 +00:00
Adam Nemet	0e0e2d5d26	[OptDiag,LV] Add hotness attribute to the derived analysis remarks This includes FPCompute and Aliasing. Testcase is based on no_fpmath.ll. llvm-svn: 276211	2016-07-20 23:50:32 +00:00
Sanjay Patel	5f3c70307d	[InstSimplify][InstCombine] don't crash when folding vector selects of icmp Differential Revision: https://reviews.llvm.org/D22602 llvm-svn: 276209	2016-07-20 23:40:01 +00:00
Justin Lebar	cd564c6b46	[NVPTX] Enable the load-store vectorizer on nvptx. Reviewers: tra Subscribers: jholewinski, arsenm, asbirlea Differential Revision: https://reviews.llvm.org/D22592 llvm-svn: 276196	2016-07-20 22:11:36 +00:00
Adam Nemet	5b3a5cf6b0	[OptDiag,LV] Add hotness attribute to analysis remarks The earlier change added hotness attribute to missed-optimization remarks. This follows up with the analysis remarks (the ones explaining the reason for the missed optimization). llvm-svn: 276192	2016-07-20 21:44:26 +00:00
David Majnemer	bd21012c6c	[GVNHoist] Don't hoist PHI nodes We hoisted PHIs without respecting their special insertion point in the block, leading to verfier errors. This fixes PR28626. llvm-svn: 276181	2016-07-20 21:05:01 +00:00
Davide Italiano	15ff2d6d0c	[SCCP] Zap multiple return values. We can replace the return values with undef if we replaced all the call uses with a constant/undef. Differential Revision: https://reviews.llvm.org/D22336 llvm-svn: 276174	2016-07-20 20:17:13 +00:00
Justin Lebar	a272c12b73	[LSV] Don't move stores across may-load instrs, and loosen restrictions on moving loads. Summary: Previously we wouldn't move loads/stores across instructions that had side-effects, where that was defined as may-write or may-throw. But this is not sufficiently restrictive: Stores can't safely be moved across instructions that may load. This patch also adds a DEBUG check that all instructions in our chain are either loads or stores. Reviewers: asbirlea Subscribers: llvm-commits, jholewinski, arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D22547 llvm-svn: 276171	2016-07-20 20:07:37 +00:00
Justin Lebar	62b03e344e	[LSV] Vectorize up to side-effecting instructions. Summary: Previously if we had a chain that contained a side-effecting instruction, we wouldn't vectorize it at all. Now we'll vectorize everything that comes before the side-effecting instruction. Reviewers: asbirlea Subscribers: arsenm, jholewinski, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22536 llvm-svn: 276170	2016-07-20 20:07:34 +00:00
Sanjay Patel	c0812702f8	minimize tests and auto-generate checks llvm-svn: 276147	2016-07-20 17:58:20 +00:00
Benjamin Kramer	b4d64cf27d	Revert "[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))" Makes InstCombine infloop when compiling v8. This reverts commit r275989 and r276105. llvm-svn: 276106	2016-07-20 11:40:16 +00:00
Tobias Grosser	8c6201b49f	[InstCombine] Provide more test cases for cast-folding [NFC] Summary: In r275989 we enabled the folding of `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))`. Here we add more test cases to assure this folding works for all logical operations `and`/`or`/`xor`. Reviewers: grosser Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22561 Contributed-by: Matthias Reisinger llvm-svn: 276105	2016-07-20 11:24:27 +00:00
Simon Pilgrim	1b4f511aaa	[X86][SSE] Add cost model values for CTPOP of vectors This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104	2016-07-20 10:41:28 +00:00
David Majnemer	a75736087d	Forgot to add a test for r276008. llvm-svn: 276082	2016-07-20 04:13:05 +00:00
Adam Nemet	67c8929a2c	[LV] Add hotness attribute to missed-optimization remarks The new OptimizationRemarkEmitter analysis pass is hooked up to both new and old PM passes. llvm-svn: 276080	2016-07-20 04:03:43 +00:00
Michael Zolotukhin	6bc56d552a	Revert "Revert r275883 and r275891. They seem to cause PR28608." This reverts commit r276064, and thus reapplies r275891 and r275883 with a fix for PR28608. llvm-svn: 276077	2016-07-20 01:55:27 +00:00
Justin Lebar	6114b37838	[LSV] Don't assume that loads/stores appear in address order in the BB. Summary: getVectorizablePrefix previously didn't work properly in the face of aliasing loads/stores. It unwittingly assumed that the loads/stores appeared in the BB in address order. If they didn't, it would do the wrong thing. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22535 llvm-svn: 276072	2016-07-20 00:55:12 +00:00
Sean Silva	554efb28d2	Revert r275883 and r275891. They seem to cause PR28608. Revert "[LoopSimplify] Update LCSSA after separating nested loops." This reverts commit r275891. Revert "[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form." This reverts commit r275883. llvm-svn: 276064	2016-07-19 23:54:29 +00:00
Sean Silva	e3c18a5ae8	[PM] Port LoopUnroll. We just set PreserveLCSSA to always true since we don't have an analogous method `mustPreserveAnalysisID(LCSSA)`. Also port LoopInfo verifier pass to test LoopUnrollPass. llvm-svn: 276063	2016-07-19 23:54:23 +00:00
Justin Lebar	8778c62629	[LSV] Insert stores at the right point. Summary: Previously, the insertion point for stores was the last instruction in Chain before calling getVectorizablePrefixEndIdx. Thus if getVectorizablePrefixEndIdx didn't return Chain.size(), we still would insert at the last instruction in Chain. This patch changes our internal API a bit in an attempt to make it less prone to this sort of error. As a result, we end up recalculating the Chain's boundary instructions, but I think worrying about the speed hit of this is a premature optimization right now. Reviewers: asbirlea, tstellarAMD Subscribers: mzolotukhin, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D22534 llvm-svn: 276056	2016-07-19 23:19:20 +00:00
Justin Lebar	d9446d3770	[LSV] Add detail to correct-order.ll test. Summary: This helps keep us honest -- there were a number of ways we could screw up and still have passed this test. Reviewers: asbirlea Subscribers: llvm-commits, arsenm Differential Revision: https://reviews.llvm.org/D22531 llvm-svn: 276053	2016-07-19 23:18:59 +00:00
Sanjay Patel	d4ea94eb94	regenerate checks llvm-svn: 276042	2016-07-19 22:32:15 +00:00
Sanjay Patel	2d477e59e8	[InstCombine] fold add(zext(xor X, C), C) --> sext X when C is INT_MIN in the source type The pattern may look more obviously like a sext if written as: define i32 @g(i16 %x) { %zext = zext i16 %x to i32 %xor = xor i32 %zext, 32768 %add = add i32 %xor, -32768 ret i32 %add } We already have that fold in visitAdd(). Differential Revision: https://reviews.llvm.org/D22477 llvm-svn: 276035	2016-07-19 22:09:34 +00:00
Sanjay Patel	47c04f9543	add even more missing tests for simplifySelectBitTest() llvm-svn: 276024	2016-07-19 20:47:00 +00:00
David Majnemer	5246e0b2c2	[FunctionAttrs] Correct the safety analysis for inference of 'returned' We skipped over ReturnInsts which didn't return an argument which would lead us to incorrectly conclude that an argument returned by another ReturnInst was 'returned'. This reverts commit r275756. This fixes PR28610. llvm-svn: 276008	2016-07-19 18:50:26 +00:00
David Majnemer	07ea344222	Add a testcase for r275581 llvm-svn: 276002	2016-07-19 17:52:41 +00:00
Sanjay Patel	8b76ebe5b8	add tests related to PR28466 llvm-svn: 275995	2016-07-19 17:07:35 +00:00
Sanjay Patel	d2ff6d727f	add missing test for simplifySelectBitTest() llvm-svn: 275990	2016-07-19 16:49:55 +00:00
Tobias Grosser	1c38262279	[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp)) Summary: Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one: > Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp. Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check: `if ((!isa<ICmpInst>(Cast0Src) \|\| !isa<ICmpInst>(Cast1Src)) && ...` This check seems to sort out more cases than necessary because: - the reverse transformation is obviously done for `or` instructions only - and also not every `zext icmp` pair is necessarily the result of this reverse transformation Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`). As an example, consider the following IR snippet ``` %1 = icmp sgt i64 %a, %b %2 = zext i1 %1 to i8 %3 = icmp slt i64 %a, %c %4 = zext i1 %3 to i8 %5 = and i8 %2, %4 ``` which would now be transformed to ``` %1 = icmp sgt i64 %a, %b %2 = icmp slt i64 %a, %c %3 = and i1 %1, %2 %4 = zext i1 %3 to i8 ``` This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code. Reviewers: grosser, vtjnash, majnemer Subscribers: majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D22511 Contributed-by: Matthias Reisinger llvm-svn: 275989	2016-07-19 16:39:17 +00:00
Simon Pilgrim	0ea8d275cc	[X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead. It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match). This patch changes both scalar and packed versions back to using x86-specific builtins. It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding. A companion clang patch is at D22105 Differential Revision: https://reviews.llvm.org/D22106 llvm-svn: 275981	2016-07-19 15:07:43 +00:00
George Burgess IV	5f30897b7b	[MemorySSA] Update to the new shiny walker. This patch updates MemorySSA's use-optimizing walker to be more accurate and, in some cases, faster. Essentially, this changed our core walking algorithm from a cache-as-you-go DFS to an iteratively expanded DFS, with all of the caching happening at the end. Said expansion happens when we hit a Phi, P; we'll try to do the smallest amount of work possible to see if optimizing above that Phi is legal in the first place. If so, we'll expand the search to see if we can optimize to the next phi, etc. An iteratively expanded DFS lets us potentially quit earlier (because we don't assume that we can optimize above all phis) than our old walker. Additionally, because we don't cache as we go, we can now optimize above loops. As an added bonus, this patch adds a ton of verification (if EXPENSIVE_CHECKS are enabled), so finding bugs is easier. Differential Revision: https://reviews.llvm.org/D21777 llvm-svn: 275940	2016-07-19 01:29:15 +00:00
Wei Mi	79997a24d7	Recommit the patch "Use uniforms set to populate VecValuesToIgnore". For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 The recommit fixed the testcase global_alias.ll. llvm-svn: 275936	2016-07-19 00:50:43 +00:00
Sanjoy Das	ab73c9d88e	[LoopReroll] Reroll loops with unordered atomic memory accesses Reviewers: hfinkel, jfb, reames Subscribers: mcrosier, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22385 llvm-svn: 275932	2016-07-19 00:23:54 +00:00
Dehao Chen	6132ee8502	[PM] Convert Loop Strength Reduce pass to new PM Summary: Convert Loop String Reduce pass to new PM Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D22468 llvm-svn: 275919	2016-07-18 21:41:50 +00:00
Teresa Johnson	2124157102	[PM] Port FunctionImport Pass to new PM Summary: Port FunctionImport Pass to new PM. Reviewers: mehdi_amini, davide Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D22475 llvm-svn: 275916	2016-07-18 21:22:24 +00:00
Wei Mi	f9afff71a2	Revert rL275912. llvm-svn: 275915	2016-07-18 21:14:43 +00:00
Wei Mi	1fd25726af	Use uniforms set to populate VecValuesToIgnore. For instructions in uniform set, they will not have vector versions so add them to VecValuesToIgnore. For induction vars, those only used in uniform instructions or consecutive ptrs instructions have already been added to VecValuesToIgnore above. For those induction vars which are only used in uniform instructions or non-consecutive/non-gather scatter ptr instructions, the related phi and update will also be added into VecValuesToIgnore set. The change will make the vector RegUsages estimation less conservative. Differential Revision: https://reviews.llvm.org/D20474 llvm-svn: 275912	2016-07-18 20:59:53 +00:00
Sanjay Patel	dbf44f5016	add tests for missed sext transform llvm-svn: 275908	2016-07-18 20:37:51 +00:00
Sanjay Patel	8a2bf3099f	auto-generate checks llvm-svn: 275899	2016-07-18 20:06:51 +00:00
Michael Zolotukhin	ea5b72825b	[LoopSimplify] Update LCSSA after separating nested loops. Summary: Usually LCSSA survives this transformation, but in some cases (see attached test) it doesn't: values from the original loop after separating might be used from the outer loop. Before the transformation it was the same loop, so LCSSA phis were not required. This fixes PR28272. Reviewers: sanjoy, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21665 llvm-svn: 275891	2016-07-18 19:44:19 +00:00
Michael Zolotukhin	7a3040dc83	[LCSSA] Post-process PHI-nodes created by SSAUpdate when constructing LCSSA form. Summary: SSAUpdate might insert PHI-nodes inside loops, which can break LCSSA form unless we fix it up. This fixes PR28424. Reviewers: sanjoy, chandlerc, hfinkel Subscribers: uabelho, llvm-commits Differential Revision: http://reviews.llvm.org/D21997 llvm-svn: 275883	2016-07-18 19:05:08 +00:00
Adam Nemet	d6ba0bf831	[LoopDist] This test does not require ASSERTS Only its counterpart, diagnostics-with-hotness-lazy-BFI.ll, which invokes opt with -debug-only=. llvm-svn: 275812	2016-07-18 16:37:32 +00:00
Adam Nemet	b2593f78ca	[LoopDist] Port to new PM Summary: The direct motivation for the port is to ensure that the OptRemarkEmitter tests work with the new PM. This remains a function pass because we not only create multiple loops but could also version the original loop. In the test I need to invoke opt with -passes='require<aa>,loop-distribute'. LoopDistribute does not directly depend on AA however LAA does. LAA uses getCachedResult so I think we need manually pull in 'aa'. Reviewers: davidxl, silvas Subscribers: sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22437 llvm-svn: 275811	2016-07-18 16:29:27 +00:00
Alexander Kornienko	63dd36faa5	Revert "r275571 [DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals" Causes https://llvm.org/bugs/show_bug.cgi?id=28588 llvm-svn: 275801	2016-07-18 15:51:31 +00:00
Simon Pilgrim	1b2ab113fb	[SLPVectorizer][X86] Added sqrt vectorization tests llvm-svn: 275788	2016-07-18 13:20:54 +00:00
David Majnemer	04c7c225a1	[GVNHoist] Change the key for VNtoInsns to a pair While debugging GVNHoist, I found it confusing that the entries in a VNtoInsns were not always value numbers. They _usually_ were except for StoreInst in which case they were a hash of two different value numbers. This leads to two observations: - It is more difficult to debug things when the semantic contents of VNtoInsns changes over time. - Using a single value number is not much cheaper, the value of VNtoInsns is a SmallVector. - It is not immediately clear what the algorithm would do if there were hash collisions in the StoreInst case. Using a DenseMap of std::pair sidesteps all of this. N.B. The changes in the test were due their sensitivity to the iteration order of VNtoInsns which has changed. llvm-svn: 275761	2016-07-18 06:11:37 +00:00
NAKAMURA Takumi	966bde50c3	Revert r275678, "Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute"" This reverts also r275029, "Update Clang tests after adding inference for the returned argument attribute" It broke LTO build. Seems miscompilation. llvm-svn: 275756	2016-07-18 03:23:25 +00:00
Davide Italiano	4edd54794b	[GVN] Move other PRE tests to a subdirectory. llvm-svn: 275742	2016-07-17 23:55:20 +00:00
Davide Italiano	ed8e0881c1	[GVN] Move the PRE/LOADPRE test in a subdirectory. llvm-svn: 275741	2016-07-17 23:48:18 +00:00
Davide Italiano	6a69f829bd	[GVN] Use FileCheck instead of grep for tests. llvm-svn: 275739	2016-07-17 23:21:26 +00:00
Teresa Johnson	cd21a646f6	[ThinLTO] Perform profile-guided indirect call promotion Summary: To enable profile-guided indirect call promotion in ThinLTO mode, we simply add call graph edges for each profitable target from the profile to the summaries, then the summary-guided importing will consider the callee for importing as usual. Also we need to enable the indirect call promotion pass creation in the PassManagerBuilder when PerformThinLTO=true (we are in the ThinLTO backend), so that the newly imported functions are considered for promotion in the backends. The IC promotion profiles refer to callees by GUID, which required adding GUIDs to the per-module VST in bitcode (and assigning them valueIds similar to how they are assigned valueIds in the combined index). Reviewers: mehdi_amini, xur Subscribers: mehdi_amini, davidxl, llvm-commits Differential Revision: http://reviews.llvm.org/D21932 llvm-svn: 275707	2016-07-17 14:47:01 +00:00
Dehao Chen	1a44452b11	[PM] Convert IVUsers analysis to new pass manager. Summary: Convert IVUsers analysis to new pass manager. Reviewers: davidxl, silvas Subscribers: junbuml, sanjoy, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D22434 llvm-svn: 275698	2016-07-16 22:51:33 +00:00
Sanjay Patel	79acd2a96b	[InstCombine] allow X + signbit --> X ^ signbit for vector splats llvm-svn: 275691	2016-07-16 18:29:26 +00:00
Sanjay Patel	040bd16e56	add vector test to show missing transform llvm-svn: 275690	2016-07-16 18:24:18 +00:00
Sanjay Patel	eb50476f77	update tests to use FileCheck, consolidate tests, fix comments llvm-svn: 275688	2016-07-16 18:08:22 +00:00
Sanjay Patel	e540a31f59	update test to use FileCheck llvm-svn: 275687	2016-07-16 16:31:58 +00:00
Sanjay Patel	972a53fb42	auto-generate checks llvm-svn: 275686	2016-07-16 16:27:58 +00:00
Sanjay Patel	309f98ef3a	auto-ggenerate checks llvm-svn: 275685	2016-07-16 16:24:06 +00:00
Sanjay Patel	f9d2b20daf	[InstCombine] reassociate logic ops with constants separated by a zext This is a partial implementation of a general fold for associative+commutative operators: (op (cast (op X, C2)), C1) --> (cast (op X, op (C1, C2))) (op (cast (op X, C2)), C1) --> (op (cast X), op (C1, C2)) There are 7 associative operators and 13 cast types, so this could potentially go a lot further. Differential Revision: https://reviews.llvm.org/D22421 llvm-svn: 275684	2016-07-16 15:20:19 +00:00
Hal Finkel	660096b260	Revert "Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute" This reverts commit r275042; the initial commit triggered self-hosting failures on ARM/AArch64. James Molloy identified the problematic backend code, which has been disabled in r275677. Trying again... Original commit message: Let FuncAttrs infer the 'returned' argument attribute A function can have one argument with the 'returned' attribute, indicating that the associated argument is always the return value of the function. Add FuncAttrs inference logic. llvm-svn: 275678	2016-07-16 07:21:28 +00:00
Matt Arsenault	93be6e8c0a	StructurizeCFG: Fix inverting constantexpr conditions llvm-svn: 275626	2016-07-15 22:13:16 +00:00
Jingyue Wu	2b353a9522	[ReassociateGEP] Update tests to allow missing "inbounds" on certain GEPs. With r275532 fixing miscompilation of GVN, "inbounds" on certain GEPs in these tests cannot be preserved any more. Left a TODO in the tests for future reference. llvm-svn: 275596	2016-07-15 18:47:17 +00:00
Sanjay Patel	27fefb2fcf	add tests for associative ops blocked by a cast These are more generalized versions of the cases added in r275302 and r275297. llvm-svn: 275594	2016-07-15 18:39:02 +00:00
Rong Xu	96a19d35ae	[PGO] IRPGO pre-cleanup pass changes This patch adds a selected set of cleanup passes including a pre-inline pass before LLVM IR PGO instrumentation. The inline is only intended to apply those obvious/trivial ones before instrumentation so that much less instrumentation is needed to get better profiling information. This will drastically improve the instrumented code performance for large C++ applications. Another benefit is the context sensitive counts that can potentially improve the PGO optimization. Differential Revision: http://reviews.llvm.org/D21405 llvm-svn: 275588	2016-07-15 18:10:49 +00:00
Adam Nemet	aad816083e	[OptRemark,LDist] RFC: Add hotness attribute Summary: This is the first set of changes implementing the RFC from http://thread.gmane.org/gmane.comp.compilers.llvm.devel/98334 This is a cross-sectional patch; rather than implementing the hotness attribute for all optimization remarks and all passes in a patch set, it implements it for the 'missed-optimization' remark for Loop Distribution. My goal is to shake out the design issues before scaling it up to other types and passes. Hotness is computed as an integer as the multiplication of the block frequency with the function entry count. It's only printed in opt currently since clang prints the diagnostic fields directly. E.g.: remark: /tmp/t.c:3:3: loop not distributed: use -Rpass-analysis=loop-distribute for more info (hotness: 300) A new API added is similar to emitOptimizationRemarkMissed. The difference is that it additionally takes a code region that the diagnostic corresponds to. From this, hotness is computed using BFI. The new API is exposed via an analysis pass so that it can be made dependent on LazyBFI. (Thanks to Hal for the analysis pass idea.) This feature can all be enabled by setDiagnosticHotnessRequested in the LLVM context. If this is off, LazyBFI is not calculated (D22141) so there should be no overhead. A new command-line option is added to turn this on in opt. My plan is to switch all user of emitOptimizationRemark* to use this module instead. Reviewers: hfinkel Subscribers: rcox2, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D21771 llvm-svn: 275583	2016-07-15 17:23:20 +00:00
Jun Bum Lim	a5737d8eac	[DSE]Enhance shorthening MemIntrinsic based on OverlapIntervals Summary: This change use the overlap interval map built from partial overwrite tracking to perform shortening MemIntrinsics. Add test cases which was missing opportunities before. Reviewers: hfinkel, eeckstein, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D21909 llvm-svn: 275571	2016-07-15 16:14:34 +00:00
Sebastian Pop	4177480aad	code hoisting pass based on GVN This pass hoists duplicated computations in the program. The primary goal of gvn-hoist is to reduce the size of functions before inline heuristics to reduce the total cost of function inlining. Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki. Important algorithmic contributions by Daniel Berlin under the form of reviews. Differential Revision: http://reviews.llvm.org/D19338 llvm-svn: 275561	2016-07-15 13:45:20 +00:00
David Majnemer	959a6623b5	XFAIL two SeparateConstOffsetFromGEP tests They appear to have relied on bugs hidden in copyIRFlags/andIRFlags. This has been filed as PR28564. llvm-svn: 275533	2016-07-15 05:37:22 +00:00
David Majnemer	92f84ccf0f	[IR] andIRFlags and copyIRFlags needs to handle GEP We didn't consider the inbounds flag on GEPs leading to downstream users introducing UB. This fixes PR28562. llvm-svn: 275532	2016-07-15 05:02:31 +00:00
Adam Nemet	74730d9ab0	[LoopDist] Fix typo in diagnostic llvm-svn: 275495	2016-07-14 22:33:46 +00:00
Ekaterina Romanova	7aea5906c0	[GVN] Fold constant expression in GVN. Fix for PR 28418. opt never finishes compiling a test when -gvn option is passed. The problem is caused by the fact that GVN fails to fold a constant expression. Differential Revision: https://reviews.llvm.org/D22185 llvm-svn: 275483	2016-07-14 22:02:25 +00:00
Matthew Simpson	65ca32b83c	[LV] Allow interleaved accesses in loops with predicated blocks This patch allows the formation of interleaved access groups in loops containing predicated blocks. However, the predicated accesses are prevented from forming groups. Differential Revision: https://reviews.llvm.org/D19694 llvm-svn: 275471	2016-07-14 20:59:47 +00:00
Sanjoy Das	13623ad009	[JumpThreading] PRE unordered loads Summary: Extend JumpThreading's PRE to unordered atomic loads. Reviewers: hfinkel, reames Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D22326 llvm-svn: 275456	2016-07-14 19:21:15 +00:00
Jun Bum Lim	c837af306e	[PM] Port Dead Loop Deletion Pass to the new PM Summary: Port Dead Loop Deletion Pass to the new pass manager. Reviewers: silvas, davide Subscribers: llvm-commits, sanjoy, mcrosier Differential Revision: https://reviews.llvm.org/D21483 llvm-svn: 275453	2016-07-14 18:28:29 +00:00
Nico Weber	755cd760cd	Revert r275401, it caused PR28551. llvm-svn: 275420	2016-07-14 14:41:25 +00:00
Matthew Simpson	3c3b4a257b	[LV] Avoid unnecessary IV scalar-to-vector-to-scalar conversions This patch prevents increases in the number of instructions, pre-instcombine, due to induction variable scalarization. An increase in instructions can lead to an increase in the compile-time required to simplify the induction variables. We now maintain a new map for scalarized induction variables to prevent us from converting between the scalar and vector forms. This patch should resolve compile-time regressions seen after r274627. llvm-svn: 275419	2016-07-14 14:36:06 +00:00
Sjoerd Meijer	716abbb2f5	This converts a signed remainder instruction to unsigned remainder, which enables the code size optimisation to fold a rem and div into a single aeabi_uidivmod call. This was not happening before because sdiv was converted but srem not, and instructions with different signedness are not combined. Differential Revision: http://reviews.llvm.org/D22214 llvm-svn: 275403	2016-07-14 12:23:48 +00:00
Sebastian Pop	63847d04e7	code hoisting pass based on GVN This pass hoists duplicated computations in the program. The primary goal of gvn-hoist is to reduce the size of functions before inline heuristics to reduce the total cost of function inlining. Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki. Important algorithmic contributions by Daniel Berlin under the form of reviews. Differential Revision: http://reviews.llvm.org/D19338 llvm-svn: 275401	2016-07-14 12:18:53 +00:00
Sjoerd Meijer	38c2cd0c14	This implements a more optimal algorithm for selecting a base constant in constant hoisting. It not only takes into account the number of uses and the cost of expressions in which constants appear, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding. Differential Revision: http://reviews.llvm.org/D21183 llvm-svn: 275382	2016-07-14 07:44:20 +00:00
David Majnemer	666aa945a5	[InstCombine] Masked loads with undef masks can fold to normal loads We were able to fold masked loads with an all-ones mask to a normal load. However, we couldn't turn a masked load with a mask with mixed ones and undefs into a normal load. llvm-svn: 275380	2016-07-14 06:58:42 +00:00
David Majnemer	17a95aaa7b	Simplify llvm.masked.load w/ undef masks We can always pick the passthru value if the mask is undef: we are permitted to treat the mask as-if it were filled with zeros. llvm-svn: 275379	2016-07-14 06:58:37 +00:00
Davide Italiano	7dac027ed7	[IPSCCP] Constant fold struct argument/instructions when all the lattice values are constant. This now should also work with the interprocedural variant of the pass. Slightly easier now that the yak is shaved. Differential Revision: http://reviews.llvm.org/D22329 llvm-svn: 275363	2016-07-14 02:51:41 +00:00
Mehdi Amini	8484f92f7f	[Scalarizer] PR28108: Skip over nullptr rather than crashing on it. Summary: In Scalarizer::gather we see if we already have a scattered form of Op, and in that case use the new form. In the particular case of PR28108, the found ValueVector SV has size 2, where the first Value is nullptr, and the second is indeed a proper Value. The nullptr then caused an assert to blow when we tried to do cast<Instruction>(SV[I]). With this patch we check SV[I] before doing the cast, and if it's nullptr we just skip over it. I don't know the Scalarizer well enough to know if this is the best fix or if something should be done else where to prevent the nullptr from being in the ValueVector at all, but at least this avoids the crash and looking at the test case output it looks reasonable. Reviewers: hfinkel, frasercrmck, wala, mehdi_amini Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21518 llvm-svn: 275359	2016-07-14 01:31:25 +00:00
David Majnemer	7f781aba97	[ConstantFolding] Fold masked loads We can constant fold a masked load if the operands are appropriately constant. Differential Revision: http://reviews.llvm.org/D22324 llvm-svn: 275352	2016-07-14 00:29:50 +00:00
David Majnemer	f89660aba7	[ConstantFolding] Extend FoldReinterpretLoadFromConstPtr to handle negative offsets Treat loads which clip before the start of a global initializer the same way we treat clipping beyond the end of the initializer: use zeros. llvm-svn: 275345	2016-07-13 23:33:07 +00:00
Alina Sbirlea	640a61cd8b	Extended LoadStoreVectorizer to vectorize subchains. Summary: LSV used to abort vectorizing a chain for interleaved load/store accesses that alias. Allow a valid prefix of the chain to be vectorized, mark just the prefix and retry vectorizing the remaining chain. Reviewers: llvm-commits, jlebar, arsenm Subscribers: mzolotukhin Differential Revision: http://reviews.llvm.org/D22119 llvm-svn: 275317	2016-07-13 21:20:01 +00:00
Andrew Kaylor	346dd7f1bd	Reverting r275284 due to platform-specific test failures llvm-svn: 275304	2016-07-13 19:09:16 +00:00
Sanjay Patel	eff2aa70fc	add more tests for zexty xor sandwiches ...mmm sandwiches llvm-svn: 275302	2016-07-13 18:58:55 +00:00
Sanjay Patel	904a88025a	add test for zexty xor sandwich llvm-svn: 275297	2016-07-13 18:40:38 +00:00
Sanjay Patel	c00e48a3db	[InstCombine] extend vector select matching for non-splat constants In D21740, we discussed trying to make this a more general matcher. However, I didn't see a clean way to handle the regular m_Not cases and these non-splat vector patterns, so I've opted for the direct approach here. If there are other potential uses of areInverseVectorBitmasks(), we could move that helper function to a higher level. There is an open question as to which is of these forms should be considered the canonical IR: %sel = select <4 x i1> <i1 true, i1 false, i1 false, i1 true>, <4 x i32> %a, <4 x i32> %b %shuf = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 0, i32 5, i32 6, i32 3> Differential Revision: http://reviews.llvm.org/D22114 llvm-svn: 275289	2016-07-13 18:07:02 +00:00
Andrew Kaylor	12cccdd731	Fix for Bug 26903, adds support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 275284	2016-07-13 17:25:11 +00:00
David Majnemer	1b3db33e3d	[ConstantFolding] Don't treat negative GEP offsets as positive GEP offsets are signed, don't treat them as huge positive numbers. llvm-svn: 275251	2016-07-13 05:16:16 +00:00
Dehao Chen	9cba1f4e7e	New pass manager for LICM. Summary: Port LICM to the new pass manager. Reviewers: davidxl, silvas Subscribers: krasin, vitalybuka, silvas, davide, sanjoy, llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21772 llvm-svn: 275222	2016-07-12 22:37:48 +00:00
Michael Kuperstein	a99c46cc73	[LV] Remove wrong assumption about LCSSA The LCSSA pass itself will not generate several redundant PHI nodes in a single exit block. However, such redundant PHI nodes don't violate LCSSA form, and may be introduced by passes that preserve LCSSA, and/or preserved by the LCSSA pass itself. So, assuming a single PHI node per exit block is not safe. llvm-svn: 275217	2016-07-12 21:24:06 +00:00
Davide Italiano	0080269342	[SCCP] Constant fold structs if all the lattice value are constant. Differential Revision: http://reviews.llvm.org/D22269 llvm-svn: 275208	2016-07-12 19:54:19 +00:00
Dehao Chen	b9f8e29290	[PM] Port LoopIdiomRecognize Pass to new PM Summary: Port LoopIdiomRecognize Pass to new PM Reviewers: davidxl Subscribers: davide, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D22250 llvm-svn: 275202	2016-07-12 18:45:51 +00:00
Xinliang David Li	9eb472ba4b	[PGO] Don't include full file path in static function profile counter names Patch by Jake VanAdrighem Differential Revision: http://reviews.llvm.org/D22028 llvm-svn: 275193	2016-07-12 17:14:51 +00:00
Sanjay Patel	4a6a751dce	add tests for missing DeMorgan's Law folds llvm-svn: 275192	2016-07-12 17:05:04 +00:00
Sanjay Patel	3900191ecc	auto-generate checks llvm-svn: 275188	2016-07-12 16:21:55 +00:00
Sanjay Patel	93dffe629a	auto-generate checks llvm-svn: 275187	2016-07-12 16:17:30 +00:00
Sanjay Patel	6d1f227e6b	auto-generate checks llvm-svn: 275186	2016-07-12 16:13:04 +00:00
Vitaly Buka	204dc533c5	Revert "New pass manager for LICM." Summary: This reverts commit r275118. Subscribers: sanjoy, mehdi_amini Differential Revision: http://reviews.llvm.org/D22259 llvm-svn: 275156	2016-07-12 06:25:32 +00:00
Ivan Krasin	5474645dc8	Print remarks from WholeProgramDevirt pass for each call site. Summary: It's useful to have some visibility about which call sites are devirtualized, especially for debug purposes. Another use case is a regression test on the application side (like, Chromium). Reviewers: pcc Differential Revision: http://reviews.llvm.org/D22252 llvm-svn: 275145	2016-07-12 02:38:37 +00:00
Dehao Chen	7ef5820fa3	New pass manager for LICM. Summary: Port LICM to the new pass manager. Reviewers: davidxl, silvas Subscribers: silvas, davide, sanjoy, llvm-commits, mehdi_amini Differential Revision: http://reviews.llvm.org/D21772 llvm-svn: 275118	2016-07-11 22:45:24 +00:00
Alina Sbirlea	cbc6ac2afd	Correct ordering of loads/stores. Summary: Aiming to correct the ordering of loads/stores. This patch changes the insert point for loads to the position of the first load. It updates the ordering method for loads to insert before, rather than after. Before this patch the following sequence: "load a[1], store a[1], store a[0], load a[2]" Would incorrectly vectorize to "store a[0,1], load a[1,2]". The correctness check was assuming the insertion point for loads is at the position of the first load, when in practice it was at the last load. An alternative fix would have been to invert the correctness check. The current fix changes insert position but also requires reordering of instructions before the vectorized load. Updated testcases to reflect the changes. Reviewers: tstellarAMD, llvm-commits, jlebar, arsenm Subscribers: mzolotukhin Differential Revision: http://reviews.llvm.org/D22071 llvm-svn: 275117	2016-07-11 22:34:29 +00:00
Michael Kuperstein	f0c59330e9	[X86] Make some cast costs more precise Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106	2016-07-11 21:39:44 +00:00
Alina Sbirlea	327955e057	Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains. Add additional parameters: AddressSpace, Alignment, Fast. Reviewers: llvm-commits, jlebar Subscribers: arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21935 llvm-svn: 275100	2016-07-11 20:46:17 +00:00
Jingyue Wu	641cfee976	[SLSR] Call getPointerSizeInBits with the correct address space. llvm-svn: 275083	2016-07-11 18:13:28 +00:00
Davide Italiano	e8ae0b5eb4	[PM/IPO] Port LowerTypeTests to the new PassManager. There's a little bit of churn in this patch because the initialization mechanism is now shared between the old and the new PM. Other than that, it's just a pretty mechanical translation. llvm-svn: 275082	2016-07-11 18:10:06 +00:00
Dehao Chen	9232f98279	Implement callsite-hotness based inline cost for Sample-based PGO Summary: For sample-based PGO, using BFI to calculate callsite count is sometime not accurate. This is because with sampling based approach, if a callsite resides in a hot loop deeply nested in a bunch of cold branches, the callsite's BFI frequency would be inaccurately calculated due to lack of samples in the cold branch. E.g. if (A1 && A2 && A3 && ..... && A10) { for (i=0; i < 100000000; i++) { callsite(); } } Assume that A1 to A100 are all 100% taken, and callsite has 1000 samples and thus is considerred hot. Because the loop's trip count is huge, it's normal that all branches outside the loop has no sample at all. As a result, we can only use static branch probability to derive the the frequency of the loop header. Assuming that static heuristic thinks each branch is 50% taken, then the count calculated from BFI will be 1/(2^10) of the actual value. In order to get more accurate callsite count, we directly annotate the weight on the call instruction, and directly use it when checking callsite hotness. Note that this mechanism can also be shared by instrumentation based callsite hotness analysis. The side benefit is that it breaks the dependency from Inliner to BFI as call count is embedded in the IR. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22118 llvm-svn: 275073	2016-07-11 16:48:54 +00:00
Dehao Chen	29d2641f52	Tune the weight propagation algorithm for sample profile. Summary: Handle the case when there is only one incoming/outgoing edge for a visited basic block: use the block weight to adjust edge weight even when the edge has been visited before. This can help reduce inaccuracies introduced by incorrect basic block profile, as shown in the updated unittest. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D22180 llvm-svn: 275072	2016-07-11 16:40:17 +00:00
Nicolai Haehnle	889a20cf40	[Sink] Don't move calls to readonly functions across stores Summary: Reviewers: hfinkel, majnemer, tstellarAMD, sunfish Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17279 llvm-svn: 275066	2016-07-11 14:11:51 +00:00
Hal Finkel	02012bcfee	Revert r275027 - Let FuncAttrs infer the 'returned' argument attribute Reverting r275027 and r275033. These seem to cause miscompiles on the AArch64 buildbot. llvm-svn: 275042	2016-07-11 04:51:23 +00:00
Hal Finkel	2cac58f604	Pointer-comparison folding should look through returned-argument functions For functions which are known to return a specific argument, pointer-comparison folding can look through the function calls as part of its analysis. Differential Revision: http://reviews.llvm.org/D9387 llvm-svn: 275039	2016-07-11 03:37:59 +00:00
Hal Finkel	6fd5e1f02b	Teach computeKnownBits to look through returned-argument functions If a function is known to return one of its arguments, we can use that in order to compute known bits of the return value. Differential Revision: http://reviews.llvm.org/D9397 llvm-svn: 275036	2016-07-11 02:25:14 +00:00
Hal Finkel	d66a7b05db	Let FuncAttrs infer the 'returned' argument attribute A function can have one argument with the 'returned' attribute, indicating that the associated argument is always the return value of the function. Add FuncAttrs inference logic. Differential Revision: http://reviews.llvm.org/D22202 llvm-svn: 275027	2016-07-10 22:02:55 +00:00
Sean Silva	db90d4d9c1	[PM] Port LoopVectorize to the new PM. llvm-svn: 275000	2016-07-09 22:56:50 +00:00
Jingyue Wu	debce55ac3	[SLSR] Fix crash on handling 128-bit integers. ConstantInt::getSExtValue may fail on >64-bit integers. Add checks to call getSExtValue only on narrow integers. As a minor aside, simplify slsr-gep.ll to remove unnecessary load instructions. llvm-svn: 274982	2016-07-09 19:13:18 +00:00
Davide Italiano	92b933a55c	[PM] Port CrossDSOCFI to the new pass manager. llvm-svn: 274962	2016-07-09 03:25:35 +00:00
Davide Italiano	cd96cfd8df	[PM] Port LoopSimplify to the new pass manager. While here move simplifyLoop() function to the new header, as suggested by Chandler in the review. Differential Revision: http://reviews.llvm.org/D21404 llvm-svn: 274959	2016-07-09 03:03:01 +00:00
Piotr Padlewski	3b77612839	Add 'thinlto_src_module' md with asserts or -enable-import-metadata Summary: This way the metadata will be only generated when asserts enabled, or when -enable-import-metadata specified FIXED missing colon on requires. Reviewers: tejohnson, eraman, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D22167 llvm-svn: 274947	2016-07-08 23:01:49 +00:00
Piotr Padlewski	d4b792346c	Revert "Add 'thinlto_src_module' md with asserts or -enable-import-metadata" Reverting because of 17463 http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17463 This reverts commit d20cb431bba2ba43b4c65a8556cff445bfefbb7c. llvm-svn: 274946	2016-07-08 22:55:48 +00:00
Anna Thomas	9ad45adfd7	Revert "InstCombine rule to fold truncs whose value is available" This reverts commit r274853. Caused failure in ppcBE build llvm-svn: 274943	2016-07-08 22:15:08 +00:00
Piotr Padlewski	d6efefa2b8	Add 'thinlto_src_module' md with asserts or -enable-import-metadata Summary: This way the metadata will be only generated when asserts enabled, or when -enable-import-metadata specified Reviewers: tejohnson, eraman, mehdi_amini Subscribers: mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D22167 llvm-svn: 274938	2016-07-08 21:25:39 +00:00
Sanjay Patel	664514f7fe	[InstCombine] don't form select from bitcasted logic ops if bitcasts have >1 use This isn't a sure thing (are 2 extra bitcasts less expensive than a logic op?), but we'll try to err on the conservative side by going with the case that has less IR instructions. Note: This question came up in http://reviews.llvm.org/D22114 , but this part is independent of that patch proposal, so I'm making this small change ahead of that one. See also: http://reviews.llvm.org/rL274926 llvm-svn: 274932	2016-07-08 21:17:51 +00:00
Sanjay Patel	5246482c7a	add another multi-use test for logic->select transform llvm-svn: 274929	2016-07-08 21:08:16 +00:00
Sanjay Patel	f4a08ede03	[InstCombine] don't form select from logic ops if it's unlikely that we'll eliminate any ops llvm-svn: 274926	2016-07-08 20:53:29 +00:00
Sanjay Patel	297a0e67b6	adjust test so it won't completely optimize away llvm-svn: 274925	2016-07-08 20:35:53 +00:00
Sanjay Patel	0733e6b61c	add tests for multi-use folding to select llvm-svn: 274922	2016-07-08 20:22:27 +00:00
Dehao Chen	429f5c735f	Remove inline hints computation from SampleProfile.cpp Summary: As we will move to use uniformed hotness check in inliner, we do not need inline hints in SampleProfile pass any more. Reviewers: dnovillo, davidxl Subscribers: eraman, llvm-commits Differential Revision: http://reviews.llvm.org/D19287 llvm-svn: 274918	2016-07-08 20:12:44 +00:00
Davide Italiano	d555bde59f	[SCCP] Fold constants as we build them whne visiting cast instructions. This should be slightly more efficient and could avoid spurious overdefined markings, as Eli pointed out. Differential Revision: http://reviews.llvm.org/D22122 llvm-svn: 274905	2016-07-08 19:13:40 +00:00
Sanjay Patel	1b6b824548	[InstCombine] check for one-use before turning simple logic op into a select llvm-svn: 274891	2016-07-08 17:26:47 +00:00
Simon Pilgrim	4ca42e232d	[SLPVectorizer][X86] Added fma vectorization tests llvm-svn: 274889	2016-07-08 17:19:13 +00:00
Sanjay Patel	910ce0d511	add test to show multi-use output llvm-svn: 274887	2016-07-08 17:12:27 +00:00
Sanjay Patel	cbfca9e8ef	[InstCombine] allow or(sext(A), B) --> A ? -1 : B transform for vectors llvm-svn: 274883	2016-07-08 17:01:15 +00:00
Sanjay Patel	647174c8a4	add vector tests to show missing transform llvm-svn: 274876	2016-07-08 16:39:53 +00:00
Sanjay Patel	46df968326	minimize tests The cmp and load aren't required. llvm-svn: 274864	2016-07-08 16:11:48 +00:00
Sanjay Patel	e1acad9b61	regenerate checks llvm-svn: 274860	2016-07-08 16:06:38 +00:00
Anna Thomas	3124f6273a	InstCombine rule to fold truncs whose value is available We can fold truncs whose operand feeds from a load, if the trunc value is available through a prior load/store. This change is from: http://reviews.llvm.org/D21246, which folded the trunc but missed the bitcast or ptrtoint/inttoptr required in the RAUW call, when the load type didnt match the prior load/store type. Differential Revision: http://reviews.llvm.org/D21791 llvm-svn: 274853	2016-07-08 15:18:56 +00:00
Simon Pilgrim	4f1877fb57	[X86][SSE] Improve constant folding tests for CVTSD/CVTSS/CVTTSD/CVTTSS As discussed on D22106, improve the testing for constant folding sse scalar conversion intrinsics to ensure we are correctly handling special/out of range cases llvm-svn: 274846	2016-07-08 13:28:34 +00:00
Davide Italiano	16284df8ec	[PM] Port InstSimplify to the new pass manager. llvm-svn: 274796	2016-07-07 21:14:36 +00:00
Anna Thomas	6a78c78a03	[DSE] Remove dead stores in end blocks containing fence We can remove dead stores in the presence of fence instructions. Fence does not change an otherwise thread local store to visible. reviewers: reames, dexonsmith, jfb Differential Revision: http://reviews.llvm.org/D22001 llvm-svn: 274795	2016-07-07 20:51:42 +00:00
Sjoerd Meijer	17c08dc701	Code size optimisation: don't rewrite fputs to fwrite when optimising for size because fwrite requires more arguments and thus extra MOVs are required. llvm-svn: 274753	2016-07-07 13:56:23 +00:00
David Majnemer	7afb46d3c8	[LoopAccessAnalysis] Fix an integer overflow We were inappropriately using 32-bit types to account for quantities that can be far larger. Fixed in PR28443. llvm-svn: 274737	2016-07-07 06:24:36 +00:00
Elena Demikhovsky	fc1e969dfc	Fixed a bug in vectorizing GEP before gather/scatter intrinsic. Vectorizing GEP was incorrect and broke SSA in some cases. The patch fixes PR27997 https://llvm.org/bugs/show_bug.cgi?id=27997. Differential revision: http://reviews.llvm.org/D22035 llvm-svn: 274735	2016-07-07 06:06:46 +00:00
Sean Silva	59fe82f4ce	[PM] Port TailCallElim llvm-svn: 274708	2016-07-06 23:48:41 +00:00
Sean Silva	b025d375a1	[PM] Port CorrelatedValuePropagation llvm-svn: 274705	2016-07-06 23:26:29 +00:00
Sanjay Patel	65a51c25c1	[InstCombine] enhance (select X, C1, C2 --> ext X) to handle vectors By replacing dyn_cast of ConstantInt with m_Zero/m_One/m_AllOnes, we allow these transforms for splat vectors. Differential Revision: http://reviews.llvm.org/D21899 llvm-svn: 274696	2016-07-06 22:23:01 +00:00
Chad Rosier	232e29ebea	[MemorySSA] Reinstate the legacy printer and verifier. Differential Revision: http://reviews.llvm.org/D22058 llvm-svn: 274679	2016-07-06 21:20:47 +00:00
Haicheng Wu	a95cd1267f	[LIR] Fix mis-compilation with unwinding. To fix PR27859, bail out if there is an instruction may throw. Differential Revision: http://reviews.llvm.org/D20638 llvm-svn: 274673	2016-07-06 21:05:40 +00:00
Piotr Padlewski	6deaa6afae	Add 'thinlto_src_module' metadata to imported function Added metadata to be able to make statistics on how many functions that have been imported have been removed. Also module name might be helpfull when debugging. Reviewers: tejohnson, eraman Subscribers: mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D21943 llvm-svn: 274668	2016-07-06 20:26:25 +00:00
Justin Bogner	a463537a36	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0 Everywhere where cuda.syncthreads or __syncthreads is used, use the properly namespaced nvvm.barrier0 instead. llvm-svn: 274664	2016-07-06 20:02:45 +00:00
Chad Rosier	dcfce2d0ec	[DSE] Avoid iterator invalidation bugs. The dse_with_dbg_value.ll test committed with r273141 is removed because this we no longer performs any type of back tracking, which is what was causing the codegen differences with and without debug information. Differential Revision: http://reviews.llvm.org/D21613 llvm-svn: 274660	2016-07-06 19:48:52 +00:00
Sean Silva	f50d4b6cdc	Work around PR28400 a bit harder. We were still crashing in the "no change" case because LVI was not getting invalidated. See the thread "Should analyses be able to hold AssertingVH to IR? (related to PR28400)" for more discussion. llvm-svn: 274656	2016-07-06 19:05:41 +00:00
Michael Kuperstein	aa71bdd3af	[TTI] The cost model should not assume vector casts get completely scalarized The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642	2016-07-06 17:30:56 +00:00
Matthew Simpson	433cb1dfe3	[LV] Don't widen trivial induction variables We currently always vectorize induction variables. However, if an induction variable is only used for counting loop iterations or computing addresses with getelementptr instructions, we don't need to do this. Vectorizing these trivial induction variables can create vector code that is difficult to simplify later on. This is especially true when the unroll factor is greater than one, and we create vector arithmetic when computing step vectors. With this patch, we check if an induction variable is only used for counting iterations or computing addresses, and if so, scalarize the arithmetic when computing step vectors instead. This allows for greater simplification. This patch addresses the suboptimal pointer arithmetic sequence seen in PR27881. Reference: https://llvm.org/bugs/show_bug.cgi?id=27881 Differential Revision: http://reviews.llvm.org/D21620 llvm-svn: 274627	2016-07-06 14:26:59 +00:00
Elena Demikhovsky	971fbfda1e	Vector GEP test: renamed + some comments Differential revision: http://reviews.llvm.org/D21957 llvm-svn: 274611	2016-07-06 08:11:23 +00:00
Daniel Berlin	fc7e651bfd	Fix handling of forward unreachable but reverse-reachable blocks in MemorySSA construction llvm-svn: 274606	2016-07-06 05:32:05 +00:00
Sanjay Patel	cbaac41856	[InstCombine] enable vector select of bools -> logic folds llvm-svn: 274465	2016-07-03 14:34:39 +00:00
Sanjay Patel	42396ae0ea	add vector bool select tests and regenerate checks for scalar bool select tests llvm-svn: 274460	2016-07-03 13:26:02 +00:00
Sean Silva	45835e731d	Remove dead TLI arg of isKnownNonNull and propagate deadness. NFC. This actually uncovered a surprisingly large chain of ultimately unused TLI args. From what I can gather, this argument is a remnant of when isKnownNonNull would look at the TLI directly. The current approach seems to be that InferFunctionAttrs runs early in the pipeline and uses TLI to annotate the TLI-dependent non-null information as return attributes. This also removes the dependence of functionattrs on TLI altogether. llvm-svn: 274455	2016-07-02 23:47:27 +00:00
Michael Kuperstein	071d8306b0	[PM] Port ConstantHoisting to the new Pass Manager Differential Revision: http://reviews.llvm.org/D21945 llvm-svn: 274411	2016-07-02 00:16:47 +00:00
Alina Sbirlea	8d8aa5dd6c	Address two correctness issues in LoadStoreVectorizer Summary: GetBoundryInstruction returns the last instruction as the instruction which follows or end(). Otherwise the last instruction in the boundry set is not being tested by isVectorizable(). Partially solve reordering of instructions. More extensive solution to follow. Reviewers: tstellarAMD, llvm-commits, jlebar Subscribers: escha, arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21934 llvm-svn: 274389	2016-07-01 21:44:12 +00:00
Duncan P. N. Exon Smith	e60719b3fa	Revert "add tests for bugs fixed by the GVN hoist pass" This reverts commit r274327 since the tests fail. E.g.: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17240 It looks like this commit is building on r274305, but that commit caused a miscompile and was reverted in r274320. llvm-svn: 274332	2016-07-01 04:55:13 +00:00
Sebastian Pop	196ba4f844	add tests for bugs fixed by the GVN hoist pass https://llvm.org/bugs/show_bug.cgi?id=20242 https://llvm.org/bugs/show_bug.cgi?id=22005 llvm-svn: 274327	2016-07-01 03:03:19 +00:00
Matt Arsenault	0101ecade0	LoadStoreVectorizer: Don't increase alignment with no align set If no alignment was set on the load/stores, it would vectorize to the new type even though this increases the default alignment. llvm-svn: 274323	2016-07-01 02:09:38 +00:00
Matt Arsenault	370e8226c7	LoadStoreVectorizer: Check TTI for vec reg bit width llvm-svn: 274322	2016-07-01 02:07:22 +00:00
Matt Arsenault	42ad17059a	LoadStoreVectorizer: Fix assert when merging pointer ops This needs to use inttoptr/ptrtoint if combining an int and pointer load. If a pointer is used always do an integer load. llvm-svn: 274321	2016-07-01 01:55:52 +00:00
Duncan P. N. Exon Smith	9d1f156418	Revert "code hoisting pass based on GVN" This reverts commit r274305, since it breaks self-hosting: http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/22349/ http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/17232 Note that the blamelist on lab.llvm.org:8011 is incorrect. The previous build was r274299, but somehow r274305 wasn't included in the blamelist: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules llvm-svn: 274320	2016-07-01 01:51:40 +00:00
Matt Arsenault	241f34cde8	LoadStoreVectorizer: Use AA metadata This was not passing the full instruction with metadata to the alias query. llvm-svn: 274318	2016-07-01 01:47:46 +00:00
Matt Arsenault	d7e8898bdd	LoadStoreVectorizer: if one element of a vector is integer, default to integer. Fixes issues on some architectures where we use arithmetic ops to build vectors, which can cause bad things to happen for loads/stores of mixed types. Patch by Fiona Glaser llvm-svn: 274307	2016-07-01 00:37:01 +00:00
Matt Arsenault	8a4ab5e19f	LoadStoreVectorizer: Fix crashes on sub-byte types llvm-svn: 274306	2016-07-01 00:36:54 +00:00
Sebastian Pop	5c5798c57c	code hoisting pass based on GVN This pass hoists duplicated computations in the program. The primary goal of gvn-hoist is to reduce the size of functions before inline heuristics to reduce the total cost of function inlining. Pass written by Sebastian Pop, Aditya Kumar, Xiaoyu Hu, and Brian Rzycki. Important algorithmic contributions by Daniel Berlin under the form of reviews. Differential Revision: http://reviews.llvm.org/D19338 llvm-svn: 274305	2016-07-01 00:24:31 +00:00
Matt Arsenault	079d0f19a2	LoadStoreVectorizer: Check skipFunction first. Also add test I forgot to add to r274296. llvm-svn: 274299	2016-06-30 23:50:18 +00:00
Matt Arsenault	2cbe52b990	LoadStoreVectorizer: Skip optnone functions llvm-svn: 274296	2016-06-30 23:30:29 +00:00
Matt Arsenault	08debb0244	Add LoadStoreVectorizer pass This was contributed by Apple, and I've been working on minimal cleanups and generalizing it. llvm-svn: 274293	2016-06-30 23:11:38 +00:00
Matt Arsenault	727e279ac4	SLPVectorizer: Move propagateMetadata to VectorUtils This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. llvm-svn: 274281	2016-06-30 21:17:59 +00:00
Wei Mi	95685faeee	Refine the set of UniformAfterVectorization instructions. Except the seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop. Differential Revision: http://reviews.llvm.org/D21755 llvm-svn: 274262	2016-06-30 18:42:56 +00:00
Jun Bum Lim	596a3bd9ec	[DSE] Fix bug in partial overwrite tracking Summary: Found cases where DSE incorrectly add partially-overwritten intervals. Please see the test case for details. Reviewers: mcrosier, eeckstein, hfinkel Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D21859 llvm-svn: 274237	2016-06-30 15:32:20 +00:00
Sanjay Patel	7c6eab5777	[InstCombine] shrink switch conditions better (PR24766) https://llvm.org/bugs/show_bug.cgi?id=24766#c2 This removes a hack that was added for the benefit of x86 codegen. It prevented shrinking the switch condition even to smaller legal (DataLayout) types. We have a safety mechanism in CGP after: http://reviews.llvm.org/rL251857 ...so we're free to use the optimal (smallest) IR type now. Differential Revision: http://reviews.llvm.org/D12965 llvm-svn: 274233	2016-06-30 14:51:21 +00:00
Sanjay Patel	7ad98babfa	[InstCombine] extend matchSelectFromAndOr() to work with i1 scalar types If the incoming types are i1, then we don't have to pattern match any sext ops. Differential Revision: http://reviews.llvm.org/D21740 llvm-svn: 274228	2016-06-30 14:18:18 +00:00
Sanjay Patel	348111f4b9	add vector tests to show missing transform llvm-svn: 274192	2016-06-30 00:09:13 +00:00
Sanjay Patel	c3701e8b92	regenerate checks llvm-svn: 274188	2016-06-29 23:58:39 +00:00
Evgeniy Stepanov	a5da256f92	StackColoring for SafeStack. This is a fix for PR27842. An IR-level implementation of stack coloring tailored to work with SafeStack. It is a bit weaker than the MI implementation in that it does not the "lifetime start at first access" logic. This can be improved in the future. This patch also replaces the naive implementation of stack frame layout with a greedy algorithm that can split existing stack slots and even fit small objects inside the alignment padding of other objects. llvm-svn: 274162	2016-06-29 20:37:43 +00:00
Tim Shen	aec68b263d	[InstCombine] Simplify and correct folding fcmps with the same children Summary: Take advantage of FCmpInst::Predicate's bit pattern and handle (fcmp , x, y) \| (fcmp , x, y) and (fcmp , x, y) & (fcmp , x, y) more consistently. Also fold more FCmpInst::FCMP_FALSE and FCmpInst::FCMP_TRUE to constants. Currently InstCombine wrongly folds (fcmp ogt, x, y) \| (fcmp ord, x, y) to (fcmp ogt, x, y); this patch also fixes that. Reviewers: spatel Subscribers: llvm-commits, iteratee, echristo Differential Revision: http://reviews.llvm.org/D21775 llvm-svn: 274156	2016-06-29 20:10:17 +00:00
Tim Shen	860a67eb4c	[InstCombine, NFC] Change the generated variable names by creating new instructions This removes some noise for D21775's test changes. llvm-svn: 274155	2016-06-29 20:10:13 +00:00
Tim Shen	4561e784f4	[InstCombine] Add full tests for FoldAndOfFCmps and FoldOrOfFCmps Summary: This adds tests for covering all cases that FoldAndOfFCmps and FoldOrOfFCmps handle. Reviewers: spatel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21844 llvm-svn: 274144	2016-06-29 17:55:11 +00:00
Elena Demikhovsky	5e21c94f25	Reverted patch 273864 llvm-svn: 274115	2016-06-29 10:01:06 +00:00
Craig Topper	df7454f94b	Revert "[ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition." This is breaking an optimizaton remark test in clang. I've identified a couple fixes for that, but want to understand it better before I commit to anything. llvm-svn: 274102	2016-06-29 04:57:00 +00:00
Craig Topper	2cc199baff	[ValueTracking] Teach computeKnownBits for PHI nodes to compute sign bit for a recurrence with a NSW addition. If a operation for a recurrence is an addition with no signed wrap and both input sign bits are 0, then the result sign bit must also be 0. Similar for the negative case. I found this deficiency while playing around with a loop in the x86 backend that contained a signed division that could be optimized into an unsigned division if we could prove both inputs were positive. One of them being the loop induction variable. With this patch we can perform the conversion for this case. One of the test cases here is a contrived variation of the loop I was looking at. Differential revision: http://reviews.llvm.org/D21493 llvm-svn: 274098	2016-06-29 03:46:47 +00:00
Eric Christopher	0c58837b1f	Revert "[InstCombine] Avoid combining the bitcast of a var that is used as both address and result of load instructions" Revert "[InstCombine] Combine A->B->A BitCast" as this appears to cause PR27996 and as discussed in http://reviews.llvm.org/D20847 This reverts commits r270135 and r263734. llvm-svn: 274094	2016-06-29 03:05:58 +00:00
Weiming Zhao	5410edddb1	[ARM] Fix 28282: cost computation for constant hoisting Summary: This fixes bug: https://llvm.org/bugs/show_bug.cgi?id=28282 Currently the cost model of constant hoisting checks the bit width of the data type of the constants. However, the actual immediate value is small enough and not need to be hoisted. This patch checks for the actual bit width needed for the constant. Reviewers: t.p.northover, rengolin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D21668 llvm-svn: 274073	2016-06-28 22:30:45 +00:00
Sanjay Patel	3a0f2606ec	minimize regression tests and update checks llvm-svn: 274047	2016-06-28 18:40:08 +00:00
Sanjay Patel	8ce43c098b	minimize regression tests and update checks llvm-svn: 274046	2016-06-28 18:33:10 +00:00
Artur Pilipenko	7ad95ec22d	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043	2016-06-28 18:27:25 +00:00
Adam Nemet	bd861acf29	[LLE] Don't hoist conditionally executed loads If the load is conditional we can't hoist its 0-iteration instance to the preheader because that would make it unconditional. Thus we would access a memory location that the original loop did not access. llvm-svn: 273991	2016-06-28 04:02:47 +00:00
Easwaran Raman	22eb80a114	Fix size computation of array allocation in inline cost analysis Differential revision: http://reviews.llvm.org/D21690 llvm-svn: 273952	2016-06-27 22:31:53 +00:00
Sanjay Patel	59ed2ffca3	[InstCombine] shrink type of sdiv if dividend is sexted and constant divisor is small enough (PR28153) This should fix PR28153: https://llvm.org/bugs/show_bug.cgi?id=28153 Differential Revision: http://reviews.llvm.org/D21769 llvm-svn: 273951	2016-06-27 22:27:11 +00:00
Sanjay Patel	5cdf699daa	add tests for PR28153 llvm-svn: 273936	2016-06-27 20:28:59 +00:00
Elena Demikhovsky	6f2ec8104a	Fixed crash of SLP Vectorizer on KNL The bug is connected to vector GEPs. https://llvm.org/bugs/show_bug.cgi?id=28313 llvm-svn: 273919	2016-06-27 20:07:00 +00:00
Sanjay Patel	c6ada53be5	[InstCombine] use m_APInt for div --> ashr fold The APInt matcher works with splat vectors, so we get this fold for vectors too. llvm-svn: 273897	2016-06-27 17:25:57 +00:00
Artur Pilipenko	72f76b8805	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895	2016-06-27 16:54:33 +00:00
Easwaran Raman	1832bf6aee	[PM] Port PartialInlining to the new PM Differential revision: http://reviews.llvm.org/D21699 llvm-svn: 273894	2016-06-27 16:50:18 +00:00
Artur Pilipenko	a36aa41519	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892	2016-06-27 16:29:26 +00:00
Elena Demikhovsky	f65e865e33	Removed extra test from the prev commit. llvm-svn: 273865	2016-06-27 11:40:49 +00:00
Elena Demikhovsky	4c58b2761a	Fixed consecutive memory access detection in Loop Vectorizer. It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) to++ = from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Re-commit rL273257 - revision: http://reviews.llvm.org/D20789 llvm-svn: 273864	2016-06-27 11:19:23 +00:00
Igor Breger	7357849dca	[ConstantFolding] Fix bitcast vector of i1. Differential Revision: http://reviews.llvm.org/D21735 llvm-svn: 273845	2016-06-27 06:42:54 +00:00
Sanjay Patel	1d745384da	add tests for potential select transforms llvm-svn: 273833	2016-06-26 23:44:21 +00:00
Sanjoy Das	a37bb4a65d	[LoopUnswitch] Unswitch on conditions feeding into guards Summary: This is a straightforward extension of what LoopUnswitch does to branches to guards. That is, we unswitch ``` for (;;) { ... guard(loop_invariant_cond); ... } ``` into ``` if (loop_invariant_cond) { for (;;) { ... // There is no need to emit guard(true) ... } } else { for (;;) { ... guard(false); // SimplifyCFG will clean this up by adding an // unreachable after the guard(false) ... } } ``` Reviewers: majnemer Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D21725 llvm-svn: 273801	2016-06-26 05:10:45 +00:00
Sanjay Patel	51ff79fd82	update tests to use FileCheck llvm-svn: 273784	2016-06-25 17:39:10 +00:00
David Majnemer	e14e7bc4b8	Revert "[SimplifyCFG] Stop inserting calls to llvm.trap for UB" This reverts commit r273778, it seems to break UBSan :/ llvm-svn: 273779	2016-06-25 08:19:55 +00:00
David Majnemer	d346a37737	[SimplifyCFG] Stop inserting calls to llvm.trap for UB SimplifyCFG had logic to insert calls to llvm.trap for two very particular IR patterns: stores and invokes of undef/null. While InstCombine canonicalizes certain undefined behavior IR patterns to stores of undef, phase ordering means that this cannot be relied upon in general. There are much better tools than llvm.trap: UBSan and ASan. N.B. I could be argued into reverting this change if a clear argument as to why it is important that we synthesize llvm.trap for stores, I'd be hard pressed to see why it'd be useful for invokes... llvm-svn: 273778	2016-06-25 08:04:19 +00:00
David Majnemer	bb53d23ef8	[InstSimplify] Replace calls to null with undef Calling null is undefined behavior, we can simplify the resulting value to undef. llvm-svn: 273777	2016-06-25 07:37:30 +00:00
David Majnemer	1fea77c6fc	[SimplifyCFG] Replace calls to null/undef with unreachable Calling null is undefined behavior, a call to undef can be trivially treated as a call to null. llvm-svn: 273776	2016-06-25 07:37:27 +00:00
Sanjoy Das	f63768cbfc	[PlaceSafepoints] Don't call undef in test case; NFC llvm-svn: 273764	2016-06-25 01:40:54 +00:00
Sanjoy Das	d850068282	[LoopUnswitch] Avoid exponential behavior Summary: (No semantic change intended). Reviewers: majnemer, bogner, mzolotukhin Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D21707 llvm-svn: 273763	2016-06-25 01:14:19 +00:00
David Majnemer	0f45572761	The absence of noreturn doesn't ensure mayReturn There are two separate issues: - LLVM doesn't consider infinite loops to be side effects: we happily hoist/sink above/below loops whose bounds are unknown. - The absence of the noreturn attribute is insufficient for us to know if a function will definitely return. Relying on noreturn in the middle-end for any property is an accident waiting to happen. llvm-svn: 273762	2016-06-25 00:55:12 +00:00
Peter Collingbourne	0312f614b1	IR: Introduce llvm.type.checked.load intrinsic. This intrinsic safely loads a function pointer from a virtual table pointer using type metadata. This intrinsic is used to implement control flow integrity in conjunction with virtual call optimization. The virtual call optimization pass will optimize away llvm.type.checked.load intrinsics associated with devirtualized calls, thereby removing the type check in cases where it is not needed to enforce the control flow integrity constraint. This patch also introduces the capability to copy type metadata between global variables, and teaches the virtual call optimization pass to do so. Differential Revision: http://reviews.llvm.org/D21121 llvm-svn: 273756	2016-06-25 00:23:04 +00:00
David Majnemer	b8da3a2bb2	Reinstate r273711 r273711 was reverted by r273743. The inliner needs to know about any call sites in the inlined function. These were obscured if we replaced a call to undef with an undef but kept the call around. This fixes PR28298. llvm-svn: 273753	2016-06-25 00:04:10 +00:00
Michael Kuperstein	83b753d430	[PM] Port float2int to the new pass manager Differential Revision: http://reviews.llvm.org/D21704 llvm-svn: 273747	2016-06-24 23:32:02 +00:00
Dehao Chen	c66a06ad0e	Hookup ProfileSummary with SampleProfilerLoader Summary: Set ProfileSummary in SampleProfilerLoader. Reviewers: davidxl, eraman Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21702 llvm-svn: 273745	2016-06-24 22:57:06 +00:00
Nico Weber	ae2ef4ccd4	Revert r273711, it caused PR28298. llvm-svn: 273743	2016-06-24 22:52:39 +00:00
Peter Collingbourne	7efd750607	IR: New representation for CFI and virtual call optimization pass metadata. The bitset metadata currently used in LLVM has a few problems: 1. It has the wrong name. The name "bitset" refers to an implementation detail of one use of the metadata (i.e. its original use case, CFI). This makes it harder to understand, as the name makes no sense in the context of virtual call optimization. 2. It is represented using a global named metadata node, rather than being directly associated with a global. This makes it harder to manipulate the metadata when rebuilding global variables, summarise it as part of ThinLTO and drop unused metadata when associated globals are dropped. For this reason, CFI does not currently work correctly when both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable globals, and fails to associate metadata with the rebuilt globals. As I understand it, the same problem could also affect ASan, which rebuilds globals with a red zone. This patch solves both of those problems in the following way: 1. Rename the metadata to "type metadata". This new name reflects how the metadata is currently being used (i.e. to represent type information for CFI and vtable opt). The new name is reflected in the name for the associated intrinsic (llvm.type.test) and pass (LowerTypeTests). 2. Attach metadata directly to the globals that it pertains to, rather than using the "llvm.bitsets" global metadata node as we are doing now. This is done using the newly introduced capability to attach metadata to global variables (r271348 and r271358). See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html Differential Revision: http://reviews.llvm.org/D21053 llvm-svn: 273729	2016-06-24 21:21:32 +00:00
Michael Kuperstein	82d5da5aac	[PM] Port PreISelIntrinsicLowering to the new PM llvm-svn: 273713	2016-06-24 20:13:42 +00:00
David Majnemer	3b3e954ea2	SimplifyInstruction does not imply DCE We cannot remove an instruction with no uses just because SimplifyInstruction succeeds. It may have side effects. llvm-svn: 273711	2016-06-24 19:34:46 +00:00
Reid Kleckner	fbd5eef691	Revert "InstCombine rule to fold trunc when value available" This reverts commit r273608. Broke building code with sanitizers, where apparently these kinds of loads, casts, and truncations are common: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/24502 http://crbug.com/623099 llvm-svn: 273703	2016-06-24 18:42:58 +00:00
Sanjay Patel	f8b08f7179	[InstCombine] consolidate commutation variants of matchSelectFromAndOr() in one place; NFCI By putting all the possible commutations together, we simplify the code. Note that this is NFCI, but I'm adding tests that actually exercise each commutation pattern because we don't have this anywhere else. llvm-svn: 273702	2016-06-24 18:26:02 +00:00
Matthew Simpson	e794678404	[LV] Preserve order of dependences in interleaved accesses analysis The interleaved access analysis currently assumes that the inserted run-time pointer aliasing checks ensure the absence of dependences that would prevent its instruction reordering. However, this is not the case. Issues can arise from how code generation is performed for interleaved groups. For a load group, all loads in the group are essentially moved to the location of the first load in program order, and for a store group, all stores in the group are moved to the location of the last store. For groups having members involved in a dependence relation with any other instruction in the loop, this reordering can violate the dependence. This patch teaches the interleaved access analysis how to avoid breaking such dependences, and should fix PR27626. An assumption of the original analysis was that the accesses had been collected in "program order". The analysis was then simplified by visiting the accesses bottom-up. However, this ordering was never guaranteed for anything other than single basic block loops. Thus, this patch also enforces the desired ordering. Reference: https://llvm.org/bugs/show_bug.cgi?id=27626 Differential Revision: http://reviews.llvm.org/D19984 llvm-svn: 273687	2016-06-24 15:33:25 +00:00
Chuang-Yu Cheng	68f7f1cf00	Teaching SimplifyCFG to recognize the Or-Mask trick that InstCombine uses to reduce the number of comparisons. Specifically, InstCombine can turn: (i == 5334 \|\| i == 5335) into: ((i \| 1) == 5335) SimplifyCFG was already able to detect the pattern: (i == 5334 \|\| i == 5335) to: ((i & -2) == 5334) This patch supersedes D21315 and resolves PR27555 (https://llvm.org/bugs/show_bug.cgi?id=27555). Thanks to David and Chandler for the suggestions! Author: Thomas Jablin (tjablin) Reviewers: majnemer chandlerc halfdan cycheng http://reviews.llvm.org/D21397 llvm-svn: 273639	2016-06-24 01:59:00 +00:00
Anna Thomas	31a0b2088f	InstCombine rule to fold trunc when value available Summary: This instcombine rule folds away trunc operations that have value available from a prior load or store. This kind of code can be generated as a result of GVN widening the load or from source code as well. Reviewers: reames, majnemer, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21246 llvm-svn: 273608	2016-06-23 20:22:22 +00:00
Artur Pilipenko	80771b9ad9	Upgrade other old memset/memcpy signatures in tests causing buildbot failures with rL273568. llvm-svn: 273580	2016-06-23 16:34:52 +00:00
Michael Zolotukhin	2d3592d481	[LoopUnrollAnalyzer] Fix a bug in UnrolledInstAnalyzer::visitLoad. When simplifying a load we need to make sure that the type of the simplified value matches the type of the instruction we're processing. In theory, we can handle casts here as we deal with constant data, but since it's not implemented at the moment, we at least need to bail out. This fixes PR28262. llvm-svn: 273562	2016-06-23 14:31:31 +00:00
Hal Finkel	a1271036c5	Allow DeadStoreElimination to track combinations of partial later wrties DeadStoreElimination can currently remove a small store rendered unnecessary by a later larger one, but could not remove a larger store rendered unnecessary by a series of later smaller ones. This adds that capability. It works by keeping a map, which is used as an effective interval map, for each store later overwritten only partially, and filling in that interval map as more such stores are discovered. No additional walking or aliasing queries are used. In the map forms an interval covering the the entire earlier store, then it is dead and can be removed. The map is used as an interval map by storing a mapping between the ending offset and the beginning offset of each interval. I discovered this problem when investigating a performance issue with code like this on PowerPC: #include <complex> using namespace std; complex<float> bar(complex<float> C); complex<float> foo(complex<float> C) { return bar(C)C; } which produces this: define void @_Z4testSt7complexIfE(%"struct.std::complex" noalias nocapture sret %agg.result, i64 %c.coerce) { entry: %ref.tmp = alloca i64, align 8 %tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"* %c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32 %c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32 %0 = bitcast i32 %c.sroa.0.0.extract.trunc to float %c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32 %1 = bitcast i32 %c.sroa.2.0.extract.trunc to float call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce) %2 = bitcast %"struct.std::complex"* %agg.result to i64* %3 = load i64, i64* %ref.tmp, align 8 store i64 %3, i64* %2, align 4 ; <--- *** THIS SHOULD NOT BE HERE ** %_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0 %4 = lshr i64 %3, 32 %5 = trunc i64 %4 to i32 %6 = bitcast i32 %5 to float %_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1 %7 = trunc i64 %3 to i32 %8 = bitcast i32 %7 to float %mul_ad.i.i = fmul fast float %6, %1 %mul_bc.i.i = fmul fast float %8, %0 %mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i %mul_ac.i.i = fmul fast float %6, %0 %mul_bd.i.i = fmul fast float %8, %1 %mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4 store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4 ret void } the problem here is not just that the i64 store is unnecessary, but also that it blocks further backend optimizations of the other uses of that i64 value in the backend. In the future, we might want to add a special case for handling smaller accesses (e.g. using a bit vector) if the map mechanism turns out to be noticeably inefficient. A sorted vector is also a possible replacement for the map for small numbers of tracked intervals. Differential Revision: http://reviews.llvm.org/D18586 llvm-svn: 273559	2016-06-23 13:46:39 +00:00
David Majnemer	d1fbf48566	[SCCP] Don't assume all Constants are ConstantInt This fixes PR28269. llvm-svn: 273521	2016-06-23 00:14:29 +00:00
Sanjay Patel	a06d989552	[ValueTracking] improve ComputeNumSignBits for vector constants This is similar to the computeKnownBits improvement in rL268479. There's probably more we can do for vector logic instructions, but this should let us see non-splat constant masking ops that can become vector selects instead of and/andn/or sequences. Differential Revision: http://reviews.llvm.org/D21610 llvm-svn: 273459	2016-06-22 19:20:59 +00:00
Artur Pilipenko	1cec4fdddf	Upgrade old memset/memcpy signatures (without isVolatile argument) in tests We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect. llvm-svn: 273428	2016-06-22 15:16:06 +00:00
Sanjay Patel	c6cacd6067	[InstSimplify] add ashr tests including vector types llvm-svn: 273421	2016-06-22 14:18:04 +00:00
Simon Pilgrim	bc35f9f702	[SLPVectorizer][X86] Added ceil/floor/nearbyint/rint/trunc vectorization tests llvm-svn: 273420	2016-06-22 14:07:46 +00:00
Sanjay Patel	21579bb39a	[InstSimplify] regenerate checks llvm-svn: 273419	2016-06-22 14:00:16 +00:00
Haicheng Wu	a783bac50b	[Kryo] Enable loop prefetcher. Differential Revision: http://reviews.llvm.org/D21535 llvm-svn: 273329	2016-06-21 22:47:56 +00:00
Easwaran Raman	8bceb9d210	Fix PR28219: Use profile summary from reader and not compute it Differentiaal revision: http://reviews.llvm.org/D21546 llvm-svn: 273301	2016-06-21 19:29:49 +00:00
Elena Demikhovsky	a266cf0518	reverted the prev commit due to assertion failure llvm-svn: 273258	2016-06-21 12:10:11 +00:00
Elena Demikhovsky	9823c995bc	Fixed consecutive memory access detection in Loop Vectorizer. It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) to++ = from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Differential revision: http://reviews.llvm.org/D20789 llvm-svn: 273257	2016-06-21 11:32:01 +00:00
Simon Pilgrim	356e823b51	[X86][SSE] Add cost model for BSWAP of vectors The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217	2016-06-20 23:08:21 +00:00
Sanjay Patel	9ad8fb68f7	[InstSimplify] analyze (optionally casted) icmps to eliminate obviously false logic (PR27869) By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question raised by PR27869: https://llvm.org/bugs/show_bug.cgi?id=27869 ...where InstCombine turns an icmp+zext into a shift causing us to miss the fold. Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp. Differential Revision: http://reviews.llvm.org/D21512 llvm-svn: 273200	2016-06-20 20:59:59 +00:00
Dehao Chen	071bb9d7af	Pass AssumptionCacheTracker from SampleProfileLoader to Inliner Summary: Inliner needs ACT when calling InlineFunction. Instead of nullptr, we need to pass it in from SampleProfileLoader Reviewers: davidxl Subscribers: eraman, vsk, danielcdh, llvm-commits Differential Revision: http://reviews.llvm.org/D21205 llvm-svn: 273199	2016-06-20 20:53:40 +00:00
Matt Arsenault	802ebcb4bb	InstCombine: Don't strip convergent from intrinsic callsites Specific instances of intrinsic calls may want to be convergent, such as certain register reads but the intrinsic declaration is not. llvm-svn: 273188	2016-06-20 19:04:44 +00:00
Sanjay Patel	445d7ecf89	[InstCombine] consolidate some icmp+logic tests and improve checks llvm-svn: 273186	2016-06-20 18:40:37 +00:00
Sanjay Patel	14dcb042bc	[InstCombine] update to use FileCheck with autogenerated exact checking llvm-svn: 273180	2016-06-20 18:23:40 +00:00
Sanjay Patel	06918ad79e	[InstCombine] update to use FileCheck with autogenerated exact checking llvm-svn: 273173	2016-06-20 17:56:13 +00:00
Sanjay Patel	a038240660	[InstCombine] regenerate checks llvm-svn: 273170	2016-06-20 17:48:48 +00:00
David Majnemer	c5601df9fd	Reapply "[LoopIdiom] Don't remove dead operands manually" This reverts commit r273160, reapplying r273132. RecursivelyDeleteTriviallyDeadInstructions cannot be called on a parentless Instruction. llvm-svn: 273162	2016-06-20 16:03:25 +00:00
Cong Liu	1c28b6d733	Revert "[LoopIdiom] Don't remove dead operands manually" This reverts commit r273132. Breaks multiple test under /llvm/test:Transforms (e.g. llvm/test:Transforms/LoopIdiom/basic.ll.test) under asan. llvm-svn: 273160	2016-06-20 15:22:15 +00:00
Patrik Hagglund	7205215591	Fix for PR27940 After a store has been eliminated, when making sure that the instruction iterator points to a valid instruction, dbg intrinsics are now ignored as a new instruction. Patch by Henric Karlsson. Reviewed by Daniel Berlin. Differential Revision: http://reviews.llvm.org/D21076 llvm-svn: 273141	2016-06-20 09:10:10 +00:00
David Majnemer	a705843f23	[LoopIdiom] Don't remove dead operands manually Removing dead instructions requires remembering which operands have already been removed. RecursivelyDeleteTriviallyDeadInstructions has this logic, don't partially reimplement it in LoopIdiomRecognize. This fixes PR28196. llvm-svn: 273132	2016-06-20 02:33:29 +00:00
Sanjay Patel	a4b052c7d1	[InstSimplify] add tests for PR27689; regenerate checks llvm-svn: 273128	2016-06-19 21:40:12 +00:00
David Majnemer	3119599475	[LoadCombine] Combine Loads formed from GEPS with negative indexes Change the underlying offset and comparisons to use int64_t instead of uint64_t. Patch by River Riddle! Differential Revision: http://reviews.llvm.org/D21499 llvm-svn: 273105	2016-06-19 06:14:56 +00:00
Matt Arsenault	a466a7cf62	Add looping testcase that broke in r272987 llvm-svn: 273081	2016-06-18 05:15:58 +00:00
Sanjoy Das	e8fd9561cb	[SCEV] Fix incorrect trip count computation The way we elide max expressions when computing trip counts is incorrect -- it breaks cases like this: ``` static int wrapping_add(int a, int b) { return (int)((unsigned)a + (unsigned)b); } void test() { volatile int end_buf = 2147483548; // INT_MIN - 100 int end = end_buf; unsigned counter = 0; for (int start = wrapping_add(end, 200); start < end; start++) counter++; print(counter); } ``` Note: the `NoWrap` variable that was being tested has little to do with the values flowing into the max expression; it is a property of the induction variable. test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test functionality I'm reverting in this change, so I've deleted the test fully. llvm-svn: 273079	2016-06-18 04:38:31 +00:00
Matt Arsenault	8fd5978811	Revert "Revert "Revert "InstCombine: Reduce trunc (shl x, K) width.""" This seems to be causing an infinite loop / crash in instcombine on some bots. llvm-svn: 273069	2016-06-17 23:36:38 +00:00
Adam Nemet	a9f09c6245	[LAA] Enable symbolic stride speculation for all LAA clients This is a functional change for LLE and LDist. The other clients (LV, LVerLICM) already had this explicitly enabled. The temporary boolean parameter to LAA is removed that allowed turning off speculation of symbolic strides. This makes LAA's caching interface LAA::getInfo only take the loop as the parameter. This makes the interface more friendly to the new Pass Manager. The flag -enable-mem-access-versioning is moved from LV to a LAA which now allows turning off speculation globally. llvm-svn: 273064	2016-06-17 22:35:41 +00:00
Matt Arsenault	d76efc14b9	Revert "Revert "InstCombine: Reduce trunc (shl x, K) width."" Reapply r272987. Condition should be in terms of the destination type, and the flags should not be copied. llvm-svn: 273045	2016-06-17 20:33:53 +00:00
Davide Italiano	b49aa5c0c4	[PM] Port MergedLoadStoreMotion to the new pass manager, take two. This is indeed a much cleaner approach (thanks to Daniel Berlin for pointing out), and also David/Sean for review. Differential Revision: http://reviews.llvm.org/D21454 llvm-svn: 273032	2016-06-17 19:10:09 +00:00
James Y Knight	148a6469dc	Support expanding partial-word cmpxchg to full-word cmpxchg in AtomicExpandPass. Many CPUs only have the ability to do a 4-byte cmpxchg (or ll/sc), not 1 or 2-byte. For those, you need to mask and shift the 1 or 2 byte values appropriately to use the 4-byte instruction. This change adds support for cmpxchg-based instruction sets (only SPARC, in LLVM). The support can be extended for LL/SC-based PPC and MIPS in the future, supplanting the ISel expansions those architectures currently use. Tests added for the IR transform and SPARCv9. Differential Revision: http://reviews.llvm.org/D21029 llvm-svn: 273025	2016-06-17 18:11:48 +00:00
Sanjay Patel	216d8cf720	[InstCombine] allow more than one use for vector bitcast folding with selects The motivating example for this transform is similar to D20774 where bitcasts interfere with a single cmp/select sequence, but in this case we have 2 uses of each bitcast to produce min and max ops: define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) { %cmp = fcmp olt <4 x float> %a, %b %bc1 = bitcast <4 x float> %a to <4 x i32> %bc2 = bitcast <4 x float> %b to <4 x i32> %sel1 = select <4 x i1> %cmp, <4 x i32> %bc1, <4 x i32> %bc2 %sel2 = select <4 x i1> %cmp, <4 x i32> %bc2, <4 x i32> %bc1 %bc3 = bitcast <4 x float>* %ptr1 to <4 x i32>* store <4 x i32> %sel1, <4 x i32>* %bc3 %bc4 = bitcast <4 x float>* %ptr2 to <4 x i32>* store <4 x i32> %sel2, <4 x i32>* %bc4 ret void } With this patch, we move the selects up to use the input args which allows getting rid of all of the bitcasts: define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) { %cmp = fcmp olt <4 x float> %a, %b %sel1.v = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b %sel2.v = select <4 x i1> %cmp, <4 x float> %b, <4 x float> %a store <4 x float> %sel1.v, <4 x float>* %ptr1, align 16 store <4 x float> %sel2.v, <4 x float>* %ptr2, align 16 ret void } The asm for x86 SSE then improves from: movaps %xmm0, %xmm2 cmpltps %xmm1, %xmm2 movaps %xmm2, %xmm3 andnps %xmm1, %xmm3 movaps %xmm2, %xmm4 andnps %xmm0, %xmm4 andps %xmm2, %xmm0 orps %xmm3, %xmm0 andps %xmm1, %xmm2 orps %xmm4, %xmm2 movaps %xmm0, (%rdi) movaps %xmm2, (%rsi) To: movaps %xmm0, %xmm2 minps %xmm1, %xmm2 maxps %xmm0, %xmm1 movaps %xmm2, (%rdi) movaps %xmm1, (%rsi) The TODO comments show that we're limiting this transform only to vectors and only to bitcasts because we need to improve other transforms or risk creating worse codegen. Differential Revision: http://reviews.llvm.org/D21190 llvm-svn: 273011	2016-06-17 16:46:50 +00:00
Adam Nemet	e7709d92ba	[LLE] Don't hard-code the name of the preheader in test Turns out a didn't get this right because symbolic stride versioning changes the name. Relax the matching. llvm-svn: 272992	2016-06-17 09:13:15 +00:00
Matt Arsenault	ce56f7bbaa	Revert "InstCombine: Reduce trunc (shl x, K) width." This reverts commit r272987. This might be causing crashes on some bots. llvm-svn: 272990	2016-06-17 06:28:53 +00:00
Matt Arsenault	028fd50642	InstCombine: Reduce trunc (shl x, K) width. llvm-svn: 272987	2016-06-17 04:43:22 +00:00
Evgeniy Stepanov	45fa0fd758	[safestack] Sink unsafe address computation to each use. This is a fix for PR27844. When replacing uses of unsafe allocas, emit the new location immediately after each use. Without this, the pointer stays live from the function entry to the last use, while it's usually cheaper to recalculate. llvm-svn: 272969	2016-06-16 22:34:04 +00:00
Evgeniy Stepanov	72d961a1da	[safestack] Fixup llvm.dbg.value when rewriting unsafe allocas. When moving unsafe allocas to the unsafe stack, dbg.declare intrinsics are updated to refer to the new location. This change does the same to dbg.value intrinsics. llvm-svn: 272968	2016-06-16 22:34:00 +00:00
Sanjoy Das	07c6521aed	[EarlyCSE] Fold invariant loads Redundant invariant loads can be CSE'ed with very little extra effort over what early-cse already tracks, so it looks reasonable to make early-cse handle this case. llvm-svn: 272954	2016-06-16 20:47:57 +00:00
Justin Lebar	b0bd07aff7	Fix strip-dead-debug-info test if path contains "bar". This test checks that the string 'bar' (no quotes) doesn't exist in the output after running opt. But opt embeds the absolute path to the filename, and on my machine, the filename contains the string 'jlebar', causing the test to fail. This patch changes the test to look for the string '"bar"' instead. llvm-svn: 272941	2016-06-16 19:39:55 +00:00
Davide Italiano	41315f7873	[PM] Revert the port of MergeLoadStoreMotion to the new pass manager. Daniel Berlin expressed some real concerns about the port and proposed and alternative approach. I'll revert this for now while working on a new patch, which I hope to put up for review shortly. Sorry for the churn. llvm-svn: 272925	2016-06-16 17:40:53 +00:00
Adam Nemet	776346848a	[LLE] New test to check that no versioning for symbolic strides occurs. NFC This is currently only performed in the Vectorizer. I will change this as symbolic stride collection is moved to LAA. This test will track when the actual functional change occurs. llvm-svn: 272918	2016-06-16 16:45:29 +00:00
Igor Laevsky	87f0d0e185	Revert r272891 "[JumpThreading] Prevent dangling pointer problems in BranchProbabilityInfo" It was causing failures in Profile-i386 and Profile-x86_64 tests. llvm-svn: 272912	2016-06-16 16:25:53 +00:00
Igor Laevsky	c9179fd2c2	[JumpThreading] Prevent dangling pointer problems in BranchProbabilityInfo We should update results of the BranchProbabilityInfo after removing block in JumpThreading. Otherwise we will get dangling pointer inside BranchProbabilityInfo cache. Differential Revision: http://reviews.llvm.org/D20957 llvm-svn: 272891	2016-06-16 13:28:25 +00:00
Patrik Hagglund	0acaefaf9d	PR27938: Don't remove valid DebugLoc in Scalarizer Added checks to make sure the Scalarizer::transferMetadata() don't remove valid debug locations from instructions. This is important as the verifier pass require that e.g. inlinable callsites have a valid debug location. https://llvm.org/bugs/show_bug.cgi?id=27938 Patch by Karl-Johan Karlsson Reviewers: dblaikie Differential Revision: http://reviews.llvm.org/D20807 llvm-svn: 272884	2016-06-16 10:48:54 +00:00
Chuang-Yu Cheng	dbe00d51b4	SimplifyCFG is able to detect the pattern: (i == 5334 \|\| i == 5335) to: ((i & -2) == 5334) This transformation has some incorrect side conditions. Specifically, the transformation is only applied when the right-hand side constant (5334 in the example) is a power of two not equal and not equal to the negated mask. These side conditions were added in r258904 to fix PR26323. The correct side condition is that: ((Constant & Mask) == Constant)[(5334 & -2) == 5334]. It's a little bit hard to see why these transformations are correct and what the side conditions ought to be. Here is a CVC3 program to verify them for 64-bit values: ONE : BITVECTOR(64) = BVZEROEXTEND(0bin1, 63); x : BITVECTOR(64); y : BITVECTOR(64); z : BITVECTOR(64); mask : BITVECTOR(64) = BVSHL(ONE, z); QUERY( (y & ~mask = y) => ((x & ~mask = y) <=> (x = y OR x = (y \| mask))) ); Please note that each pattern must be a dual implication (<--> or iff). One directional implication can create spurious matches. If the implication is only one-way, an unsatisfiable condition on the left side can imply a satisfiable condition on the right side. Dual implication ensures that satisfiable conditions are transformed to other satisfiable conditions and unsatisfiable conditions are transformed to other unsatisfiable conditions. Here is a concrete example of a unsatisfiable condition on the left implying a satisfiable condition on the right: mask = (1 << z) (x & ~mask) == y --> (x == y \|\| x == (y \| mask)) Substituting y = 3, z = 0 yields: (x & -2) == 3 --> (x == 3 \|\| x == 2) The version of this code before r258904 had no side-conditions and incorrectly justified itself in comments through one-directional implication. Thanks to Chandler for the suggestion! Author: Thomas Jablin (tjablin) Reviewers: chandlerc majnemer hfinkel cycheng http://reviews.llvm.org/D21417 llvm-svn: 272873	2016-06-16 04:44:25 +00:00
Eli Friedman	bd254a6f45	[InstCombine] Don't widen metadata on store-to-load forwarding The original check for load CSE or store-to-load forwarding is wrong when the forwarded stored value happened to be a load. Ref https://github.com/JuliaLang/julia/issues/16894 Differential Revision: http://reviews.llvm.org/D21271 Patch by Yichao Yu! llvm-svn: 272868	2016-06-16 02:33:42 +00:00
Xinliang David Li	1eaecefaf9	[PM] Port Add discriminator pass to new PM llvm-svn: 272847	2016-06-15 21:51:30 +00:00
David Majnemer	b62692e2e0	[TargetLibraryInfo] Teach isValidProtoForLibFunc about tan We would fail to validate the type of the tan function which would cause downstream users of isValidProtoForLibFunc to assert. This fixes PR28143. llvm-svn: 272802	2016-06-15 16:47:23 +00:00
Sean Silva	e0a9e66040	[PM] Port SLPVectorizer to the new PM This uses the "runImpl" approach to share code with the old PM. Porting to the new PM meant abandoning the anonymous namespace enclosing most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal compared to having to pull the pass class into a header which the new PM requires since it calls the constructor directly). llvm-svn: 272766	2016-06-15 08:43:40 +00:00
Sean Silva	a4c2d150d0	[PM] Port AlignmentFromAssumptions to the new PM. This uses the "runImpl" pattern to share code between the old and new PM. llvm-svn: 272757	2016-06-15 06:18:01 +00:00
Michael Kuperstein	3277a05fcf	Recommit [LV] Enable vectorization of loops where the IV has an external use r272715 broke libcxx because it did not correctly handle cases where the last iteration of one IV is the second-to-last iteration of another. Original commit message: Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. llvm-svn: 272742	2016-06-15 00:35:26 +00:00
David Majnemer	4a697c312f	[LoopUnroll] Don't crash trying to unroll loop with EH pad exit We do not support splitting cleanuppad or catchswitches. This is problematic for passes which assume that a loop is in loop simplify form (the loop would have a dedicated exit block instead of sharing it). While it isn't great that we don't support this for cleanups, we still cannot make loop-simplify form an assertable precondition because indirectbr will also disable these sorts of CFG cleanups. This fixes PR28132. llvm-svn: 272739	2016-06-15 00:19:56 +00:00
David Majnemer	cbf614a93b	Remove the ScalarReplAggregates pass Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737	2016-06-15 00:19:09 +00:00
Matt Arsenault	f42c69206d	AMDGPU: Run pointer optimization passes llvm-svn: 272736	2016-06-15 00:11:01 +00:00
Michael Kuperstein	d4bd3ab5fe	Reverting r272715 since it broke libcxx. llvm-svn: 272730	2016-06-14 22:30:41 +00:00
Davide Italiano	d737dd2ec6	[PM] Port WholeProgramDevirt to the new pass manager. llvm-svn: 272721	2016-06-14 21:44:19 +00:00
Michael Kuperstein	23b6d6adc9	[LV] Enable vectorization of loops where the IV has an external use Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. Differential Revision: http://reviews.llvm.org/D21048 llvm-svn: 272715	2016-06-14 21:27:27 +00:00
Evgeniy Stepanov	0be3cf1d35	Add a missing test. This is a test for r272421: Disable MSan-hostile loop unswitching. llvm-svn: 272713	2016-06-14 21:24:13 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Sanjoy Das	d7e8206b58	[ValueTracking] Calls to @llvm.assume always return This change teaches llvm::isGuaranteedToTransferExecutionToSuccessor that calls to @llvm.assume always terminate. Most other relevant intrinsics should be covered by the "CS.onlyReadsMemory() \|\| CS.onlyAccessesArgMemory()" bit but we were missing @llvm.assumes because we state that it clobbers memory. Added an LICM test case, but this change is not specific to LICM. llvm-svn: 272703	2016-06-14 20:23:16 +00:00
Adam Nemet	73a26957fc	[LoopVer] Update all existing PHIs in the exit block We only used to add the edge from the cloned loop to PHIs that corresponded to values defined by the loop. We need to do this for all PHIs obviously since we need a PHI operand for each incoming edge. This includes things like PHIs with a constant value or with values defined before the original loop (see the testcases). After the patch the PHIs are added to the exit block in two passes. In the first pass we ensure there is a single-operand (LCSSA) PHI for each value defined by the loop. In the second pass we loop through each (single-operand) PHI and add the value for the edge from the cloned loop. If the value is defined in the loop we'll use the cloned instruction from the cloned loop. Fixes PR28037 llvm-svn: 272649	2016-06-14 09:38:54 +00:00
Davide Italiano	cccf4f01ad	[PM] Port Mem2Reg to the new pass manager. llvm-svn: 272630	2016-06-14 03:22:22 +00:00
Sean Silva	6347df0f81	[PM] Port MemCpyOpt to the new PM. The need for all these Lookup* functions is just because of calls to getAnalysis inside methods (i.e. not at the top level) of the runOnFunction method. They should be straightforward to clean up when the old PM is gone. llvm-svn: 272615	2016-06-14 02:44:55 +00:00
Davide Italiano	5669ef1efe	Placate bots fixing a typo in AA-pipeline description. Sorry. llvm-svn: 272608	2016-06-14 01:11:12 +00:00
Sean Silva	46590d556a	Bring back "[PM] Port JumpThreading to the new PM" with a fix This reverts commit r272603 and adds a fix. Big thanks to Davide for pointing me at r216244 which gives some insight into how to fix this VS2013 issue. VS2013 can't synthesize a move constructor. So the fix here is to add one explicitly to the JumpThreadingPass class. llvm-svn: 272607	2016-06-14 00:51:09 +00:00
Davide Italiano	89ab89d6cd	[PM] Port MergedLoadStoreMotion to the new pass manager. llvm-svn: 272606	2016-06-14 00:49:23 +00:00
Sean Silva	7d5a57cbfc	Revert "[PM] Port JumpThreading to the new PM" This reverts commit r272597. Will investigate issue with VS2013 compilation and then recommit. llvm-svn: 272603	2016-06-14 00:26:31 +00:00
Sean Silva	f81328d0b4	[PM] Port JumpThreading to the new PM This follows the approach in r263208 (for GVN) pretty closely: - move the bulk of the body of the function to the new PM class. - expose a runImpl method on the new-PM class that takes the IRUnitT and pointers/references to any analyses and use that to implement the old-PM class. - use a private namespace in the header for stuff that used to be file scope llvm-svn: 272597	2016-06-13 22:52:52 +00:00
Sanjoy Das	98ac278b86	Move previously added test case to the right location In rL272580 I accidentally added a test case to test/CodeGen when test/Transforms/DeadStoreElimination/ is a better place for it. llvm-svn: 272581	2016-06-13 20:12:07 +00:00
Sean Silva	e3bb457423	[PM] Port DeadArgumentElimination to the new PM The approach taken here follows r267631. deadarghaX0r should be easy to port when the time comes to add new-PM support to bugpoint. llvm-svn: 272507	2016-06-12 09:16:39 +00:00
Sean Silva	f5080194fd	[PM] Port ReversePostOrderFunctionAttrs to the new PM Below are my super rough notes when porting. They can probably serve as a basic guide for porting other passes to the new PM. As I port more passes I'll expand and generalize this and make a proper docs/HowToPortToNewPassManager.rst document. There is also missing documentation for general concepts and API's in the new PM which will require some documentation. Once there is proper documentation in place we can put up a list of passes that have to be ported and game-ify/crowdsource the rest of the porting (at least of the middle end; the backend is still unclear). I will however be taking personal responsibility for ensuring that the LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes to be ported are (do something like `git grep "<the string in the bullet point below>"` to find the pass): General Scalar: [ ] Simplify the CFG [ ] Jump Threading [ ] MemCpy Optimization [ ] Promote Memory to Register [ ] MergedLoadStoreMotion [ ] Lazy Value Information Analysis General IPO: [ ] Dead Argument Elimination [ ] Deduce function attributes in RPO Loop stuff / vectorization stuff: [ ] Alignment from assumptions [ ] Canonicalize natural loops [ ] Delete dead loops [ ] Loop Access Analysis [ ] Loop Invariant Code Motion [ ] Loop Vectorization [ ] SLP Vectorizer [ ] Unroll loops Devirtualization / CFI: [ ] Cross-DSO CFI [ ] Whole program devirtualization [ ] Lower bitset metadata CGSCC passes: [ ] Function Integration/Inlining [ ] Remove unused exception handling info [ ] Promote 'by reference' arguments to scalars Please let me know if you are interested in working on any of the passes in the above list (e.g. reply to the post-commit thread for this patch). I'll probably be tackling "General Scalar" and "General IPO" first FWIW. Steps as I port "Deduce function attributes in RPO" --------------------------------------------------- (note: if you are doing any work based on these notes, please leave a note in the post-commit review thread for this commit with any improvements / suggestions / incompleteness you ran into!) Note: "Deduce function attributes in RPO" is a module pass. 1. Do preparatory refactoring. Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503). (TODO: give more advice here e.g. if pass holds state or something) 2. Rename the old pass class. llvm/lib/Transforms/IPO/FunctionAttrs.cpp Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM. (edit: actually wait what? The new class name will be ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is sort of useless churn). llvm/include/llvm/InitializePasses.h llvm/lib/LTO/LTOCodeGenerator.cpp llvm/lib/Transforms/IPO/IPO.cpp llvm/lib/Transforms/IPO/FunctionAttrs.cpp Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass (note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`) Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too. Note that createReversePostOrderFunctionAttrsPass does not need to be renamed since its name is not generated from the class name. 3. Add the new PM pass class. In the new PM all passes need to have their declaration in a header somewhere, so you will often need to add a header. In this case llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because PostOrderFunctionAttrsPass was already ported. The file-level comment from the .cpp file can be used as the file-level comment for the new header. You may want to tweak the wording slightly from "this file implements" to "this file provides" or similar. Add declaration for the new PM pass in this header: class ReversePostOrderFunctionAttrsPass : public PassInfoMixin<ReversePostOrderFunctionAttrsPass> { public: PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM); }; Its name should end with `Pass` for consistency (note that this doesn't collide with the names of most old PM passes). E.g. call it `<name of the old PM pass>Pass`. Also, move the doxygen comment from the old PM pass to the declaration of this class in the header. Also, include the declaration for the new PM class `llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case, it was already done when the other pass in this file was ported). Now define the `run` method for the new class. The main things here are: a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()` b) If the old PM pass would have returned "false" (i.e. `Changed == false`), then you should return PreservedAnalyses::all(); c) In the old PM getAnalysisUsage method, observe the calls `AU.addPreserved<...>();`. In the case `Changed == true`, for each preserved analysis you should do call `PA.preserve<...>()` on a PreservedAnalyses object and return it. E.g.: PreservedAnalyses PA; PA.preserve<CallGraphAnalysis>(); return PA; Note that calls to skipModule/skipFunction are not supported in the new PM currently, so optnone and optimization bisect support do not work. You can just drop those calls for now. 4. Add the pass to the new PM pass registry to make it available in opt. In llvm/lib/Passes/PassBuilder.cpp add a #include for your header. `#include "llvm/Transforms/IPO/FunctionAttrs.h"` In this case there is already an include (from when PostOrderFunctionAttrsPass was ported). Add your pass to llvm/lib/Passes/PassRegistry.def In this case, I added `MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())` The string is from the `INITIALIZE_PASS*` macros used in the old pass manager. Then choose a test that uses the pass and use the new PM `-passes=...` to run it. E.g. in this case there is a test that does: ; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S \| FileCheck %s I have added the line: ; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S \| FileCheck %s The `-aa-pipeline=basic-aa` and `require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run functionattrs in the new PM (note that in the new PM "functionattrs" becomes "function-attrs" for some reason). This is just pulled from `readattrs.ll` which contains the change from when functionattrs was ported to the new PM. Adding rpo-functionattrs causes the pass that was just ported to run. llvm-svn: 272505	2016-06-12 07:48:51 +00:00
Eli Friedman	9f8031c2da	[MergedLoadStoreMotion] Use correct helper for load hoist safety. It isn't legal to hoist a load past a call which might not return; even if it doesn't throw, it could, for example, call exit(). Fixes http://llvm.org/PR27953. llvm-svn: 272495	2016-06-12 02:11:20 +00:00
Eli Friedman	f1da33e4d3	[LICM] Make isGuaranteedToExecute more accurate. Summary: Make isGuaranteedToExecute use the isGuaranteedToTransferExecutionToSuccessor helper, and make that helper a bit more accurate. There's a potential performance impact here from assuming that arbitrary calls might not return. This probably has little impact on loads and stores to a pointer because most things alias analysis can reason about are dereferenceable anyway. The other impacts, like less aggressive hoisting of sdiv by a variable and less aggressive hoisting around volatile memory operations, are unlikely to matter for real code. This also impacts SCEV, which uses the same helper. It's a minor improvement there because we can tell that, for example, memcpy always returns normally. Strictly speaking, it's also introducing a bug, but it's not any worse than everywhere else we assume readonly functions terminate. Fixes http://llvm.org/PR27857. Reviewers: hfinkel, reames, chandlerc, sanjoy Subscribers: broune, llvm-commits Differential Revision: http://reviews.llvm.org/D21167 llvm-svn: 272489	2016-06-11 21:48:25 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Matthew Simpson	12b9c5ba98	Reapply "[TTI] Refine default cost for interleaved load groups with gaps" This reapplies commit r272385 with a fix. The build was failing when compiled with gcc, but not with clang. With the fix, we now get the data layout from the current TTI implementation, which will hopefully solve the issue. llvm-svn: 272395	2016-06-10 14:33:30 +00:00
Matthew Simpson	65c7b74de4	Revert "[TTI] Refine default cost for interleaved load groups with gaps" This reverts commit r272385. This commit broke the build. I'm temporarily reverting to investigate. llvm-svn: 272391	2016-06-10 12:41:33 +00:00
Matthew Simpson	b16907f17a	[TTI] Refine default cost for interleaved load groups with gaps This patch refines the default cost for interleaved load groups having gaps. If a load group has gaps, the legalized instructions corresponding to the unused elements will be dead. Thus, we don't need to account for them in the cost model. Instead, we only need to account for the fraction of legalized loads that will actually be used. Differential Revision: http://reviews.llvm.org/D20873 llvm-svn: 272385	2016-06-10 11:27:51 +00:00
Easwaran Raman	71069cf67d	Use ProfileSummaryInfo in inline cost analysis. Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo. Differential revision: http://reviews.llvm.org/D21045 llvm-svn: 272321	2016-06-09 22:23:21 +00:00
Easwaran Raman	e12c487b8c	[PM] Port LCSSA to the new PM. Differential Revision: http://reviews.llvm.org/D21090 llvm-svn: 272294	2016-06-09 19:44:46 +00:00
Michael Kuperstein	c5edcdeb0e	[LV] Use vector phis for some secondary induction variables Previously, we materialized secondary vector IVs from the primary scalar IV, by offseting the primary to match the correct start value, and then broadcasting it - inside the loop body. Instead, we can use a real vector IV, like we do for the primary. This enables using vector IVs for secondary integer IVs whose type matches the type of the primary. Differential Revision: http://reviews.llvm.org/D20932 llvm-svn: 272283	2016-06-09 18:03:15 +00:00
Sanjoy Das	c7f69b921f	Be wary of abnormal exits from loop when exploiting UB We can safely rely on a NoWrap add recurrence causing UB down the road only if we know the loop does not have a exit expressed in a way that is opaque to ScalarEvolution (e.g. by a function call that conditionally calls exit(0)). I believe with this change PR28012 is fixed. Note: I had to change some llvm-lit tests in LoopReroll, since it looks like they were depending on this incorrect behavior. llvm-svn: 272237	2016-06-09 01:13:59 +00:00
Michael Zolotukhin	8e7e76729d	[LoopSimplify] Preserve LCSSA when merging exit blocks. Summary: This fixes PR26682. Also add LCSSA as a preserved pass to LoopSimplify, that looks correct to me and allows to write a test for the issue. Reviewers: chandlerc, bogner, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21112 llvm-svn: 272224	2016-06-08 23:13:21 +00:00
Michael Zolotukhin	987ab631fa	[SLPVectorizer] Handle GEP with differing constant index types Summary: This fixes PR27617. Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64. The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert. Reviewers: nadav, mzolotukhin Subscribers: JesperAntonsson, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20685 Patch by Jesper Antonsson! llvm-svn: 272206	2016-06-08 21:55:16 +00:00
Evgeny Stupachenko	3e2f389a7e	The patch set unroll disable pragma when unroll with user specified count has been applied. Summary: Previously SetLoopAlreadyUnrolled() set the disable pragma only if there was some loop metadata. Now it set the pragma in all cases. This helps to prevent multiple unroll when -unroll-count=N is given. Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D20765 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 272195	2016-06-08 20:21:24 +00:00
Tim Shen	7aa0ad65ce	[MemCpyOpt] Do not exchange llvm.lifetime.start and llvm.memcpy Reviewers: iteratee Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21087 llvm-svn: 272192	2016-06-08 19:42:32 +00:00
Easwaran Raman	f894b5e89c	Use FileCheck instead of grepping for patterns. NFC. llvm-svn: 272065	2016-06-07 21:46:14 +00:00
Andrey Turetskiy	94c2179550	Quick fix for the test from rL272014 "[LAA] Improve non-wrapping pointer detection by handling loop-invariant case" (s couple of buildbots failed). Patch by Roman Shirokiy. llvm-svn: 272019	2016-06-07 15:52:35 +00:00
Andrey Turetskiy	9f02c58670	[LAA] Improve non-wrapping pointer detection by handling loop-invariant case. This fixes PR26314. This patch adds new helper “isNoWrap” with detection of loop-invariant pointer case. Patch by Roman Shirokiy. Ref: https://llvm.org/bugs/show_bug.cgi?id=26314 Differential Revision: http://reviews.llvm.org/D17268 llvm-svn: 272014	2016-06-07 14:55:27 +00:00
Simon Pilgrim	db9893fb90	[InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit). If the shift amount is constant we can sometimes convert these instructions to native shifts: 1 - if all shift amounts are in range then the conversion is trivial. 2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion. 3 - logical shifts just return zero if all elements have out of range shift amounts. In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts. Differential Revision: http://reviews.llvm.org/D19675 llvm-svn: 271996	2016-06-07 10:27:15 +00:00
Simon Pilgrim	91e3ac8293	[InstCombine][SSE] Add MOVMSK constant folding (PR27982) This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions. The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs. Differential Revision: http://reviews.llvm.org/D20998 llvm-svn: 271990	2016-06-07 08:18:35 +00:00
Michael Kuperstein	a0c6ae02a5	[InstCombine] scalarizePHI should not assume the code it sees has been CSE'd scalarizePHI only looked for phis that have exactly two uses - the "latch" use, and an extract. Unfortunately, we can not assume all equivalent extracts are CSE'd, since InstCombine itself may create an extract which is a duplicate of an existing one. This extends it to handle several distinct extracts from the same index. This should fix at least some of the performance regressions from PR27988. Differential Revision: http://reviews.llvm.org/D20983 llvm-svn: 271961	2016-06-06 23:38:33 +00:00
Michael Zolotukhin	19edbadfc5	[LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost. In some cases, when simplifying with SCEV, we might consider pointer values as just usual integer values. Thus, we might get a different type from what we had originally in the map of simplified values, and hence we need to check types before operating on the values. This fixes PR28015. llvm-svn: 271931	2016-06-06 19:21:40 +00:00
Geoff Berry	43e5160d0e	Reapply [LSR] Create fewer redundant instructions. Summary: Fix LSRInstance::HoistInsertPosition() to check the original insert position block first for a canonical insertion point that is dominated by all inputs. This leads to SCEV being able to reuse more instructions since it currently tracks the instructions it creates for reuse by keeping a table of <Value, insert point> pairs. Originally reviewed in http://reviews.llvm.org/D18001 Reviewers: atrick Subscribers: llvm-commits, mzolotukhin, mcrosier Differential Revision: http://reviews.llvm.org/D18480 llvm-svn: 271929	2016-06-06 19:10:46 +00:00
Sanjay Patel	6a333c3ed9	[InstCombine] limit icmp transform to ConstantInt (PR28011) In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check above this to work for any Constant rather than ConstantInt. AFAICT, that part makes sense if we can determine that the shrunken/extended constant remained equal. But it doesn't make sense for this later transform where we assume that the constant DID change. This could assert for a ConstantExpr: https://llvm.org/bugs/show_bug.cgi?id=28011 And it could be wrong for a vector as shown in the added regression test. llvm-svn: 271908	2016-06-06 16:56:57 +00:00
Sanjay Patel	027c469158	regenerate checks llvm-svn: 271904	2016-06-06 16:03:06 +00:00
Sanjay Patel	70aa568c4e	regenerate checks llvm-svn: 271903	2016-06-06 15:55:00 +00:00
Eli Friedman	ee89505799	LICM: Don't sink stores out of loops that may throw. Summary: This hasn't been caught before because it requires noalias or similarly strong alias analysis to actually reproduce. Fixes http://llvm.org/PR27952 . Reviewers: hfinkel, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20944 llvm-svn: 271858	2016-06-05 22:13:52 +00:00
Sanjoy Das	b7e861a488	Add safety check to InstCombiner::commonIRemTransforms Since FoldOpIntoPhi speculates the binary operation to potentially each of the predecessors of the PHI node (pulling it out of arbitrary control dependence in the process), we can FoldOpIntoPhi only if we know the operation doesn't have UB. This also brings up an interesting profitability question -- the way it is written today, commonIRemTransforms will hoist out work from dynamically dead code into code that will execute at runtime. Perhaps that isn't the best canonicalization? Fixes PR27968. llvm-svn: 271857	2016-06-05 21:17:04 +00:00
Sanjoy Das	0dcd1d859c	Add test case for InstCombiner::commonIRemTransforms; NFC The PHI case in commonIRemTransforms was untested; add a trivial test case. llvm-svn: 271856	2016-06-05 21:17:00 +00:00
Davide Italiano	a1cbc3f8cc	[Internalize] Test that __stack_chk_{guard, fail} are not internalized. r154645 introduced this feature without test. This should have better coverage now. llvm-svn: 271853	2016-06-05 19:08:54 +00:00
Sanjoy Das	4d4339d1e8	[PM] Port IndVarSimplify to the new pass manager Summary: There are some rough corners, since the new pass manager doesn't have (as far as I can tell) LoopSimplify and LCSSA, so I've updated the tests to run them separately in the old pass manager in the lit tests. We also don't have an equivalent for AU.setPreservesCFG() in the new pass manager, so I've left a FIXME. Reviewers: bogner, chandlerc, davide Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20783 llvm-svn: 271846	2016-06-05 18:01:19 +00:00
Sanjoy Das	f90e28d6fd	[IndVars] Remove -liv-reduce It is an off-by-default option that no one seems to use[0], and given that SCEV directly understands the overflow instrinsics there is no real need for it anymore. [0]: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098181.html llvm-svn: 271845	2016-06-05 18:01:12 +00:00
Sanjay Patel	0fab306eb5	fix checks update_test_checks.py got confused matching the variable names. llvm-svn: 271844	2016-06-05 17:54:56 +00:00
Sanjay Patel	a6fbc82392	[InstCombine] allow vector icmp bool transforms llvm-svn: 271843	2016-06-05 17:49:45 +00:00
Sanjay Patel	54d7010627	add tests to show missing vector transforms llvm-svn: 271842	2016-06-05 17:32:58 +00:00
Sanjay Patel	51dc83c052	regenerate checks llvm-svn: 271841	2016-06-05 17:29:45 +00:00
Sanjay Patel	009c3da65f	update test to use FileCheck llvm-svn: 271840	2016-06-05 17:13:09 +00:00
Sanjay Patel	f48b909f28	update test to use FileCheck llvm-svn: 271838	2016-06-05 16:41:20 +00:00
Sanjay Patel	8a3b6d0d8b	update test to FileCheck llvm-svn: 271837	2016-06-05 16:29:15 +00:00
Xinliang David Li	64dbb295b6	[PM] Port GCOVProfiler pass to the new pass manager llvm-svn: 271823	2016-06-05 05:12:23 +00:00
David Majnemer	2482e1c017	[SimplifyCFG] Don't kill empty cleanuppads with multiple uses A basic block could contain: %cp = cleanuppad [] cleanupret from %cp unwind to caller This basic block is empty and is thus a candidate for removal. However, there can be other uses of %cp outside of this basic block. This is only possible in unreachable blocks. Make our transform more correct by checking that the pad has a single user before removing the BB. This fixes PR28005. llvm-svn: 271816	2016-06-04 23:50:03 +00:00
Sanjay Patel	ea8a211169	[InstCombine] allow vector constants for cast+icmp fold This is step 1 of unknown towards fixing PR28001: https://llvm.org/bugs/show_bug.cgi?id=28001 llvm-svn: 271810	2016-06-04 22:04:05 +00:00
Sanjay Patel	8e63999bee	[InstCombine] add test for missing vector optimization llvm-svn: 271808	2016-06-04 21:41:25 +00:00
Sanjay Patel	4c42211c6f	[InstCombine] add test for missing vector optimization llvm-svn: 271806	2016-06-04 21:20:03 +00:00
Sanjay Patel	58a92a327d	[InstCombine] minimize test case and use FileCheck llvm-svn: 271805	2016-06-04 21:04:59 +00:00
Simon Pilgrim	ba319ded5e	[Analysis] Enabled BITREVERSE as a vectorizable intrinsic Allows XOP to vectorize BITREVERSE - other targets will follow as their costmodels improve. llvm-svn: 271803	2016-06-04 20:21:07 +00:00
Simon Pilgrim	fda22d66fc	[InstCombine][MMX] Extend SimplifyDemandedUseBits MOVMSK support to MMX Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614 Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type. llvm-svn: 271789	2016-06-04 13:42:46 +00:00
Sanjay Patel	6cf18af1c5	[InstCombine] look through bitcasts to find selects There was concern that creating bitcasts for the simpler potential select pattern: define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) { %a2 = add <2 x i64> %a, %a %sext = sext <4 x i1> %cmp to <4 x i32> %bc = bitcast <4 x i32> %sext to <2 x i64> %and = and <2 x i64> %a2, %bc ret <2 x i64> %and } might lead to worse code for some targets, so this patch is matching the larger patterns seen in the test cases. The motivating example for this patch is this IR produced via SSE intrinsics in C: define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) { %t0 = bitcast <2 x i64> %a to <4 x i32> %t1 = bitcast <2 x i64> %b to <4 x i32> %cmp = icmp sgt <4 x i32> %t0, %t1 %sext = sext <4 x i1> %cmp to <4 x i32> %t2 = bitcast <4 x i32> %sext to <2 x i64> %and = and <2 x i64> %t2, %a %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1> %neg2 = bitcast <4 x i32> %neg to <2 x i64> %and2 = and <2 x i64> %neg2, %b %or = or <2 x i64> %and, %and2 ret <2 x i64> %or } For an AVX target, this is currently: vpcmpgtd %xmm1, %xmm0, %xmm2 vpand %xmm0, %xmm2, %xmm0 vpandn %xmm1, %xmm2, %xmm1 vpor %xmm1, %xmm0, %xmm0 retq With this patch, it becomes: vpmaxsd %xmm1, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D20774 llvm-svn: 271676	2016-06-03 14:42:07 +00:00
Sanjay Patel	172bf6edd1	[InstCombine] change tests to show a more obvious transform possibility The original tests were intended to show a missing transform that would be solved by D20774: http://reviews.llvm.org/D20774 But it's not clear that the transform for the simpler tests is a win for all targets. Make the tests show a larger pattern that should be a win regardless of the cost of bitcast instructions. llvm-svn: 271603	2016-06-02 22:45:49 +00:00
Sanjay Patel	dba8b4c04d	transform obscured FP sign bit ops into a fabs/fneg using TLI hook This is effectively a revert of: http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) and: http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions. This is intended to resolve the objections raised on the dev list: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html and: https://llvm.org/bugs/show_bug.cgi?id=24886#c4 In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later. Differential Revision: http://reviews.llvm.org/D19391 llvm-svn: 271573	2016-06-02 20:01:37 +00:00
Xinliang David Li	7008ce3f98	[profile] value profiling bug fix -- missing icall targets in profile-use Inline virtual functions has linkeonceodr linkage (emitted in comdat on supporting targets). If the vtable for the class is not emitted in the defining module, function won't be address taken thus its address is not recorded. At the mercy of the linker, if the per-func prf_data from this module (in comdat) is picked at link time, we will lose mapping from function address to its hash val. This leads to missing icall promotion. The second test case (currently disabled) in compiler_rt (r271528): instrprof-icall-prom.test demostrates the bug. The first profile-use subtest is fine due to linker order difference. With this change, no missing icall targets is found in instrumented clang's raw profile. llvm-svn: 271532	2016-06-02 16:33:41 +00:00
Xinliang David Li	0b29330612	make icall pass name consistent /NFC llvm-svn: 271467	2016-06-02 01:52:05 +00:00
Geoff Berry	b96d3b2dd8	[MemorySSA] Port to new pass manager Add support for the new pass manager to MemorySSA pass. Change MemorySSA to be computed eagerly upon construction. Change MemorySSAWalker to be owned by the MemorySSA object that creates it. Reviewers: dberlin, george.burgess.iv Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19664 llvm-svn: 271432	2016-06-01 21:30:40 +00:00
Daniel Berlin	73694bb92b	Revert "Claim NoAlias if two GEPs index different fields of the same struct" This reverts commit 2d5d6493f43eb68493a3852b8c226ac9fafdc7eb. llvm-svn: 271422	2016-06-01 18:55:32 +00:00
Daniel Berlin	e846c9dc52	Claim NoAlias if two GEPs index different fields of the same struct Patch by Taewook Oh Summary: Patch for Bug 27478. Make BasicAliasAnalysis claims NoAlias if two GEPs index different fields of the same structure. Reviewers: hfinkel, dberlin Subscribers: dberlin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20665 llvm-svn: 271415	2016-06-01 18:12:01 +00:00
Michael Kuperstein	3a3c64d23e	[LV] For some IVs, use vector phis instead of widening in the loop body Previously, whenever we needed a vector IV, we would create it on the fly, by splatting the scalar IV and adding a step vector. Instead, we can create a real vector IV. This tends to save a couple of instructions per iteration. This only changes the behavior for the most basic case - integer primary IVs with a constant step. Differential Revision: http://reviews.llvm.org/D20315 llvm-svn: 271410	2016-06-01 17:16:46 +00:00
Guozhi Wei	b994f4cdbc	[SLP] Pass in correct alignment when query memory access cost This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897. When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case. It can be fixed by simply giving correct alignment. llvm-svn: 271333	2016-05-31 20:41:19 +00:00
Erik Eckstein	0c48dd8ca5	Fix a crash in MergeFunctions related to ordering of weak/strong functions The assumption, made in insert() that weak functions are always inserted after strong functions, is only true in the first round of adding functions. In subsequent rounds this is no longer guaranteed , because we might remove a strong function from the tree (because it's modified) and add it later, where an equivalent weak function already exists in the tree. This change removes the assert in insert() and explicitly enforces a weak->strong order. This also removes the need of two separate loops in runOnModule(). llvm-svn: 271299	2016-05-31 17:20:23 +00:00
Sanjoy Das	ae09b3cd4c	[IndVars] Eliminate op.with.overflow when possible (re-apply) Summary: If we can prove that an op.with.overflow intrinsic does not overflow, we can get rid of the intrinsic, and replace it with non-wrapping arithmetic. This was first checked in at r265913 but reverted in r265950 because it exposed some issues around how SCEV handled post-inc add recurrences. Those issues have now been fixed. Reviewers: atrick, regehr Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18685 llvm-svn: 271153	2016-05-29 00:36:25 +00:00
Sanjoy Das	7e4a64167d	[SCEV] Don't always add no-wrap flags to post-inc add recs Fixes PR27315. The post-inc version of an add recurrence needs to "follow the same rules" as a normal add or subtract expression. Otherwise we miscompile programs like ``` int main() { int a = 0; unsigned a_u = 0; volatile long last_value; do { a_u += 3; last_value = (long) ((int) a_u); if (will_add_overflow(a, 3)) { // Leave, and don't actually do the increment, so no UB. printf("last_value = %ld\n", last_value); exit(0); } a += 3; } while (a != 46); return 0; } ``` This patch changes SCEV to put no-wrap flags on post-inc add recurrences only when the poison from a potential overflow will go ahead to cause undefined behavior. To avoid regressing performance too much, I've assumed infinite loops without side effects is undefined behavior to prove poison<->UB equivalence in more cases. This isn't ideal, but is not new to LLVM as a whole, and far better than the situation I'm trying to fix. llvm-svn: 271151	2016-05-29 00:32:17 +00:00
Simon Pilgrim	9602d678cb	[X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 271131	2016-05-28 18:03:41 +00:00
Sanjay Patel	395eca8d26	[InstCombine] add tests to show bitcast interference llvm-svn: 271125	2016-05-28 16:10:37 +00:00
Sanjay Patel	f49c7b1570	regenerate checks llvm-svn: 271117	2016-05-28 15:44:28 +00:00
Sanjay Patel	cbc4aa6bdd	join RUN lines; NFC llvm-svn: 271115	2016-05-28 15:34:05 +00:00
Sean Silva	02b9d892c5	Bring back r271090 in a way that doesn't depend on r271089. llvm-svn: 271092	2016-05-28 04:05:36 +00:00
Sean Silva	9dd4b5c51d	Revert r271089 and r271090. It was triggering an msan bot. Revert "[IRPGO] Set the function entry count metadata." This reverts commit r271090. Revert "[IRPGO] Centralize the function attribute inliner hint logic. NFC." This reverts commit r271089. llvm-svn: 271091	2016-05-28 03:56:25 +00:00
Sean Silva	7884633c5b	[IRPGO] Set the function entry count metadata. llvm-svn: 271090	2016-05-28 03:02:54 +00:00
Xinliang David Li	d38392ecd6	[PM] Port the Sample FDO to new PM (part-2) llvm-svn: 271072	2016-05-27 23:20:16 +00:00
Evgeny Stupachenko	ea2aef4a1d	The patch refactors unroll pass. Summary: Unroll factor (Count) calculations moved to a new function. Early exits on pragma and "-unroll-count" defined factor added. New type of unrolling "Force" introduced (previously used implicitly). New unroll preference "AllowRemainder" introduced and set "true" by default. (should be set to false for architectures that suffers from it). Reviewers: hfinkel, mzolotukhin, zzheng Differential Revision: http://reviews.llvm.org/D19553 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 271071	2016-05-27 23:15:06 +00:00
Sanjoy Das	6fff9dc932	[GVN] Preserve !range metadata when PRE'ing loads Reviewers: dberlin, reames, george.burgess.iv Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20743 llvm-svn: 271034	2016-05-27 19:03:10 +00:00
Tim Northover	32b4d15e0a	Move test to X86 directory: I think it depends on X86 TTI. llvm-svn: 271019	2016-05-27 16:56:54 +00:00
Tim Northover	10a1e8b1fe	Vectorizer: track non-fast FP instructions through phis when finding reductions. When we traced through a phi node looking for floating-point reductions, we forgot whether we'd ever seen an instruction without fast-math flags (that would block vectorization). This propagates it through to the end. llvm-svn: 271015	2016-05-27 16:40:27 +00:00
Dehao Chen	80b16d4135	Remove sample profile dependency to instcombine, which is not a analysis pass. Summary: This patch removes dependency from sample profile pass to instcombine pass. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D20501 llvm-svn: 271009	2016-05-27 16:14:15 +00:00
Igor Laevsky	df9db45c94	[RewriteStatepointsForGC] All constant should have null base pointer Currently we consider that each constant has itself as a base value. I.e "base(const) = const". This introduces couple of problems when we are trying to avoid reporting constants in statepoint live sets: 1. When querying "base( phi(const1, const2) )" we will get "phi(const1, const2)" as a base pointer. Since it's not a constant we will record it in a stack map. However on practice we don't want this to happen (constant are never relocated). 2. base( phi(const, gc ptr) ) = phi( const, base(gc ptr) ). This particular case imposes challenge on our runtime - we don't expect to see constant base pointers other than null. This problems can be avoided by treating all constant as if they were derived from null pointer base. I.e in a first case we will not include constant pointer in a stack map at all. In a second case we will get "phi(null, base(gc ptr))" as a base pointer which is a lot more convenient. Differential Revision: http://reviews.llvm.org/D20584 llvm-svn: 270993	2016-05-27 13:13:59 +00:00
Simon Pilgrim	4642a57fbf	Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) llvm-svn: 270976	2016-05-27 09:02:25 +00:00
Simon Pilgrim	c013e5737b	[X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. A companion patch (D20684) removes/auto-upgrade the clang intrinsics. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 270973	2016-05-27 08:49:15 +00:00
Pete Cooper	1929b5539a	Form objc_storeStrong in the presence of bitcasts. objc_storeStrong can be formed from a sequence such as %0 = tail call i8* @objc_retain(i8* %p) nounwind %tmp = load i8, i8* @x, align 8 store i8* %0, i8** @x, align 8 tail call void @objc_release(i8* %tmp) nounwind The code was already looking through bitcasts for most of the values involved, but had missed one case where the pointer operand for the store was a bitcast. Ultimately the pointer for the load and store have to be the same value, after stripping casts. llvm-svn: 270955	2016-05-27 02:13:53 +00:00
Michael Zolotukhin	15e745133e	[LoopUnrollAnalyzer] Bail out instead of dying with assert when facing huge index. This fixes PR27902. llvm-svn: 270946	2016-05-27 00:55:16 +00:00
Easwaran Raman	5fe04a1d8e	Attach profile summary in IR based instrumentation pass. Differential revision: http://reviews.llvm.org/D20655 llvm-svn: 270933	2016-05-26 22:57:11 +00:00
Michael Zolotukhin	1ecdedad8d	[LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost. Condition might be simplified to a Constant, but it doesn't have to be ConstantInt, so we should dyn_cast, instead of cast. This fixes PR27886. llvm-svn: 270924	2016-05-26 21:42:51 +00:00
David Majnemer	d99068d26d	[MemCpyOpt] Don't perform callslot optimization across may-throw calls An exception could prevent a store from occurring but MemCpyOpt's callslot optimization would fire anyway, causing the store to occur. This fixes PR27849. llvm-svn: 270892	2016-05-26 19:24:24 +00:00
Michael Kuperstein	9a81b62a01	[BBVectorize] Don't vectorize selects with a scalar condition and vector operands. This fixes PR27879. Differential Revision: http://reviews.llvm.org/D20659 llvm-svn: 270888	2016-05-26 18:43:57 +00:00
David Majnemer	7f32420ed5	[CaptureTracking] Volatile operations capture their memory location The memory location that corresponds to a volatile operation is very special. They are observed by the machine in ways which we cannot reason about. Differential Revision: http://reviews.llvm.org/D20555 llvm-svn: 270879	2016-05-26 17:36:22 +00:00
Chad Rosier	e5819e2732	[InstCombine] Catch more bswap cases missed due to zext and truncs. Fixes PR27824. Differential Revision: http://reviews.llvm.org/D20591. llvm-svn: 270853	2016-05-26 14:58:51 +00:00
David Majnemer	474512576e	[MergedLoadStoreMotion] Don't transform across may-throw calls It is unsafe to hoist a load before a function call which may throw, the throw might prevent a pointer dereference. Likewise, it is unsafe to sink a store after a call which may throw. The caller might be able to observe the difference. This fixes PR27858. llvm-svn: 270828	2016-05-26 07:11:09 +00:00
Adam Nemet	c68534bd13	[ConstantFold] Fix incorrect index rewrites for GEPs Summary: If an index for a vector or array type is out-of-range GEP constant folding tries to factor it into preceding dimensions. The code however does not consider addressing of structure field padding which should not qualify as out-of-range index. As demonstrated by the testcase, this can occur if the indexing performed on a vector type and the preceding index is an array type. SROA generates GEPs for example involving padding bytes as it slices an alloca. My fix disables this folding if the element type is a vector type. I believe that this is the only way we can end up with padding. (We have no access to DataLayout so I am not sure if there is actual robust way of actually checking the presence of padding.) Reviewers: majnemer Subscribers: llvm-commits, Gerolf Differential Revision: http://reviews.llvm.org/D20663 llvm-svn: 270826	2016-05-26 07:08:05 +00:00
Peter Collingbourne	b9aa1f4a03	MemorySSA: Revert r269678 and r268068; replace with special casing in MemorySSA. It turns out that too many passes are relying on alias analysis results for control dependencies. Until we fix that by introducing a more accurate modelling of control dependencies, special case assume in MemorySSA instead. Also introduce tests to ensure we don't regress the FunctionAttrs or LICM passes. Differential Revision: http://reviews.llvm.org/D20658 llvm-svn: 270823	2016-05-26 04:58:46 +00:00
Sanjoy Das	a099268e85	[IRCE] Optimize conjunctions of range checks After this change, we do the expected thing for cases like ``` Check0Passed = /* range check IRCE can optimize / Check1Passed = / range check IRCE can optimize */ if (!(Check0Passed && Check1Passed)) throw_Exception(); ``` llvm-svn: 270804	2016-05-26 00:09:02 +00:00
Davide Italiano	1021c68e92	[PM] Port PartiallyInlineLibCalls to the new pass manager. llvm-svn: 270798	2016-05-25 23:38:53 +00:00
Hal Finkel	2f6886844e	Look for a loop's starting location in the llvm.loop metadata Getting accurate locations for loops is important, because those locations are used by the frontend to generate optimization remarks. Currently, optimization remarks for loops often appear on the wrong line, often the first line of the loop body instead of the loop itself. This is confusing because that line might itself be another loop, or might be somewhere else completely if the body was inlined function call. This happens because of the way we find the loop's starting location. First, we look for a preheader, and if we find one, and its terminator has a debug location, then we use that. Otherwise, we look for a location on an instruction in the loop header. The fallback heuristic is not bad, but will almost always find the beginning of the body, and not the loop statement itself. The preheader location search often fails because there's often not a preheader, and even when there is a preheader, depending on how it was formed, it sometimes carries the location of some preceeding code. I don't see any good theoretical way to fix this problem. On the other hand, this seems like a straightforward solution: Put the debug location in the loop's llvm.loop metadata. A companion Clang patch will cause Clang to insert llvm.loop metadata with appropriate locations when generating debugging information. With these changes, our loop remarks have much more accurate locations. Differential Revision: http://reviews.llvm.org/D19738 llvm-svn: 270771	2016-05-25 21:42:37 +00:00
Ahmed Bougacha	201b97f550	[TLI] Also cover Linux 64 libfunc (stat64, ...) prototype checking. My script missed those in r270750. llvm-svn: 270763	2016-05-25 21:16:33 +00:00
Ahmed Bougacha	1fe3f1ca50	[TLI] Fix NumParams==0 prototype checking typo. There was a typo in r267758. It caused invalid accesses when given something like "void @free(...)", as NumParams == 0, and we then try to look at the 0th parameter. Turns out, most of these were untested; add both attribute and missing-prototype checks for all libc libfuncs. Differential Revision: http://reviews.llvm.org/D20543 llvm-svn: 270750	2016-05-25 20:22:45 +00:00
Reid Kleckner	c0a0363d5c	[IR] Copy comdats in GlobalObject::copyAttributesFrom This is probably correct for all uses except cross-module IR linking, where we need to move the comdat from the source module to the destination module. Fixes PR27870. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D20631 llvm-svn: 270743	2016-05-25 18:36:22 +00:00
Sanjay Patel	aedc347b29	[x86] avoid code explosion from LoopVectorizer for gather loop (PR27826) By making pointer extraction from a vector more expensive in the cost model, we avoid the vectorization of a loop that is very likely to be memory-bound: https://llvm.org/bugs/show_bug.cgi?id=27826 There are still bugs related to this, so we may need a more general solution to avoid vectorizing obviously memory-bound loops when we don't have HW gather support. Differential Revision: http://reviews.llvm.org/D20601 llvm-svn: 270729	2016-05-25 17:27:54 +00:00
Craig Topper	12e322a8cf	[X86] Remove the llvm.x86.sse2.storel.dq intrinsic. It hasn't been used in a long time. llvm-svn: 270677	2016-05-25 06:56:32 +00:00
David Majnemer	124bdb7497	[FunctionAttrs] Volatile loads should disable readonly A volatile load has side effects beyond what callers expect readonly to signify. For example, it is not safe to reorder two function calls which each perform a volatile load to the same memory location. llvm-svn: 270671	2016-05-25 05:53:04 +00:00
Davide Italiano	655a145e83	[PM] Port BDCE to the new pass manager. llvm-svn: 270647	2016-05-25 01:57:04 +00:00
Michael Zolotukhin	8f7a242c7b	Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time. This reverts commit r270577. llvm-svn: 270630	2016-05-24 23:00:05 +00:00
Michael Zolotukhin	7216dd4668	[LoopUnrollAnalyzer] Fix a crash in UnrolledInstAnalyzer::visitCastInst. This fixes PR27847. Now for real. llvm-svn: 270629	2016-05-24 22:59:58 +00:00
Chad Rosier	47f0148c98	[InstCombine] Clean up and FileCheckize test case. llvm-svn: 270586	2016-05-24 17:35:49 +00:00
Hans Wennborg	b64e4390a3	Revert r270518, which re-enabled "[LoopUnroll] Enable advanced unrolling analysis by default. Chromium builds are still hitting the assert in PR27874. llvm-svn: 270577	2016-05-24 16:10:12 +00:00
Sanjay Patel	23019d1006	[ValueTracking, InstSimplify] extend isKnownNonZero() to handle vector constants Similar in spirit to D20497 : If all elements of a constant vector are known non-zero, then we can say that the whole vector is known non-zero. It seems like we could extend this to FP scalar/vector too, but isKnownNonZero() says it only works for integers and pointers for now. Differential Revision: http://reviews.llvm.org/D20544 llvm-svn: 270562	2016-05-24 14:18:49 +00:00
Simon Pilgrim	0295fbe1bb	[InstCombine][X86][SSE41] The SSE41 PMOVSX intrinsics are auto upgraded now and aren't handled by InstCombine any more llvm-svn: 270561	2016-05-24 13:52:44 +00:00
Michael Zolotukhin	96c150d154	Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."" This reverts commit r270512 and reapplies r270478. Originally it caused PR27847, but it was fixed in r270517. llvm-svn: 270518	2016-05-24 01:22:20 +00:00
Michael Zolotukhin	3898b2b587	[LoopUnrollAnalyzer] Fix a crash in UnrolledInstAnalyzer::visitCastInst. This fixes PR27847. llvm-svn: 270517	2016-05-24 00:51:01 +00:00
Hans Wennborg	6951028b61	Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default." This caused PR27847. llvm-svn: 270512	2016-05-23 23:42:35 +00:00
Sanjoy Das	aa83c47bab	[IRCE] Optimize "uses" not branches; NFCI This changes IRCE to optimize uses, and not branches. This change is NFCI since the uses we do inspect are in practice only ever going to be the condition use in conditional branches; but this flexibility will later allow us to analyze more complex expressions than just a direct branch on a range check. llvm-svn: 270500	2016-05-23 22:16:45 +00:00
Sanjay Patel	adcaef7238	[InstSimplify] add vector tests for isKnownNonZero llvm-svn: 270498	2016-05-23 22:09:04 +00:00
Gerolf Hoflehner	00e7092f68	[InstCombine] Fix assertion when bitcast is converted to gep When an aggregate contains an opaque type its size cannot be determined. This triggers an "Invalid GetElementPtrInst indices for type" assert in function checkGEPType. The fix suppresses the conversion in this case. http://reviews.llvm.org/D20319 llvm-svn: 270479	2016-05-23 19:23:17 +00:00
Michael Zolotukhin	be080fc51d	[LoopUnroll] Enable advanced unrolling analysis by default. Summary: This patch turns on LoopUnrollAnalyzer by default. To mitigate compile time regressions, I chose very conservative thresholds for now. Later we can make them more aggressive, but it might require being smarter in which loops we're optimizing. E.g. currently the biggest issue is that with more agressive thresholds we unroll many cold loops, which increases compile time for no performance benefit (performance of those loops is improved, but it doesn't matter since they are cold). Test results for compile time(using 4 samples to reduce noise): ``` MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19% SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect 4.19% MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow 3.39% MultiSource/Applications/JM/lencod/lencod 1.47% MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06% ``` I didn't see any performance changes in the testsuite, but it improves some internal tests. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20482 llvm-svn: 270478	2016-05-23 19:10:19 +00:00
Sanjay Patel	e2e89ef936	[ValueTracking, InstCombine] extend isKnownToBeAPowerOfTwo() to handle vector splat constants We could try harder to handle non-splat vector constants too, but that seems much rarer to me. Note that the div test isn't resolved because there's a check for isIntegerTy() guarding that transform. Differential Revision: http://reviews.llvm.org/D20497 llvm-svn: 270369	2016-05-22 15:41:53 +00:00
David Majnemer	9f92f4c497	[SimplifyCFG] Remove cleanuppads which are empty except for calls to lifetime.end A cleanuppad is not cheap, they turn into many instructions and result in additional spills and fills. It is not worth keeping a cleanuppad around if all it does is hold a lifetime.end instruction. N.B. We first try to merge the cleanuppad with another cleanuppad to avoid dropping the lifetime and debug info markers. llvm-svn: 270314	2016-05-21 05:12:32 +00:00
Sanjoy Das	be6c7a12cb	[GuardWidening] Fix incorrect use of remove_if I had used `std::remove_if` under the assumption that it moves the predicate matching elements to the end, but actaully the elements remaining towards the end (after the iterator returned by `std::remove_if`) are indeterminate. Fix the bug (and make the code more straightforward) by using a temporary SmallVector, and add a test case demonstrating the issue. llvm-svn: 270306	2016-05-21 02:24:44 +00:00
Matt Arsenault	2907e51246	Fix constant folding of addrspacecast of null This should not be making assumptions on the value of the casted pointer. llvm-svn: 270293	2016-05-21 00:14:04 +00:00
Sanjay Patel	ec4d91a4d7	add test vector sdiv llvm-svn: 270285	2016-05-20 22:08:40 +00:00
Sanjay Patel	312c9afd90	add test for vector shift llvm-svn: 270284	2016-05-20 22:08:16 +00:00
Sanjay Patel	54acedf88f	add tests for vector urem llvm-svn: 270271	2016-05-20 20:55:17 +00:00
Sanjay Patel	3eded68bef	use FileCheck instead of grep for exact checking llvm-svn: 270265	2016-05-20 20:07:18 +00:00
Mark Lacey	9b5fcf65ec	Functions with differing phis should not be merged. Check that the incoming blocks of phi nodes are identical, and block function merging if they are not. rdar://problem/26255167 Differential Revision: http://reviews.llvm.org/D20462 llvm-svn: 270250	2016-05-20 18:39:11 +00:00
Sanjay Patel	75892a1543	[SimplifyCFG] eliminate switch cases based on known range of switch condition This was noted in PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766#c2 We may not know whether the sign bit(s) are zero or one, but we can still optimize based on knowing that the sign bit is repeated. Differential Revision: http://reviews.llvm.org/D20275 llvm-svn: 270222	2016-05-20 14:53:09 +00:00
Easwaran Raman	bb578ef0dd	Allow -inline-threshold to override default threshold. Before r257832, the threshold used by SimpleInliner was explicitly specified or generated from opt levels and passed to the base class Inliner's constructor. There, it was first overridden by explicitly specified -inline-threshold. The refactoring in r257832 did not preserve this behavior for all opt levels. This change brings back the original behavior. Differential Revision: http://reviews.llvm.org/D20452 llvm-svn: 270153	2016-05-19 23:02:09 +00:00
Sanjoy Das	f5f0331a3b	[GuardWidening] Introduce range check merging Sequences of range checks expressed using guards, like guard((I - 2) u< L) guard((I - 1) u< L) guard((I + 0) u< L) guard((I + 1) u< L) guard((I + 2) u< L) can sometimes be combined into a smaller sequence: guard((I - 2) u< L AND (I + 2) u< L) if we can prove that (I - 2) u< L AND (I + 2) u< L implies all of checks expressed in the previous sequence. This change teaches GuardWidening to do this kind of merging when feasible. llvm-svn: 270151	2016-05-19 22:55:46 +00:00
Guozhi Wei	b1d37199cc	[InstCombine] Avoid combining the bitcast of a var that is used as both address and result of load instructions This patch fixes https://llvm.org/bugs/show_bug.cgi?id=27703. If there is a sequence of one or more load instructions, each loaded value is used as address of later load instruction, bitcast is necessary to change the value type, don't optimize it. llvm-svn: 270135	2016-05-19 21:07:01 +00:00
Wei Mi	0456d9dd18	Recommit r255691 since PR26509 has been fixed. llvm-svn: 270113	2016-05-19 20:38:03 +00:00
Peter Collingbourne	fe12d0e3e5	CodeGen: Make the global-merge pass independently testable, and add a test. llvm-svn: 270023	2016-05-19 04:38:56 +00:00
Sanjoy Das	b784ed36c0	[GuardWidening] Use getEquivalentICmp to fold constant compares `ConstantRange::getEquivalentICmp` is more general, and better factored. llvm-svn: 270019	2016-05-19 03:53:17 +00:00
Sanjoy Das	083f38939b	New pass: guard widening Summary: Implement guard widening in LLVM. Description from GuardWidening.cpp: The semantics of the `@llvm.experimental.guard` intrinsic lets LLVM transform it so that it fails more often that it did before the transform. This optimization is called "widening" and can be used hoist and common runtime checks in situations like these: ``` %cmp0 = 7 u< Length call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ] call @unknown_side_effects() %cmp1 = 9 u< Length call @llvm.experimental.guard(i1 %cmp1) [ "deopt"(...) ] ... ``` to ``` %cmp0 = 9 u< Length call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ] call @unknown_side_effects() ... ``` If `%cmp0` is false, `@llvm.experimental.guard` will "deoptimize" back to a generic implementation of the same function, which will have the correct semantics from that point onward. It is always _legal_ to deoptimize (so replacing `%cmp0` with false is "correct"), though it may not always be profitable to do so. NB! This pass is a work in progress. It hasn't been tuned to be "production ready" yet. It is known to have quadriatic running time and will not scale to large numbers of guards Reviewers: reames, atrick, bogner, apilipenko, nlewycky Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20143 llvm-svn: 269997	2016-05-18 22:55:34 +00:00
Dehao Chen	f16376b505	Follow-up patch of http://reviews.llvm.org/D19948 to handle missing profiles when simplifying CFG. Summary: Set default branch weight to 1:1 if one of the branch has profile missing when simplifying CFG. Reviewers: spatel, davidxl Subscribers: danielcdh, llvm-commits Differential Revision: http://reviews.llvm.org/D20307 llvm-svn: 269995	2016-05-18 22:41:03 +00:00
Michael Zolotukhin	d2268a73bc	[LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands. Previously, we didn't add their and their operands cost, which could've resulted in unrolling loops for no actual benefit. llvm-svn: 269985	2016-05-18 21:20:12 +00:00
Matt Arsenault	1735da460b	AMDGPU: Other sizes of popcnt are fast We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950	2016-05-18 16:10:19 +00:00
Matt Arsenault	71fa1f375e	AMDGPU: Fix a few slightly broken tests Fix minor bugs and uses of undef which break when pointer related optimization passes are run. llvm-svn: 269944	2016-05-18 15:48:44 +00:00
Davide Italiano	98f7e0e790	[PM] Port per-function SCCP to the new pass manager. llvm-svn: 269937	2016-05-18 15:18:25 +00:00
Justin Bogner	594e07bd78	[PM] Port DSE to the new pass manager Patch by JakeVanAdrighem. Thanks! llvm-svn: 269847	2016-05-17 21:38:13 +00:00
Sanjay Patel	22b01febd4	[InstCombine] add another test for wrong icmp constant (PR27792) It doesn't matter if the comparison is unsigned; the inc/dec is always signed. llvm-svn: 269831	2016-05-17 20:20:40 +00:00
Sanjay Patel	de96f39392	[InstCombine] add test for wrong icmp constant (PR27792) The code fix for this was checked in at r269797. llvm-svn: 269803	2016-05-17 19:25:55 +00:00
Sanjoy Das	fd67038c8b	[Guards] Add branch metadata when lowering Guards are expected to basically never fail. Reflect this in the branch probabilities in their lowered form. llvm-svn: 269791	2016-05-17 17:51:19 +00:00
Igor Laevsky	953f2d2a54	[RewriteStatepointsForGC] Remove obsolete assertion This is assertion is no longer necessary since we never record constants in the live set anyway. (They are never recorded in the initial live set, and constant bases are removed near line 2119) Differential Revision: http://reviews.llvm.org/D20293 llvm-svn: 269764	2016-05-17 13:54:10 +00:00
Benjamin Kramer	ca9a0fe2b9	[InstCombine] Don't crash when trying to take an element of a ConstantExpr. Fixes PR27786. llvm-svn: 269757	2016-05-17 12:08:55 +00:00
Sanjay Patel	e9b2c32e7f	[InstCombine] check vector elements before trying to transform LE/GE vector icmp (PR27756) Fix a bug introduced with rL269426 : [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819) We were assuming that a ConstantDataVector / ConstantVector / ConstantAggregateZero operand of an ICMP was composed of ConstantInt elements, but it might have ConstantExpr or UndefValue elements. Handle those appropriately. Also, refactor this function to join the scalar and vector paths and eliminate the switches. Differential Revision: http://reviews.llvm.org/D20289 llvm-svn: 269728	2016-05-17 00:57:57 +00:00
Matthew Simpson	37ec5f914e	[LAA] Rename forwarding conflict detection option (NFC) This patch renames the option enabling the store-to-load forwarding conflict detection optimization. This change was requested in the review of D20241. llvm-svn: 269668	2016-05-16 17:00:56 +00:00
Xinliang David Li	f3c7a35238	[PM] Port indirect call promotion pass to new pass manager llvm-svn: 269660	2016-05-16 16:31:07 +00:00
Matthew Simpson	e43198dc4b	[LV] Ensure safe VF for loops with interleaved accesses The selection of the vectorization factor currently doesn't consider interleaved accesses. The vectorization factor is based on the maximum safe dependence distance computed by LAA. However, for loops with interleaved groups, we should instead base the vectorization factor on the maximum safe dependence distance divided by the maximum interleave factor of all the interleaved groups. Interleaved accesses not in a group will be scalarized. Differential Revision: http://reviews.llvm.org/D20241 llvm-svn: 269659	2016-05-16 15:08:20 +00:00
Sanjay Patel	399780f088	add test to show missing optimization llvm-svn: 269601	2016-05-15 18:41:18 +00:00
Sanjay Patel	ecdd13d788	regenerate checks llvm-svn: 269596	2016-05-15 18:05:10 +00:00
Elena Demikhovsky	ee004bc0a2	Vector GEP - fixed a crash on InstSimplify Pass. Vector GEP with mixed (vector and scalar) indices failed on the InstSimplify Pass when all indices are constants. Differential revision http://reviews.llvm.org/D20149 llvm-svn: 269590	2016-05-15 12:30:25 +00:00
Davide Italiano	9922344178	[PM] Port LowerAtomic to the new pass manager. llvm-svn: 269511	2016-05-13 22:52:35 +00:00
Michael Zolotukhin	963a6d9c69	Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."" This reverts commit r269395. Try to reapply with a fix from chapuni. llvm-svn: 269486	2016-05-13 21:23:25 +00:00
Sanjay Patel	23fa090738	regenerate checks and add a run to show missed shrinkage llvm-svn: 269449	2016-05-13 18:04:39 +00:00
Sanjay Patel	4e0cf49318	regenerate checks llvm-svn: 269447	2016-05-13 18:02:16 +00:00
Sanjay Patel	0c8f3f9332	[InstCombine] handle zero constant vectors for LE/GE comparisons too Enhancement to: http://reviews.llvm.org/rL269426 With discussion in: http://reviews.llvm.org/D17859 This should complete the fixes for: PR26701, PR26819: https://llvm.org/bugs/show_bug.cgi?id=26701 https://llvm.org/bugs/show_bug.cgi?id=26819 llvm-svn: 269439	2016-05-13 17:28:12 +00:00
Jun Bum Lim	f28beac419	[MemCpyOpt] Use MaxIntSize in byte instead of bit Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit. Reviewers: arsenm, joker.eph, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20176 llvm-svn: 269433	2016-05-13 16:52:24 +00:00
Sanjay Patel	b79ab27853	[InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819) *We don't currently handle the edge case constants (min/max values), so it's not a complete canonicalization. To fully solve the motivating bugs, we need to enhance this to recognize a zero vector too because that's a ConstantAggregateZero which is a ConstantData, not a ConstantVector or a ConstantDataVector. Differential Revision: http://reviews.llvm.org/D17859 llvm-svn: 269426	2016-05-13 15:10:46 +00:00
Michael Zolotukhin	9be3b8b9bb	Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..." This reverts commit r269388. It caused some bots to fail, I'm reverting it until I investigate the issue. llvm-svn: 269395	2016-05-13 06:32:25 +00:00
Michael Zolotukhin	b7b8052982	[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the... Summary: ...loop after the last iteration. This is really hard to do correctly. The core problem is that we need to model liveness through the induction PHIs from iteration to iteration in order to get the correct results, and we need to correctly de-duplicate the common subgraphs of instructions feeding some subset of the induction PHIs. All of this can be driven either from a side effect at some iteration or from the loop values used after the loop finishes. This patch implements this by storing the forward-propagating analysis of each instruction in a cache to recall whether it was free and whether it has become live and thus counted toward the total unroll cost. Then, at each sink for a value in the loop, we recursively walk back through every value that feeds the sink, including looping back through the iterations as needed, until we have marked the entire input graph as live. Because we cache this, we never visit instructions more than twice -- once when we analyze them and put them into the cache, and once when we count their cost towards the unrolled loop. Also, because the cache is only two bits and because we are dealing with relatively small iteration counts, we can store all of this very densely in memory to avoid this from becoming an excessively slow analysis. The code here is still pretty gross. I would appreciate suggestions about better ways to factor or split this up, I've stared too long at the algorithmic side to really have a good sense of what the design should probably look at. Also, it might seem like we should do all of this bottom-up, but I think that is a red herring. Specifically, the simplification power is much greater working top-down. We can forward propagate very effectively, even across strange and interesting recurrances around the backedge. Because we use data to propagate, this doesn't cause a state space explosion. Doing this level of constant folding, etc, would be very expensive to do bottom-up because it wouldn't be until the last moment that you could collapse everything. The current solution is essentially a top-down simplification with a bottom-up cost accounting which seems to get the best of both worlds. It makes the simplification incremental and powerful while leaving everything dead until we know it is needed. Finally, a core property of this approach is its monotonicity. At all times, the current UnrolledCost is a conservatively low estimate. This ensures that we will never early-exit from the analysis due to exceeding a threshold when if we had continued, the cost would have gone back below the threshold. These kinds of bugs can cause incredibly hard to track down random changes to behavior. We could use a techinque similar (but much simpler) within the inliner as well to avoid considering speculated code in the inline cost. Reviewers: chandlerc Subscribers: sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D11758 llvm-svn: 269388	2016-05-13 01:42:39 +00:00
Michael Zolotukhin	a59a308e8d	[LoopUnrollAnalyzer] Don't treat gep-instructions with simplified offset as simplified. Summary: Currently we consider such instructions as simplified, which is incorrect, because if their user isn't simplified, we can't actually simplify them too. This biases our estimates of profitability: for instance the analyzer expects much more gains from unrolling memcpy loops than there actually are. Reviewers: hfinkel, chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17365 llvm-svn: 269387	2016-05-13 01:42:34 +00:00
David Majnemer	96f0d383a7	[SCCP] Resolve shifts beyond the bitwidth to undef Shifts beyond the bitwidth are undef but SCCP resolved them to zero. Instead, DTRT and resolve them to undef. This reimplements the transform which caused PR27712. llvm-svn: 269269	2016-05-12 03:07:40 +00:00
Sanjoy Das	e0aa414acf	All llvm.deoptimize declarations must use the same calling convention This new verifier rule lets us unambigously pick a calling convention when creating a new declaration for `@llvm.experimental.deoptimize.<ty>`. It is also congruent with our lowering strategy -- since all calls to `@llvm.experimental.deoptimize` are lowered to calls to `__llvm_deoptimize`, it is reasonable to enforce a unique calling convention. Some of the tests that were breaking this verifier rule have had to be split up into different .ll files. The inliner was violating this rule as well, and has been fixed to avoid producing invalid IR. llvm-svn: 269261	2016-05-12 01:17:38 +00:00
Davide Italiano	cd7c84bd8b	Revert "[SCCP] Partially propagate informations when the input is not fully defined." This reverts commit r269105 as it caused PR27712. llvm-svn: 269252	2016-05-11 23:06:10 +00:00
Easwaran Raman	9b792923d0	Revert r269131 llvm-svn: 269138	2016-05-10 23:26:04 +00:00
Dehao Chen	b76e5d948a	Propagate branch metadata when some branch probability is missing. Summary: In sample profile, some branches may have profile missing due to profile inaccuracy. We want existing branch probability still valid after propagation. Reviewers: hfinkel, davidxl, spatel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19948 llvm-svn: 269137	2016-05-10 23:07:19 +00:00
Easwaran Raman	7eccf4ee0e	Reapply r266477 and r266488 llvm-svn: 269131	2016-05-10 22:03:23 +00:00
Xinliang David Li	da1955835d	[PM]: port IR based profUse pass to new pass manager llvm-svn: 269129	2016-05-10 21:59:52 +00:00
Tim Northover	3961735f03	Revert "MemCpyOpt: combine local load/store sequences into memcpy." This reverts commit r269125. It was in my tree when I ran "git svn dcommit". It's really still under review. llvm-svn: 269127	2016-05-10 21:49:40 +00:00
Tim Northover	6c65c71639	MemCpyOpt: combine local load/store sequences into memcpy. Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block that really implements a memcpy operation, backends can benefit from seeing this. llvm-svn: 269125	2016-05-10 21:48:11 +00:00
Hans Wennborg	719b26ba54	Loop unroller: set thresholds for optsize and minsize functions to zero Before r268509, Clang would disable the loop unroll pass when optimizing for size. That commit enabled it to be able to support unroll pragmas in -Os builds. However, this regressed binary size in one of Chromium's DLLs with ~100 KB. This restores the original behaviour of no unrolling at -Os, but doing it in LLVM instead of Clang makes more sense, and also allows the pragmas to keep working. Differential revision: http://reviews.llvm.org/D20115 llvm-svn: 269124	2016-05-10 21:45:55 +00:00
Lawrence Hu	e58a814c07	Enable loopreroll for sext of loop control only IV This patch extend loopreroll to allow the instruction chain of loop control only IV has sext. Differential Revision: http://reviews.llvm.org/D19820 llvm-svn: 269121	2016-05-10 21:16:49 +00:00
Lawrence Hu	fe7c87beac	Revert r26084: Enable loopreroll for sext of loop control only IV llvm-svn: 269119	2016-05-10 21:11:09 +00:00
Lawrence Hu	4c623d27b5	Revert r269093: Enable loopreroll for sext of loop control only IV llvm-svn: 269117	2016-05-10 21:04:28 +00:00
Sanjay Patel	6786bc5390	[InstSimplify] use computeKnownBits on shift amount operands Do simplifications common to all shift instructions based on the amount shifted: 1. If the shift amount is known larger than the bitwidth, the result is undefined. 2. If the valid bits of the shift amount are all known to be 0, it's a shift by zero, so the shift operand is the result. Note that we could generalize the shift-by-zero transform into a shift-by-constant if all of the valid bits in the shift amount are known, but that would have to be done in InstCombine rather than here because it would mean we need to create a new shift instruction. Differential Revision: http://reviews.llvm.org/D19874 llvm-svn: 269114	2016-05-10 20:46:54 +00:00
Chad Rosier	4e6cda2db5	[InstCombine] Fold icmp ugt/ult (udiv i32 C2, X), C1. This patch adds support for two optimizations: icmp ugt (udiv C2, X), C1 -> icmp ule X, C2/(C1+1) icmp ult (udiv C2, X), C1 -> icmp ugt X, C2/C1 Differential Revision: http://reviews.llvm.org/D20123 llvm-svn: 269109	2016-05-10 20:22:09 +00:00
Davide Italiano	7860c9bbf4	[SCCP] Partially propagate informations when the input is not fully defined. With this patch: %r1 = lshr i64 -1, 4294967296 -> undef Before this patch: %r1 = lshr i64 -1, 4294967296 -> 0 llvm-svn: 269105	2016-05-10 19:49:47 +00:00
Justin Bogner	da0fe183c3	LPM: Drop require<loops> from these tests, it's redundant. NFC The LoopPassManager needs to calculate the loops analysis in order to iterate over the loops at all. Requiring it is redundant and just adds noise to the RUN lines here. llvm-svn: 269097	2016-05-10 18:28:10 +00:00
Rafael Espindola	32483a7641	Make "@name =" mandatory for globals in .ll files. An oddity of the .ll syntax is that the "@var = " in @var = global i32 42 is optional. Writing just global i32 42 is equivalent to @0 = global i32 42 This means that there is a pretty big First set at the top level. The current implementation maintains it manually. I was trying to refactor it, but then started wondering why keep it a all. I personally find the above syntax confusing. It looks like something is missing. This patch removes the feature and simplifies the parser. llvm-svn: 269096	2016-05-10 18:22:45 +00:00
Lawrence Hu	b68f16e007	Enable loopreroll for sext of loop control only IV This patch extend loopreroll to allow the instruction chain of loop control only IV has sext. Differential Revision: http://reviews.llvm.org/D19820 llvm-svn: 269093	2016-05-10 18:00:42 +00:00
Rong Xu	b6211a0b4f	[PGO] resubmit r268969 Put the test into a target specific directory. llvm-svn: 269090	2016-05-10 17:45:33 +00:00
Lawrence Hu	8cc3b37d2c	Enable loopreroll for sext of loop control only IV This patch extend loopreroll to allow the instruction chain of loop control only IV has sext. llvm-svn: 269084	2016-05-10 17:42:27 +00:00
James Molloy	aa1d638800	Revert "[VectorUtils] Query number of sign bits to allow more truncations" This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690. This reverts commit r268921. llvm-svn: 269051	2016-05-10 12:27:23 +00:00
Chuang-Yu Cheng	175741d5a7	Update Debug Intrinsics in RewriteUsesOfClonedInstructions in LoopRotation Loop rotation clones instruction from the old header into the preheader. If there were uses of values produced by these instructions that were outside the loop, we have to insert PHI nodes to merge the two values. If the values are used by DbgIntrinsics they will be used as a MetadataAsValue of a ValueAsMetadata of the original values, and iterating all of the uses of the original value will not update the DbgIntrinsics. The new code checks if the values are used by DbgIntrinsics and if so, updates them using essentially the same logic as the original code. The attached testcase demonstrates the issue. Without the fix, the DbgIntrinic outside the loop uses values computed inside the loop, even though these values do not dominate the DbgIntrinsic. Author: Thomas Jablin (tjablin) Reviewers: dblaikie aprantl kbarton hfinkel cycheng http://reviews.llvm.org/D19564 llvm-svn: 269034	2016-05-10 09:45:44 +00:00
Arnaud A. de Grandmaison	333ef381b8	[InstCombine] Remove trivially empty va_start/va_end and va_copy/va_end ranges. When a va_start or va_copy is immediately followed by a va_end (ignoring debug information or other start/end in between), then it is safe to remove the pair. As this code shares some commonalities with the lifetime markers, this has been factored to helper functions. This InstCombine pattern kicks-in 3 times when running the LLVM test suite. llvm-svn: 269033	2016-05-10 09:24:49 +00:00
Renato Golin	d876eecf02	Revert "[PGO] Fix __llvm_profile_raw_version linkage in MACHO IR instrumentation generates a COMDAT symbol __llvm_profile_raw_version to overwrite the same symbol in profile run-time to distinguish IR profiles from Clang generated profiles. In MACHO, LinkOnceODR linkage is used due to the lack of COMDAT support." This reverts commits r268969, r268979 and r268984. They had target specific test in generic directories without the correct specifiers and made it hard for us to come up with a good solution by rapidly committing untested changes. This test needs to be in a target specific directory or have the correct REQUIRED identifier. llvm-svn: 269027	2016-05-10 08:23:57 +00:00
Elena Demikhovsky	c434d091c5	[LoopVectorize] Handling induction variable with non-constant step. Allow vectorization when the step is a loop-invariant variable. This is the loop example that is getting vectorized after the patch: int int_inc; int bar(int init, int restrict A, int N) { int x = init; for (int i=0;i<N;i++){ A[i] = x; x += int_inc; } return x; } "x" is an induction variable with loop-invariant* step. But it is not a primary induction. Primary induction variable with non-constant step is not handled yet. Differential Revision: http://reviews.llvm.org/D19258 llvm-svn: 269023	2016-05-10 07:33:35 +00:00
Sanjoy Das	12c91dc4c8	[ValueTracking] Use guards to prove non-nullness of a value Reviewers: apilipenko, majnemer, reames Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20044 llvm-svn: 269008	2016-05-10 02:35:44 +00:00
Evgeniy Stepanov	6694ec7406	Don't inline functions with different SafeStack attributes. llvm-svn: 268999	2016-05-10 00:33:07 +00:00
Adam Nemet	0a77dfad95	[LV] Hint at the new loop distribution pragma in optimization remark When we encounter unsafe memory dependencies, loop distribution could help. Even though, the diagnostics is in LAA, it's only currently emitted in the vectorizer. llvm-svn: 268987	2016-05-09 23:03:44 +00:00
Rong Xu	2c9bdd0d3d	Fix buildbot failure from r268968. llvm-svn: 268984	2016-05-09 22:45:47 +00:00
Sanjay Patel	0f153424a9	[Inliner] don't assume that a Constant alloca size is a ConstantInt (PR27277) Differential Revision: http://reviews.llvm.org/D20077 llvm-svn: 268980	2016-05-09 21:51:53 +00:00
Rong Xu	c5508046b8	Fix buildbot failure from r268968. llvm-svn: 268979	2016-05-09 21:51:50 +00:00
Rong Xu	a12f6d3c7b	[PGO] Fix __llvm_profile_raw_version linkage in MACHO IR instrumentation generates a COMDAT symbol __llvm_profile_raw_version to overwrite the same symbol in profile run-time to distinguish IR profiles from Clang generated profiles. In MACHO, LinkOnceODR linkage is used due to the lack of COMDAT support. But LinkOnceODR linkage might have .weak_def_can_be_hidden assembly directive, while the weak variable in run-time has a .weak_definition directive. Linker will not merge these two symbols even they have the same name. The end result is IR profiles are not properly flagged in MACHO. This patch changes the linkage for __llvm_profile_raw_version in each module to LinkOnceAny so that it has same .weak_definition directive as in the run-time. Differential Revision: http://reviews.llvm.org/D20078 llvm-svn: 268969	2016-05-09 21:03:06 +00:00
Chad Rosier	131a42ccdf	[InstCombine] Fold icmp eq/ne (udiv i32 A, B), 0 -> icmp ugt/ule B, A. Differential Revision: http://reviews.llvm.org/D20036 llvm-svn: 268960	2016-05-09 19:30:20 +00:00
Sanjay Patel	da7fe0c4a4	clean up; NFC llvm-svn: 268949	2016-05-09 18:54:14 +00:00
Joerg Sonnenberger	8ffe7ab7c2	Optimize a printf with a double procent to putchar. llvm-svn: 268922	2016-05-09 14:36:16 +00:00
James Molloy	5c20e27b7f	[VectorUtils] Query number of sign bits to allow more truncations When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations. llvm-svn: 268921	2016-05-09 14:32:30 +00:00
Mehdi Amini	581f0e1451	Refactor stripDebugInfo(Function) to handle intrinsic This moves the code that handles stripping debug info intrinsic from StripDebugInfo(Module) to StripDebugInfo(Function). The latter is already walking every instructions so it makes sense to do it at the same time. This makes also stripDebugInfo(Function) as an API more useful: it is really dropping every debug info in the Function. Finally the existing code is trigerring an assertion when the Module is not fully materialized. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 268847	2016-05-07 04:10:52 +00:00
Simon Pilgrim	45964c3742	[SLPVectorizer][X86] Regenerated SEXT/ZEXT cast vectorization tests Added 256-bit vector test as well llvm-svn: 268811	2016-05-06 22:22:18 +00:00
Philip Reames	6f4d0088c6	Reapply 267210 with fix for PR27490 Original Commit Message Extend load/store type canonicalization to handle unordered operations Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case. Note that the concern about lowering is now much less likely. PR27490 proved that we already were mucking with the types of ordered atomics and volatiles. As a result, this change doesn't introduce as much new behavior as originally thought. llvm-svn: 268809	2016-05-06 22:17:01 +00:00
Philip Reames	4a3c3b66d7	[GVN] PRE of unordered loads Again, fairly simple. Only change is ensuring that we actually copy the property of the load correctly. The aliasing legality constraints were already handled by the FRE patches. There's nothing special about unorder atomics from the perspective of the PRE algorithm itself. llvm-svn: 268804	2016-05-06 21:43:51 +00:00
Simon Pilgrim	2def0a878a	[SLPVectorizer][X86] Added BSWAP/BITREVERSE vectorization tests llvm-svn: 268803	2016-05-06 21:41:55 +00:00
Simon Pilgrim	a2220ea456	[SLPVectorizer][X86] Added CTPOP/CTLZ/CTTZ vectorization tests llvm-svn: 268800	2016-05-06 21:33:01 +00:00
Philip Reames	1fdce639d2	[GVN] Handle unordered atomics in cross block FRE You'll note there are essentially no code changes here. Cross block FRE heavily reuses code from the block local FRE. All of the tricky parts were done as part of the previous patch and the refactoring that removed the original code duplication. llvm-svn: 268775	2016-05-06 18:46:45 +00:00
Philip Reames	ae8997f496	[GVN] Do local FRE for unordered atomic loads This patch is the first in a small series teaching GVN to optimize unordered loads aggressively. This change just handles block local FRE because that's the simplest thing which lets me test MDA, and the AvailableValue pieces. Somewhat suprisingly, MDA appears fine and only a couple of small changes are needed in GVN. Once this is in, I'll tackle non-local FRE and PRE. The former looks like a natural extension of this, the later will require a couple of minor changes. Differential Revision: http://reviews.llvm.org/D19440 llvm-svn: 268770	2016-05-06 18:17:13 +00:00
Sanjay Patel	1cb6241a89	[SimplifyCFG] propagate branch metadata when creating select (retry r268550 / r268751 with possible fix) Retrying r268550/r268751 which were reverted at r268577/r268765 due a memory sanitizer failure. I have not been able to reproduce that failure, but I've taken another guess at fixing the problem in this version of the patch and will watch for another failure. Original commit message: Unlike earlier similar fixes, we need to recalculate the branch weights in this case. Differential Revision: http://reviews.llvm.org/D19674 llvm-svn: 268767	2016-05-06 18:07:46 +00:00
Sanjay Patel	84a0bf64a8	revert r268751 - caused same failures on msan bot llvm-svn: 268765	2016-05-06 17:51:37 +00:00
Sanjay Patel	6609510c32	[SimplifyCFG] propagate branch metadata when creating select (retry r268550 with possible fix) Retrying r268550 which was reverted at r268577 due a memory sanitizer failure. I have not been able to reproduce that failure, but I've taken a guess at fixing the problem in this version of the patch and will watch for another failure. Original commit message: Unlike earlier similar fixes, we need to recalculate the branch weights in this case. Differential Revision: http://reviews.llvm.org/D19674 llvm-svn: 268751	2016-05-06 17:07:47 +00:00
Chad Rosier	4ab37c0037	[SimplifyCFG] Prefer a simplification based on a dominating condition. Rather than merge two branches with a common destination. Differential Revision: http://reviews.llvm.org/D19743 llvm-svn: 268735	2016-05-06 14:25:14 +00:00
Xinliang David Li	8aebf44c97	[PM] port IR based PGO prof-gen pass to new pass manager llvm-svn: 268710	2016-05-06 05:49:19 +00:00
Davide Italiano	f54f2f0893	[PM] Port Interprocedural SCCP to the new pass manager. llvm-svn: 268684	2016-05-05 21:05:36 +00:00
Dehao Chen	f50c67ce7c	Revert http://reviews.llvm.org/D19926 as it breaks tests. llvm-svn: 268681	2016-05-05 20:47:53 +00:00
Dehao Chen	e48b4ee98c	Simplify CFG before assigning discriminator. Summary: We need to clean up CFG before assigning discriminator to minimize the impact of optimization on debug info. Reviewers: davidxl, dblaikie, dnovillo Subscribers: dnovillo, danielcdh, llvm-commits Differential Revision: http://reviews.llvm.org/D19926 llvm-svn: 268675	2016-05-05 20:18:49 +00:00
Chad Rosier	25cfb7dbd6	[ValueTracking] Improve isImpliedCondition for matching LHS and Imm RHSs. llvm-svn: 268636	2016-05-05 15:39:18 +00:00
Silviu Baranga	c05bab8a9c	[LV] Identify more induction PHIs by coercing expressions to AddRecExprs Summary: Some PHIs can have expressions that are not AddRecExprs due to the presence of sext/zext instructions. In order to prevent the Loop Vectorizer from bailing out when encountering these PHIs, we now coerce the SCEV expressions to AddRecExprs using SCEV predicates (when possible). We only do this when the alternative would be to not vectorize. Reviewers: mzolotukhin, anemet Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17153 llvm-svn: 268633	2016-05-05 15:20:39 +00:00
Davide Italiano	344e838fea	[PM] Port EliminateAvailableExternally pass to the new pass manager. llvm-svn: 268599	2016-05-05 02:37:32 +00:00
Davide Italiano	164b9bc6fe	[PM] Port ConstantMerge to the new pass manager. llvm-svn: 268582	2016-05-05 00:51:09 +00:00
Adam Nemet	3c5eabfcbc	[LoopDataPrefetch] Add optimization remark With -Rpass=loop-data-prefetch, show the memory access that got prefetched. llvm-svn: 268578	2016-05-05 00:08:15 +00:00
Vitaly Buka	fdcea9d78a	Revert "[SimplifyCFG] propagate branch metadata when creating select" MemorySanitizer: use-of-uninitialized-value 0x4910e47 in count /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:159:12 0x4910e47 in countLeadingZeros<unsigned long> /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:183 0x4910e47 in FitWeights /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:855 0x4910e47 in SimplifyCondBranchToCondBranch /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:2895 This reverts commit 609f4dd4bf3bc735c8c047a4d4b0a8e9e4d202e2. llvm-svn: 268577	2016-05-04 23:59:33 +00:00
Balaram Makam	569eaec5f3	"Reapply r268521 "[InstCombine] Canonicalize icmp instructions based on dominating conditions."" This reapplies commit r268521, that was reverted in r268530 due to a test failure in select-implied.ll Modified the test case to reflect the new change. llvm-svn: 268557	2016-05-04 21:32:14 +00:00
Sanjay Patel	7e8c285814	[SimplifyCFG] propagate branch metadata when creating select Unlike earlier similar fixes, we need to recalculate the branch weights in this case. Differential Revision: http://reviews.llvm.org/D19674 llvm-svn: 268550	2016-05-04 20:48:24 +00:00
Hal Finkel	e2b89118bd	[ConstantFold] Don't try to strip fp -> int bitcasts to simplify icmps ConstantFold has logic to take icmp (bitcast x to y), null and strip the bitcast. This makes sense in general, but not if x has floating-point type. In this case, we'd need a fcmp, not an icmp, and the code will assert. We normally don't see this situation because we constant fold fp -> int bitcasts, however, we'll see it for bitcasts of ppc_fp128 -> i128. This is because that bitcast is Endian-dependent, and as a result, we don't simplify it in ConstantFold (we could, but no one has yet added the necessary logic). Regardless, ConstantFold should not depend on that canonicalization for correctness. llvm-svn: 268534	2016-05-04 19:37:08 +00:00
Balaram Makam	31e7e13789	Revert "[InstCombine] Canonicalize icmp instructions based on dominating conditions." This reverts commit 573a40f79b35cf3e71db331bb00f6a84f03b835d. llvm-svn: 268530	2016-05-04 18:37:35 +00:00
Marianne Mailhot-Sarrasin	b192670279	Adding test cases showing the behavior of LoopUnrollPass according to optnone and optsize attributes The unroll pass was disabled by clang in /Os. Those new test cases shows that the pass will behave correctly even if it is not fully disabled. This patch is related in some way to the clang commit (http://reviews.llvm.org/D19827), which re-enables the pass in /Os. Differential Revision: http://reviews.llvm.org/D19870 llvm-svn: 268524	2016-05-04 17:45:40 +00:00
Balaram Makam	cf3bcb2625	[InstCombine] Canonicalize icmp instructions based on dominating conditions. Summary: This patch canonicalizes conditions based on the constant range information of the dominating branch condition. For example: %cmp = icmp slt i64 %a, 0 br i1 %cmp, label %land.lhs.true, label %lor.rhs lor.rhs: %cmp2 = icmp sgt i64 %a, 0 Would now be canonicalized into: %cmp = icmp slt i64 %a, 0 br i1 %cmp, label %land.lhs.true, label %lor.rhs lor.rhs: %cmp2 = icmp ne i64 %a, 0 Reviewers: mcrosier, gberry, t.p.northover, llvm-commits, reames, hfinkel, sanjoy, majnemer Subscribers: MatzeB, majnemer, mcrosier Differential Revision: http://reviews.llvm.org/D18841 llvm-svn: 268521	2016-05-04 17:34:20 +00:00
Hans Wennborg	0c3518e84b	[SimplifyCFG] isSafeToSpeculateStore now ignores debug info This patch fixes PR27615. @llvm.dbg.value instructions no longer count towards the maximum number of instructions to look back at in the instruction list when searching for a store instruction. This should make the output consistent between debug and non-debug build. Patch by Henric Karlsson <henric.karlsson@ericsson.com>! Differential Revision: http://reviews.llvm.org/D19912 llvm-svn: 268512	2016-05-04 15:40:57 +00:00
Igor Laevsky	fb1811d3a0	[RS4GC] Use SetVector/MapVector instead of DenseSet/DenseMap to guarantee stable ordering Goal of this change is to guarantee stable ordering of the statepoint arguments and other newly inserted values such as gc.relocates. Previously we had explicit sorting in a couple of places. However for unnamed values ordering was partial and overall we didn't have any strong invariant regarding it. This change switches all data structures to use SetVector's and MapVector's which provide possibility for deterministic iteration over them. Explicit sorting is now redundant and was removed. Differential Revision: http://reviews.llvm.org/D19669 llvm-svn: 268502	2016-05-04 14:55:36 +00:00
David Majnemer	3918cdd2a1	[ConstantFolding, ValueTracking] Fold constants involving bitcasts of ConstantVector We assumed that ConstantVectors would be rather uninteresting from the perspective of analysis. However, this is not the case due to a quirk of how LLVM handles vectors of i1. Vectors of i1 are not ConstantDataVectors like vectors of i8, i16, i32 or i64 because i1's SizeInBits differs from it's StoreSizeInBytes. This leads to it being categorized as a ConstantVector instead of a ConstantDataVector. Instead, treat ConstantVector more uniformly. This fixes PR27591. llvm-svn: 268479	2016-05-04 06:13:33 +00:00
David Majnemer	95549497ec	[GlobalDCE, Misc] Don't remove functions referenced by ifuncs We forgot to consider the target of ifuncs when considering if a function was alive or dead. N.B. Also update a few auxiliary tools like bugpoint and verify-uselistorder. This fixes PR27593. llvm-svn: 268468	2016-05-04 00:20:48 +00:00
Justin Bogner	d0d2341f30	PM: Port LoopRotation to the new loop pass manager llvm-svn: 268452	2016-05-03 22:02:31 +00:00
Justin Bogner	ab6a513b4e	PM: Port LoopSimplifyCFG to the new pass manager llvm-svn: 268446	2016-05-03 21:47:32 +00:00
Sanjoy Das	8a004551d0	[RS4GC] Add a test case around calling conventions; NFC llvm-svn: 268436	2016-05-03 20:58:10 +00:00
Davide Italiano	66228c4cf1	[IPO/GlobalDCE] Port to the new pass manager. Differential Revision: http://reviews.llvm.org/D19782 llvm-svn: 268425	2016-05-03 19:39:15 +00:00
Jack Liu	f101c0f7a1	[SROA] Function canConvertValue needs to check whether both NewTy and OldTy pointers are pointing to the same addr space. This can prevent SROA from creating a bitcast between pointers with different addr spaces. Differential Revision: http://reviews.llvm.org/D19697 llvm-svn: 268424	2016-05-03 19:30:48 +00:00
Jack Liu	430e2c2140	Revert 268409 due to missing comment. llvm-svn: 268421	2016-05-03 19:15:02 +00:00
Jack Liu	1ff4a0b7ee	(no commit message) llvm-svn: 268409	2016-05-03 18:01:43 +00:00
Sanjoy Das	4ae3920c5b	[LICM] Kill SCEV loop dispositions if needed SCEV caches whether SCEV expressions are loop invariant, variant or computable. LICM breaks this cache, almost by definition; so clear the SCEV disposition cache if LICM changed anything. llvm-svn: 268408	2016-05-03 17:50:11 +00:00
Sanjoy Das	7e7a5a050a	Use all_of instead of a raw loop; NFC Added some tests despite being NFC, since it looks like nothing was exercising the "all incoming values to exit PHIs are same" logic. llvm-svn: 268407	2016-05-03 17:50:06 +00:00
Sanjoy Das	905fc27ebf	[LoopDeletion] Clear SCEV loop dispositions `Loop::makeLoopInvariant` can hoist instructions out of loops, so loop dispositions for the loop it operated on may need to be cleared. We can be smarter here (especially around how `forgetLoopDispositions` is implemented), but let's be correct first. Fixes PR27570. llvm-svn: 268406	2016-05-03 17:50:02 +00:00
Anna Thomas	43d7e1cbff	Fold compares irrespective of whether allocation can be elided Summary When a non-escaping pointer is compared to a global value, the comparison can be folded even if the corresponding malloc/allocation call cannot be elided. We need to make sure the global value is not null, since comparisons to null cannot be folded. In future, we should also handle cases when the the comparison instruction dominates the pointer escape. Reviewers: sanjoy Subscribers s.egerton, llvm-commits Differential Revision: http://reviews.llvm.org/D19549 llvm-svn: 268390	2016-05-03 14:58:21 +00:00
Kristof Beyls	c08f70588d	Mark that SpeculativeExecution preserves Globals Alias Analysis. A few benchmarks with lots of accesses to global variables in the hot loops regressed a lot since r266399, which added the SpeculativeExecution pass to the default pipeline. The problem is that this pass doesn't mark Globals Alias Analysis as preserved. Globals Alias Analysis is computed in a module pass, whereas SpeculativeExecution is a function pass, and a lot of passes dependent on the Globals Alias Analysis to optimize these benchmarks are also function passes. As such, the Globals Alias Analysis information cannot be recomputed between SpeculativeExecution and the following function passes needing that information. SpeculativeExecution doesn't invalidate Globals Alias Analysis, so mark it as such to fix those performance regressions. Differential Revision: http://reviews.llvm.org/D19806 llvm-svn: 268370	2016-05-03 08:33:26 +00:00
Jack Liu	cd777c8b35	test commit llvm-svn: 268358	2016-05-03 04:06:24 +00:00
David Majnemer	3d90bb79c4	[LoopUnroll] Unroll loops which have exit blocks to EH pads We were overly cautious in our analysis of loops which have invokes which unwind to EH pads. The loop unroll transform is safe because it only clones blocks in the loop body, it does not try to split critical edges involving EH pads. Instead, move the necessary safety check to LoopUnswitch. N.B. The safety check for loop unswitch is covered by an existing test which fails without it. llvm-svn: 268357	2016-05-03 03:57:40 +00:00
Mehdi Amini	5b85d8d67b	ThinLTO: do not import function whose linkage prevents inlining. There is not point in importing a "weak" or a "linkonce" function since we won't be able to inline it anyway. We already had a targeted check for WeakAny, this is using the same check on GlobalValue as the inline, i.e. isMayBeOverriddenLinkage() From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 268341	2016-05-03 00:27:28 +00:00
Mehdi Amini	1e918c9cb3	Revert "ThinLTO: do not import function whose linkage prevents inlining." This reverts commit r268315, the tests are not passing. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 268317	2016-05-02 22:26:04 +00:00
Mehdi Amini	bda9b2ae9e	ThinLTO: do not import function whose linkage prevents inlining. There is not point in importing a "weak" or a "linkonce" function since we won't be able to inline it anyway. We already had a targeted check for WeakAny, this is using the same check on GlobalValue as the inline, i.e. isMayBeOverriddenLinkage() From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 268315	2016-05-02 22:11:27 +00:00
Reid Kleckner	bca59d2a43	Revert "[SimplifyCFG] Extend TryToSimplifyUncondBranchFromEmptyBlock for empty block including lifetime intrinsics" This reverts commit r268254. This change causes assertion failures while building Chromium. Reduced test case coming soon. llvm-svn: 268288	2016-05-02 19:43:22 +00:00
Hans Wennborg	b7599329fc	[SimplifyCFG] Extend TryToSimplifyUncondBranchFromEmptyBlock for empty block including lifetime intrinsics Make it possible that TryToSimplifyUncondBranchFromEmptyBlock merges empty basic block including lifetime intrinsics as well as phi nodes and unconditional branch into its successor or predecessor(s). If successor of empty block has single predecessor, all contents including lifetime intrinsics are sinked into the successor. Otherwise, they are hoisted into its predecessor(s) and then merged into the predecessor(s). Patch by Josh Yoon <josh.yoon@samsung.com>! Differential Revision: http://reviews.llvm.org/D19257 llvm-svn: 268254	2016-05-02 17:22:54 +00:00
Chad Rosier	84567343bc	Remove extra whitespace. NFC. llvm-svn: 268248	2016-05-02 16:45:00 +00:00
Sanjay Patel	ec41cd2461	remove blank lines llvm-svn: 268246	2016-05-02 15:49:09 +00:00
Sanjay Patel	ebc0faa8d4	[InstCombine] regenerate checks llvm-svn: 268245	2016-05-02 15:32:10 +00:00
Sanjay Patel	0d0181006a	[InstCombine] regenerate checks llvm-svn: 268244	2016-05-02 15:25:49 +00:00
Sanjay Patel	0b75fd81e1	[InstCombine] regenerate checks llvm-svn: 268242	2016-05-02 15:21:41 +00:00
Sanjay Patel	933f9da43d	[InstCombine] regenerate checks llvm-svn: 268241	2016-05-02 15:18:13 +00:00
Sanjay Patel	b193fe943f	[InstCombine] regenerate checks llvm-svn: 268239	2016-05-02 15:06:55 +00:00
Sanjay Patel	1540b19407	[InstCombine] regenerate checks llvm-svn: 268232	2016-05-02 14:21:55 +00:00
Davide Italiano	4f277763cf	[GlobalDCE] Modernize. Use FileCheck instead of grep. llvm-svn: 268207	2016-05-01 22:51:14 +00:00
Simon Pilgrim	ca140b17cb	[InstCombine][SSE] Added support to VPERMD/VPERMPS to shuffle combine to accept UNDEF elements. llvm-svn: 268206	2016-05-01 20:43:02 +00:00
Simon Pilgrim	c590492075	Dropped FIXME comment llvm-svn: 268205	2016-05-01 20:33:25 +00:00
Simon Pilgrim	eeacc40e27	[InstCombine][SSE] Added support to VPERMILVAR to shuffle combine to accept UNDEF elements. llvm-svn: 268204	2016-05-01 20:22:42 +00:00
Simon Pilgrim	cc7f567b6a	[InstCombine][AVX] Fixed PERMILVAR identity tests and added additional decode tests llvm-svn: 268203	2016-05-01 20:06:47 +00:00
Simon Pilgrim	e5e8c2fde0	[InstCombine][SSE] Added support to PSHUFB to shuffle combine to accept UNDEF elements. llvm-svn: 268202	2016-05-01 19:26:21 +00:00
Simon Pilgrim	cae3e70707	[InstCombine][SSE] Regenerate MOVSX/MOVZX tests llvm-svn: 268201	2016-05-01 18:28:45 +00:00
Simon Pilgrim	8cddf8b3c6	[InstCombine][AVX2] Combine VPERMD/VPERMPS intrinsics with constant masks to shufflevector. llvm-svn: 268199	2016-05-01 16:41:22 +00:00
Simon Pilgrim	c179435055	[InstCombine][AVX2] Added VPERMD/VPERMPS shuffle combining placeholder tests. For future support for VPERMD/VPERMPS to generic shuffles combines llvm-svn: 268166	2016-04-30 20:41:52 +00:00
Simon Pilgrim	8e38a5439b	[InstCombine][AVX] Split off VPERMILVAR tests and added additional tests for UNDEF mask elements llvm-svn: 268159	2016-04-30 07:32:19 +00:00
Sanjoy Das	47cf2affbd	[LowerGuardIntrinsics] Keep track of !make.implicit metadata If a guard call being lowered by LowerGuardIntrinsics has the `!make.implicit` metadata attached, then reattach the metadata to the branch in the resulting expanded form of the intrinsic. This allows us to implement null checks as guards and still get the benefit of implicit null checks. llvm-svn: 268148	2016-04-30 00:55:59 +00:00
Lawrence Hu	1befea2bdc	Reroll loops with multiple IV and negative step part 3 support multiple induction variables This patch enable loop reroll for the following case: for(int i=0; i<N; i += 2) { S += a++; S += a++; }; Differential Revision: http://reviews.llvm.org/D16550 llvm-svn: 268147	2016-04-30 00:51:22 +00:00
Sanjoy Das	52c68bb0f5	[LowerGuardIntrinsics] Preserve calling conv when lowering llvm-svn: 268142	2016-04-30 00:17:47 +00:00
Sanjay Patel	bc6fad0bdf	add minimal test to show dropped metadata llvm-svn: 268141	2016-04-30 00:12:54 +00:00
Sanjay Patel	6748ec49e9	remove the metadata added with r267827 We can demonstrate the 'select' bug and fix with a simpler test case. The merged weight values are already tested in another test. llvm-svn: 268139	2016-04-30 00:02:36 +00:00
Sanjoy Das	107aefc2fc	Mark guards on true as "trivially dead" This moves some logic added to EarlyCSE in rL268120 into `llvm::isInstructionTriviallyDead`. Adds a test case for DCE to demonstrate that passes other than EarlyCSE can now pick up on the new information. llvm-svn: 268126	2016-04-29 22:23:16 +00:00
Sanjoy Das	ee81b23fe7	[EarlyCSE] Simplify guard intrinsics Summary: This change teaches EarlyCSE some basic properties of guard intrinsics: - Guard intrinsics read all memory, but don't write to any memory - After a guard has executed, the condition it was guarding on can be assumed to be true - Guard intrinsics on a constant `true` are no-ops Reviewers: reames, hfinkel Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19578 llvm-svn: 268120	2016-04-29 21:52:58 +00:00
Chad Rosier	cd62bf5821	[InstCombine] Determine the result of a select based on a dominating condition. Differential Revision: http://reviews.llvm.org/D19550 llvm-svn: 268104	2016-04-29 21:12:31 +00:00
David Majnemer	d2a074b1f4	[ValueTracking] matchSelectPattern needs to be more careful around FP matchSelectPattern attempts to see through casts which mask min/max patterns from being more obvious. Under certain circumstances, it would misidentify a sequence of instructions as a min/max because it assumed that folding casts would preserve the result. This is not the case for floating point <-> integer casts. This fixes PR27575. llvm-svn: 268086	2016-04-29 18:40:34 +00:00
Geoff Berry	b92cd5293e	[BasicAA] Treat llvm.assume as not accessing memory in getModRefBehavior(Function) Reviewers: dberlin, chandlerc, hfinkel, reames, sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19730 llvm-svn: 268068	2016-04-29 17:18:28 +00:00
Sanjay Patel	362dcf9615	auto-generate checks llvm-svn: 268061	2016-04-29 16:39:37 +00:00
Simon Pilgrim	07a691c706	[InstCombine][SSE] Added x86 pshufb undef mask tests FIXME: We currently don't support folding constant pshufb shuffle masks containing undef elements. llvm-svn: 268016	2016-04-29 09:13:53 +00:00
Simon Pilgrim	5779fb61b0	[InstCombine][SSE] Regenerated x86 pshufb tests llvm-svn: 268014	2016-04-29 08:53:35 +00:00
David Majnemer	1a5799fe3e	[DeadArgumentElimination] Propagate operand bundles to promoted call sites We neglected to transfer operand bundles when performing argument promotion. llvm-svn: 268008	2016-04-29 07:22:36 +00:00
Adam Nemet	c60d8f8fd0	[LoopDist] Add missing RUN line in test from r268006 llvm-svn: 268007	2016-04-29 07:16:00 +00:00
Adam Nemet	88ec491830	[LoopDist] Also emit optimization remark on success (-Rpass=) The option -Rpass=loop-distribute now reports the loops that were distributed. llvm-svn: 268006	2016-04-29 07:10:46 +00:00
David Majnemer	13d5526392	[SLPVectorizer] Add operand bundles to vectorized functions SLPVectorizing a call site should result in further propagation of its bundles. llvm-svn: 268004	2016-04-29 07:09:51 +00:00
David Majnemer	50ddc0e1b6	[LoopVectorize] Add operand bundles to vectorized functions Also, do not crash when calculating a cost model for loop-invariant token values. llvm-svn: 268003	2016-04-29 07:09:48 +00:00
Matt Arsenault	7d1b6c81af	AMDGPU: Stop reporting an addressing mode for unknown addrspace This was being treated the same as private, which has an immediate offset. For unknown, it probably means it's for a computation not actually being used for accessing memory, so it should not have a nontrivial addressing mode. llvm-svn: 268002	2016-04-29 06:25:10 +00:00
David Majnemer	cd24bb1d3a	[ArgumentPromotion] Propagate operand bundles to promoted call sites We neglected to transfer operand bundles when performing argument promotion. This fixes PR27568. llvm-svn: 267986	2016-04-29 04:56:12 +00:00
Michael Zolotukhin	1816d03b7d	[PR25281] Remove AAResultsWrapper from preserved analyses of loop vectorizer. We don't preserve AAResults, because, for one, we don't preserve SCEV-AA. That fixes PR25281. llvm-svn: 267980	2016-04-29 03:31:25 +00:00
Hal Finkel	1b66f7e3c8	[LoopVectorize] Keep hints from original loop on the vector loop We need to keep loop hints from the original loop on the new vector loop. Failure to do this meant that, for example: void foo(int *b) { #pragma clang loop unroll(disable) for (int i = 0; i < 16; ++i) b[i] = 1; } this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the hints that unrolling should be disabled, and then we'd unroll it. llvm-svn: 267970	2016-04-29 01:27:40 +00:00
Adam Nemet	0ba164bbcb	[LoopDist] Emit optimization remarks (-Rpass) I closely followed the precedents set by the vectorizer: With -Rpass-missed, the loop is reported with further details pointing to -Rpass--analysis. * -Rpass-analysis reports the details why distribution has failed. * Regardless of -Rpass*, when distribution fails for a loop where distribution was forced with the pragma, a warning is produced according to -Wpass-failed. In this case the analysis info is also printed even without -Rpass-analysis. llvm-svn: 267952	2016-04-28 23:08:32 +00:00
Hal Finkel	50316d95a9	[Inliner] Preserve llvm.mem.parallel_loop_access metadata When inlining a call site with llvm.mem.parallel_loop_access metadata, this metadata needs to be propagated to all cloned memory-accessing instructions. Otherwise, inlining parts of the loop body will invalidate the annotation. With this functionality, we now vectorize the following as expected: void Body(int res, int c, int d, int p, int i) { res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } void Test(int res, int c, int d, int p, int n) { int i; #pragma clang loop vectorize(assume_safety) for (i = 0; i < 1600; i++) { Body(res, c, d, p, i); } } llvm-svn: 267949	2016-04-28 23:00:04 +00:00
Arch D. Robison	0e61034018	[SLPVectorizer] Extend SLP Vectorizer to deal with aggregates. The refactoring portion part was done as r267748. http://reviews.llvm.org/D14185 llvm-svn: 267899	2016-04-28 16:11:45 +00:00
Simon Pilgrim	bd4a3be7d2	[InstCombine][SSE] Add MOVMSK support to SimplifyDemandedUseBits The MOVMSK instructions copies a vector elements' sign bits to the low bits of a scalar register and zeros the high bits. This patch adds MOVMSK support to SimplifyDemandedUseBits so that its aware that the upper bits are known to be zero. It also removes the call to MOVMSK if none of the lower bits are actually required and just returns zero. Differential Revision: http://reviews.llvm.org/D19614 llvm-svn: 267873	2016-04-28 12:22:53 +00:00
Sanjay Patel	21bd38a07b	Update test to use FileCheck Also, add some metadata to show what that currently looks like. llvm-svn: 267827	2016-04-28 00:29:27 +00:00
Rong Xu	a4c3f67fe8	more buildbot failure fix to r267792 __llvm_prf_nm length is embedded in llvm_used. Relax llvm_used check. llvm-svn: 267816	2016-04-27 23:23:53 +00:00
Rong Xu	6e34c490ff	[PGO] Promote indirect calls to conditional direct calls with value-profile This patch implements the transformation that promotes indirect calls to conditional direct calls when the indirect-call value profile meta-data is available. Differential Revision: http://reviews.llvm.org/D17864 llvm-svn: 267815	2016-04-27 23:20:27 +00:00
Rong Xu	4b1dc5d60b	Fix buildbot failure due to r267792 Relax the test check as some targets do not have name compression. llvm-svn: 267803	2016-04-27 22:06:35 +00:00
Rong Xu	af5aebaa32	[PGO] Prohibit address recording if the function is both internal and COMDAT Differential Revision: http://reviews.llvm.org/D19515 llvm-svn: 267792	2016-04-27 21:17:30 +00:00
Simon Pilgrim	3f595aabe2	[InstCombine][AVX2] Add AVX2 per-element vector shift tests At the moment we don't simplify PSRAV/PSRLV/PSLLV intrinsics to generic IR for constant shift amounts, but we could. llvm-svn: 267777	2016-04-27 20:25:34 +00:00
David Majnemer	0c80e2eac6	[CodeGenPrepare] Don't sink a cast past its user The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767	2016-04-27 19:36:38 +00:00
Ahmed Bougacha	ace97c1f7d	[LIR] Set attributes on memset_pattern16. "inferattrs" will deduce the attribute, but it will be too late for many optimizations. Set it ourselves when creating the call. Differential Revision: http://reviews.llvm.org/D17598 llvm-svn: 267762	2016-04-27 19:04:50 +00:00
Ahmed Bougacha	44c19876c7	[InferAttrs] Mark memset_pattern16 params nocapture. Differential Revision: http://reviews.llvm.org/D19471 llvm-svn: 267760	2016-04-27 19:04:43 +00:00
Matthew Simpson	622b95be7b	[LV] Reallow positive-stride interleaved load groups with gaps We previously disallowed interleaved load groups that may cause us to speculatively access memory out-of-bounds (r261331). We did this by ensuring each load group had an access corresponding to the first and last member. Instead of bailing out for these interleaved groups, this patch enables us to peel off the last vector iteration, ensuring that we execute at least one iteration of the scalar remainder loop. This solution was proposed in the review of the previous patch. Differential Revision: http://reviews.llvm.org/D19487 llvm-svn: 267751	2016-04-27 18:21:36 +00:00
Gerolf Hoflehner	88017c08a6	[InstCombine] Sharpended test case in pr21210.ll llvm-svn: 267742	2016-04-27 17:19:54 +00:00
Matthew Simpson	e5dfb08fcb	[TTI] Add hook for vector extract with extension This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725	2016-04-27 15:20:21 +00:00
Simon Pilgrim	f23aa2a9c9	[InstCombine][SSE] Regenerated vector shift tests llvm-svn: 267699	2016-04-27 12:04:44 +00:00
Simon Pilgrim	d2ea708739	[InstCombine][SSE] Added DemandedBits tests for MOVMSK instructions MOVMSK zeros the upper bits of the gpr - we should be able to use this. llvm-svn: 267686	2016-04-27 09:53:09 +00:00
Adam Nemet	d2fa414718	[LoopDist] Add llvm.loop.distribute.enable loop metadata Summary: D19403 adds a new pragma for loop distribution. This change adds support for the corresponding metadata that the pragma is translated to by the FE. As part of this I had to rethink the flag -enable-loop-distribute. My goal was to be backward compatible with the existing behavior: A1. pass is off by default from the optimization pipeline unless -enable-loop-distribute is specified A2. pass is on when invoked directly from opt (e.g. for unit-testing) The new pragma/metadata overrides these defaults so the new behavior is: B1. A1 + enable distribution for individual loop with the pragma/metadata B2. A2 + disable distribution for individual loop with the pragma/metadata The default value whether the pass is on or off comes from the initiator of the pass. From the PassManagerBuilder the default is off, from opt it's on. I moved -enable-loop-distribute under the pass. If the flag is specified it overrides the default from above. Then the pragma/metadata can further modifies this per loop. As a side-effect, we can now also use -enable-loop-distribute=0 from opt to emulate the default from the optimization pipeline. So to be precise this is the new behavior: C1. pass is off by default from the optimization pipeline unless -enable-loop-distribute or the pragma/metadata enables it C2. pass is on when invoked directly from opt unless -enable-loop-distribute=0 or the pragma/metadata disables it Reviewers: hfinkel Subscribers: joker.eph, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D19431 llvm-svn: 267672	2016-04-27 05:28:18 +00:00
Evgeny Stupachenko	23ce61b663	The patch fixes PR27392. Summary: It is incorrect to compare TripCount (which is BECount + 1) with extraiters (or Count) to check if we should enter unrolled loop or not, because TripCount can potentially overflow (when BECount is max unsigned integer). While comparing BECount with (Count - 1) is overflow safe and therefore correct. Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D19256 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 267662	2016-04-27 03:04:54 +00:00
Philip Reames	3f83dbeed9	[LVI] Reduce compile time by lazily scanning blocks if needed When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null. That's all well and good, but then we'd go recurse through our input blocks. As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain. The previous code papered over this by using the isKnownNonNull routine from value tracking. This made the duplication less painful in the common case. Instead, we know do the block scan only after we've gotten the recursive results back. This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks. For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so. Note that the case where we can't prove something non-null is still the really expensive case. We end up scanning each and every block looking for a dereference and never end up finding one. llvm-svn: 267642	2016-04-27 00:30:55 +00:00
Justin Bogner	c2bf63d29d	PM: Port Reassociate to the new pass manager llvm-svn: 267631	2016-04-26 23:39:29 +00:00
Sanjay Patel	29dea0d230	[SimplifyCFG] propagate branch metadata when creating select llvm-svn: 267624	2016-04-26 23:15:48 +00:00
Philip Reames	053c2a6f25	[LVI] Apply transfer rule for overdefine inputs for binary operators As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well. This patch builds on 267609 which did the same thing for unary casts. llvm-svn: 267620	2016-04-26 23:10:35 +00:00
Philip Reames	e5030e85ea	[LVI] A better fix for the assertion error introduced by 267609 Essentially, I was using the wrong size function. For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert. I fixed this, and also added a guard that the input is a sized type. Test case is for the original mistake. I'm not sure how to actually exercise the sized type check. llvm-svn: 267618	2016-04-26 22:52:30 +00:00
Sanjay Patel	d2d2aa52cd	[LowerExpectIntrinsic] make default likely/unlikely ratio bigger We need the default ratio to be sufficiently large that it triggers transforms based on block frequency info (BFI) and plays well with the recently introduced BranchProbability used by CGP. Differential Revision: http://reviews.llvm.org/D19435 llvm-svn: 267615	2016-04-26 22:23:38 +00:00
Philip Reames	38c87c2e50	[LVI] Infer local facts from unary expressions As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well. This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations. Differential Revision: http://reviews.llvm.org/D19492 llvm-svn: 267609	2016-04-26 21:48:16 +00:00
David Majnemer	abb9f55c80	Revert "[SimplifyLibCalls] sprintf doesn't copy null bytes" The destination buffer that sprintf uses is restrict qualified, we do not need to worry about derived pointers referenced via format specifiers. This reverts commit r267580. llvm-svn: 267605	2016-04-26 21:04:47 +00:00
Elena Demikhovsky	308a7eb0d2	Masked Store in Loop Vectorizer - bugfix Fixed a bug in loop vectorization with conditional store. Differential Revision: http://reviews.llvm.org/D19532 llvm-svn: 267597	2016-04-26 20:18:04 +00:00
Justin Bogner	4563a06cee	PM: Port Internalize to the new pass manager llvm-svn: 267596	2016-04-26 20:15:52 +00:00
David Majnemer	8cd77baebc	[SimplifyLibCalls] sprintf doesn't copy null bytes sprintf doesn't read or copy the terminating null byte from it's string operands. sprintf will append it's own after processing all of the format specifiers. This fixes PR27526. llvm-svn: 267580	2016-04-26 18:16:49 +00:00
Dehao Chen	5d6d4841ed	Tune basic block annotation algorithm. Summary: Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary. This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19301 llvm-svn: 267519	2016-04-26 04:59:11 +00:00
Hal Finkel	e4c0c1679b	[SimplifyCFG] Preserve !llvm.mem.parallel_loop_access when merging When SimplifyCFG merges identical instructions from both sides of a diamond, it can preserve !llvm.mem.parallel_loop_access (as it does with most of the other metadata). There's no real data or control dependency change in this case. llvm-svn: 267515	2016-04-26 02:06:06 +00:00
Hal Finkel	411d31ad72	[LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops I really thought we were doing this already, but we were not. Given this input: void Test(int res, int c, int d, int p) { for (int i = 0; i < 16; i++) res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } we did not vectorize the loop. Even with "assume_safety" the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy. The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also implies that if conversion is okay. Differential Revision: http://reviews.llvm.org/D19512 llvm-svn: 267514	2016-04-26 02:00:36 +00:00
Sanjay Patel	a31b0c0ece	[CodeGenPrepare] don't convert an unpredictable select into control flow Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504	2016-04-26 00:47:39 +00:00
Justin Bogner	1a07501379	PM: Port GlobalOpt to the new pass manager llvm-svn: 267499	2016-04-26 00:28:01 +00:00
Justin Bogner	6f6c5f2a02	GlobalOpt: Convert a bunch of tests from grep to FileCheck llvm-svn: 267493	2016-04-25 23:36:50 +00:00
Sanjay Patel	82059090d3	Add check for "branch_weights" with prof metadata While we're here, fix the comment and variable names to make it clear that these are raw weights, not percentages. llvm-svn: 267491	2016-04-25 23:15:16 +00:00
Arch D. Robison	be0490a6e8	Optimize store of "bitcast" from vector to aggregate. This patch is what was the "instcombine" portion of D14185, with an additional test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll). The patch causes instcombine to replace sequences of extractelement-insertvalue-store that act essentially like a bitcast followed by a store. Differential review: http://reviews.llvm.org/D14260 llvm-svn: 267482	2016-04-25 22:22:39 +00:00
Chad Rosier	bbabc85031	Fix typo from r267432. llvm-svn: 267436	2016-04-25 18:20:27 +00:00
Chad Rosier	4c4e3336b8	[ValueTracking] Add an additional test case for r266767 where one operand is a const. llvm-svn: 267432	2016-04-25 17:41:48 +00:00
Chad Rosier	e2cbd13e56	[ValueTracking] Improve isImpliedCondition when the dominating cond is false. llvm-svn: 267430	2016-04-25 17:23:36 +00:00
James Molloy	eb040cc55f	[GlobalOpt] Allow constant globals to be SRA'd The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!). There seems to be no reason why we can't SRA constants too, so let's do it. llvm-svn: 267393	2016-04-25 10:48:29 +00:00
Simon Pilgrim	4b5462f119	[InstCombine][SSE] Reduce DIVSS/DIVSD to FDIV if only first element is required As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns. llvm-svn: 267359	2016-04-24 18:35:59 +00:00
Simon Pilgrim	83020942d3	[InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 2 of 2) Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics: 1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND). 2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements 3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns). Differential Revision: http://reviews.llvm.org/D19318 llvm-svn: 267357	2016-04-24 18:23:14 +00:00
Simon Pilgrim	424da1637a	[InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2) This patch improves support for determining the demanded vector elements through SSE scalar intrinsics: 1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input) 2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input Differential Revision: http://reviews.llvm.org/D17490 llvm-svn: 267356	2016-04-24 18:12:42 +00:00
Duncan P. N. Exon Smith	a59d3e5af8	DebugInfo: Remove MDString-based type references Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around DIType*. It is no longer legal to refer to a DICompositeType by its 'identifier:', and DIBuilder no longer retains all types with an 'identifier:' automatically. Aside from the bitcode upgrade, this is mainly removing logic to resolve an MDString-based reference to an actualy DIType. The commits leading up to this have made the implicit type map in DICompileUnit's 'retainedTypes:' field superfluous. This does not remove DITypeRef, DIScopeRef, DINodeRef, and DITypeRefArray, or stop using them in DI-related metadata. Although as of this commit they aren't serving a useful purpose, there are patchces under review to reuse them for CodeView support. The tests in LLVM were updated with deref-typerefs.sh, which is attached to the thread "[RFC] Lazy-loading of debug info metadata": http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html llvm-svn: 267296	2016-04-23 21:08:00 +00:00
Nico Weber	0aa9845d15	Revert r267210, it makes clang assert (PR27490). llvm-svn: 267232	2016-04-22 22:08:42 +00:00
Peter Collingbourne	7dd8dbf486	Introduce llvm.load.relative intrinsic. This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223	2016-04-22 21:18:02 +00:00
Philip Reames	5f0e36947b	[unordered] sink unordered stores at end of blocks The existing code turned out to be completely correct when auditted. Thus, only minor code changes and adding a couple of tests. llvm-svn: 267215	2016-04-22 20:53:32 +00:00
Sanjoy Das	f97229d6ba	Fold compares for distinct allocations Summary: We can fold compares to false when two distinct allocations within a function are compared for equality. Patch by Anna Thomas! Reviewers: majnemer, reames, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19390 llvm-svn: 267214	2016-04-22 20:52:25 +00:00
Philip Reames	eedef73b63	[unordered] Extend load/store type canonicalization to handle unordered operations Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case. llvm-svn: 267210	2016-04-22 20:33:48 +00:00
Justin Bogner	b93949089e	PM: Port SinkingPass to the new pass manager llvm-svn: 267199	2016-04-22 19:54:10 +00:00
Jun Bum Lim	d29a24e4fd	[DeadStoreElimination] Shorten beginning of memset overwritten by later stores Summary: This change will shorten memset if the beginning of memset is overwritten by later stores. Reviewers: hfinkel, eeckstein, dberlin, mcrosier Subscribers: mgrang, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18906 llvm-svn: 267197	2016-04-22 19:51:29 +00:00
Justin Bogner	395c2127ed	PM: Port DCE to the new pass manager Also add a very basic test, since apparently there aren't any tests for DCE whatsoever to add the new pass version to. llvm-svn: 267196	2016-04-22 19:40:41 +00:00
Adam Nemet	54053a518e	[LoopVersioningLICM] Add test coverage for llvm.loop.licm_versioning.disable In the next change, I am generalizing the function findStringMetadataForLoop and I want to make sure I don't break this. Looks like there was no coverage for this so far. llvm-svn: 267182	2016-04-22 18:34:50 +00:00
Chad Rosier	1a60159064	[SimplifyCFG] Add final missing implications to isImpliedTrueByMatchingCmp. Summary: eq imply [u\|s]ge and [u\|s]le are true. Remove redundant logic by implementing isImpliedFalseByMatchingCmp(Pred1, Pred2) as isImpliedTrueByMatchingCmp(Pred1, getInversePredicate(Pred2)). llvm-svn: 267177	2016-04-22 17:57:34 +00:00
Chad Rosier	3456cb5672	[SimplifyCFG] Add missing implications to isImpliedTrueByMatchingCmp. Summary: [u\|s]gt and [u\|s]lt imply [u\|s]ge and [u\|s]le are true, respectively. I've simplified the existing tests and added additional tests to cover the new cases mentioned above. I've also added tests for all the cases where the first compare doesn't imply anything about the second compare. llvm-svn: 267171	2016-04-22 17:14:12 +00:00
Chad Rosier	1960d13e29	[SimplifyCFG] Simplify code review by temporarily removing this test file. A followup commit will replace these tests with simplified and more inclusive tests. The diff is unreadable if this were to be done in a single commit. llvm-svn: 267170	2016-04-22 17:14:08 +00:00
David Majnemer	bfd695d591	[EarlyCSE] Don't add the overflow flags to the hash We take the intersection of overflow flags while CSE'ing. This permits us to consider two instructions with different overflow behavior to be replaceable. llvm-svn: 267153	2016-04-22 14:12:50 +00:00
Silviu Baranga	e985c76b90	[InstCombine] Preserve fast math flags when combining PHIs Summary: When optimizing PHIs which have inputs floating point binary operators, we preserve all IR flags except the fast math flags. This change removes the logic which tracked some of the IR flags (no wrap, exact) and replaces it by doing an and on the IR flags of all inputs to the PHI - which will also handle the fast math flags. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19370 llvm-svn: 267139	2016-04-22 11:21:36 +00:00
David Majnemer	d0ce8f1485	[GVN] Respect fast-math-flags on fcmps We assumed that flags were only present on binary operators. This is not true, they may also be present on calls and fcmps. llvm-svn: 267113	2016-04-22 06:37:51 +00:00
David Majnemer	9554c1339c	[EarlyCSE] Take the intersection of flags on instructions EarlyCSE had inconsistent behavior with regards to flag'd instructions: - In some cases, it would pessimize if the available instruction had different flags by not performing CSE. - In other cases, it would miscompile if it replaced an instruction which had no flags with an instruction which has flags. Fix this by being more consistent with our flag handling by utilizing andIRFlags. llvm-svn: 267111	2016-04-22 06:37:45 +00:00
Sanjoy Das	a085cfc150	Folding compares with unescaped allocations Summary: If we know that the pointer allocated within a function does not escape, we can fold away comparisons that are done with global pointers Patch by Anna Thomas! Reviewers: reames, majnemer, sanjoy Subscribers: mgrang, mcrosier, majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D19276 llvm-svn: 267035	2016-04-21 19:26:45 +00:00
Philip Reames	a98c7ead30	[instcombine][unordered] Extend load(select) transform to handle unordered loads llvm-svn: 267023	2016-04-21 17:59:40 +00:00
Philip Reames	3ac0718423	[unordered] unordered loads from null are still unreachable llvm-svn: 267019	2016-04-21 17:45:05 +00:00
Philip Reames	ac55090e96	[instcombine][unordered] Implement *-load forwarding for unordered atomics This builds on 266999 which made FindAvailableValue do the right thing. Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements. llvm-svn: 267006	2016-04-21 17:03:33 +00:00
Philip Reames	92c43699bc	[unordered] Add tests and conservative handling in support of future changes [NFCI] This change adds a couple of test cases to make sure FindAvailableLoadedValue does the right thing. At the moment, the code added is dead, but separating it makes follow on changes far more obvious. llvm-svn: 266999	2016-04-21 16:51:08 +00:00
Sanjoy Das	54a3a006ca	[SimplifyCFG] Fold `llvm.guard(false)` to unreachable Summary: `llvm.guard(false)` always bails out of the current compilation unit, so we can prune any control flow following it. Reviewers: hfinkel, pcc, reames Subscribers: majnemer, reames, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19245 llvm-svn: 266955	2016-04-21 05:09:12 +00:00
Mehdi Amini	bda3c97c16	ThinLTO/ModuleLinker: add a flag to not always pull-in linkonce when performing importing Summary: The function importer already decided what symbols need to be pulled in. Also these magically added ones will not be in the export list for the source module, which can confuse the internalizer for instance. Reviewers: tejohnson, rafael Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19096 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266948	2016-04-21 01:59:39 +00:00
Nick Lewycky	762f8a8549	Add optimization for 'icmp slt (or A, B), A' and some related idioms based on knowledge of the sign bit for A and B. No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table: ``` A is +, -, +/- F F F + B is T F ? - ? F ? +/- ``` The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate. There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers. llvm-svn: 266939	2016-04-21 00:53:14 +00:00
Vedant Kumar	932866bfe7	[test/PGOProfile] Make tests independent of the raw profile version (NFC) Differential Revision: http://reviews.llvm.org/D19290 llvm-svn: 266928	2016-04-20 22:24:01 +00:00
Teresa Johnson	b35cc691ea	[ThinLTO] Prevent importing of "llvm.used" values Summary: This patch prevents importing from (and therefore exporting from) any module with a "llvm.used" local value. Local values need to be promoted and renamed when importing, and their presense on the llvm.used variable indicates that there are opaque uses that won't see the rename. One such example is a use in inline assembly. See also the discussion at: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html As part of this, move collectUsedGlobalVariables out of Transforms/Utils and into IR/Module so that it can be used more widely. There are several other places in LLVM that used copies of this code that can be cleaned up as a follow on NFC patch. Reviewers: joker.eph Subscribers: pcc, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18986 llvm-svn: 266877	2016-04-20 14:39:45 +00:00
Mandeep Singh Grang	029a0567fa	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC. Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834	2016-04-19 23:51:52 +00:00
Marcin Koscielnicki	3fdc257d6a	[AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic. Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818	2016-04-19 20:51:05 +00:00
Chad Rosier	b7dfbb40a3	[ValueTracking] Improve isImpliedCondition for conditions with matching operands. This patch improves SimplifyCFG to catch cases like: if (a < b) { if (a > b) <- known to be false unreachable; } Phabricator Revision: http://reviews.llvm.org/D18905 llvm-svn: 266767	2016-04-19 17:19:14 +00:00
Simon Pilgrim	998cffa6b9	[InstCombine][X86] Added extra tests introduced for D17490 llvm-svn: 266732	2016-04-19 12:59:52 +00:00
Simon Pilgrim	74b3bfdf71	[InstCombine][X86] Regenerate SSE combine tests as part of setup for D17490 Regenerated with utils/update_test_checks.py llvm-svn: 266731	2016-04-19 12:56:46 +00:00
Tim Northover	b629c77692	ARM: use a pseudo-instruction for cmpxchg at -O0. The fast register-allocator cannot cope with inter-block dependencies without spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions where any value produced within a block is dead by the end, but not for cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded after regalloc. Fortunately this is at -O0 so we don't have to care about performance. This simplifies the various axes of expansion considerably: we assume a strong seq_cst operation and ensure ordering via the always-present DMB instructions rather than v8 acquire/release instructions. Should fix the 32-bit part of PR25526. llvm-svn: 266679	2016-04-18 21:48:55 +00:00
Chad Rosier	e30fed70e6	[ValueTracking] Correct lit test comments. NFC. llvm-svn: 266657	2016-04-18 19:11:45 +00:00
Eric Liu	d09f15ea6f	Revert "Replace the use of MaxFunctionCount module flag" This reverts commit r266477. This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData", while "ProfileData" depends on "Object", which depends on "BitCode", which depends on "Analysis". llvm-svn: 266619	2016-04-18 15:31:11 +00:00
Renato Golin	4b18a510a2	[ARM] AArch32 v8 NEON is still not IEEE-754 compliant llvm-svn: 266603	2016-04-18 12:06:47 +00:00
Sanjoy Das	99042473d0	Fix a typo in rL265762 I accidentally replaced `mayBeOverridden` with `!isInterposable`. Remove the negation and add a test case that would've caught this. Many thanks to Håkan Hjort for spotting this! llvm-svn: 266551	2016-04-17 04:30:43 +00:00
Mehdi Amini	2d28f7aa07	ThinLTO: Make aliases explicit in the summary To be able to work accurately on the reference graph when taking decision about internalizing, promoting, renaming, etc. We need to have the alias information explicit. Differential Revision: http://reviews.llvm.org/D18836 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266517	2016-04-16 06:56:44 +00:00
Evgeniy Stepanov	40cd1514cf	[cfi] Support explicit sections for functions in cfi-icall. Allow explicit section for indirectly called functions in cfi-icall. Jumptables for functions in the same type class must be contiguous, so they always go to the default text section. Fixes PR25079. llvm-svn: 266486	2016-04-15 22:55:38 +00:00
Adrian Prantl	dc75a6b517	Convert this sample-based-profiling testcase to use a NoDebug CU. llvm-svn: 266481	2016-04-15 22:05:38 +00:00
Easwaran Raman	f53baca686	Replace the use of MaxFunctionCount module flag Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count. Differential Revision: http://reviews.llvm.org/D18622 llvm-svn: 266477	2016-04-15 21:39:58 +00:00
Tim Northover	903f81ba18	ARM: don't try to hoist constant RHS out of a division. Divisions by a constant can be converted into multiplies which are usually cheaper, but this isn't possible if the constant gets separated (particularly in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is cheap. I considered making the check generic, but neither AArch64 (strangely) nor x86 showed any benefit on the tests I had. llvm-svn: 266464	2016-04-15 18:17:18 +00:00
David Majnemer	2e02ba78d5	[InstCombine] Don't transform compares of calls to functions named fabs{f,l,} InstCombine wants to optimize compares of calls to fabs with zero. However, we didn't have the necessary legality checking to verify that the function call had the same behavior as fabs. llvm-svn: 266452	2016-04-15 17:21:03 +00:00
Adrian Prantl	75819aedf6	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
Sanjay Patel	f11ab05bdb	[SimplifyCFG] propagate branch metadata when creating select (PR27344) This is almost identical to: http://reviews.llvm.org/rL264527 This doesn't solve PR27344; it just allows the profile weights to survive. To solve the bug, we need to use the profile weights in the backend. llvm-svn: 266442	2016-04-15 15:32:12 +00:00
Sanjay Patel	81433e99b9	[SimplifyCFG] add metadata to show failure to propagate (PR27344) llvm-svn: 266435	2016-04-15 14:53:35 +00:00
Justin Lebar	f04e678e36	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target. llvm-svn: 266403	2016-04-15 01:20:52 +00:00
Justin Lebar	cad81cf6b3	[Speculation] Add a SpeculativeExecution mode where the pass does nothing unless TTI::hasBranchDivergence() is true. Summary: This lets us add this pass to the IR pass manager unconditionally; it will simply not do anything on targets without branch divergence. Reviewers: tra Subscribers: llvm-commits, jingyue, rnk, chandlerc Differential Revision: http://reviews.llvm.org/D18625 llvm-svn: 266398	2016-04-15 00:32:09 +00:00
Vedant Kumar	4960fbf391	[test] Require 'asserts' for a test which uses -debug-only Without this line, bots which run check-all on Release compilers will break. llvm-svn: 266386	2016-04-14 23:32:40 +00:00
Michael Kuperstein	16f13e252b	[AliasSetTracker] Correctly handle changing the size of an entry If the size of an AST entry changes, we also need to make sure we perform necessary alias set merges, as the new size may overlap pointers in other sets. We happen to run into this with memset, because memset allows an entry for a i8* pointer to have a decidedly non-i8 size. This fixes PR27262. Differential Revision: http://reviews.llvm.org/D18939 llvm-svn: 266381	2016-04-14 22:00:11 +00:00
Renato Golin	5cb666add7	[ARM] Adding IEEE-754 SIMD detection to loop vectorizer Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363	2016-04-14 20:42:18 +00:00
Sanjay Patel	e998b91d86	[InstCombine] remove constant by inverting compare + logic (PR27105) https://llvm.org/bugs/show_bug.cgi?id=27105 We can check if all bits outside of a constant mask are set with a single constant. As noted in the bug report, although this form should be considered the canonical IR, backends may want to transform this into an 'andn' / 'andc' comparison against zero because that could be a single machine instruction. Differential Revision: http://reviews.llvm.org/D18842 llvm-svn: 266362	2016-04-14 20:17:40 +00:00
Dehao Chen	46f8fbbb1b	Update discriminator assignment algorithm to handle nested call correctly. Summary: Add discriminator for nested call correctly. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19127 llvm-svn: 266354	2016-04-14 18:37:18 +00:00
Adam Nemet	7aab648831	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282	2016-04-14 08:47:17 +00:00
Tim Northover	5c02f9ad28	ARM: override cost function to re-enable ConstantHoisting (& fix it). At some point, ARM stopped getting any benefit from ConstantHoisting because the pass called a different variant of getIntImmCost. Reimplementing the correct variant revealed some problems, however: + ConstantHoisting was modifying switch statements. This is simply invalid, the cases must remain integer constants no matter the notional cost. + ConstantHoisting was mangling alloca instructions in the entry block. These should be handled by FrameLowering, so constants actually have a cost of 0. Worse, the resulting bitcasts meant they became dynamic allocas. rdar://25707382 llvm-svn: 266260	2016-04-13 23:08:27 +00:00
Easwaran Raman	cbd3989742	Test case for r265852. llvm-svn: 266237	2016-04-13 19:43:31 +00:00
Betul Buyukkurt	bf8554c279	[PGO] Remove redundant VP instrumentation LLVM optimization passes may reduce a profiled target expression to a constant. Removing runtime calls at such instrumentation points would help speedup the runtime of the instrumented program. llvm-svn: 266229	2016-04-13 18:52:19 +00:00
Mehdi Amini	b5b289339b	Revert "Make aliases explicit in the summary" Inadvertently commited... This reverts commit e618ec93786d99df2ddf280ad2d5e02f5516cecf. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266215	2016-04-13 17:20:07 +00:00
Mehdi Amini	ce744a95fd	Make aliases explicit in the summary Summary: To be able to work accurately on the reference graph when taking decision about internalizing, promoting, renaming, etc. We need to have the alias information explicit. Reviewers: tejohnson Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18836 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266214	2016-04-13 17:18:42 +00:00
David L Kreitzer	752c1448fe	Simplify strlen to a subtraction for certain cases. Patch by Li Huang (li1.huang@intel.com) Differential Revision: http://reviews.llvm.org/D18230 llvm-svn: 266200	2016-04-13 14:31:06 +00:00
Petar Jovanovic	644b8c1a5d	Calculate __builtin_object_size when pointer depends on a condition This patch fixes calculating of builtin_object_size if it depends on a condition. Before this patch compiler did not know how to calculate the object size when it finds a condition that cannot be eliminated. This patch enables calculating of builtin_object_size even in case when condition cannot be eliminated by choosing minimum or maximum value as a result from condition. Choosing minimum or maximum value from condition is based on the second argument of __builtin_object_size function. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D18438 llvm-svn: 266193	2016-04-13 12:25:25 +00:00
David Majnemer	3ee5f34469	[InstCombine] We folded an fcmp to an i1 instead of a vector of i1 Remove an ad-hoc transform in InstCombine and replace it with more general machinery (ValueTracking, InstructionSimplify and VectorUtils). This fixes PR27332. llvm-svn: 266175	2016-04-13 06:55:52 +00:00
Matt Arsenault	b34eea9cb5	AMDGPU: Remove leftover ShaderType attributes in tests llvm-svn: 266155	2016-04-13 00:39:48 +00:00
Sanjay Patel	5e5056d939	[x86, InstCombine] fix masked load pass-through operand to be a zero vector This bug was introduced with: http://reviews.llvm.org/rL262269 AVX masked loads are specified to set vector lanes to zero when the high bit of the mask element for that lane is zero: "If the mask is 0, the corresponding data element is set to zero in the load form of these instructions, and unmodified in the store form." --Intel manual Differential Revision: http://reviews.llvm.org/D19017 llvm-svn: 266148	2016-04-12 23:16:23 +00:00
Mehdi Amini	d5faa267c4	Add a pass to name anonymous/nameless function Summary: For correct handling of alias to nameless function, we need to be able to refer them through a GUID in the summary. Here we name them using a hash of the non-private global names in the module. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18883 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266132	2016-04-12 21:35:28 +00:00
Mehdi Amini	68da426eea	Move summary creation out of llvm-as into opt Summary: Let keep llvm-as "dumb": it converts textual IR to bitcode. This commit removes the dependency from llvm-as to libLLVMAnalysis. We'll add back summary in llvm-as if we get to a textual representation for it at some point. In the meantime, opt seems like a better place for that. Reviewers: tejohnson Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19032 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266131	2016-04-12 21:35:18 +00:00
James Y Knight	19f6cce4e3	Add __atomic_* lowering to AtomicExpandPass. (Recommit of r266002, with r266011, r266016, and not accidentally including an extra unused/uninitialized element in LibcallRoutineNames) AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266115	2016-04-12 20:18:48 +00:00
Artur Pilipenko	dbe0bc8df4	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086	2016-04-12 15:58:04 +00:00
Rafael Espindola	d41b54be11	This reverts commit r266002, r266011 and r266016. They broke the msan bot. Original message: Add __atomic_* lowering to AtomicExpandPass. AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266062	2016-04-12 12:30:25 +00:00
George Burgess IV	278199f615	Add the allocsize attribute to LLVM. `allocsize` is a function attribute that allows users to request that LLVM treat arbitrary functions as allocation functions. This patch makes LLVM accept the `allocsize` attribute, and makes `@llvm.objectsize` recognize said attribute. The review for this was split into two patches for ease of reviewing: D18974 and D14933. As promised on the revisions, I'm landing both patches as a single commit. Differential Revision: http://reviews.llvm.org/D14933 llvm-svn: 266032	2016-04-12 01:05:35 +00:00
JF Bastien	4f43cfd2c2	MergeFunctions: test alloca better r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately. llvm-svn: 266022	2016-04-12 00:03:26 +00:00
Mehdi Amini	ae280e54a9	ThinLTO renaming: use module hash instead of position in the summary This is more robust to changes in the link ordering. Differential Revision: http://reviews.llvm.org/D18946 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266018	2016-04-11 23:26:46 +00:00
Evgeniy Stepanov	f17120a85f	[safestack] Add canary to unsafe stack frames Add StackProtector to SafeStack. This adds limited protection against data corruption in the caller frame. Current implementation treats all stack protector levels as -fstack-protector-all. llvm-svn: 266004	2016-04-11 22:27:48 +00:00
James Y Knight	b91d38c5fe	Add __atomic_* lowering to AtomicExpandPass. AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and cmpxchg instructions to __atomic_* library calls, when the target doesn't support atomics of a given size. This is the first step towards moving all atomic lowering from clang into llvm. When all is done, the behavior of __sync_* builtins, __atomic_* builtins, and C11 atomics will be unified. Previously LLVM would pass everything through to the ISelLowering code. There, unsupported atomic instructions would turn into __sync_* library calls. Because of that behavior, Clang currently avoids emitting llvm IR atomic instructions when this would happen, and emits __atomic_* library functions itself, in the frontend. This change makes LLVM able to emit __atomic_* libcalls, and thus will eventually allow clang to depend on LLVM to do the right thing. It is advantageous to do the new lowering to atomic libcalls in AtomicExpandPass, before ISel time, because it's important that all atomic operations for a given size either lower to __atomic_* libcalls (which may use locks), or native instructions which won't. No mixing and matching. At the moment, this code is enabled only for SPARC, as a demonstration. The next commit will expand support to all of the other targets. Differential Revision: http://reviews.llvm.org/D18200 llvm-svn: 266002	2016-04-11 22:22:33 +00:00
Davide Italiano	0778bec6f9	[DebugInfo/Test] Add CU as required. llvm-svn: 265999	2016-04-11 21:16:48 +00:00
Matthew Simpson	53207a99f9	[LoopUtils, LV] Fix PR27246 (first-order recurrences) This patch ensures that when we detect first-order recurrences, we reject a phi node if its previous value is also a phi node. During vectorization the initial and previous values of the recurrence are shuffled together to create the value for the current iteration. However, phi nodes are not widened like other instructions. This fixes PR27246. Differential Revision: http://reviews.llvm.org/D18971 llvm-svn: 265983	2016-04-11 19:48:18 +00:00
Davide Italiano	7469644259	[DebugInfo] Fix even more tests to include DICompileunit. llvm-svn: 265980	2016-04-11 18:53:27 +00:00
Adrian Prantl	f95164227e	Fix missing DICompileUnits in testcases llvm-svn: 265974	2016-04-11 18:15:44 +00:00
Sanjay Patel	1dd212f900	[InstCombine] consolidate tests for related bugs llvm-svn: 265973	2016-04-11 17:58:37 +00:00
Adrian Prantl	d1f52b672a	Update discriminator testcases to use proper NoDebug CUs instead of omitting !llvm.dbg.cu. llvm-svn: 265961	2016-04-11 16:58:35 +00:00
Sanjay Patel	816ec8882a	[InstCombine] don't try to shift an illegal amount (PR26760) This is the straightforward fix for PR26760: https://llvm.org/bugs/show_bug.cgi?id=26760 But we still need to make some changes to generalize this helper function and then send the lshr case into here. llvm-svn: 265960	2016-04-11 16:50:32 +00:00
Adrian Prantl	b01a4d48ac	More upgrading of old- and very-old-style debug info in testcases. llvm-svn: 265953	2016-04-11 15:53:44 +00:00
Sanjoy Das	f9d88e650b	This reverts commit r265913 and r265912 See PR27315 r265913: "[IndVars] Eliminate op.with.overflow when possible" r265912: "[SCEV] See through op.with.overflow intrinsics" llvm-svn: 265950	2016-04-11 15:26:18 +00:00
Sanjay Patel	2d2d67994c	[InstCombine] replace test that no longer works as intended This is step 1 of refactoring to solve PR26760: https://llvm.org/bugs/show_bug.cgi?id=26760 llvm-svn: 265946	2016-04-11 15:19:44 +00:00
Teresa Johnson	2d5487cf44	[ThinLTO] Move summary computation from BitcodeWriter to new pass Summary: This is the first step in also serializing the index out to LLVM assembly. The per-module summary written to bitcode is moved out of the bitcode writer and to a new analysis pass (ModuleSummaryIndexWrapperPass). The pass itself uses a new builder class to compute index, and the builder class is used directly in places where we don't have a pass manager (e.g. llvm-as). Because we are computing summaries outside of the bitcode writer, we no longer can use value ids created by the bitcode writer's ValueEnumerator. This required changing the reference graph edge type to use a new ValueInfo class holding a union between a GUID (combined index) and Value* (permodule index). The Value* are converted to the appropriate value ID during bitcode writing. Also, this enables removal of the BitWriter library's dependence on the Analysis library that was previously required for the summary computation. Reviewers: joker.eph Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18763 llvm-svn: 265941	2016-04-11 13:58:45 +00:00
Sanjoy Das	a07ad647ee	[IndVars] Eliminate op.with.overflow when possible Summary: If we can prove that an op.with.overflow intrinsic does not overflow, we can get rid of the intrinsic, and replace it with non-wrapping arithmetic. Reviewers: atrick, regehr Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18685 llvm-svn: 265913	2016-04-10 22:50:31 +00:00
Elena Demikhovsky	751ed0a06a	Loop vectorization with uniform load Vectorization cost of uniform load wasn't correctly calculated. As a result, a simple loop that loads a uniform value wasn't vectorized. Differential Revision: http://reviews.llvm.org/D18940 llvm-svn: 265901	2016-04-10 16:53:19 +00:00
Sanjoy Das	dd77e1e6a5	Maintain calling convention when inling calls to llvm.deoptimize The behavior here was buggy -- we'd forget the calling convention after inlining a callsite calling llvm.deoptimize. llvm-svn: 265867	2016-04-09 00:22:59 +00:00
Sanjoy Das	87b9e1b727	Propagate Undef in llvm.cos Intrinsic Summary: The llvm cos intrinsic currently does not propagate undef's. This change transforms cos(undef) to null value or 0. There are 2 test cases added as well. Patch by Anna Thomas! Reviewers: sanjoy Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D18863 llvm-svn: 265825	2016-04-08 18:21:11 +00:00
David Majnemer	56737722e4	[InstCombine] Fix miscompile in FoldSPFofSPF We had a select of a cast of a select but attempted to replace the outer select with the inner select dispite their incompatible types. Patch by Anton Korobeynikov! This fixes PR27236. llvm-svn: 265805	2016-04-08 16:51:49 +00:00

... 26 27 28 29 30 ...

9171 Commits