llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	9c27d79520	Update tests for the patch. llvm-svn: 300351	2017-04-14 17:47:07 +00:00
Daniel Berlin	2f72b19b05	NewGVN: Don't propagate over phi backedges where undef causes us to have >1 value, unless we can prove the phi node is cycle free. Fixes PR 32607. llvm-svn: 300299	2017-04-14 02:53:37 +00:00
Richard Smith	6c2615177b	Revert accidentally-committed files in r300252. llvm-svn: 300253	2017-04-13 20:31:21 +00:00
Richard Smith	55bd375b69	Remove all allocation and divisions from GreatestCommonDivisor Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This avoids the (expensive) APInt division operation in favour of bit operations. Remove all memory allocation from within the GCD loop by tweaking our `lshr` implementation so it can operate in-place. Differential Revision: https://reviews.llvm.org/D31968 llvm-svn: 300252	2017-04-13 20:29:59 +00:00
Reid Kleckner	257cb4e099	[InstCombine] Fix !prof metadata preservation for invokes Summary: Bug noticed by inspection. Extend the test to handle invokes as well as calls, and rewrite it to not depend on the inliner and other passes. Also simplify the call site replacement code with CallSite, similar to what I did to dead arg elimination and arg promotion (rL300235 and rL300229). Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32041 llvm-svn: 300251	2017-04-13 20:26:38 +00:00
Dehao Chen	2c7ca9b5df	SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly. Reviewers: davidxl, dnovillo Reviewed By: davidxl Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D31950 llvm-svn: 300240	2017-04-13 19:52:10 +00:00
Anna Thomas	dcdb325fee	[LV] Fix the vector code generation for first order recurrence Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the second last element from the vectorized phi update. This is because we require the phi itself (which is the value at the second last iteration of the vector loop) and not the phi's update within the loop. Also fix the code gen when we just unroll, but don't vectorize. Fixes PR32396. Reviewers: mssimpso, mkuper, anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D31979 llvm-svn: 300238	2017-04-13 18:59:25 +00:00
Sanjay Patel	445d03bf00	[InstCombine] fold X == 0 \|\| X == -1 to one compare (PR32524) This is effectively a retry of: https://reviews.llvm.org/rL299851 but now we have tests and an assert to make sure the bug that was exposed with that attempt will not happen again. I'll fix the code duplication and missing sibling fold next, but I want to make this change as small as possible to reduce risk since I messed it up last time. This should fix: https://bugs.llvm.org/show_bug.cgi?id=32524 llvm-svn: 300236	2017-04-13 18:47:06 +00:00
Reid Kleckner	3a1150352d	[ArgPromotion] Don't drop !prof metadata on promoted calls Noticed by inspection while doing attribute work. DAE, InstCombineCalls, and ArgPromotion have a fair amount of duplicated code for hacking on call sites, and you can find bugs by comparing them. Add a test case for this. llvm-svn: 300229	2017-04-13 18:10:30 +00:00
Brian Gesiak	0a7894d99c	[Analysis] Support bitreverse in -demanded-bits pass Summary: * Add a bitreverse case in the demanded bits analysis pass. * Add tests for the bitreverse (and bswap) intrinsic in the demanded bits pass. * Add a test case to the BDCE tests: that manipulations to high-order bits are eliminated once the bits are reversed and then right-shifted. Reviewers: mkuper, jmolloy, hfinkel, trentxintong Reviewed By: jmolloy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31857 llvm-svn: 300215	2017-04-13 16:44:25 +00:00
Sanjay Patel	104e36a0e9	[InstCombine] add/move tests for or-of-icmps; NFC If we had these tests, the bug caused by https://reviews.llvm.org/rL299851 would have been caught sooner. There's also an assert in the code that should have caught that bug, but the assert line itself has a bug. llvm-svn: 300201	2017-04-13 15:46:39 +00:00
Geoff Berry	85a530fb59	Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline." This reverts commit r296872 now that PR32153 has been fixed. llvm-svn: 300200	2017-04-13 15:36:25 +00:00
Craig Topper	e70dffeb54	[InstCombine] Add vector version of a test to show missing optimization. llvm-svn: 300161	2017-04-13 01:31:40 +00:00
Sanjay Patel	6e41018942	[InstCombine] fix wrong undef handling when converting select to shuffle As discussed in: https://bugs.llvm.org/show_bug.cgi?id=32486 ...the canonicalization of vector select to shufflevector does not hold up when undef elements are present in the condition vector. Try to make the undef handling clear in the code and the LangRef. Differential Revision: https://reviews.llvm.org/D31980 llvm-svn: 300092	2017-04-12 18:39:53 +00:00
Craig Topper	9a51c7f343	[InstCombine] Teach SimplifyDemandedInstructionBits that even if we reach an instruction that has multiple uses, if we know all the bits for the demanded bits for this context we can go ahead and create a constant. Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user. This might then reduce the instruction to having a single use and allow additional optimizations on the other path. This picks up an additional case that r300075 didn't catch. Differential Revision: https://reviews.llvm.org/D31552 llvm-svn: 300084	2017-04-12 18:17:46 +00:00
Renato Golin	af3bc2089e	[SystemZ] Fix more target specific tests llvm-svn: 300081	2017-04-12 18:03:09 +00:00
Craig Topper	845033a6c9	Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation. My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran. With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything. Differential Revision: https://reviews.llvm.org/D31120 llvm-svn: 300075	2017-04-12 16:49:59 +00:00
Sanjay Patel	33439f982b	[InstCombine] morph an existing instruction instead of creating a new one One potential way to make InstCombine (very slightly?) faster is to recycle instructions when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't consider this an "InstSimplify". We could, however, make a new layer to house transforms like this if that makes InstCombine more manageable (just throwing out an idea; not sure how much opportunity is actually here). Differential Revision: https://reviews.llvm.org/D31863 llvm-svn: 300067	2017-04-12 15:11:33 +00:00
Jonas Paulsson	4707015d46	Fix a RUN line in new test. Use '2>&1 \|' and not '\|&' to pipe debug output to FileCheck Hopefully handles a "shell parser error" on llvm-clang-x86_64-expensive-checks-win test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll llvm-svn: 300064	2017-04-12 14:25:08 +00:00
Jonas Paulsson	22776892c9	[SLPVectorizer] Pass the right type argument to getCmpSelInstrCost() In getEntryCost(), make the scalar type for a compare instruction that of the operands, not i1. This is needed in order to call getCmpSelInstrCost() for a compare in a sensible way, the same way as the LoopVectorizer does. New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll Review: Matthew Simpson https://reviews.llvm.org/D31601 llvm-svn: 300061	2017-04-12 13:29:25 +00:00
Jonas Paulsson	592dbea779	[LoopVectorizer] Improve handling of branches during cost estimation. The cost for a branch after vectorization is very different depending on if the vectorizer will if-convert the block (branch is eliminated), or if scalarized and predicated blocks will be produced (branch duplicated before each block). There is also the case of remaining scalar branches, such as the back-edge branch. This patch handles these cases differently with TTI based cost estimates. Review: Matthew Simpson https://reviews.llvm.org/D31175 llvm-svn: 300058	2017-04-12 13:13:15 +00:00
Jonas Paulsson	da74ed42da	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore() Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056	2017-04-12 12:41:37 +00:00
Jonas Paulsson	fccc7d66c3	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Bjorn Pettersson	4af0593ecc	[LoadCombine] Avoid analysing dead basic blocks Summary: Dead basic blocks may be forming a loop, for which SSA form is fulfilled, but with a circular def-use chain. LoadCombine could enter an infinite loop when analysing such dead code. This patch solves the problem by simply avoiding to analyse all basic blocks that aren't forward reachable, from function entry, in LoadCombine. Fixes https://bugs.llvm.org/show_bug.cgi?id=27065 Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide Reviewed By: davide Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D31032 llvm-svn: 300034	2017-04-12 08:07:55 +00:00
Bob Haarman	4075ccc717	ThinLTOBitcodeWriter: keep comdats together, rename if leader is renamed Summary: COFF requires that every comdat contain a symbol with the same name as the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this requirement to be violated. This change avoids such violations by renaming comdats if their leaders are renamed. It also keeps comdats together when splitting modules. Reviewers: pcc, mehdi_amini, tejohnson Reviewed By: pcc Subscribers: rnk, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31963 llvm-svn: 300019	2017-04-12 01:43:07 +00:00
Zvi Rackover	30efd24d78	InstSimplify: A shuffle of a splat is always the splat itself Summary: Fold: shuffle (splat-shuffle), undef, M --> splat-shuffle Reviewers: spatel, RKSimon, craig.topper Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31527 llvm-svn: 299990	2017-04-11 21:37:02 +00:00
Anna Thomas	00dc1b74b7	[LV] Avoid vectorizing first order recurrence when phi uses are outside loop In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one extracted to pass into the scalar remainder loop. However, this is not true when there is a phi (other than the primary induction variable) is used outside the loop. In such a case, we need the value from the second last iteration (i.e. the phi value), not the last iteration (which would be the phi update). I've added a test case for this. Also see PR32396. A follow up patch would generate the correct code gen for such cases, and turn this vectorization on. Differential Revision: https://reviews.llvm.org/D31910 Reviewers: mssimpso llvm-svn: 299985	2017-04-11 21:02:00 +00:00
Sanjay Patel	f0cb5a80ad	[InstSimplify] add tests for chains of shuffles; NFC llvm-svn: 299984	2017-04-11 20:54:57 +00:00
Daniel Berlin	554dcd8c89	MemorySSA: Move to Analysis, from Transforms/Utils. It's used as Analysis, it has Analysis passes, and once NewGVN is made an Analysis, this removes the cross dependency from Analysis to Transform/Utils. NFC. llvm-svn: 299980	2017-04-11 20:06:36 +00:00
Andrea Di Biagio	8e26936bfd	[AddDiscriminators] Assign discriminators to MemIntrinsic calls. Before this patch, pass AddDiscriminators always avoided to assign discriminators to intrinsic calls. This was done mainly for two reasons: 1) We wanted to minimize the number of based discriminators used. 2) We wanted to avoid non-deterministic discriminator assignment for different debug levels. Unfortunately, that approach was problematic for MemIntrinsic calls. MemIntrinsic calls can be split by SROA into loads and stores, and each new load/store instruction would obtain the debug location from the original intrinsic call. If we don't assign a discriminator to MemIntrinsic calls, then we cannot correctly set the discriminator for the newly created loads and stores. This may have a negative impact on the basic block weight computation performed by the SampleLoader. This patch fixes the issue by letting MemIntrinsic calls have a discriminator. Differential Revision: https://reviews.llvm.org/D31900 llvm-svn: 299972	2017-04-11 19:07:30 +00:00
Craig Topper	9eac2717c6	[InstCombine] Add testcases for (B&A)^A -> ~B & A and (B\|A)^A -> B & ~A llvm-svn: 299971	2017-04-11 18:50:48 +00:00
Anna Thomas	98cbb067ce	[LV] Move first order recurrence test to common folder. NFC llvm-svn: 299969	2017-04-11 18:31:42 +00:00
Sanjay Patel	28611acef9	revert r299851 - [InstCombine] fix matching of or-of-icmps constants (PR32524) This is a candidate culprit for multiple bot fails, so reverting pending investigation. llvm-svn: 299955	2017-04-11 15:57:32 +00:00
Keno Fischer	30779772cf	[StripDeadDebug/DIFinder] Track inlined SPs Summary: In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not referenced from the current module. However, in doing so I neglected to realize that some SPs could be referenced entirely from inlined functions. It appears I was not the only one to make this mistake, because DebugInfoFinder, doesn't find those SPs either. Fix this in DebugInfoFinder and then use that to make sure not to drop those CUs in strip-dead-debug-info. Reviewers: aprantl Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31904 llvm-svn: 299936	2017-04-11 13:32:11 +00:00
Craig Topper	18f9e424e7	[InstCombine] Support weird size element types in dyn_castNegVal. llvm-svn: 299915	2017-04-11 05:42:47 +00:00
Sanjoy Das	92ce1e76c5	[LoopUnswitch] Fix a test case (h/t to Chandler for pointing this out) The test in question was not at all testing what it was supposed to test. We do not //care// about placing `!make.implicit` in inner constant branch (since it will be folded away anyway). We care about placing `!make.implicit` in the outer branch that switches between either version of the loop. Having said that, it is _correct_ to leave behind the `!make.implicit` in the inner branch, but there is no need to do so. llvm-svn: 299912	2017-04-11 04:11:47 +00:00
Hal Finkel	b63ed91549	[LICM] Hoist fp division from the loops and replace by a reciprocal When allowed, we can hoist a division out of a loop in favor of a multiplication by the reciprocal. Fixes PR32157. Patch by vit9696! Differential Revision: https://reviews.llvm.org/D30819 llvm-svn: 299911	2017-04-11 02:22:54 +00:00
Matt Arsenault	3c1fc768ed	Allow DataLayout to specify addrspace for allocas. LLVM makes several assumptions about address space 0. However, alloca is presently constrained to always return this address space. There's no real way to avoid using alloca, so without this there is no way to opt out of these assumptions. The problematic assumptions include: - That the pointer size used for the stack is the same size as the code size pointer, which is also the maximum sized pointer. - That 0 is an invalid, non-dereferencable pointer value. These are problems for AMDGPU because alloca is used to implement the private address space, which uses a 32-bit index as the pointer value. Other pointers are 64-bit and behave more like LLVM's notion of generic address space. By changing the address space used for allocas, we can change our generic pointer type to be LLVM's generic pointer type which does have similar properties. llvm-svn: 299888	2017-04-10 22:27:50 +00:00
Dehao Chen	d4a3397861	Emit less compiler optimization remarks in samplepgo to reduce a call to findCalleeFunctionSamples which is going to be refactored. Summary: Now the SamplePGO support is more stable, we do not need so many verbose optimization remarks emitted. Reviewers: dnovillo, davidxl Reviewed By: davidxl Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D31826 llvm-svn: 299883	2017-04-10 20:49:16 +00:00
Geoff Berry	635e505675	[GVNHoist] Call isGuaranteedToTransferExecutionToSuccessor on each instruction w.r.t. https://bugs.llvm.org/show_bug.cgi?id=32153 The consensus seems to be isGuaranteedToTransferExecutionToSuccessor should be called for each function. Patch by Aditya Kumar Differential Revision: https://reviews.llvm.org/D31035 llvm-svn: 299882	2017-04-10 20:45:17 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Matt Arsenault	daa08875b3	[MemCpyOpt] Only replace memcpy with bitcast if address spaces match Patch by James Price llvm-svn: 299866	2017-04-10 19:00:25 +00:00
Daniel Berlin	74603a68ef	MemorySSA: Make lifetime starts defs for mustaliased pointers Summary: While we don't want them aliasing with other pointers, there seems to be no point in not having them clobber must-aliased'd pointers. If some day, we split the aliasing and ordering chains, we'd make this not aliasing but an ordering barrier (IE it doesn't affect it's memory, but we can't hoist it above it). Reviewers: hfinkel, george.burgess.iv Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31865 llvm-svn: 299865	2017-04-10 18:46:00 +00:00
Matthew Simpson	1468d3e04e	[ARM/AArch64] Ensure valid vector element types for interleaved accesses This patch refactors and strengthens the type checks performed for interleaved accesses. The primary functional change is to ensure that the interleaved accesses have valid element types. The added test cases previously failed because the element type is f128. Differential Revision: https://reviews.llvm.org/D31817 llvm-svn: 299864	2017-04-10 18:34:37 +00:00
Craig Topper	0d830ff7bf	[InstCombine] Use commutable matchers and m_OneUse in visitSub to shorten code. Add missing test cases. In one case I removed commute handling for a multiply with a constant since we'll eventually get the constant on the right hand side. llvm-svn: 299863	2017-04-10 18:09:25 +00:00
Craig Topper	98851adc2a	[InstCombine] Use m_c_Add to shorten some code. Add testcases for this fold since they were missing. NFC llvm-svn: 299853	2017-04-10 16:59:40 +00:00
Sanjay Patel	570e35c157	[InstCombine] fix matching of or-of-icmps constants (PR32524) Also, make the same change in and-of-icmps and remove a hack for detecting that case. Finally, add some FIXME comments because the code duplication here is awful. This should fix the remaining IR problem noted in: https://bugs.llvm.org/show_bug.cgi?id=32524 llvm-svn: 299851	2017-04-10 16:55:57 +00:00
Craig Topper	3eec73e20b	[InstCombine] Support folding of add instructions with vector constants into select operations We currently only fold scalar add of constants into selects. This improves this to support vectors too. Differential Revision: https://reviews.llvm.org/D31683 llvm-svn: 299847	2017-04-10 16:40:00 +00:00
Sanjay Patel	8c1cc5abbb	[InstCombine] add test for PR32524; NFC llvm-svn: 299846	2017-04-10 16:28:08 +00:00
Craig Topper	838d13e7ee	[InstCombine] Make sure we preserve fast math flags when folding fp instructions into phi nodes Summary: I noticed in the select folding code that we copied fast math flags, but did not do the same for the similar handling in phi nodes. This patch fixes that to do the same thing as select Reviewers: spatel, davide, majnemer, hfinkel Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31690 llvm-svn: 299838	2017-04-10 07:00:10 +00:00
Craig Topper	d8840d7b10	[InstCombine] use m_c_And and m_c_Xor to handle commuted versions of a transform. llvm-svn: 299837	2017-04-10 06:53:28 +00:00
Craig Topper	7260e2f159	[InstCombine] Add test cases demonstrating missing handling for the commuted version of a transform. NFC. llvm-svn: 299836	2017-04-10 06:53:25 +00:00
Xin Tong	34888c08bc	[SCCP] Resolve indirect branch target when possible. Summary: Resolve indirect branch target when possible. This potentially eliminates more basicblocks and result in better evaluation for phi and other things. Reviewers: davide, efriedma, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30322 llvm-svn: 299830	2017-04-10 00:33:25 +00:00
Sanjay Patel	72fbb7868a	[InstCombine] remove duplicate test; NFC I moved this test to 'not.ll' in r299824 but accidentally added a copy here. llvm-svn: 299828	2017-04-09 21:45:52 +00:00
Sanjay Patel	2824927e8a	[SimplifyCFG] auto-generate better checks; NFC llvm-svn: 299825	2017-04-09 16:16:32 +00:00
Sanjay Patel	c5f963c2e5	[InstCombine] auto-generate better checks; NFC Also, move a test next to its sibling to eliminate a file with just one test. llvm-svn: 299824	2017-04-09 15:44:59 +00:00
Hal Finkel	a9d67cf601	[MemorySSA] Fix use of pointsToConstantMemory in isUseTriviallyOptimizableToLiveOnEntry In isUseTriviallyOptimizableToLiveOnEntry, pointsToConstantMemory needs to be called on the load's pointer operand, not on the result of the load (which might not even be a pointer). llvm-svn: 299823	2017-04-09 12:57:50 +00:00
Craig Topper	afa07c5ef6	[InstCombine] Extend some OR combines to support vectors. This adds support for these combines for vectors (X^C)\|Y -> (X\|Y)^C iff Y&C == 0 Y\|(X^C) -> (X\|Y)^C iff Y&C == 0 llvm-svn: 299822	2017-04-09 06:12:41 +00:00
Craig Topper	e63c21b1ba	[InstCombine] Extend a canonicalization check to apply to vector constants too. llvm-svn: 299821	2017-04-09 06:12:39 +00:00
Craig Topper	1c5af0d400	[InstCombine] Add test cases to show missing support for vectors in an OR combine. Also add the commuted versions. NFC llvm-svn: 299820	2017-04-09 06:12:36 +00:00
Gor Nishanov	bfb2a9db31	[coroutines] Make CoroSplit pass deterministic coro-split-after-phi.ll test was flaky due to non-determinism in the coroutine frame construction that was sorting the spill vector using a pointer to a def as a part of the key. The sorting was intended to make sure that spills for the same def are kept together, however, we populate the vector by processing defs in order, so the spill entires will end up together anyways. This change removes spill sorting and restores the determinism in the test. llvm-svn: 299809	2017-04-08 00:49:46 +00:00
Reid Kleckner	56a66a9794	De-flake a test that is failing due to coroutine spill insertion non-determinism llvm-svn: 299791	2017-04-07 18:02:53 +00:00
Gor Nishanov	138ad6c9c0	[coroutines] Insert spills of PHI instructions correctly Summary: Fix a bug where we were inserting a spill in between the PHIs in the beginning of the block. Consider this fragment: ``` begin: %phi1 = phi i32 [ 0, %entry ], [ 2, %alt ] %phi2 = phi i32 [ 1, %entry ], [ 3, %alt ] %sp1 = call i8 @llvm.coro.suspend(token none, i1 false) switch i8 %sp1, label %suspend [i8 0, label %resume i8 1, label %cleanup] resume: call i32 @print(i32 %phi1) ``` Unless we are spilling the argument or result of the invoke, we were always inserting the spill immediately following the instruction. The fix adds a check that if the spilled instruction is a PHI Node, select an appropriate insert point with `getFirstInsertionPt()` that skips all the PHI Nodes and EH pads. Reviewers: majnemer, rnk Reviewed By: rnk Subscribers: qcolombet, EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D31799 llvm-svn: 299771	2017-04-07 14:16:49 +00:00
Matthew Simpson	11fe2e9f2b	Reapply r298620: [LV] Vectorize GEPs This patch reapplies r298620. The original patch was reverted because of two issues. First, the patch exposed a bug in InstCombine that caused the Chromium builds to fail (PR32414). This issue was fixed in r299017. Second, the patch introduced a bug in the vectorizer's scalars analysis that caused test suite builds to fail on SystemZ. The scalars analysis was too aggressive and marked a memory instruction scalar, even though it was going to be vectorized. This issue has been fixed in the current patch and several new test cases for the scalars analysis have been added. llvm-svn: 299770	2017-04-07 14:15:34 +00:00
Craig Topper	33e0dbcc58	[InstCombine] Handle more commuted cases of ((A & B) \| ~A) -> (~A \| B) llvm-svn: 299747	2017-04-07 07:32:00 +00:00
Craig Topper	ccf85f24c8	[InstCombine] Add additional tests with varied commuting to show missing combines. NFC llvm-svn: 299746	2017-04-07 07:31:55 +00:00
Daniel Berlin	d952ceae2f	AliasAnalysis: Be less conservative about volatile than atomic. Summary: getModRefInfo is meant to answer the question "what impact does this instruction have on a given memory location" (not even another instruction). Long debate on this on IRC comes to the conclusion the answer should be "nothing special". That is, a noalias volatile store does not affect a memory location just by being volatile. Note: DSE and GVN and memdep currently believe this, because memdep just goes behind AA's back after it says "modref" right now. see line 635 of memdep. Prior to this patch we would get modref there, then check aliasing, and if it said noalias, we would continue. getModRefInfo already has this same AA check, it just wasn't being used because volatile was lumped in with ordering. (I am separately testing whether this code in memdep is now dead except for the invariant load case) Reviewers: jyknight, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31726 llvm-svn: 299741	2017-04-07 01:28:36 +00:00
Craig Topper	72a622cac7	[InstCombine] Add more commuted patterns to support folding ((~A & B) \| A) -> (A \| B). llvm-svn: 299737	2017-04-07 00:29:47 +00:00
Craig Topper	740fe1a6eb	[InstCombine] Add a few cases for OR we fail to optimize due to missing commuted patterns checks. llvm-svn: 299725	2017-04-06 23:00:22 +00:00
Eli Friedman	5fba1e53f2	Turn on -addr-sink-using-gep by default. The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723	2017-04-06 22:42:18 +00:00
Keno Fischer	bacc64b5fa	[StripDeadDebugInfo] Drop dead CUs entirely Summary: Prior to this while it would delete the dead DIGlobalVariables, it would leave dead DICompileUnits and everything referenced therefrom. For a bit bitcode file with thousands of compile units those dead nodes easily outnumbered the real ones. Clean that up. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D31720 llvm-svn: 299692	2017-04-06 19:26:22 +00:00
Daniel Berlin	1316a94ebc	NewGVN: This patch makes memory congruence work for all types of memorydefs, not just stores. Along the way, we audit and fixup issues about how we were tracking memory leaders, and improve the verifier to notice more memory congruency issues. llvm-svn: 299682	2017-04-06 18:52:50 +00:00
Craig Topper	2f1e1c351b	[InstSimplify] Teach SimplifyMulInst to recognize vectors of i1 as And. Not just scalar i1. llvm-svn: 299665	2017-04-06 17:33:37 +00:00
Craig Topper	f7298b0ef0	[InstSimplify] Add test cases for mixing add/sub i1 with xor of i1. Seems we can simplify in one direction but not the other. llvm-svn: 299627	2017-04-06 05:48:06 +00:00
Craig Topper	aa5f524095	[InstSimplify] Teach SimplifyAddInst and SimplifySubInst that vectors of i1 can be treated as Xor too. llvm-svn: 299626	2017-04-06 05:28:41 +00:00
Keno Fischer	1ec5dd85a2	[X86 TTI] Implement LSV hook Summary: LSV wants to know the maximum size that can be loaded to a vector register. On X86, this always matches the maximum register width. Implement this accordingly and add a test to make sure that LSV can vectorize up to the maximum permissible width on X86. Reviewers: delena, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31504 llvm-svn: 299589	2017-04-05 20:51:38 +00:00
Sanjay Patel	50c82c4395	[InstCombine] add fold for icmp with or mask of low bits (PR32542) We already have these 'and' folds: // X & -C == -C -> X > u ~C // X & -C != -C -> X <= u ~C // iff C is a power of 2 ...but we were missing the 'or' siblings. http://rise4fun.com/Alive/n6 This should improve: https://bugs.llvm.org/show_bug.cgi?id=32524 ...but there are 2 or more other pieces to fix still. Differential Revision: https://reviews.llvm.org/D31712 llvm-svn: 299570	2017-04-05 17:57:05 +00:00
Sanjay Patel	e7e4cc5f98	[InstCombine] add tests for missing icmp fold (PR32524) llvm-svn: 299557	2017-04-05 16:21:38 +00:00
Matthew Simpson	1a4d5c9860	[LV] Make test case more robust This test case depends on the loop being vectorized without forcing the vectorization factor. If the profitability ever changes in the future (due to cost model improvements), the test may no longer work as intended. Instead of checking the resulting IR, we should just check the instruction costs. The costs will be computed regardless if vectorization is profitable. llvm-svn: 299545	2017-04-05 14:34:13 +00:00
Sanjay Patel	8090e6f004	[InstCombine] add tests for missing add canonicalization; NFC llvm-svn: 299539	2017-04-05 13:33:10 +00:00
James Molloy	37dd4d7aaa	[LAA] Correctly return a half-open range in expandBounds This is a latent bug that's been hanging around for a while. For a loop-invariant pointer, expandBounds would return the range {Ptr, Ptr}, but this was interpreted as a half-open range, not a closed range. So we ended up planting incorrect bounds checks. Even worse, they were tautological, so we ended up incorrectly executing the optimized loop. llvm-svn: 299526	2017-04-05 09:24:26 +00:00
Akira Hatanaka	75be84f3c2	[ObjCArc] Do not dereference an invalidated iterator. Fix a bug in ARC contract pass where an iterator that pointed to a deleted instruction was dereferenced. It appears that tryToContractReleaseIntoStoreStrong was incorrectly assuming that a call to objc_retain would not immediately follow a call to objc_release. rdar://problem/25276306 llvm-svn: 299507	2017-04-05 03:44:09 +00:00
Bob Haarman	6de8134784	ThinLTOBitcodeWriter: handle aliases first in filterModule Summary: This change fixes a "local linkage requires default visibility" assert when attempting to build LLVM with ThinLTO on Windows. Reviewers: pcc, tejohnson, mehdi_amini Reviewed By: pcc Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D31632 llvm-svn: 299491	2017-04-05 00:42:07 +00:00
Craig Topper	1534495ffd	[InstCombine] Add test cases for various add/subtracts of constants(scalar, splat, and vector) with phis and selects. Improvements coming in a future commit. llvm-svn: 299476	2017-04-04 22:22:30 +00:00
Craig Topper	c745b6a1f6	[InstCombine] Turn subtract of vectors of i1 into xor like we do for scalar i1. Matches what we already do for add. llvm-svn: 299472	2017-04-04 21:44:56 +00:00
Craig Topper	86173600ec	[InstCombine] Support folding and/or/xor with a constant vector RHS into selects and phis Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS. Differential Revision: https://reviews.llvm.org/D31610 llvm-svn: 299466	2017-04-04 20:26:25 +00:00
Craig Topper	11791a723b	[InstCombine] Add test cases for missing combines of phis with and/or/xor with constant argument. NFC llvm-svn: 299460	2017-04-04 19:31:21 +00:00
Craig Topper	78cfbc1635	[InstCombine] Add more test cases for missing combines of selects with and/or/xor with constant argument. NFC llvm-svn: 299450	2017-04-04 17:48:08 +00:00
Rong Xu	48596b6f7a	[PGO] Memory intrinsic calls optimization based on profiled size This patch optimizes two memory intrinsic operations: memset and memcpy based on the profiled size of the operation. The high level transformation is like: mem_op(..., size) ==> switch (size) { case s1: mem_op(..., s1); goto merge_bb; case s2: mem_op(..., s2); goto merge_bb; ... default: mem_op(..., size); goto merge_bb; } merge_bb: Differential Revision: http://reviews.llvm.org/D28966 llvm-svn: 299446	2017-04-04 16:42:20 +00:00
Jonas Hahnfeld	1f9b00117c	Align all scalar numbers to LLVM_YAML_IS_FLOW_SEQUENCE_VECTOR Otherwise, yamlize in YAMLTraits.h might be wrongly defined. This makes some AMDGPU tests fail when LLVM_LINK_LLVM_DYLIB is set. Differential Revision: https://reviews.llvm.org/D30508 llvm-svn: 299415	2017-04-04 06:02:32 +00:00
Zvi Rackover	8f460655a2	InstSimplify: Add a hook for shufflevector Summary: Add a hook for simplification of shufflevector's with the following rules: - Constant folding - NFC, as it was already being done by the default handler. - If only one of the operands is constant, constant fold the shuffle if the mask does not select elements from the variable operand - to show the hook is firing and affecting the test-cases. Reviewers: RKSimon, craig.topper, spatel, sanjoy, nlopes, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31525 llvm-svn: 299393	2017-04-03 22:05:30 +00:00
Matt Arsenault	b600e138cc	AMDGPU: Remove llvm.SI.vs.load.input llvm-svn: 299391	2017-04-03 21:45:13 +00:00
Craig Topper	8698fc09de	[InstCombine] Add test cases showing how we fail to fold vector constants into selects the way we do with scalars. llvm-svn: 299369	2017-04-03 17:49:15 +00:00
Daniel Berlin	07daac8a36	NewGVN: Handle coercion of constant stores, loads, memory insts. Summary: Depends on D30928. This adds support for coercion of stores and memory instructions that do not require insertion to process. Another few tests down. I added the relevant tests from rle.ll Reviewers: davide Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30929 llvm-svn: 299330	2017-04-02 13:23:44 +00:00
Nikolai Bozhenov	fca527af5c	[BypassSlowDivision] Do not bypass division of hash-like values Disable bypassing if one of the operands looks like a hash value. Slow division often occurs in hashtable implementations and fast division is never taken there because a hash value is extremely unlikely to have enough upper bits set to zero. A value is considered to be hash-like if it is produced by 1) XOR operation 2) Multiplication by a constant wider than the shorter type 3) PHI node with all incoming values being hash-like Differential Revision: https://reviews.llvm.org/D28200 llvm-svn: 299329	2017-04-02 13:14:30 +00:00
Zvi Rackover	e479980686	Add another interesting shufflevector test case for InstSimplify. NFC. Test case shows opportunity to constant fold a shuffle with one variable input vector operand. llvm-svn: 299327	2017-04-02 10:42:21 +00:00
Sanjay Patel	8b5ad3f00e	[InstSimplify] add constant folding for fdiv/frem Also, add a helper function so we don't have to repeat this code for each binop. llvm-svn: 299309	2017-04-01 19:05:11 +00:00
Sanjay Patel	ee0f5cc41f	[InstSimplify] add tests for missed constant folding; NFC llvm-svn: 299308	2017-04-01 18:44:03 +00:00
Peter Collingbourne	3420e4bf94	Fix a test to check assembly output instead of bitcode. llvm-svn: 299279	2017-03-31 23:22:19 +00:00
Craig Topper	0d8801f991	[InstCombine] Add test case demonstrating missed opportunities for removing add/sub when the LSBs of one input are known to be 0 and MSBs of the output aren't consumed. llvm-svn: 299263	2017-03-31 21:08:37 +00:00
Joerg Sonnenberger	28bed106e0	Do not translate rint into nearbyint, but truncate it like nearbyint. A common way to implement nearbyint is by fiddling with the floating point environment and calling rint. This is used at least by the BSD libm and musl. As such, canonicalizing the latter to the former will create infinite loops for libm and generally pessimize performance, at least when the generic C versions are used. This change preserves the rint in the libcall translation and also handles the domain truncation logic, so that rint with float argument will be reduced to rintf etc. llvm-svn: 299247	2017-03-31 19:58:07 +00:00
Piotr Padlewski	c050e6e91f	[MSSA] Small test fix llvm-svn: 299235	2017-03-31 17:39:07 +00:00
Dehao Chen	fed890ea3a	Fix the InstCombine to reserve the VP metadata and sets correct call count. Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31344 llvm-svn: 299228	2017-03-31 15:59:52 +00:00
Zvi Rackover	38ba75c238	Instsimplify: Adding shufflevector test. NFC. Adding some test-cases demonstrating cases that need to be improved. To be followed by patches that improve these cases. llvm-svn: 299189	2017-03-31 07:46:02 +00:00
Mikael Holmen	79235bd4d8	[Scalarizer] Handle scalar arguments in vector GEP Summary: Triggered by commit r298620: "[LV] Vectorize GEPs". If we encounter a vector GEP with scalar arguments, we splat the scalar into a vector of appropriate size before we scatter the argument. Reviewers: arsenm, mehdi_amini, bkramer Reviewed By: arsenm Subscribers: bjope, mssimpso, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31416 llvm-svn: 299186	2017-03-31 06:29:49 +00:00
Matt Arsenault	79f837c254	AMDGPU: Add all atomicrmw fields to atomic.inc/dec Add scope, order, isVolatile llvm-svn: 299122	2017-03-30 22:21:40 +00:00
Hongbin Zheng	bfd7c38de7	[SimplifyIndvar] Replace the sdiv used by IV if we can prove both of its operands are non-negative Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable Differential Revision: https://reviews.llvm.org/D31488 llvm-svn: 299118	2017-03-30 21:56:56 +00:00
Adrian Prantl	346dcaf1fa	Teach stripNonLineTableDebugInfo() to remap DILocations in !llvm.loop nodes. llvm-svn: 299107	2017-03-30 20:10:56 +00:00
Matthew Simpson	c8f0aeccda	[InstCombine] Correct the check for vector GEPs Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an existing check that attempts to bail out if given a vector GEP. However, the check only tests the GEP's pointer operand. A GEP results in a vector of pointers if at least one of its operands is vector-typed (e.g., its pointer operand could be a scalar, but its index could be a vector). We should just check the type of the GEP itself. This should fix PR32414. Reference: https://bugs.llvm.org/show_bug.cgi?id=32414 Differential Revision: https://reviews.llvm.org/D31470 llvm-svn: 299017	2017-03-29 18:23:08 +00:00
Anna Thomas	ba04f4e925	rename instcombine test file. NFC llvm-svn: 298904	2017-03-28 08:34:07 +00:00
Matthew Simpson	b8ff4a4a70	[LV] Transform truncations of non-primary induction variables The vectorizer tries to replace truncations of induction variables with new induction variables having the smaller type. After r295063, this optimization was applied to all integer induction variables, including non-primary ones. When optimizing the truncation of a non-primary induction variable, we still need to transform the new induction so that it has the correct start value. This should fix PR32419. Reference: https://bugs.llvm.org/show_bug.cgi?id=32419 llvm-svn: 298882	2017-03-27 20:07:38 +00:00
Anna Thomas	f57ae33381	[InstCombine] Avoid incorrect folding of select into phi nodes when incoming element is a vector type Summary: We are incorrectly folding selects into phi nodes when the incoming value of a phi node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the select condition is a phi node with constant incoming values. Without the fix, we are miscompiling (i.e. incorrectly folding the select into the phi node) when the vector contains non-zero elements. This patch fixes the miscompile and we will correctly fold based on the select vector operand (see added test cases). Reviewers: majnemer, sanjoy, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31189 llvm-svn: 298845	2017-03-27 13:52:51 +00:00
Serge Pavlov	b71bb80c2d	[LoopUnroll] Remap references in peeled iteration References in cloned blocks must be remapped prior to dominator calculation. Differential Revision: https://reviews.llvm.org/D31281 llvm-svn: 298811	2017-03-26 16:46:53 +00:00
Joerg Sonnenberger	fa7367428a	Split the SimplifyCFG pass into two variants. The first variant contains all current transformations except transforming switches into lookup tables. The second variant contains all current transformations. The switch-to-lookup-table conversion results in code that is more difficult to analyze and optimize by other passes. Most importantly, it can inhibit Dead Code Elimination. As such it is often beneficial to only apply this transformation very late. A common example is inlining, which can often result in range restrictions for the switch expression. Changes in execution time according to LNT: SingleSource/Benchmarks/Misc/fp-convert +3.03% MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20% MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43% and a couple of smaller changes. For perimeter it also results 2.6% a smaller binary. Differential Revision: https://reviews.llvm.org/D30333 llvm-svn: 298799	2017-03-26 06:44:08 +00:00
Chandler Carruth	0d256c0f5d	[IR] Make SwitchInst::CaseIt almost a normal iterator. This moves it to the iterator facade utilities giving it full random access semantics, etc. It can also now be used with standard algorithms like std::all_of and std::any_of and range adaptors like llvm::reverse. Also make the semantics of iterating match what every other iterator uses and forbid decrementing past the begin iterator. This was used as a hacky way to work around iterator invalidation. However, every instance trying to do this failed to actually avoid touching invalid iterators despite the clear documentation that the removed and all subsequent iterators become invalid including the end iterator. So I've added a return of the next iterator to removeCase and rewritten the loops that were doing this to correctly follow the iterator pattern of either incremneting or removing and assigning fresh values to the iterator and the end. In one case we were trying to go backwards to make this cleaner but it doesn't actually work. I've made that code match the code we use everywhere else to remove cases as we iterate. This changes the order of cases in one test output and I moved that test to CHECK-DAG so it wouldn't care -- the order isn't semantically meaningful anyways. llvm-svn: 298791	2017-03-26 02:49:23 +00:00
Eric Christopher	0935875c40	Change the default attributes for llvm.prefetch to inaccessiblemem_or_argmemonly so that we can perform some optimizations across it. Fixes PR32365 llvm-svn: 298781	2017-03-25 20:20:23 +00:00
Ivan Krasin	c2124e185c	Revert r298620: [LV] Vectorize GEPs Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298735	2017-03-24 20:49:43 +00:00
Matt Arsenault	4c7795dd31	AMDGPU: Fold rcp/rsq of undef to undef llvm-svn: 298725	2017-03-24 19:04:57 +00:00
Teresa Johnson	428b9e0627	[ThinLTO] Correct counting of functions in inliner stats Summary: Declarations need to be filtered out when counting functions. Reviewers: eraman Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31336 llvm-svn: 298720	2017-03-24 17:59:06 +00:00
Daniel Berlin	9d0796e5d0	NewGVN: Fix PR32403 - Handling of undef in phis was not quite correct due to LLVM's view of phi nodes. It would cause NewGVN not to fixpoint in some interesting edge cases. llvm-svn: 298687	2017-03-24 05:30:34 +00:00
Dehao Chen	722e94061b	Set the prof weight correctly for call instructions in DeadArgumentElimination. Summary: In DeadArgumentElimination, the call instructions will be replaced. We also need to set the prof weights so that function inlining can find the correct profile. Reviewers: eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31143 llvm-svn: 298660	2017-03-23 23:26:00 +00:00
Bryant Wong	def79b21e4	[MetaRenamer] Don't rename library functions. Library functions can have specific semantics that affect the behavior of certain passes. DSE, for instance, gives special treatment to malloc-ed pointers but not to pointers returned from an equivalently typed (but differently named) function. MetaRenamer ought not to alter program semantics, so library functions must remain untouched. Reviewers: mehdi_amini, majnemer, chandlerc, davide Reviewed By: davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D31304 llvm-svn: 298659	2017-03-23 23:21:07 +00:00
Dehao Chen	775341a14c	Use isFunctionHotInCallGraph to set the function section prefix. Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31225 llvm-svn: 298656	2017-03-23 23:14:11 +00:00
Gil Rapaport	638d4538cd	[LV] Add regression test for r297610 The new test asserts that scalarized memory operations get memcheck metadata added even if the loop is only unrolled. Differential Revision: https://reviews.llvm.org/D30972 llvm-svn: 298641	2017-03-23 20:02:23 +00:00
Teresa Johnson	0c6a4ff8dc	[ThinLTO] Add support for emitting minimized bitcode for thin link Summary: The cumulative size of the bitcode files for a very large application can be huge, particularly with -g. In a distributed build environment, all of these files must be sent to the remote build node that performs the thin link step, and this can exceed size limits. The thin link actually only needs the summary along with a bitcode symbol table. Until we have a proper bitcode symbol table, simply stripping the debug metadata results in significant size reduction. Add support for an option to additionally emit minimized bitcode modules, just for use in the thin link step, which for now just strips all debug metadata. I plan to add a cc1 option so this can be invoked easily during the compile step. However, care must be taken to ensure that these minimized thin link bitcode files produce the same index as with the original bitcode files, as these original bitcode files will be used in the backends. Specifically: 1) The module hash used for caching is typically produced by hashing the written bitcode, and we want to include the hash that would correspond to the original bitcode file. This is because we want to ensure that changes in the stripped portions affect caching. Added plumbing to emit the same module hash in the minimized thin link bitcode file. 2) The module paths in the index are constructed from the module ID of each thin linked bitcode, and typically is automatically generated from the input file path. This is the path used for finding the modules to import from, and obviously we need this to point to the original bitcode files. Added gold-plugin support to take a suffix replacement during the thin link that is used to override the identifier on the MemoryBufferRef constructed from the loaded thin link bitcode file. The assumption is that the build system can specify that the minimized bitcode file has a name that is similar but uses a different suffix (e.g. out.thinlink.bc instead of out.o). Added various tests to ensure that we get identical index files out of the thin link step. Reviewers: mehdi_amini, pcc Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D31027 llvm-svn: 298638	2017-03-23 19:47:39 +00:00
Matthew Simpson	4e7b71bc86	[LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620	2017-03-23 16:29:58 +00:00
Matthew Simpson	1fb4064531	[LV] Delete unneeded scalar GEP creation code The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615	2017-03-23 16:07:21 +00:00
Dehao Chen	53a0c082d2	Do not set branch weight if the branch weight annotation is present. Summary: ThinLTO will annotate the CFG twice. If the branch weight is set by the first annotation, we should not set the branch weight again in the second annotation because the first annotation is more accurate as there is less optimization that could affect debug info accuracy. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: mehdi_amini, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D31228 llvm-svn: 298602	2017-03-23 14:43:10 +00:00
Luqman Aden	3f807c91dc	Preserve nonnull metadata on Loads through SROA & mem2reg. Summary: https://llvm.org/bugs/show_bug.cgi?id=31142 : SROA was dropping the nonnull metadata on loads from allocas that got optimized out. This patch simply preserves nonnull metadata on loads through SROA and mem2reg. Reviewers: chandlerc, efriedma Reviewed By: efriedma Subscribers: hfinkel, spatel, efriedma, arielb1, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D27114 llvm-svn: 298540	2017-03-22 19:16:39 +00:00
Sanjay Patel	2f602cea41	[InstCombine] canonicalize insertelement of scalar constant ahead of insertelement of variable insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 --> insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1 As noted in the code comment and seen in the test changes, the motivation is that by pulling constant insertion up, we may be able to constant fold some insertelement instructions. Differential Revision: https://reviews.llvm.org/D31196 llvm-svn: 298520	2017-03-22 17:10:44 +00:00
Craig Topper	ad5c2d04f7	[ValueTracking] Make sure we keep range metadata information when calculating known bits for calls to bitreverse intrinsic. llvm-svn: 298488	2017-03-22 07:22:49 +00:00
Craig Topper	07f2915ad8	[InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side. Reviewers: davide, majnemer, spatel, sanjoy, hfinkel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31119 llvm-svn: 298478	2017-03-22 04:03:53 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Sanjay Patel	68c6beabe9	[InstCombine] regenerate checks; NFC llvm-svn: 298432	2017-03-21 20:14:38 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Sanjay Patel	00ece756c3	[InstCombine] auto-generate better checks; NFC llvm-svn: 298377	2017-03-21 14:04:44 +00:00
Matt Arsenault	6b00d40900	InstCombine: Check source value precision when reducing cast intrinsic Missed this check when porting from the libcall version. llvm-svn: 298312	2017-03-20 21:59:24 +00:00
Daniel Berlin	fa42a23cfc	Add missing updated test from VN coercion changes. Instructions were renamed. NFC llvm-svn: 298280	2017-03-20 18:04:19 +00:00
Dehao Chen	e593049fb0	Updates branch_weights annotation for call instructions during inlining. Summary: Inliner should update the branch_weights annotation to scale it to proper value. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D30767 llvm-svn: 298270	2017-03-20 16:40:44 +00:00
Craig Topper	ff2283ec0e	[InstCombine] Use update_test_checks.py to regenerate a test. NFC llvm-svn: 298227	2017-03-19 17:04:52 +00:00
Xin Tong	967e313078	Remove unused arguments. NFCI llvm-svn: 298218	2017-03-19 15:31:16 +00:00
Xin Tong	d67fb1b66e	[JumpThreading] Perform phi-translation in SimplifyPartiallyRedundantLoad. Summary: In case we are loading on a phi-load in SimplifyPartiallyRedundantLoad. Try to phi translate it into incoming values in the predecessors before we search for available loads. This needs https://reviews.llvm.org/D30524 Reviewers: davide, sanjoy, efriedma, dberlin, rengolin Reviewed By: dberlin Subscribers: junbuml, llvm-commits Differential Revision: https://reviews.llvm.org/D30543 llvm-svn: 298217	2017-03-19 15:30:53 +00:00
Brian Gesiak	1640e68728	[Analysis] bitreverse(undef) returns undef Summary: The reverse of an artbitrary bitpattern is also an arbitrary bitpattern. Reviewers: trentxintong, arsenm, majnemer Reviewed By: majnemer Subscribers: majnemer, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31118 llvm-svn: 298201	2017-03-19 04:40:42 +00:00
Daniel Berlin	41b39169e2	NewGVN: Fix PHI evaluation bug exposed by new verifier. We were checking whether the incoming block was reachable instead of whether the specific edge was reachable llvm-svn: 298187	2017-03-18 15:41:36 +00:00
Rong Xu	661ffe104e	[PGO] Add omitted test cases. llvm-svn: 298115	2017-03-17 20:05:13 +00:00
Rong Xu	e60343d6b0	[PGO] Value profile for size of memory intrinsic calls This patch annotates the valuesites profile to memory intrinsics. Differential Revision: http://reviews.llvm.org/D31002 llvm-svn: 298110	2017-03-17 18:07:26 +00:00
Stanislav Mekhanoshin	ee2dd785f6	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Sanjoy Das	c4e4dcdf64	[RSForGC] Handle vector GEPs We were not handling getelemenptr instructions of vector type before. Since getelemenptr instructions for vector types follow the same rule as getelementptr instructions for non-vector types, we can just handle them in the same way. llvm-svn: 298028	2017-03-17 00:55:53 +00:00
Rong Xu	60faea19f8	Resubmit r297897: [PGO] Value profile for size of memory intrinsic calls R297897 inadvertently enabled annotation for memop profiling. This new patch fixed it. llvm-svn: 297996	2017-03-16 21:15:48 +00:00
Adrian Prantl	47ea6478ed	Salvage debug info from instructions about to be deleted [Reapplies r297971 and punting on finding a better API for findDbgValues()] This patch improves debug info quality in InstCombine by looking at values that are about to be deleted, checking whether there are any dbg.value instrinsics referring to them, and potentially encoding the semantics of the deleted instruction into the dbg.value's DIExpression. In the example in the testcase (which was extracted from XNU) there is a sequence of %4 = load %struct.entry, %struct.entry* %next2, align 8, !dbg !41 %5 = bitcast %struct.entry* %4 to i8, !dbg !42 %add.ptr4 = getelementptr inbounds i8, i8 %5, i64 -8, !dbg !43 %6 = bitcast i8* %add.ptr4 to %struct.entry, !dbg !44 call void @llvm.dbg.value(metadata %struct.entry %6, i64 0, metadata !20, metadata !21), !dbg 34 When these instructions are eliminated by instcombine one after another, we can still salvage the otherwise dead debug info: - Bitcasts have no effect, so have the dbg.value point to operand(0) - Loads can be expressed via a DW_OP_deref - Constant gep instructions can be replaced by DWARF expression arithmetic The API introduced by this patch is not specific to instcombine and can be useful in other places, too. rdar://problem/30725338 Differential Revision: https://reviews.llvm.org/D30919 llvm-svn: 297994	2017-03-16 21:14:09 +00:00
Michael Kuperstein	2da2bfa088	[LoopUnroll] Don't peel loops where the latch isn't the exiting block Peeling assumed this doesn't happen, but didn't check it. This fixes PR32178. Differential Revision: https://reviews.llvm.org/D30757 llvm-svn: 297993	2017-03-16 21:07:48 +00:00
Sanjay Patel	6105bb5eaf	[InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306) As the related tests show, we're not canonicalizing to this form for scalars or vectors yet, but this solves the immediate problem in: https://bugs.llvm.org/show_bug.cgi?id=32306 llvm-svn: 297989	2017-03-16 20:42:45 +00:00
Sanjay Patel	634e622069	[InstCombine] add tests for PR32306 and missed min/max canonicalization; NFC llvm-svn: 297986	2017-03-16 20:31:38 +00:00
Adrian Prantl	fa9e84eb6d	Revert commit r297971 because of issues reported by msan. llvm-svn: 297982	2017-03-16 20:11:54 +00:00
Adrian Prantl	4377314a98	Salvage debug info from instructions about to be deleted This patch improves debug info quality in InstCombine by looking at values that are about to be deleted, checking whether there are any dbg.value instrinsics referring to them, and potentially encoding the semantics of the deleted instruction into the dbg.value's DIExpression. In the example in the testcase (which was extracted from XNU) there is a sequence of %4 = load %struct.entry, %struct.entry* %next2, align 8, !dbg !41 %5 = bitcast %struct.entry* %4 to i8, !dbg !42 %add.ptr4 = getelementptr inbounds i8, i8 %5, i64 -8, !dbg !43 %6 = bitcast i8* %add.ptr4 to %struct.entry, !dbg !44 call void @llvm.dbg.value(metadata %struct.entry %6, i64 0, metadata !20, metadata !21), !dbg 34 When these instructions are eliminated by instcombine one after another, we can still salvage the otherwise dead debug info: - Bitcasts have no effect, so have the dbg.value point to operand(0) - Loads can be expressed via a DW_OP_deref - Constant gep instructions can be replaced by DWARF expression arithmetic The API introduced by this patch is not specific to instcombine and can be useful in other places, too. rdar://problem/30725338 Differential Revision: https://reviews.llvm.org/D30919 llvm-svn: 297971	2017-03-16 18:22:52 +00:00
Eric Liu	971de62291	Revert "[PGO] Value profile for size of memory intrinsic calls" This commit reverts r297897 and r297909. llvm-svn: 297951	2017-03-16 13:16:35 +00:00
Chandler Carruth	814e0df1c5	[PM/Inliner] Fix a bug in r297374 where we would leave stale calls in the work queue and crash when trying to visit them after deleting the function containing those calls. llvm-svn: 297940	2017-03-16 10:45:42 +00:00
Chandler Carruth	6ef42cc6bb	[PM/Inliner] Add a test case that encapsulates the core issue addressed in r297374. I've extracted a small version of this from the C++ metaprogram Richard came up with to exercise these kinds of issues and written comments to describe both how to reproduce a fresh version of the test case and what likely failure modes are. The test case is still a bit brittle as it depends on the particular inline cost modeling and SCC visitation order, but it definitely would have caught the bug right away when developing things so it seems a really valuable test case to have. llvm-svn: 297935	2017-03-16 10:13:55 +00:00
Rong Xu	4ed52798ce	[PGO] Value profile for size of memory intrinsic calls This patch adds the value profile support to profile the size parameter of memory intrinsic calls: memcpy, memcmp, and memmov. Differential Revision: http://reviews.llvm.org/D28965 llvm-svn: 297897	2017-03-15 21:47:27 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Fiona Glaser	a9bd572b6f	MemCpyOptimizer: don't create new addrspace casts This isn't safe on all targets, and since we don't have a way to know it's safe, avoid doing it for now. llvm-svn: 297788	2017-03-14 22:37:38 +00:00
Dehao Chen	4a435e0896	SamplePGO ThinLTO ICP fix for local functions. Summary: In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen: 1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName. 2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining: 1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import. 2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName. 3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30754 llvm-svn: 297757	2017-03-14 17:33:01 +00:00
Sanjay Patel	1c8c6a457d	[InstCombine] consolidate rem tests and update checks; NFC llvm-svn: 297747	2017-03-14 16:27:46 +00:00
Sanjay Patel	9deec85c34	[InstCombine] regenerate checks; NFC llvm-svn: 297746	2017-03-14 16:16:40 +00:00
Oliver Stannard	062041113f	[ValueTracking] Out of range shifts might be undef If it is possible for the RHS of a shift operation to be greater than or equal to the bit-width, then the result might be undef, and we can't report any known bits. In some cases, this was allowing a transformation in instcombine which widened an undef value from i1 to i32, increasing the range of values that a function could return. Differential revision: https://reviews.llvm.org/D30781 llvm-svn: 297724	2017-03-14 10:13:17 +00:00
Jonas Paulsson	a48ea231c0	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Daniel Berlin	620f86ff2b	Add missing condprop-xfail.ll that contains the remaining xfail'd tests llvm-svn: 297699	2017-03-14 01:46:51 +00:00
Daniel Berlin	2aa23e8881	NewGVN: We pass rle-nonlocal, we just perform the replacement in a way that keeps the old name instead of the new one llvm-svn: 297683	2017-03-13 22:43:30 +00:00
Sanjay Patel	caf369bd03	[SimplifyCFG] move tests for PR31028 from CGP Hopefully, this will make sense with a forthcoming patch. If not, we can move these back. llvm-svn: 297660	2017-03-13 19:59:14 +00:00
Matt Arsenault	d81f557fe2	AMDGPU: Fold icmp/fcmp into icmp intrinsic The typical use is a library vote function which compares to 0. Fold the user condition into the intrinsic. llvm-svn: 297650	2017-03-13 18:14:02 +00:00
Sanjay Patel	6023a2501c	[CGP] add tests for PR31028; NFC llvm-svn: 297629	2017-03-13 15:45:37 +00:00
Sanjoy Das	3f1e8e0102	Use a WeakVH for UnknownInstructions in AliasSetTracker Summary: This change solves the same problem as D30726, except that this only throws out the bathwater. AST was not correctly tracking and deleting UnknownInstructions via handles. The existing code only tracks "pointers" in its `ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a pointer used by another memory instruction) never gets a `ASTCallbackVH`. There are two other ways to solve this problem: - Use the `PointerRec` scheme for both known and unknown instructions. - Use a `CallbackVH` that erases the offending Instruction from the UnknownInstruction list. Both of the above changes seemed to be significantly (and unnecessarily IMO) more complex than this. Reviewers: chandlerc, dberlin, hfinkel, reames Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30849 llvm-svn: 297539	2017-03-11 01:15:48 +00:00
Peter Collingbourne	14dcf02fcb	WholeProgramDevirt: Implement export/import support for VCP. Differential Revision: https://reviews.llvm.org/D30017 llvm-svn: 297503	2017-03-10 20:13:58 +00:00
Peter Collingbourne	59675ba0f8	WholeProgramDevirt: Implement export/import support for unique ret val opt. Differential Revision: https://reviews.llvm.org/D29917 llvm-svn: 297502	2017-03-10 20:09:11 +00:00
Michael Kuperstein	5fb39a7966	[SLP] Revert everything that has to do with memory access sorting. This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493	2017-03-10 18:59:07 +00:00
Matt Arsenault	a3bdd8f27b	AMDGPU: Fix insertion point when reducing load intrinsics The insertion point may be later than the next instruction, so it is necessary to set it when replacing the call. llvm-svn: 297439	2017-03-10 05:25:49 +00:00
Daniel Berlin	e3e69e1680	NewGVN: Rewrite DCE during elimination so we do it as well as old GVN did. llvm-svn: 297428	2017-03-10 00:32:33 +00:00
Sanjay Patel	962a8431ea	[InstSimplify] allow folds for bool vector div/rem llvm-svn: 297411	2017-03-09 21:56:03 +00:00
Sanjay Patel	7e56366204	[ConstantFold] vector div/rem with any zero element in divisor is undef Follow-up for: https://reviews.llvm.org/D30665 https://reviews.llvm.org/rL297390 llvm-svn: 297409	2017-03-09 20:42:30 +00:00
Matt Arsenault	efe949cc67	AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsics llvm-svn: 297408	2017-03-09 20:34:27 +00:00
Sanjay Patel	bb47616aef	[InstSimplify] add tests for vector constant folding div/rem-by-0; NFC llvm-svn: 297407	2017-03-09 20:31:20 +00:00
Sanjay Patel	2b1f6f4b92	[InstSimplify] vector div/rem with any zero element in divisor is undef This was suggested as a DAG simplification in the review for rL297026 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html ...but let's start with IR since we have actual docs for IR (LangRef). Differential Revision: https://reviews.llvm.org/D30665 llvm-svn: 297390	2017-03-09 16:20:52 +00:00
Chandler Carruth	20e588e1af	[PM/Inliner] Make the new PM's inliner process call edges across an entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into insanely large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char str; void g(const char ); template <bool K, int N> void f(bool B, bool E) { if (K) g(str<N>); if (B == E) return; if (B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool B, bool E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool B, bool E) { return f<true, 0>(B, E); } extern bool arr, end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an awesome* job here as long as it does ok. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here should be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374	2017-03-09 11:35:40 +00:00
Peter Collingbourne	0152c8156b	WholeProgramDevirt: Implement importing for uniform ret val opt. Differential Revision: https://reviews.llvm.org/D29854 llvm-svn: 297350	2017-03-09 01:11:15 +00:00
Peter Collingbourne	6d284fab20	WholeProgramDevirt: Implement importing for single-impl devirtualization. Differential Revision: https://reviews.llvm.org/D29844 llvm-svn: 297333	2017-03-09 00:21:25 +00:00
Changpeng Fang	1be9b9f816	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Evgeniy Stepanov	8537d9994d	Don't merge global constants with non-dbg metadata. !type metadata can not be dropped. An alternative to this is adding !type metadata from the replaced globals to the replacement, but that may weaken type tests and make them slower at the same time. The merged global gets !dbg metadata from replaced globals, and can end up with multiple debug locations. llvm-svn: 297327	2017-03-09 00:03:37 +00:00
Matthew Simpson	3388de1349	[LV] Select legal insert point when fixing first-order recurrences Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302	2017-03-08 18:18:20 +00:00
Matthew Simpson	8966848d17	[LV] Make the test case for PR30183 less fragile This patch also renames the PR number the test points to. The previous reference was PR29559, but that bug was somehow deleted and recreated under PR30183. llvm-svn: 297295	2017-03-08 17:03:38 +00:00
Matthew Simpson	903dd5aa9b	[LV] Add missing check labels to tests and reformat llvm-svn: 297294	2017-03-08 16:55:34 +00:00
Jun Bum Lim	ac170872b2	[JumpThread] Use AA in SimplifyPartiallyRedundantLoad() Summary: Use AA when scanning to find an available load value. Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin Reviewed By: rengolin, dberlin Subscribers: aemerson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D30352 llvm-svn: 297284	2017-03-08 15:22:30 +00:00
Sanjay Patel	62906af379	[InstCombine] avoid crashing on shuffle shrinkage when input type is not same as result type llvm-svn: 297280	2017-03-08 15:02:23 +00:00
Sam Parker	0f4db38c20	[LoopRotate] Propagate dbg.value intrinsics Recommitting patch which was previously reverted in r297159. These changes should address the casting issues. The original patch enables dbg.value intrinsics to be attached to newly inserted PHI nodes. Differential Review: https://reviews.llvm.org/D30701 llvm-svn: 297269	2017-03-08 09:56:22 +00:00
Sebastian Pop	4a4d245b19	Handle UnreachableInst in isGuaranteedToTransferExecutionToSuccessor A block with an UnreachableInst does not transfer execution to a successor. The problem was exposed by GVN-hoist. This patch fixes bug 32153. Patch by Aditya Kumar. Differential Revision: https://reviews.llvm.org/D30667 llvm-svn: 297254	2017-03-08 01:54:50 +00:00
Sanjay Patel	fe9705149b	[InstCombine] shrink truncated insertelement into undef vector This is the 2nd part of solving: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. We're limiting this transform to undef rather than any constant to avoid backend problems. Differential Revision: https://reviews.llvm.org/D30137 llvm-svn: 297242	2017-03-07 23:27:14 +00:00
Evgeniy Stepanov	7a5cfa9a11	Fix one-after-the-end type metadata handling in globalsplit. Itanium ABI may have an address point one byte after the end of a vtable. When such vtable global is split, the !type metadata needs to follow the right vtable. Differential Revision: https://reviews.llvm.org/D30716 llvm-svn: 297236	2017-03-07 22:18:48 +00:00
Sanjay Patel	53fa17a014	[InstCombine] shrink truncated splat shuffle (2nd try) This was committed at r297155 and reverted at r297166 because of an over-reaching clang test. That should be fixed with r297189. This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297232	2017-03-07 21:45:16 +00:00
Gor Nishanov	c52006ab09	[coroutines] Add handling for unwind coro.ends Summary: The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and other code that is only relevant during the initial invocation of the coroutine and should not be present in resume and destroy parts. In landing pads coro.end is replaced with an appropriate instruction to unwind to caller. The handling of coro.end differs depending on whether the target is using landingpad or WinEH exception model. For landingpad based exception model, it is expected that frontend uses the `coro.end`_ intrinsic as follows: ``` ehcleanup: %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true) br i1 %InResumePart, label %eh.resume, label %cleanup.cont cleanup.cont: ; rest of the cleanup eh.resume: %exn = load i8, i8* %exn.slot, align 8 %sel = load i32, i32* %ehselector.slot, align 4 %lpad.val = insertvalue { i8, i32 } undef, i8 %exn, 0 %lpad.val29 = insertvalue { i8, i32 } %lpad.val, i32 %sel, 1 resume { i8, i32 } %lpad.val29 ``` The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions, thus leading to immediate unwind to the caller, whereas in start function it is replaced with ``False``, thus allowing to proceed to the rest of the cleanup code that is only needed during initial invocation of the coroutine. For Windows Exception handling model, a frontend should attach a funclet bundle referring to an enclosing cleanuppad as follows: ``` ehcleanup: %tok = cleanuppad within none [] %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ] cleanupret from %tok unwind label %RestOfTheCleanup ``` The `CoroSplit` pass, if the funclet bundle is present, will insert ``cleanupret from %tok unwind to caller`` before the `coro.end`_ intrinsic and will remove the rest of the block. Reviewers: majnemer Reviewed By: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25543 llvm-svn: 297223	2017-03-07 21:00:54 +00:00
Matthew Simpson	c86b2134c7	[LV] Consider users that are memory accesses in uniforms expansion step When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179	2017-03-07 18:47:30 +00:00
Sanjay Patel	6d30606168	revert r297155 because there's a clang test that depends on InstCombine: tools/clang/test/CodeGen/zvector.c llvm-svn: 297166	2017-03-07 17:41:45 +00:00
Adrian Prantl	d4056501fb	Revert "Strip debug info when inlining into a nodebug function." This reverts commit r296488. As noted by David Blaikie on llvm-commits, I overlooked the case of a debug function being inlined into a nodebug function being inlined into a debug function. llvm-svn: 297163	2017-03-07 17:28:57 +00:00
Nico Weber	3b2f0094d7	Revert r297132, it caused PR32171 llvm-svn: 297159	2017-03-07 17:23:52 +00:00
Sanjay Patel	defdb7bed5	[InstCombine] shrink truncated splat shuffle This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297155	2017-03-07 16:10:36 +00:00
Sam Parker	6ec5fdbc94	[LoopRotate] Update dbg.value intrinsics Propagate debug info through the newly inserted PHI nodes. Differential Revision: https://reviews.llvm.org/D30190 llvm-svn: 297132	2017-03-07 09:34:25 +00:00
Sanjoy Das	30c3538e2e	[LoopUnrolling] Fix loop size check for peeling Summary: We should check if loop size allows us to peel at least one iteration before we do so. Patch by Max Kazantsev! Reviewers: sanjoy, mkuper, efriedma Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30632 llvm-svn: 297122	2017-03-07 06:03:15 +00:00
Michael Kuperstein	768d013a03	[SLP] Revert r296863 due to miscompiles. Details and reproducer are on the email thread for r296863. llvm-svn: 297103	2017-03-06 23:54:51 +00:00
Daniel Berlin	961b002714	NewGVN: We were not really failing this testcase, because the instructions it was looking for are unused. GVN value numbers unused instructions, NewGVN does not. Fix the instructions to be used, so we eliminate the redundancies it's checking for, and un-XFAIL it llvm-svn: 297058	2017-03-06 20:01:31 +00:00
Sanjay Patel	3bbee79d9e	[InstSimplify] add tests for vector div/rem with UB potential; NFC llvm-svn: 297048	2017-03-06 18:45:39 +00:00
Sanjay Patel	c494239bd8	[InstSimplify] regenerate checks; NFC llvm-svn: 297040	2017-03-06 18:13:01 +00:00
Dehao Chen	c632a393b7	Remove the sample pgo annotation heuristic that uses call count to annotate basic block count. Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks. Reviewers: davidxl, aprantl Reviewed By: aprantl Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D30658 llvm-svn: 297038	2017-03-06 17:49:59 +00:00
Alexey Bataev	d7db344c17	[SLP] A test for vectorization of users of extractelement instructions, NFC. llvm-svn: 297024	2017-03-06 16:26:00 +00:00
Evgeny Stupachenko	e4b0813d62	Add test missed in r296770. Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296962	2017-03-04 05:20:02 +00:00
Evgeny Stupachenko	d6aa0d02c2	Set option enabling LSR alternative way to resolve complex solution to false. Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296959	2017-03-04 03:14:05 +00:00
Peter Collingbourne	77a8d563a3	WholeProgramDevirt: Implement exporting for uniform ret val opt. Differential Revision: https://reviews.llvm.org/D29846 llvm-svn: 296948	2017-03-04 01:34:53 +00:00
Peter Collingbourne	2325bb34c1	WholeProgramDevirt: Implement exporting for single-impl devirtualization. Differential Revision: https://reviews.llvm.org/D29811 llvm-svn: 296945	2017-03-04 01:31:01 +00:00
Peter Collingbourne	b406baaeef	WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users. Any unsuccessful llvm.type.checked.load devirtualizations will be translated into uses of llvm.type.test, so we need to add the resulting llvm.type.test intrinsics to the function summaries so that the LowerTypeTests pass will export them. Differential Revision: https://reviews.llvm.org/D29808 llvm-svn: 296939	2017-03-04 01:23:30 +00:00
Sanjoy Das	664c925a57	[LoopUnrolling] Peel loops with invariant backedge Phi input Summary: If a loop contains a Phi node which has an invariant input from back edge, it is profitable to peel such loops (rather than unroll them) to use the advantage that this Phi is always invariant starting from 2nd iteration. After the 1st iteration is peeled, other optimizations can potentially simplify calculations with this invariant. Patch by Max Kazantsev! Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames Reviewed By: mkuper Subscribers: mkuper, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D30161 llvm-svn: 296898	2017-03-03 18:19:15 +00:00
Benjamin Kramer	9528f8c2fb	Revert "Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline."" This reverts commit r296759. Miscompiles bash. llvm-svn: 296872	2017-03-03 14:27:53 +00:00
Mohammad Shahid	bdac9f30c0	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863	2017-03-03 10:02:47 +00:00
Peter Collingbourne	3baa72af7d	ThinLTOBitcodeWriter: Do not follow operand edges of type GlobalValue when looking for virtual functions. Such edges may otherwise result in infinite recursion if a pointer to a vtable is reachable from the vtable itself. This can happen in practice if a TU defines the ABI types used to implement RTTI, and is itself compiled with RTTI. Fixes PR32121. llvm-svn: 296839	2017-03-02 23:10:17 +00:00
Nikolai Bozhenov	4a04fb9e90	[BypassSlowDivision] Use ValueTracking to simplify run-time checks ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 llvm-svn: 296832	2017-03-02 22:12:15 +00:00
Tobias Grosser	f818c3300b	Revert "Fix PR 24415 (at least), by making our post-dominator tree behavior sane." and also "clang-format GenericDomTreeConstruction.h, since the current formatting makes it look like their is a bug in the loop indentation, and there is not" This reverts commit r296535. There are still some open design questions which I would like to discuss. I revert this for Daniel (who gave the OK), as he is on vacation. llvm-svn: 296812	2017-03-02 21:08:37 +00:00
Evgeny Stupachenko	d655ec56c3	The patch fixes r296770 Summary: Extend -unroll-partial-threshold to 200 for runtime-loop3.ll test as epilogue unroll initially add 1 more IV to the loop. From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296803	2017-03-02 19:41:38 +00:00
Evgeny Stupachenko	21bef2cb3c	The patch turns on epilogue unroll for loops with constant recurency start. Summary: Set unroll remainder to epilog if a loop contains a phi with constant parameter: loop: pn = phi [Const, PreHeader], [pn.next, Latch] ... Reviewer: hfinkel Differential Revision: http://reviews.llvm.org/D27004 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296770	2017-03-02 17:38:46 +00:00
Geoff Berry	484d756583	Re-apply "[GVNHoist] Move GVNHoist to function simplification part of pipeline." This re-applies r289696, which caused TSan perf regression, which has since been addressed in separate changes (see PR for details). See PR31382. llvm-svn: 296759	2017-03-02 16:16:47 +00:00
Bjorn Pettersson	e5027cfbcc	[InstCombine] Avoid faulty combines of select-cmp-br Summary: When InstCombine is optimizing certain select-cmp-br patterns it replaces the result of the select in uses outside of the basic block containing the select. This is only legal if the path from the select to the outside use is disjoint from all other paths out from the originating basic block. The problem found was that InstCombiner::replacedSelectWithOperand did not consider the case when both edges out from the br pointed to the same label. In that case the paths aren't disjoint and the transformation is illegal. This patch avoids the faulty rewrites by verifying that there is a single flow to the successor where we want to replace uses. Reviewers: llvm-commits, spatel, majnemer Differential Revision: https://reviews.llvm.org/D30455 llvm-svn: 296752	2017-03-02 15:18:58 +00:00
Matthew Simpson	aee9771ae2	[ARM/AArch64] Update costs for interleaved accesses with wide types After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751	2017-03-02 15:15:35 +00:00
Matthew Simpson	1bfa159db9	[ARM/AArch64] Support wide interleaved accesses This patch teaches (ARM\|AArch64)ISelLowering.cpp to match illegal vector types to interleaved access intrinsics as long as the types are multiples of the vector register width. A "wide" access will now be mapped to multiple interleave intrinsics similar to the way in which non-interleaved accesses with illegal types are legalized into multiple accesses. I'll update the associated TTI costs (in getInterleavedMemoryOpCost) as a follow-on. Differential Revision: https://reviews.llvm.org/D29466 llvm-svn: 296750	2017-03-02 15:11:20 +00:00
Matthew Simpson	455c2ee394	[LV] Considier non-consecutive but vectorizable accesses for VF selection When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized. Differential Revision: https://reviews.llvm.org/D30305 llvm-svn: 296747	2017-03-02 13:55:05 +00:00
Reid Kleckner	d80b69fa3b	[Constant Hoisting] Avoid inserting instructions before EH pads Now that terminators can be EH pads, this code needs to iterate over the immediate dominators of the EH pad to find a valid insertion point. Fix for PR32107 Patch by Robert Olliff! Differential Revision: https://reviews.llvm.org/D30511 llvm-svn: 296698	2017-03-01 22:41:12 +00:00
Sanjay Patel	3063affbed	[InstCombine] use -instnamer and auto-generate complete checks; NFC llvm-svn: 296673	2017-03-01 20:59:56 +00:00
Hans Wennborg	cc4ff78c9d	Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654	2017-03-01 18:57:16 +00:00
Hans Wennborg	19c0be90f9	[GVNHoist] Don't hoist unsafe scalars at -Oz (PR31729) Based on Aditya Kumar's patch: Differential Revision: https://reviews.llvm.org/D29092 llvm-svn: 296642	2017-03-01 17:15:08 +00:00
Igor Laevsky	b40152d5d1	[DeadStoreElimination] Check function modref behavior before considering memory clobbered Differential Revision: https://reviews.llvm.org/D29996 llvm-svn: 296625	2017-03-01 14:38:29 +00:00
Igor Laevsky	37cba43604	[BasicAA] Take attributes into account when requesting modref info for a call site Differential Revision: https://reviews.llvm.org/D29989 llvm-svn: 296617	2017-03-01 13:19:51 +00:00
Alexey Bataev	4a45efa431	[SLP] Preserve IR flags when vectorizing horizontal reductions. Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614	2017-03-01 12:43:39 +00:00
Alexey Bataev	74e5a36856	[SLP] Preserve IR flags for extra args. Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613	2017-03-01 12:22:33 +00:00
Alexey Bataev	dfec81107f	[SLP] Fix for PR32038: extra add of PHI node when it is not required. Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606	2017-03-01 10:50:44 +00:00
Mikael Holmen	760dc9aba7	Remove sometimes faulty rewrite of memcpy in instcombine. Summary: Solves PR 31990. The bad rewrite could replace a memcpy of one word with store i4 -1 while it should actually be store i8 -1 Hopefully opt and llc has improved enough so the original optimization done by the code isn't needed anymore. One already existing testcase is affected. It originally tested that the memcpy was replaced with load double but since we now remove that rewrite it will be load i64 instead. Patch suggestion by Eli Friedman. Reviewers: eli.friedman, majnemer, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D30254 llvm-svn: 296585	2017-03-01 06:45:20 +00:00
Adam Nemet	15032a0455	[LV] These remark should have been missed remarks The practice in LV is that we emit analysis remarks and then finally report either a missed or applied remark on the final decision whether vectorization is taking place. On this code path, we were closing with an analysis remark. llvm-svn: 296578	2017-03-01 04:31:15 +00:00
Mohammad Shahid	175ffa8c35	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575	2017-03-01 03:51:54 +00:00
Daniel Berlin	03f6938edc	Fix PR 24415 (at least), by making our post-dominator tree behavior sane. Summary: Currently, our post-dom tree tries to ignore and remove the effects of infinite loops. It fails miserably at this, because it tries to do it ahead of time, and thus can only detect self-loops, and any other type of infinite loop, it pretends doesn't exist at all. This can, in a bunch of cases, lead to wrong answers and a completely empty post-dom tree. Wrong answer: ``` declare void foo() define internal void @f() { entry: br i1 undef, label %bb35, label %bb3.i bb3.i: call void @foo() br label %bb3.i bb35.loopexit3: br label %bb35 bb35: ret void } ``` We get: ``` Inorder PostDominator Tree: [1] <<exit node>> {0,7} [2] %bb35 {1,6} [3] %bb35.loopexit3 {2,3} [3] %entry {4,5} ``` This is a trivial modification of the testcase for PR 6047 Note that we pretend bb3.i doesn't exist. We also pretend that bb35 post-dominates entry. While it's true that it does not exit in a theoretical sense, it's not really helpful to try to ignore the effect and pretend that bb35 post-dominates entry. Worse, we pretend the infinite loop does nothing (it's usually considered a side-effect), and doesn't even exist, even when it calls a function. Sadly, this makes it impossible to use when you are trying to move code safely. All compilers also create virtual or real single exit nodes (including us), and connect infinite loops there (which this patch does). In fact, others have worked around our behavior here, to the point of building their own post-dom trees: https://zneak.github.io/fcd/2016/02/17/structuring.html and pointing out the region infrastructure is near-useless for them with postdom in this state :( Completely empty post-dom tree: ``` define void @spam() #0 { bb: br label %bb1 bb1: ; preds = %bb1, %bb br label %bb1 bb2: ; No predecessors! ret void } ``` Printing analysis 'Post-Dominator Tree Construction' for function 'foo': =============================-------------------------------- Inorder PostDominator Tree: [1] <<exit node>> {0,1} :( (note that even if you ignore the effects of infinite loops, bb2 should be present as an exit node that post-dominates nothing). This patch changes post-dom to properly handle infinite loops and does root finding during calculation to prevent empty tress in such cases. We match gcc's (and the canonical theoretical) behavior for infinite loops (find the backedge, connect it to the exit block). Testcases coming as soon as i finish running this on a ton of random graphs :) Reviewers: chandlerc, davide Subscribers: bryant, llvm-commits Differential Revision: https://reviews.llvm.org/D29705 llvm-svn: 296535	2017-02-28 22:57:50 +00:00
Dehao Chen	a60cdd3881	Add function importing info from samplepgo profile to the module summary. Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D30053 llvm-svn: 296498	2017-02-28 18:09:44 +00:00
Adrian Prantl	80d0c93436	Strip debug info when inlining into a nodebug function. The LLVM backend cannot produce any debug info for an llvm::Function without a DISubprogram attachment. When inlining a debug-info-carrying function into a nodebug function, there is therefore no reason to keep any debug info intrinsic calls or debug locations on the instructions. This fixes a problem discovered in PR32042. rdar://problem/30679307 llvm-svn: 296488	2017-02-28 16:58:13 +00:00
Michael Kuperstein	13bf8a2684	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416	2017-02-28 00:11:34 +00:00
Michael Kuperstein	c07cca85fb	[SLP] Load sorting should not try to sort things that aren't loads. We may get a VL where the first element is a load, but the others aren't. Trying to sort such VLs can only lead to sorrow. llvm-svn: 296411	2017-02-27 23:18:11 +00:00
Matt Arsenault	cdb468c0f9	AMDGPU: Basic folds for fmed3 intrinsic Constant fold, canonicalize constants to RHS, reduce to minnum/maxnum when inputs are nan/undef. llvm-svn: 296409	2017-02-27 23:08:49 +00:00
Hans Wennborg	2d5841fa73	Revert r296366 "[InlineFunction] add nonnull assumptions based on argument attributes" It causes miscompiles e.g. during self-host of Clang (PR32082). llvm-svn: 296398	2017-02-27 22:33:02 +00:00
Alexey Bataev	a79c41cf51	[SLP] Use different flags in tests for reduction ops and extra args. llvm-svn: 296376	2017-02-27 20:22:44 +00:00
Alexey Bataev	0aadc6ef13	[SLP] Modify test to check IR flags propagation for extra args. llvm-svn: 296369	2017-02-27 19:16:09 +00:00
Sanjay Patel	40975e05eb	[InlineFunction] add nonnull assumptions based on argument attributes This was suggested in D27855: have the inliner add assumptions, so we don't lose nonnull info provided by argument attributes. This still doesn't solve PR28430 (dyn_cast), but this gets us closer. https://reviews.llvm.org/D29999 llvm-svn: 296366	2017-02-27 18:13:48 +00:00
Xin Tong	16b85a6601	Fix a bug when unswitching on partial LIV for SwitchInst Summary: Fix a bug when unswitching on partial LIV for SwitchInst. Reviewers: hfinkel, efriedma, sanjoy Reviewed By: sanjoy Subscribers: david2050, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D29107 llvm-svn: 296363	2017-02-27 18:00:13 +00:00
Alexey Bataev	cb78d09d14	[SLP] A test for a fix of PR32038. llvm-svn: 296349	2017-02-27 16:07:10 +00:00
Artur Pilipenko	0860bfc676	Loop predication expand both sides of the widened condition This is a fix for a loop predication bug which resulted in malformed IR generation. Loop invariant side of the widened condition is not guaranteed to be available in the preheader as is, so we need to expand it as well. See added unsigned_loop_0_to_n_hoist_length test for example. Reviewed By: sanjoy, mkazantsev Differential Revision: https://reviews.llvm.org/D30099 llvm-svn: 296345	2017-02-27 15:44:49 +00:00
Daniel Jasper	3ca4525612	Revert "[CGP] Split some critical edges coming out of indirect branches" This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295	2017-02-26 11:09:12 +00:00
Craig Topper	fe25988c68	[AVX-512] Fix the execution domain for AVX-512 integer broadcasts. llvm-svn: 296290	2017-02-26 06:45:51 +00:00
Sanjoy Das	39a684d117	[ValueTracking] Don't do an unchecked shift in ComputeNumSignBits Summary: Previously we used to return a bogus result, 0, for IR like `ashr %val, -1`. I've also added an assert checking that `ComputeNumSignBits` at least returns 1. That assert found an already checked in test case where we were returning a bad result for `ashr %val, -1`. Fixes PR32045. Reviewers: spatel, majnemer Reviewed By: spatel, majnemer Subscribers: efriedma, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D30311 llvm-svn: 296273	2017-02-25 20:30:45 +00:00
Rong Xu	a1a9f70537	[PGO] Directory name stripping in global identifier for static functions Current internal option -static-func-full-module-prefix keeps all the directory path the profile counter names for static functions. The default of this option is false. This strips the directory names from the source filename which is problematic: (1) it creates linker errors for profile-generation compilation, exposed in our internal benchmarks. We are seeing messages like "warning: relocation refers to discarded section". This is due to the name conflicts after the stripping. (2) the stripping only applies to getPGOFuncName. Current Thin-LTO module importing for the indirect-calls assumes the source directory name not being stripped. Current default value for this option can potentially prevent some inter-module indirect-call-promotions. This patch turns the default value for -static-func-full-module-prefix to true. The second part of the patch is to have an alternative implementation under the internal option -static-func-strip-dirname-prefix=<value> This options specifies level of directories to be stripped from the source filename. Using a large value as the parameter has the same effect as -static-func-full-module-prefix. Differential Revision: http://reviews.llvm.org/D29512 llvm-svn: 296206	2017-02-25 00:00:36 +00:00
Eli Friedman	c12a5a7595	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces. When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167	2017-02-24 20:51:36 +00:00
Yaxun Liu	e6d1ce59c0	[InstCombine] Fix bug in pointer replacement This optimisation was crashing when there was a chain of more than one bitcast instruction to replace, as a result of the changes in D27283. Patch by James Price. Differential Revision: https://reviews.llvm.org/D30347 llvm-svn: 296163	2017-02-24 20:27:25 +00:00
Michael Kuperstein	46b131e3f8	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149	2017-02-24 18:41:32 +00:00
Matthew Simpson	bdc9c78880	[LV] Merge floating-point and integer induction widening code This patch merges the existing floating-point induction variable widening code into the integer induction variable widening code, creating a single set of functions for both kinds of inductions. The primary motivation for doing this is to enable vector phi node creation for floating-point induction variables. Differential Revision: https://reviews.llvm.org/D30211 llvm-svn: 296145	2017-02-24 18:20:12 +00:00
Petr Hosek	a7d5916308	[Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081	2017-02-24 03:10:10 +00:00
Michael Kuperstein	581c9f4b20	Revert r269060 to pacify bots. llvm-svn: 296064	2017-02-24 01:22:19 +00:00
Michael Kuperstein	12e79d5002	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060	2017-02-24 00:56:21 +00:00
Xin Tong	ec6f90bec9	LoopUnswitch - Simplify based on known not to a be constant. Summary: In case we do not know what the condition is in an unswitched loop, but we know its definitely NOT a known constant. We can perform simplifcations based on this information. Reviewers: sanjoy, hfinkel, chenli, efriedma Reviewed By: efriedma Subscribers: david2050, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D28968 llvm-svn: 296041	2017-02-23 23:42:19 +00:00
Hans Wennborg	5cd9a9bf8c	Revert r282872 "CVP. Turn marking adds as no wrap on by default" While not CVP's fault, this caused miscompiles (PR31181). Reverting until those are resolved. (This also reverts the follow-ups r288154 and r288161 which removed the flag.) llvm-svn: 296030	2017-02-23 22:29:00 +00:00
Dehao Chen	cc75d2441d	Add call branch annotation for ICP promoted direct call in SamplePGO mode. Summary: SamplePGO uses branch_weight annotation to represent callsite hotness. When ICP promotes an indirect call to direct call, we need to make sure the direct call is annotated with branch_weight in SamplePGO mode, so that downstream function inliner can use hot callsite heuristic. Reviewers: davidxl, eraman, xur Reviewed By: davidxl, xur Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30282 llvm-svn: 296028	2017-02-23 22:15:18 +00:00
Evgeniy Stepanov	ee2d77f6d6	Disable TLS for stack protector on Android API<17. The TLS slot did not exist back then. llvm-svn: 296014	2017-02-23 21:06:35 +00:00
Chad Rosier	95abfa35d6	[Reassociate] Add negated value of negative constant to the Duplicates list. In OptimizeAdd, we scan the operand list to see if there are any common factors between operands that can be factored out to reduce the number of multiplies (e.g., 'AA+ABC+D' -> 'A(A+BC)+D'). For each operand of the operand list, we only consider unique factors (which is tracked by the Duplicate set). Now if we find a factor that is a negative constant, we add the negated value as a factor as well, because we can percolate the negate out. However, we mistakenly don't add this negated constant to the Duplicates set. Consider the expression A2-2 + B. Obviously, nothing to factor. For the added value A2*-2 we over count 2 as a factor without this change, which causes the assert reported in PR30256. The problem is that this code is assuming that all the multiply operands of the add are already reassociated. This change avoids the issue by making OptimizeAdd tolerate multiplies which haven't been completely optimized; this sort of works, but we're doing wasted work: we'll end up revisiting the add later anyway. Another possible approach would be to enforce RPO iteration order more strongly. If we have RedoInsts, we process them immediately in RPO order, rather than waiting until we've finished processing the whole function. Intuitively, it seems like the natural approach: reassociation works on expression trees, so the optimization only works in one direction. That said, I'm not sure how practical that is given the current Reassociate; the "optimal" form for an expression depends on its use list (see all the uses of "user_back()"), so Reassociate is really an iterative optimization of sorts, so any changes here would probably get messy. PR30256 Differential Revision: https://reviews.llvm.org/D30228 llvm-svn: 296003	2017-02-23 18:49:03 +00:00
Dehao Chen	533bc6ea8e	Use base discriminator in sample pgo profile matching. Summary: The discriminator has been encoded, and only the base discriminator should be used during profile matching. Reviewers: dblaikie, davidxl Reviewed By: dblaikie, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30218 llvm-svn: 295999	2017-02-23 18:27:45 +00:00
Alexey Bataev	f77d1656af	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972	2017-02-23 13:37:09 +00:00
Alexey Bataev	14b370c1bf	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957	2017-02-23 11:09:35 +00:00
Alexey Bataev	7ae653285d	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956	2017-02-23 10:57:15 +00:00
Alexey Bataev	7337212e83	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951	2017-02-23 09:59:29 +00:00
Alexey Bataev	68f2402c61	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295949	2017-02-23 09:40:38 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Matt Arsenault	d4bca1e9ef	AMDGPU: Replace disabled exp inputs with undef llvm-svn: 295914	2017-02-23 00:44:03 +00:00
Michael Kuperstein	6181c62b95	Revert r295868 because it breaks a different SLP lit test. llvm-svn: 295906	2017-02-22 23:35:13 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Daniel Berlin	fccbda967a	PredicateInfo: Support switch statements Summary: Depends on D29606 and D29682 Makes us pass GVN's edge.ll (we also will pass a few other testcases they just need cleaning up). Thoughts on the Predicate* hiearchy of classes especially welcome :) (it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something). Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29747 llvm-svn: 295889	2017-02-22 22:20:58 +00:00
Matthew Simpson	835b246d7f	[LV] Update floating-point induction test checks (NFC) llvm-svn: 295885	2017-02-22 21:56:02 +00:00
Wei Mi	74d5a90fa6	[LSR] Canonicalize formula and put recursive Reg related with current loop in ScaledReg. After rL294814, LSR formula can have multiple SCEVAddRecExprs inside of its BaseRegs. Previous canonicalization will swap the first SCEVAddRecExpr in BaseRegs with ScaledReg. But now we want to swap the SCEVAddRecExpr Reg related with current loop with ScaledReg. Otherwise, we may generate code like this: RegA + lsr.iv + RegB, where loop invariant parts RegA and RegB are not grouped together and cannot be promoted outside of loop. With this patch, it will ensure lsr.iv to be generated later in the expr: RegA + RegB + lsr.iv, so that RegA + RegB can be promoted outside of loop. Differential Revision: https://reviews.llvm.org/D26781 llvm-svn: 295884	2017-02-22 21:47:08 +00:00
Alexey Bataev	b551a81c28	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295868	2017-02-22 20:06:40 +00:00
Matthew Simpson	5ef66ef248	[LV] Add scalar floating-point induction test (NFC) llvm-svn: 295862	2017-02-22 19:09:38 +00:00
Davide Italiano	e122d6885a	[ModuleSummaryAnalysis] Don't crash when referencing unnamed globals. Instead, just be conservative as these are unfrequent enough. Thanks to Peter Collingbourne for the discussion about this on IRC. llvm-svn: 295861	2017-02-22 18:53:38 +00:00
Karl-Johan Karlsson	6eaed7aceb	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 This reverts r295042 (re-applies r295038) with an additional fix for the buildbot problem. llvm-svn: 295858	2017-02-22 18:37:36 +00:00
Alexey Bataev	1aeb0e73e0	[SLP] Test with extra argument used several times. llvm-svn: 295853	2017-02-22 17:47:28 +00:00
Dehao Chen	920677a997	Fix an obvious bug in SampleProfileReaderGCC. Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers. Reviewers: dnovillo, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30233 llvm-svn: 295852	2017-02-22 17:27:21 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Michael Kuperstein	c2af82b4b7	[LoopUnroll] Enable PGO-based loop peeling by default. This enables peeling of loops with low dynamic iteration count by default, when profile information is available. Differential Revision: https://reviews.llvm.org/D27734 llvm-svn: 295796	2017-02-22 00:27:34 +00:00
Matt Arsenault	3ea06336fc	AMDGPU: Remove some uses of llvm.SI.export in tests Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792	2017-02-22 00:02:21 +00:00
Sanjay Patel	cb731f1538	[InstCombine] canonicalize non-obivous forms of integer min/max This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758	2017-02-21 19:33:53 +00:00
Anna Thomas	ec36f3b79a	[InstCombine] Do not exercise nested max/min pattern on abs Summary: This is a fix for assertion failure in `getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern. We should not be invoking the simplification rule for ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations. Added a test case which would cause an assertion failure without the patch. Reviewers: sanjoy, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30051 llvm-svn: 295719	2017-02-21 14:40:28 +00:00
Alexey Bataev	64da79424e	[SLP] Tests for shuffle/blending operations. llvm-svn: 295717	2017-02-21 13:40:55 +00:00
Evgeny Stupachenko	9909872e30	The patch introduces new way of narrowing complex (>UINT16 variants) solutions. The new method introduced under "-lsr-exp-narrow" option (currenlty set to true). Summary: The method is based on registers number mathematical expectation and should be generally closer to optimal solution. Please see details in comments to "LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function (in lib/Transforms/Scalar/LoopStrengthReduce.cpp). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 295704	2017-02-21 07:34:40 +00:00
Alexey Bataev	19b35bf7f4	[SLP] Additional test for vectorization of cal/invoke args vectorization llvm-svn: 295657	2017-02-20 12:41:16 +00:00
Daniel Berlin	f7d9580a08	NewGVN: Start making use of predicateinfo pass. Summary: This begins using the predicateinfo pass in NewGVN. Reviewers: davide Subscribers: llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D29682 llvm-svn: 295583	2017-02-18 23:06:50 +00:00
Daniel Berlin	588e0be39d	PredicateInfo: Clean up predicate info a little, using insertion helpers, and fixing support for the renaming the comparison. llvm-svn: 295581	2017-02-18 23:06:38 +00:00
Sanjay Patel	53c5c3d65d	[InstCombine] add nsw/nuw X, signbit --> or X, signbit Changing to 'or' (rather than 'xor' when no wrapping flags are set) allows icmp simplifies to happen as expected. Differential Revision: https://reviews.llvm.org/D29729 llvm-svn: 295574	2017-02-18 22:20:09 +00:00
Sanjay Patel	fe67255961	[InstSimplify] add nsw/nuw (xor X, signbit), signbit --> X The change to InstCombine in: https://reviews.llvm.org/D29729 ...exposes this missing fold in InstSimplify, so adding this first to avoid a regression. llvm-svn: 295573	2017-02-18 21:59:09 +00:00
Sanjay Patel	308eb22118	[InstSimplify] add tests for add nsw/nuw (xor X, signbit), signbit --> X; NFC llvm-svn: 295572	2017-02-18 21:51:14 +00:00
Sanjay Patel	6d5dddb85f	[InstCombine] add tests for trunc(insertelement); NFC llvm-svn: 295553	2017-02-18 18:27:04 +00:00
Sanjay Patel	86554de2bd	[InstCombine] update trunc(shuffle) tests to reflect IR reality; NFC We're ok shrinking splats, but not shuffles in general. See https://reviews.llvm.org/D30123 for discussion. llvm-svn: 295547	2017-02-18 15:24:31 +00:00
Dehao Chen	7d230325ef	Increases full-unroll threshold. Summary: The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge): Performance: 403 0.11% 433 0.51% 445 0.48% 447 3.50% 453 1.49% 464 0.75% Code size: 403 0.56% 433 0.96% 445 2.16% 447 2.96% 453 0.94% 464 8.02% The compiler time overhead is similar with code size. Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc Reviewed By: hfinkel, chandlerc Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D28368 llvm-svn: 295538	2017-02-18 03:46:51 +00:00
Justin Bogner	efc3fbf6a2	Verifier: Disallow a line number without a file in DISubprogram A line number doesn't make much sense if you don't say where it's from. Add a verifier check for this and update some tests that had bogus debug info. llvm-svn: 295516	2017-02-17 23:57:42 +00:00
Sanjay Patel	f8346550bf	[InstCombine] add tests for trunc(shuffle X, C, M); NFC llvm-svn: 295513	2017-02-17 23:16:54 +00:00
Peter Collingbourne	184773d81f	WholeProgramDevirt: For VCP use a 32-bit ConstantInt for the byte offset. A future change will cause this byte offset to be inttoptr'd and then exported via an absolute symbol. On the importing end we will expect the symbol to be in range [0,2^32) so that it will fit into a 32-bit relocation. The problem is that on 64-bit architectures if the offset is negative it will not be in the correct range once we inttoptr it. This change causes us to use a 32-bit integer so that it can be inttoptr'd (which zero extends) into the correct range. Differential Revision: https://reviews.llvm.org/D30016 llvm-svn: 295487	2017-02-17 19:43:45 +00:00
Peter Collingbourne	37317f1207	WholeProgramDevirt: Examine the function body when deciding whether functions are readnone. The goal is to get an analysis result even for de-refineable functions. Differential Revision: https://reviews.llvm.org/D29803 llvm-svn: 295472	2017-02-17 18:17:04 +00:00
Peter Collingbourne	10c500ddc0	opt: Rename -default-data-layout flag to -data-layout and make it always override the layout. There isn't much point in a flag that only works if the data layout is empty. Differential Revision: https://reviews.llvm.org/D30014 llvm-svn: 295468	2017-02-17 17:36:52 +00:00
Matthew Simpson	f68e183f91	[LV] Remove constant restriction for vector phi creation We previously only created a vector phi node for an induction variable if its step had a constant integer type. However, the step actually only needs to be loop-invariant. We only handle inductions having loop-invariant steps, so this patch should enable vector phi node creation for all integer induction variables that will be vectorized. Differential Revision: https://reviews.llvm.org/D29956 llvm-svn: 295456	2017-02-17 16:09:07 +00:00
Eugene Leviant	958fcd7502	InstCombine: fix extraction when performing vector/array punning Differential revision: https://reviews.llvm.org/D29491 llvm-svn: 295429	2017-02-17 07:36:03 +00:00
Sanjoy Das	8b859c26ec	[JumpThreading] Re-enable JumpThreading for guards Summary: JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200 due to the following problem: the feature used the following algorithm for detection of diamond patters: 1. Find a block with 2 predecessors; 2. Check that these blocks have a common single parent; 3. Check that the parent's terminator is a branch instruction. The problem is that these checks are insufficient. They may pass for a non-diamond construction in case if those two predecessors are actually the same block. This may happen if parent's terminator is a br (either conditional or unconditional) to a block that ends with "switch" instruction with exactly two branches going to one block. This patch re-enables the JumpThreading for guards and fixes this issue by adding the check that those found predecessors are actually different blocks. This guarantees that parent's terminator is a conditional branch with exactly 2 different successors, which is now ensured by assertions. It also adds two more tests for this situation (with parent's terminator being a conditional and an unconditional branch). Patch by Max Kazantsev! Reviewers: anna, sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30036 llvm-svn: 295410	2017-02-17 04:21:14 +00:00
Matt Arsenault	c18b67745b	Bug 31948: Fix assertion when bitcasting constantexpr pointers llvm-svn: 295387	2017-02-17 00:32:19 +00:00
Wei Mi	493fb266ed	[LSR] Prevent formula with SCEVAddRecExpr type of Reg from Sibling loops In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops other than current loop. This is good for the case when induction variable of outerloop being used in expr in innerloop. But it is very bad to allow such Reg from sibling loop because we may need to add lsr.iv in other sibling loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase below, one loop can be inserted with a bunch of lsr.iv because of LSR for other loops. // The induction variable j from a loop in the middle will have initial // value generated from previous sibling loop and exit value used by its // next sibling loop. void goo(long i, long j); long cond; void foo(long N) { long i = 0; long j = 0; i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); i = 0; do { goo(i, j); i++; j++; } while (cond); } The fix is to only allow formula with SCEVAddRecExpr type of Reg from current loop or its parents. Differential Revision: https://reviews.llvm.org/D30021 llvm-svn: 295378	2017-02-16 21:27:31 +00:00
Matt Arsenault	920576042d	InstCombine: Canonicalize fast fmuladd to fmul + fadd llvm-svn: 295353	2017-02-16 18:46:24 +00:00
Craig Topper	3731f4d173	[AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit. llvm-svn: 295294	2017-02-16 07:35:23 +00:00
Peter Collingbourne	50cbd7cc90	Re-apply r295110 and r295144 with a fix for the ASan issue. llvm-svn: 295241	2017-02-15 21:56:51 +00:00
Peter Collingbourne	9421c2dc54	AssumptionCache: Disable the verifier by default, move it behind a hidden cl::opt and verify from releaseMemory(). This is a short term solution to the problem that many passes currently fail to update the assumption cache. In the long term the verifier should not be controllable with a flag. We should either fix all passes to correctly update the assumption cache and enable the verifier unconditionally or somehow arrange for the assumption list to be updated automatically by passes. Differential Revision: https://reviews.llvm.org/D30003 llvm-svn: 295236	2017-02-15 21:10:09 +00:00
Sanjay Patel	056218644b	[Inline] add tests to show attribute information loss; NFC llvm-svn: 295209	2017-02-15 17:42:58 +00:00
Anna Thomas	94c8d4976c	Revert "[JumpThreading] Thread through guards" This reverts commit r294617. We fail on an assert while trying to get a condition from an unconditional branch. llvm-svn: 295200	2017-02-15 17:08:29 +00:00
Daniel Jasper	eef9b03395	Revert r295110 and r295144. This fails under ASAN: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/798/steps/check-llvm%20asan/logs/stdio llvm-svn: 295162	2017-02-15 09:56:08 +00:00
Peter Collingbourne	0609acc10d	SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge. Differential Revision: https://reviews.llvm.org/D29976 llvm-svn: 295145	2017-02-15 03:01:11 +00:00
Easwaran Raman	5a12f236c6	Fix a bug in caller's BFI update code after inlining. Multiple blocks in the callee can be mapped to a single cloned block since we prune the callee as we clone it. The existing code iterates over the value map and clones the block frequency (and eventually scales the frequencies of the cloned blocks). Value map's iteration is not deterministic and so the cloned block might get the frequency of any of the original blocks. The fix is to set the max of the original frequencies to the cloned block. The first block in the sequence must have this max frequency and, in the call context, subsequent blocks must have its frequency. Differential Revision: https://reviews.llvm.org/D29696 llvm-svn: 295115	2017-02-14 22:49:28 +00:00
Peter Collingbourne	534c0175b6	WholeProgramDevirt: Change internal vcall data structures to match summary. Group calls into constant and non-constant arguments up front, and use uint64_t instead of ConstantInt to represent constant arguments. The goal is to allow the information from the summary to fit naturally into this data structure in a future change (specifically, it will be added to CallSiteInfo). This has two side effects: - We disallow VCP for constant integer arguments of width >64 bits. - We remove the restriction that the bitwidth of a vcall's argument and return types must match those of the vfunc definitions. I don't expect either of these to matter in practice. The first case is uncommon, and the second one will lead to UB (so we can do anything we like). Differential Revision: https://reviews.llvm.org/D29744 llvm-svn: 295110	2017-02-14 22:12:23 +00:00
Taewook Oh	2e945ebb13	[BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors Summary: When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of ``` 1 typedef struct _node_t { 2 struct _node_t next; 3 } node_t; 4 5 extern node_t root; 6 7 int foo() { 8 node_t node, tmp; 9 int ret = 0; 10 11 node = tmp = root->next; 12 while (node != root) { 13 while (node) { 14 tmp = node; 15 node = node->next; 16 ret++; 17 } 18 } 19 20 return ret; 21 } ``` , below is the basicblock corresponding to line 12 after Reassociate expressions pass: ``` while.cond: ; preds = %while.cond2, %entry %node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ] %ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ] tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21 tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31 %cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33 br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35 ``` As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is ``` !21 = !DILocation(line: 9, column: 7, scope: !6) ``` , which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction. Reviewers: dblaikie, samsonov, eugenis Reviewed By: eugenis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29867 llvm-svn: 295106	2017-02-14 21:10:40 +00:00
Vedant Kumar	55891fc71e	Re-apply "[profiling] Remove dead profile name vars after emitting name data" This reverts 295092 (re-applies 295084), with a fix for dangling references from the array of coverage names passed down from frontends. I missed this in my initial testing because I only checked test/Profile, and not test/CoverageMapping as well. Original commit message: The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295099	2017-02-14 20:03:48 +00:00
Vedant Kumar	27ebdf4bcb	Revert "[profiling] Remove dead profile name vars after emitting name data" This reverts commit r295084. There is a test failure on: http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/ llvm-svn: 295092	2017-02-14 19:08:39 +00:00
Vedant Kumar	bb10484662	[profiling] Remove dead profile name vars after emitting name data The profile name variables passed to counter increment intrinsics are dead after we emit the finalized name data in __llvm_prf_nm. However, we neglect to erase these name variables. This causes huge size increases in the __TEXT,__const section as well as slowdowns when linker dead stripping is disabled. Some affected projects are so massive that they fail to link on Darwin, because only the small code model is supported. Fix the issue by throwing away the name constants as soon as we're done with them. Differential Revision: https://reviews.llvm.org/D29921 llvm-svn: 295084	2017-02-14 18:48:48 +00:00
Taewook Oh	f22fa72e4a	Do not apply redundant LastCallToStaticBonus Summary: As written in the comments above, LastCallToStaticBonus is already applied to the cost if Caller has only one user, so it is redundant to reapply the bonus here. If the only user is not a caller, TotalSecondaryCost will not be adjusted anyway because callerWillBeRemoved is false. If there's no caller at all, we don't need to care about TotalSecondaryCost because inliningPreventsSomeOuterInline is false. Reviewers: chandlerc, eraman Reviewed By: eraman Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D29169 llvm-svn: 295075	2017-02-14 17:30:05 +00:00
Brian Cain	6dedf65cc9	Correct a typo, s/hosting/hoisting/ llvm-svn: 295066	2017-02-14 16:41:10 +00:00
Matthew Simpson	f09d13e5cc	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps" This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063	2017-02-14 16:28:32 +00:00
Alexey Bataev	2a2f35d59c	[SLP] Fix for PR31879: vectorize repeated scalar ops that don't get put back into a vector Previously the cost of the existing ExtractElement/ExtractValue instructions was considered as a dead cost only if it was detected that they have only one use. But these instructions may be considered dead also if users of the instructions are also going to be vectorized, like: ``` %x0 = extractelement <2 x float> %x, i32 0 %x1 = extractelement <2 x float> %x, i32 1 %x0x0 = fmul float %x0, %x0 %x1x1 = fmul float %x1, %x1 %add = fadd float %x0x0, %x1x1 ``` This can be transformed to ``` %1 = fmul <2 x float> %x, %x %2 = extractelement <2 x float> %1, i32 0 %3 = extractelement <2 x float> %1, i32 1 %add = fadd float %2, %3 ``` because though `%x0` and `%x1` have 2 users each other, these users are part of the vectorized tree and we can consider these `extractelement` instructions as dead. Differential Revision: https://reviews.llvm.org/D29900 llvm-svn: 295056	2017-02-14 15:20:48 +00:00
Alexey Bataev	4ed47342ff	[SLP] Additional tests for extractelement cost fix. llvm-svn: 295050	2017-02-14 12:52:05 +00:00
Karl-Johan Karlsson	ec21b769ec	Revert "[LoopVectorize] Added address space check when analysing interleaved accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042	2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson	2ec409cca2	[LoopVectorize] Added address space check when analysing interleaved accesses Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038	2017-02-14 08:14:06 +00:00
Mikael Holmen	ece84cd10c	[LSR] Pointers with different address spaces are considered incompatible. Summary: Function isCompatibleIVType is already used as a guard before the call to SE.getMinusSCEV(OperExpr, PrevExpr); in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions to be of the same type, so we now consider two pointers with different address spaces to be incompatible, since it is possible that the pointers in fact have different sizes. Reviewers: qcolombet, eli.friedman Reviewed By: qcolombet Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29885 llvm-svn: 295033	2017-02-14 06:37:42 +00:00
Peter Collingbourne	002c2d5380	ThinLTOBitcodeWriter: Write available_externally copies of VCP eligible functions to merged module. Differential Revision: https://reviews.llvm.org/D29701 llvm-svn: 295021	2017-02-14 03:42:38 +00:00
Philip Reames	b2bca7e309	[LICM] Make store promotion work in the face of unordered atomics Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled. Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct. It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following: while(b) { load i128 ... if (can lower i128 atomic) store atomic i128 ... else store i128 } It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically. It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result. Differential Revision: https://reviews.llvm.org/D15592 llvm-svn: 295015	2017-02-14 01:38:31 +00:00
Sanjay Patel	4f74216da0	[FunctionAttrs] try to extend nonnull-ness of arguments from a callsite back to its parent function As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html ...we should be able to propagate 'nonnull' info from a callsite back to its parent. The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430), but this won't solve that problem. The transform is currently disabled by default while we wait for clang to work-around potential security problems: http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html Differential Revision: https://reviews.llvm.org/D27855 llvm-svn: 294998	2017-02-13 23:10:51 +00:00
Peter Collingbourne	2b33f65317	IR: Type ID summary extensions for WPD; thread summary into WPD pass. Make the whole thing testable by adding YAML I/O support for the WPD summary information and adding some negative tests that exercise the YAML support. Differential Revision: https://reviews.llvm.org/D29782 llvm-svn: 294981	2017-02-13 19:26:18 +00:00
Alexey Bataev	7bed48e7a3	[SLP] Test for extractelement cost fix. llvm-svn: 294980	2017-02-13 19:08:19 +00:00
Matthew Simpson	659f92e2aa	Revert "[LV] Extend trunc optimization to all IVs with constant integer steps" This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973	2017-02-13 18:02:35 +00:00
Matthew Simpson	7b7f40297f	[LV] Extend trunc optimization to all IVs with constant integer steps This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967	2017-02-13 16:48:00 +00:00
Alexey Bataev	e8b1536e21	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934	2017-02-13 08:01:26 +00:00
Daniel Berlin	1bcd504a88	NewGVN: Update a number of xfailed tests to either be correct or note why they fail. llvm-svn: 294928	2017-02-12 23:28:06 +00:00
Daniel Berlin	2ef385d019	NewGVN: We really pass TBAA if we enable DCE and fix the test. Note that GVN eliminates no-use readonly/readnone calls, even if they are not marked nounwind. NewGVN only eliminates them if they are marked nounwind, and thus, trivially dead. llvm-svn: 294927	2017-02-12 23:24:47 +00:00
Daniel Berlin	86eab15f2b	NewGVN: Apply the fast math flags fix in r267113 to NewGVN as well. llvm-svn: 294922	2017-02-12 22:25:20 +00:00
Daniel Berlin	dbe8264c93	PredicateInfo: Handle critical edges Summary: This adds support for placing predicateinfo such that it affects critical edges. This fixes the issues mentioned by Nuno on the mailing list. Depends on D29519 Reviewers: davide, nlopes Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29606 llvm-svn: 294921	2017-02-12 22:12:20 +00:00
Sanjay Patel	45b7e69fef	[InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2) I found one special case of this transform for 'slt 0', so I removed that and added the general transform. Alive code to check correctness: Name: slt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp slt %a, C1 => %b = icmp slt %x, C1 - C2 Name: sgt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp sgt %a, C1 => %b = icmp sgt %x, C1 - C2 http://rise4fun.com/Alive/MH Differential Revision: https://reviews.llvm.org/D29774 llvm-svn: 294898	2017-02-12 16:40:30 +00:00
Sanjay Patel	97e4b98749	[ValueTracking] use nonnull argument attribute to eliminate null checks Enhancing value tracking's analysis of null-ness was suggested in D27855, so here's a first attempt at that. This is part of solving: https://llvm.org/bugs/show_bug.cgi?id=28430 Differential Revision: https://reviews.llvm.org/D28204 llvm-svn: 294897	2017-02-12 15:35:34 +00:00
Dorit Nuzman	eac89d736c	[LV/LoopAccess] Check statically if an unknown dependence distance can be proven larger than the loop-count This fixes PR31098: Try to resolve statically data-dependences whose compile-time-unknown distance can be proven larger than the loop-count, instead of resorting to runtime dependence checking (which are not always possible). For vectorization it is sufficient to prove that the dependence distance is >= VF; But in some cases we can prune unknown dependence distances early, and even before selecting the VF, and without a runtime test, by comparing the distance against the loop iteration count. Since the vectorized code will be executed only if LoopCount >= VF, proving distance >= LoopCount also guarantees that distance >= VF. This check is also equivalent to the Strong SIV Test. Reviewers: mkuper, anemet, sanjoy Differential Revision: https://reviews.llvm.org/D28044 llvm-svn: 294892	2017-02-12 09:32:53 +00:00
Daniel Berlin	b79f53669a	NewGVN: Clean up how we handle the INITIAL class so that everything in it is dead or unreachable, as it should be. This also makes the leader of INITIAL undef, enabling us to handle irreducibility properly. Summary: This lets us verify, more than we do now, that we didn't screw up value numbering. Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D29842 llvm-svn: 294844	2017-02-11 12:48:50 +00:00
Evgeny Stupachenko	5f3d9b6c09	The patch fixes r294821 Summary: Update register match for windows testing From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294825	2017-02-11 05:39:00 +00:00
Evgeny Stupachenko	fe6f548d2d	Fix PR23384 (under "-lsr-insns-cost" option) Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821	2017-02-11 02:57:43 +00:00
Ahmed Bougacha	8425f453ef	[ARM] Make f16 interleaved accesses expensive. There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Teach the cost model to consider f16 interleaved operations as expensive. Otherwise, we are all but guaranteed to end up with a large block of scalarized vector code. llvm-svn: 294819	2017-02-11 01:53:04 +00:00
Ahmed Bougacha	fc979dc9dd	[ARM] Don't lower f16 interleaved accesses. There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Reject f16 interleaved accesses. If we try to emit the f16 intrinsics, we'll just end up with a selection failure. llvm-svn: 294818	2017-02-11 01:53:00 +00:00
Ahmed Bougacha	f37fb89edc	[ARM] Unique some redundant CHECK lines. NFC. llvm-svn: 294817	2017-02-11 01:52:57 +00:00
Wei Mi	8f20e63a20	[LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814	2017-02-11 00:50:23 +00:00
Yaxun Liu	ba01ed00fe	Fix invalid addrspacecast due to combining alloca with global var For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references. However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu). To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions. Differential Revision: https://reviews.llvm.org/D27283 llvm-svn: 294786	2017-02-10 21:46:07 +00:00
Dehao Chen	fb02f7140a	Encode duplication factor from loop vectorization and loop unrolling to discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782	2017-02-10 21:09:07 +00:00
Matthew Simpson	df124a7569	[LV] Remove type restriction for vector phi creation We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755	2017-02-10 16:15:26 +00:00
Philip Reames	578dafbd8b	[LoopUnswitch] Remove BFI usage (dead code) Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass. Given this code has been checked in, but off for over a year, it makes sense to just remove it for now. Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should. llvm-svn: 294715	2017-02-10 06:12:06 +00:00
Michael J. Spencer	788b10ecbc	[LoadCombine] Change test to not use instcombine. llvm-svn: 294682	2017-02-10 00:44:08 +00:00
Davide Italiano	fc0d442cf1	[NewGVN] Fix test so that it doesn't rely on InstCombine anymore. llvm-svn: 294668	2017-02-09 23:48:10 +00:00
Chandler Carruth	addcda483e	[PM] Port ArgumentPromotion to the new pass manager. Now that the call graph supports efficient replacement of a function and spurious reference edges, we can port ArgumentPromotion to the new pass manager very easily. The old PM-specific bits are sunk into callbacks that the new PM simply doesn't use. Unlike the old PM, the new PM simply does argument promotion and afterward does the update to LCG reflecting the promoted function. Differential Revision: https://reviews.llvm.org/D29580 llvm-svn: 294667	2017-02-09 23:46:27 +00:00
Peter Collingbourne	17febdbb25	WholeProgramDevirt: Check that VCP candidate functions are defined before evaluating them. This was crashing before. llvm-svn: 294666	2017-02-09 23:46:26 +00:00
Sanjay Patel	f38bab73aa	[InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectors This fold already existed for vectors but only when 'C1' was a splat constant (but 'C2' could be any constant). There were no tests for any vector constants, so I'm adding a test that shows non-splat constants for both operands. llvm-svn: 294650	2017-02-09 23:13:04 +00:00
Michael J. Spencer	714d9d22ad	[LoadCombine] Fix combining of loads which span an aliasing store. Fixes PR31517 Differential Revision: https://reviews.llvm.org/D28922 llvm-svn: 294632	2017-02-09 21:46:49 +00:00
Sanjay Patel	ae3b43e488	[InstCombine] use m_APInt to allow demanded bits analysis on splat constants llvm-svn: 294628	2017-02-09 21:43:06 +00:00
Sanjay Patel	5bcb2d97f0	[InstCombine] add test for demanded bits with splat vector constants; NFC llvm-svn: 294625	2017-02-09 21:33:19 +00:00
Sanjoy Das	74bda4d591	[JumpThreading] Thread through guards Summary: This patch allows JumpThreading also thread through guards. Virtually, guard(cond) is equivalent to the following construction: if (cond) { do something } else {deoptimize} Yet it is not explicitly converted into IFs before lowering. This patch enables early threading through guards in simple cases. Currently it covers the following situation: if (cond1) { // code A } else { // code B } // code C guard(cond2) // code D If there is implication cond1 => cond2 or !cond1 => cond2, we can transform this construction into the following: if (cond1) { // code A // code C } else { // code B // code C guard(cond2) } // code D Thus, removing the guard from one of execution branches. Patch by Max Kazantsev! Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29620 llvm-svn: 294617	2017-02-09 19:40:22 +00:00
Sanjay Patel	b36e1f0223	[InstCombine] add tests for icmp with add nsw; NFC llvm-svn: 294601	2017-02-09 18:12:39 +00:00
Peter Collingbourne	58c90c0c80	LowerTypeTests: Change a few vtable globals in tests to constants. It turns out that some of our negative tests were not in fact providing the test coverage we expected: they were passing because the vtables were failing an early check that they were constant. Fix this by changing the globals in these tests to constants. llvm-svn: 294550	2017-02-09 01:48:24 +00:00
Sanjay Patel	a62bc44f67	[InstCombine] add tests to show information-losing add nsw/nuw transforms; NFC llvm-svn: 294524	2017-02-08 22:14:11 +00:00
Peter Collingbourne	28ffd3261f	ThinLTOBitcodeWriter: Strip debug info from merged module. This module will contain nothing but vtable definitions and (soon) available_externally function definitions, so there is no point in keeping debug info in the module. Differential Revision: https://reviews.llvm.org/D28913 llvm-svn: 294511	2017-02-08 20:44:00 +00:00
Alexey Bataev	0674fe39e5	[SLP] Additional test to check correct work of horizontal reductions, NFC. llvm-svn: 294505	2017-02-08 19:52:46 +00:00
Elena Demikhovsky	5267edd3e3	[Loop Vectorizer] Cost-based decision for vectorization form of memory instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503	2017-02-08 19:25:23 +00:00
Sanjay Patel	d11a03b263	[InstCombine] add test for missed vector icmp fold; NFC Also, move the related existing scalar test to a renamed file where I'm planning to add more icmp-add tests. llvm-svn: 294487	2017-02-08 17:37:17 +00:00
Igor Laevsky	900ffa34c8	[InstCombineCalls] Unfold element atomic memcpy instruction Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453	2017-02-08 14:32:04 +00:00
Chandler Carruth	1497710f52	[ArgPromote] Delete a test that makes no sense (any more). This test is under 'ArgumentPromotion' but there are no arguments that get promoted in the test case, so there seems to be no point. Also, there are no assertions about the output at all, so this seems like something we should just delete given the low value. llvm-svn: 294428	2017-02-08 08:54:08 +00:00
Chandler Carruth	2af523f20c	[ArgPromote] Clean up a crash test case by rinsing it through opt, renaming things to at least have somewhat spelled out names, and even have meaningful names where I could guess at what they should be. Also add FileCheck assertions that we're actually doing what we set out to do for some of the tests, for example not promoting a type that would result in infinite promotion. llvm-svn: 294426	2017-02-08 08:47:35 +00:00
Chandler Carruth	102fa92b4e	[ArgPromote] Actually add FileCheck to a test that I actually updated to have nice CHECK patterns instead of relying on a coarse 'not grep' check. Sorry that I missed this the first time through. llvm-svn: 294422	2017-02-08 08:04:02 +00:00
Chandler Carruth	9e44e08953	[ArgPromote] Actually run FileCheck on this test. The CHECK lines are already there, just waiting to, well, be checked. =] llvm-svn: 294421	2017-02-08 08:01:14 +00:00
Matt Arsenault	cb3fa37c7e	LSR: Check atomic instruction pointer operands llvm-svn: 294410	2017-02-08 06:44:58 +00:00
Craig Topper	e0ac7f3beb	[X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405	2017-02-08 05:45:39 +00:00
Sanjoy Das	ec892139bd	[IRCE] Add a missing invariant check Currently IRCE relies on the loops it transforms to be (semantically) of the form: for (i = START; i < END; i++) ... or for (i = START; i > END; i--) ... However, we were not verifying the presence of the START < END entry check (i.e. check before the first iteration). We were only verifying that the backedge was guarded by (i + 1) < END. Usually this would work "fine" since (especially in Java) most loops do actually have the START < END check, but of course that is not guaranteed. llvm-svn: 294375	2017-02-07 23:59:07 +00:00
Daniel Berlin	439042b7ad	Add PredicateInfo utility and printing pass Summary: This patch adds a utility to build extended SSA (see "ABCD: eliminating array bounds checks on demand"), and an intrinsic to support it. This is then used to get functionality equivalent to propagateEquality in GVN, in NewGVN (without having to replace instructions as we go). It would work similarly in SCCP or other passes. This has been talked about a few times, so i built a real implementation and tried to productionize it. Copies are inserted for operands used in assumes and conditional branches that are based on comparisons (see below for more) Every use affected by the predicate is renamed to the appropriate intrinsic result. E.g. %cmp = icmp eq i32 %x, 50 br i1 %cmp, label %true, label %false true: ret i32 %x false: ret i32 1 will become %cmp = icmp eq i32, %x, 50 br i1 %cmp, label %true, label %false true: ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 } %x.0 = call @llvm.ssa_copy.i32(i32 %x) ret i32 %x.0 false: ret i23 1 (you can use -print-predicateinfo to get an annotated-with-predicateinfo dump) This enables us to easily determine what operations are affected by a given predicate, and how operations affected by a chain of predicates. Reviewers: davide, sanjoy Subscribers: mgorny, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D29519 Update for review comments Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong Update for review comments llvm-svn: 294351	2017-02-07 21:10:46 +00:00
Matthew Simpson	3877f397cd	[LV] Add new ARM/AArch64 interleaved access cost model tests (NFC) llvm-svn: 294342	2017-02-07 19:34:24 +00:00
Matthew Simpson	1cd02f13a5	[LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC) This patch removes unneeded instructions from the existing ARM/AArch64 interleaved access cost model tests. I'll be adding a similar set of tests in a follow-on patch to increase coverage. llvm-svn: 294336	2017-02-07 19:17:44 +00:00
Reid Kleckner	828f3179c2	Fix my GVNHoist test case from r294317 llvm-svn: 294319	2017-02-07 17:35:53 +00:00
Reid Kleckner	79e37d517c	Revert "[GVNHoist] Merge DebugLoc metadata on hoisted instructions" This reverts commit r294250. It caused PR31891. Add a test case that shows that inlinable calls retain location information with an accurate scope. llvm-svn: 294317	2017-02-07 17:31:13 +00:00
Dehao Chen	4a9dd70213	Fix the samplepgo indirect call promotion bug: we should not promote a direct call. Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29570 llvm-svn: 294260	2017-02-06 23:33:15 +00:00
Paul Robinson	383c5c228f	Merge DebugLoc on combined stores; in this case, when combining stores from the end of two blocks, merge instead of arbitrarily picking one. Differential Revision: http://reviews.llvm.org/D29504 llvm-svn: 294251	2017-02-06 22:19:04 +00:00
Taewook Oh	44a856f7d5	[GVNHoist] Merge DebugLoc metadata on hoisted instructions Summary: When instructions are hoisted, current implementation keeps DebugLoc metadata of the instruction that chosen as Repl (and its GEP operand if Repl is a load or a store). However, DebugLoc metadata should be updated to the 'merged' location across all hoisted instructions. See the following example code: ``` 1: typedef struct { 2: int a[10]; 3: } S1; 4: 5: extern S1 s1[10]; 6: 7: void foo(int x, int y, int i) { 8: if (y) 9: s1[i]->a[i] = x + y; 10: else 11: s1[i]->a[i] = x; 12: } ``` Below is LLVM IR representation of the program before gvn-hoist: ``` %struct.S1 = type { [10 x i32] } @s1 = external local_unnamed_addr global [10 x %struct.S1], align 16 define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 { entry: %tobool = icmp ne i32 %y, 0, !dbg !8 br i1 %tobool, label %if.then, label %if.else, !dbg !10 if.then: ; preds = %entry %add = add nsw i32 %x, %y, !dbg !11 %idxprom = sext i32 %i to i64, !dbg !12 %arrayidx = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !12 %0 = load %struct.S1, %struct.S1* %arrayidx, align 8, !dbg !12, !tbaa !13 %a = getelementptr inbounds %struct.S1, %struct.S1* %0, i32 0, i32 0, !dbg !17 br label %if.end, !dbg !12 if.else: ; preds = %entry %idxprom3 = sext i32 %i to i64, !dbg !18 %arrayidx4 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom3, !dbg !18 %1 = load %struct.S1, %struct.S1* %arrayidx4, align 8, !dbg !18, !tbaa !13 %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19 br label %if.end if.end: ; preds = %if.else, %if.then %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ] %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ] %idxprom6 = sext i32 %i to i64 %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom6 store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20 ret void, !dbg !22 } ``` where ``` !11 = !DILocation(line: 9, column: 18, scope: !9) !12 = !DILocation(line: 9, column: 5, scope: !9) !18 = !DILocation(line: 11, column: 5, scope: !9) !19 = !DILocation(line: 11, column: 9, scope: !9) ``` . And below is after gvn-hoist: ``` define void @foo(i32 %x, i32 %y, i32 %i) !dbg !4 { entry: %tobool = icmp ne i32 %y, 0, !dbg !8 %idxprom = sext i32 %i to i64, !dbg !10 %0 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !10 %1 = load %struct.S1, %struct.S1* %0, align 8, !dbg !10, !tbaa !11 br i1 %tobool, label %if.then, label %if.else, !dbg !15 if.then: ; preds = %entry %add = add nsw i32 %x, %y, !dbg !16 %arrayidx = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !10 %a = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !17 br label %if.end, !dbg !10 if.else: ; preds = %entry %arrayidx4 = getelementptr inbounds [10 x %struct.S1], [10 x %struct.S1]* @s1, i64 0, i64 %idxprom, !dbg !18 %a5 = getelementptr inbounds %struct.S1, %struct.S1* %1, i32 0, i32 0, !dbg !19 br label %if.end if.end: ; preds = %if.else, %if.then %a5.sink = phi [10 x i32]* [ %a5, %if.else ], [ %a, %if.then ] %.sink = phi i32 [ %x, %if.else ], [ %add, %if.then ] %arrayidx7 = getelementptr inbounds [10 x i32], [10 x i32]* %a5.sink, i64 0, i64 %idxprom store i32 %.sink, i32* %arrayidx7, align 4, !tbaa !20 ret void, !dbg !22 } ``` As you see, loads and their GEPs have been hosited from if.then/if.else block to entry block. However, DebugLoc metadata of these new instructions are still same as the instructions in if.then block, as they are moved/cloned from if.then block. This may result incorrect stepping and imprecise sample profile result. Reviewers: majnemer, pcc, sebpop Reviewed By: sebpop Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29377 llvm-svn: 294250	2017-02-06 22:05:04 +00:00
Michael Kuperstein	7a86bb2589	[SLP] Revert "Allow using of extra values in horizontal reductions." This breaks when one of the extra values is also a scalar that participates in the same vectorization tree which we'll end up reducing. llvm-svn: 294245	2017-02-06 21:50:59 +00:00
Peter Collingbourne	e69e73c7b8	IR: Consider two DISubprograms to be odr-equal if they have the same template parameters. In ValueMapper we create new operands for MDNodes and rely on MDNode::replaceWithUniqued to create a new MDNode with the specified operands. However this doesn't always actually happen correctly for DISubprograms because when we uniquify the new node, we only odr-compare it with existing nodes (MDNodeSubsetEqualImpl<DISubprogram>::isDeclarationOfODRMember). Although the TemplateParameters field can refer to a distinct DICompileUnit via DITemplateTypeParameter::type -> DICompositeType::scope -> DISubprogram::unit, it is not currently included in the odr comparison. As a result, we can end up getting our original DISubprogram back, which means we will have a cloned module referring to the DICompileUnit in the original module, which causes a verification error. The fix I implemented was to consider TemplateParameters to be one of the odr-equal properties. But I'm a little uncomfortable with this. In general it seems unsound to rely on distinct MDNodes never being reachable from nodes which we only check odr-equality of. My only long term suggestion would be to separate odr-uniquing from full uniquing. Differential Revision: https://reviews.llvm.org/D29240 llvm-svn: 294240	2017-02-06 21:23:03 +00:00
Sanjay Patel	54656ca7db	[ValueTracking] emit a remark when we detect a conflicting assumption (PR31809) This is a follow-up to D29395 where we try to be good citizens and let the user know that we've probably gone off the rails. This should allow us to resolve: https://llvm.org/bugs/show_bug.cgi?id=31809 Differential Revision: https://reviews.llvm.org/D29404 llvm-svn: 294208	2017-02-06 18:26:06 +00:00
Dehao Chen	c81483d79c	Fix the bug of samplepgo indirect call promption when type casting of the return value is needed. Summary: When type casting of the return value is needed, promoteIndirectCall will return the type casting instruction instead of the direct call. This patch changed to return the direct call instruction instead. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29569 llvm-svn: 294205	2017-02-06 18:10:36 +00:00
Chandler Carruth	044d1e0dcf	[ArgPromote] Replace all the grep-based testing with precise FileCheck tests. This also removes the use of instcombine as we can max the patterns produced by argument promotion directly with the more powerful tools in FileCheck. llvm-svn: 294174	2017-02-06 08:43:11 +00:00
Davide Italiano	ec49313b11	[IPCP] Don't propagate return value for naked functions. This is pretty much the same change made in SCCP. llvm-svn: 294098	2017-02-04 19:44:14 +00:00
Sanjay Patel	0fe32ac256	[InstCombine] treat i1 as a special type in shouldChangeType() This patch is based on the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html Folding to i1 should always be desirable because that's better for value tracking and we have special folds for i1 types. I checked for other users of shouldChangeType() where this might have an effect, but we already handle the i1 case differently than other types in all of those cases. Side note: the default datalayout includes i1, so it seems we only find this gap in shouldChangeType + phi folding for the case when there is (1) an explicit datalayout without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming casted operands (as Björn mentioned). Differential Revision: https://reviews.llvm.org/D29336 llvm-svn: 294066	2017-02-03 23:13:11 +00:00
Sanjay Patel	73fc8ddb06	[InstCombine] fix operand-complexity-based canonicalization (PR28296) The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg) operators from arguments. Adding another level to the weighting scheme provides more structure and can help simplify the pattern matching in InstCombine and other places. I fixed regressions that would have shown up from this change in: rL290067 rL290127 But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests. Should fix: https://llvm.org/bugs/show_bug.cgi?id=28296 Differential Revision: https://reviews.llvm.org/D27933 llvm-svn: 294049	2017-02-03 21:43:34 +00:00
Sanjay Patel	62f175f01e	[InstCombine] auto-generate better test checks; NFC llvm-svn: 294040	2017-02-03 20:56:38 +00:00
Sanjay Patel	d08bded092	[InstCombine] auto-generate better test checks; NFC llvm-svn: 294034	2017-02-03 20:19:33 +00:00
Matt Arsenault	d9cd736585	AMDGPU: Don't unroll for private with dynamic allocas This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030	2017-02-03 19:36:00 +00:00
Michael Kuperstein	723999d4aa	[SLP] Use SCEV to sort memory accesses. This generalizes memory access sorting to use differences between SCEVs, instead of relying on constant offsets. That allows us to properly do SLP vectorization of non-sequentially ordered loads within loops bodies. Differential Revision: https://reviews.llvm.org/D29425 llvm-svn: 294027	2017-02-03 19:09:45 +00:00
Alexey Bataev	a16cfe6fa9	[SLP] Fix for PR31690: Allow using of extra values in horizontal reductions. Currently LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also it supports a vectorization of instructions with some extra arguments, like: float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D28961 llvm-svn: 293994	2017-02-03 08:08:50 +00:00
Stanislav Mekhanoshin	f29602df65	[AMDGPU] Unroll preferences improvements Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991	2017-02-03 02:20:05 +00:00
Peter Collingbourne	37e2459186	FunctionImport: Remove the -disable-force-link-odr flag and change importFunctions to never force link. This removes some functionality that was only being used by tests. Differential Revision: https://reviews.llvm.org/D29439 llvm-svn: 293919	2017-02-02 18:42:25 +00:00
Teresa Johnson	825221192f	[ThinLTO] Resolve old FIXME for alias importing in test This FIXME was added with r265941 and should have been resolved with r266517. llvm-svn: 293901	2017-02-02 15:58:06 +00:00
Jun Bum Lim	180bc5a021	[JumpThread] Enhance finding partial redundant loads by continuing scanning single predecessor Summary: While scanning predecessors to find an available loaded value, if the predecessor has a single predecessor, we can continue scanning through the single predecessor. Reviewers: mcrosier, rengolin, reames, davidxl, haicheng Reviewed By: rengolin Subscribers: zzheng, llvm-commits Differential Revision: https://reviews.llvm.org/D29200 llvm-svn: 293896	2017-02-02 15:12:34 +00:00
Anna Thomas	7f4b26e189	[LICM] Hoist loads that are dominated by invariant.start intrinsic, and are invariant in the loop. Summary: We can hoist out loads that are dominated by invariant.start, to the preheader. We conservatively assume the load is variant, if we see a corresponding use of invariant.start (it could be an invariant.end or an escaping call). Reviewers: mkuper, sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29331 llvm-svn: 293887	2017-02-02 13:22:03 +00:00
Adam Nemet	0bf1b863b9	[LV] Also port failure remarks to new OptimizationRemarkEmitter API llvm-svn: 293866	2017-02-02 05:41:51 +00:00
Matt Arsenault	300836098f	InferAddressSpaces: Handle more cases with constant select operands llvm-svn: 293859	2017-02-02 03:37:22 +00:00
Davide Italiano	cb68f37184	[IPSCCP] Restore the old behaviour (pre r293799). It's not clear the change I made a good idea, and it definitely needs further discussion. Thanks to Eli for pointing out. llvm-svn: 293846	2017-02-02 00:46:54 +00:00
Sanjay Patel	52e4e6594e	[ValueTracking] remove a FIXME for something we don't want to do; NFC The comment was added with: https://reviews.llvm.org/rL293773 ...but there would be a cost to implement this and possibly no payoff. llvm-svn: 293823	2017-02-01 22:27:34 +00:00
Sanjay Patel	2b0cd30ce5	fix typos; NFC llvm-svn: 293816	2017-02-01 21:38:32 +00:00
Sanjay Patel	c56d1ccd79	[InstCombine] move folds for shift-shift pairs; NFCI Although this is 'no-functional-change-intended', I'm adding tests for shl-shl and lshr-lshr pairs because there is no existing test coverage for those folds. It seems like we should be able to remove some code from foldShiftedShift() at this point because we're handling those patterns on the general path. llvm-svn: 293814	2017-02-01 21:31:34 +00:00
Davide Italiano	a76357deef	[SCCP] Make sure we get this case right without noinline. Thanks to Hal for pointing out in the post-commit review of r293727. llvm-svn: 293801	2017-02-01 19:03:46 +00:00
Davide Italiano	6849f20d85	[IPSCCP] Don't propagate return values of functions marked as noinline. This tries to address what Hal defined (in the post-commit review of r293727) a long-standing problem with noinline, where we end up de facto inlining trivial functions e.g. __attribute__((noinline)) int patatino(void) { return 5; } because of return value propagation. llvm-svn: 293799	2017-02-01 18:52:20 +00:00
Sanjoy Das	e0e5795f6b	[InstCombine] Allow InstCombine to merge adjacent guards Summary: If there are two adjacent guards with different conditions, we can remove one of them and include its condition into the condition of another one. This patch allows InstCombine to merge them by the following pattern: guard(a); guard(b) -> guard(a & b). Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29378 llvm-svn: 293778	2017-02-01 16:34:55 +00:00
Sanjay Patel	25f6d710d9	[ValueTracking] avoid crashing from bad assumptions (PR31809) A program may contain llvm.assume info that disagrees with other analysis. This may be caused by UB in the program, so we must not crash because of that. As noted in the code comments: https://llvm.org/bugs/show_bug.cgi?id=31809 ...we can do better, but this at least avoids the assert/crash in the bug report. Differential Revision: https://reviews.llvm.org/D29395 llvm-svn: 293773	2017-02-01 15:41:32 +00:00
Davide Italiano	7343b9f340	[IPSCCP] Teach how to not propagate return values of naked functions. Differential Revision: https://reviews.llvm.org/D29360 llvm-svn: 293727	2017-02-01 01:01:22 +00:00
Matt Arsenault	bdd59e6879	InferAddressSpaces: Handle select This fails to handle some cases where one of the inputs is a constant to be fixed in a later commit. llvm-svn: 293723	2017-02-01 00:08:53 +00:00
Matt Arsenault	2a46d81038	InferAddressSpaces: Fix broken casting of constants llvm-svn: 293718	2017-01-31 23:48:40 +00:00
Taewook Oh	75acec8a14	Do not propagate DebugLoc across basic blocks Summary: DebugLoc shouldn't be propagated across basic blocks to prevent incorrect stepping and imprecise sample profile result. rL288903 addressed the wrong DebugLoc propagation issue by limiting the copy of DebugLoc when GVN removes a fully redundant load that is dominated by some other load. However, DebugLoc is still incorrectly propagated in the following example: ``` 1: extern int g; 2: 3: void foo(int x, int y, int z) { 4: if (x) 5: g = 0; 6: else 7: g = 1; 8: 9: int i = 0; 10: for ( ; i < y ; i++) 11: if (i > z) 12: g++; 13: } ``` Below is LLVM IR representation of the program before GVN: ``` @g = external local_unnamed_addr global i32, align 4 ; Function Attrs: nounwind uwtable define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr #0 !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.body.preheader, %for.inc %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !21 if.then2: ; preds = %for.body %0 = load i32, i32* @g, align 4, !dbg !22, !tbaa !9 %inc = add nsw i32 %0, 1, !dbg !22 store i32 %inc, i32* @g, align 4, !dbg !22, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %for.body, %if.then2 %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` where ``` !21 = !DILocation(line: 11, column: 9, scope: !15) !22 = !DILocation(line: 12, column: 8, scope: !20) !23 = !DILocation(line: 12, column: 7, scope: !20) !24 = !DILocation(line: 10, column: 20, scope: !25) ``` And below is after GVN: ``` @g = external local_unnamed_addr global i32, align 4 define void @foo(i32 %x, i32 %y, i32 %z) local_unnamed_addr !dbg !4 { entry: %not.tobool = icmp eq i32 %x, 0, !dbg !8 %.sink = zext i1 %not.tobool to i32, !dbg !8 store i32 %.sink, i32* @g, align 4, !tbaa !9 %cmp8 = icmp sgt i32 %y, 0, !dbg !13 br i1 %cmp8, label %for.body.preheader, label %for.end, !dbg !17 for.body.preheader: ; preds = %entry br label %for.body, !dbg !19 for.body: ; preds = %for.inc, %for.body.preheader %0 = phi i32 [ %1, %for.inc ], [ %.sink, %for.body.preheader ], !dbg !21 %i.09 = phi i32 [ %inc4, %for.inc ], [ 0, %for.body.preheader ] %cmp1 = icmp sgt i32 %i.09, %z, !dbg !19 br i1 %cmp1, label %if.then2, label %for.inc, !dbg !22 if.then2: ; preds = %for.body %inc = add nsw i32 %0, 1, !dbg !21 store i32 %inc, i32* @g, align 4, !dbg !21, !tbaa !9 br label %for.inc, !dbg !23 for.inc: ; preds = %if.then2, %for.body %1 = phi i32 [ %inc, %if.then2 ], [ %0, %for.body ] %inc4 = add nuw nsw i32 %i.09, 1, !dbg !24 %exitcond = icmp ne i32 %inc4, %y, !dbg !13 br i1 %exitcond, label %for.body, label %for.end.loopexit, !dbg !17 for.end.loopexit: ; preds = %for.inc br label %for.end, !dbg !26 for.end: ; preds = %for.end.loopexit, %entry ret void, !dbg !26 } ``` As you see, GVN removes the load in if.then2 block and creates a phi instruction in for.body for it. The problem is that DebugLoc of remove load instruction is propagated to the newly created phi instruction, which is wrong. rL288903 cannot handle this case because ValuesPerBlock.size() is not 1 in this example when the load is removed. Reviewers: aprantl, andreadb, wolfgangp Reviewed By: andreadb Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D29254 llvm-svn: 293688	2017-01-31 20:57:13 +00:00
Matthias Braun	01fa962226	InterleaveAccessPass: Avoid constructing invalid shuffle masks Fix a bug where we would construct shufflevector instructions addressing invalid elements. Differential Revision: https://reviews.llvm.org/D29313 llvm-svn: 293673	2017-01-31 18:37:53 +00:00
Davide Italiano	aec4617dc8	[Instcombine] Combine consecutive identical fences Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661	2017-01-31 18:09:05 +00:00
Arnold Schwaighofer	c368563bd6	Don't combine stores to a swifterror pointer operand to a different type llvm-svn: 293658	2017-01-31 17:53:49 +00:00
Dehao Chen	274df5ea41	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 293657	2017-01-31 17:49:37 +00:00
Sanjay Patel	c0abce81c1	[InstCombine] add test for possible zext-phi transform; NFC The datalayout doesn't include i1, so we don't do a potential shrink and sink transform. Example based on discussion here: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html llvm-svn: 293656	2017-01-31 17:43:00 +00:00
Silviu Baranga	c6d21eba0e	[InstCombine] Make sure that LHS and RHS have the same type in transformToIndexedCompare If they don't have the same type, the size of the constant index would need to be adjusted (and this wouldn't be always possible). Alternatively we could try the analysis with the initial RHS value, which would guarantee that the two sides have the same type. However it is unlikely that in practice this would pass our transformation requirements. Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808). llvm-svn: 293629	2017-01-31 14:04:15 +00:00
Matt Arsenault	72f259b8eb	InferAddressSpaces: Handle icmp llvm-svn: 293593	2017-01-31 02:17:32 +00:00
Matt Arsenault	6d5a8d48fd	InferAddressSpaces: Support memory intrinsics llvm-svn: 293587	2017-01-31 01:56:57 +00:00
Matt Arsenault	6c907a9bb3	InferAddressSpaces: Support atomics llvm-svn: 293584	2017-01-31 01:40:38 +00:00
Matt Arsenault	d89a6e11a7	InferAddressSpaces: Don't replace volatile users llvm-svn: 293582	2017-01-31 01:30:16 +00:00
Matt Arsenault	b6491cc854	AMDGPU: Implement hook for InferAddressSpaces For now just port some of the existing NVPTX tests and from an old HSAIL optimization pass which approximately did the same thing. Don't enable the pass yet until more testing is done. llvm-svn: 293580	2017-01-31 01:20:54 +00:00
Sanjay Patel	8c5f236197	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants llvm-svn: 293570	2017-01-30 23:35:52 +00:00
Sanjay Patel	abbb118a78	[InstCombine] add vector test for (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2); NFC llvm-svn: 293566	2017-01-30 23:26:17 +00:00
Sanjay Patel	0c39d56a60	[InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat constants llvm-svn: 293562	2017-01-30 23:01:05 +00:00
Dehao Chen	6217fa44b8	Revert r292979 which causes compile time failure. llvm-svn: 293557	2017-01-30 22:26:05 +00:00
Sanjay Patel	98cc841421	[InstCombine] add tests for more shift-shift patterns; NFC llvm-svn: 293555	2017-01-30 22:24:36 +00:00
Matt Arsenault	1f2ca66317	LSR: Don't drop address space when type doesn't match For targets with different addressing modes in each address space, if this is dropped querying isLegalAddressingMode later with this will give a nonsense result, breaking the isLegalUse assertions. This is a candidate for the 4.0 release branch. llvm-svn: 293542	2017-01-30 19:50:17 +00:00
Sanjay Patel	373db5ba6c	[InstCombine] enable (X >>?exact C1) << C2 --> X >>?exact (C1-C2) for vectors with splat constants llvm-svn: 293524	2017-01-30 18:40:23 +00:00
Sanjay Patel	1a86607d38	[InstCombine] add vector splat tests for (X >>?exact C1) << C2 --> X >>?exact (C1-C2); NFC llvm-svn: 293517	2017-01-30 18:17:14 +00:00
Daniel Berlin	a53a72243a	NewGVN: Instead of changeToUnreachable, insert an instruction SimplifyCFG will turn into unreachable when it runs llvm-svn: 293515	2017-01-30 18:12:56 +00:00
Sanjay Patel	77732d5033	[InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors with splat constants llvm-svn: 293507	2017-01-30 17:19:32 +00:00
Daniel Berlin	e60a791748	Update pr31758.ll for unreachable revert llvm-svn: 293502	2017-01-30 17:08:06 +00:00
Daniel Berlin	e19f0e01a8	Revert "NewGVN: Make unreachable blocks be marked with unreachable" This reverts commit r293196 Besides making things look nicer, ATM, we'd like to preserve analysis more than we'd like to destroy the CFG. We'll probably revisit in the future llvm-svn: 293501	2017-01-30 17:06:55 +00:00
Sanjay Patel	8e644c08ee	[InstCombine] fixed to propagate 'exact' on lshr The original shift is bigger, so this may qualify as 'obvious', but here's an attempt at an Alive-based proof: Name: exact Pre: (C1 u< C2) %a = shl i8 %x, C1 %b = lshr exact i8 %a, C2 => %c = lshr exact i8 %x, C2 - C1 %b = and i8 %c, ((1 << width(C1)) - 1) u>> C2 Optimization is correct! llvm-svn: 293498	2017-01-30 16:53:03 +00:00
Sanjay Patel	5d6687da99	[InstCombine] add 'exact' to lshr to show that it got dropped; NFC llvm-svn: 293496	2017-01-30 16:38:49 +00:00
Adam Nemet	e7bdf227f6	[Inliner] Fold analysis remarks into missed remarks This significantly reduces the noise level of these messages. llvm-svn: 293492	2017-01-30 16:22:45 +00:00
Sanjay Patel	1196d7cd7f	[InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constants llvm-svn: 293489	2017-01-30 16:11:40 +00:00
Sanjay Patel	127d64065a	[InstCombine] add tests for shift-shift patterns; NFC llvm-svn: 293487	2017-01-30 15:54:50 +00:00
Sanjay Patel	062adaab83	[InstCombine] enable (X >>?,exact C1) << C2 --> X << (C2 - C1) for vectors with splats llvm-svn: 293435	2017-01-29 17:11:18 +00:00
Sanjay Patel	c00574830f	[InstCombine] add tests for shl(shr X, C1), C2 transforms; NFC llvm-svn: 293434	2017-01-29 16:52:59 +00:00
Mohammad Shahid	3121334d32	[SLP] Vectorize loads of consecutive memory accesses, accessed in non-consecutive (jumbled) way. The jumbled scalar loads will be sorted while building the tree and these accesses will be marked to generate shufflevector after the vectorized load with proper mask. Reviewers: hfinkel, mssimpso, mkuper Differential Revision: https://reviews.llvm.org/D26905 Change-Id: I9c0c8e6f91a00076a7ee1465440a3f6ae092f7ad llvm-svn: 293386	2017-01-28 17:59:44 +00:00
Taewook Oh	505a25aec5	[InstCombine] Merge DebugLoc when speculatively hoisting store instruction Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice. Reviewers: aprantl, andreadb, danielcdh Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29062 llvm-svn: 293372	2017-01-28 07:05:43 +00:00
Sanjay Patel	febcb9ce54	[InstCombine] move icmp transforms that might be recognized as min/max and inf-loop (PR31751) This is a minimal patch to avoid the infinite loop in: https://llvm.org/bugs/show_bug.cgi?id=31751 But the general problem is bigger: we're not canonicalizing all of the min/max forms reported by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own inline definitions which may be subtly different from any of the above. The reason that the test cases in this patch need a cast op to trigger is because we don't (yet) canonicalize all min/max forms based on matchSelectPattern() in canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on matchSelectPattern() in visitSelectInst(). The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so I'm moving those behind the min/max fence in visitICmpInst() as the quick fix. llvm-svn: 293345	2017-01-27 23:26:27 +00:00
Matthew Simpson	3650df13be	[ARM/AArch64] Relocate and update InterleavedAccessPass tests (NFC) The interleaved access pass is an IR-to-IR transformation that runs before code generation. It matches interleaved memory operations to target-specific intrinsics (that are later lowered to load and store multiple instructions on ARM/AArch64). We place tests for similar passes (e.g., GlobalMergePass) under test/Transforms. This patch moves the InterleavedAccessPass tests out of test/CodeGen and into target-specific directories under test/Transforms/InterleavedAccess. Although the pass is an IR pass, many of the existing tests were llc tests rather opt tests. For example, the tests would check for ldN/stN instructions generated by llc rather than the intrinsic calls the pass actually inserts. Thus, this patch updates all tests to be opt tests that check for the inserted intrinsics. We already have separate CodeGen tests that ensure we lower the interleaved access intrinsics to their corresponding ldN/stN instructions. In addition to migrating the tests to opt, this patch also performs some minor clean-up (to ensure consistent naming, etc.). Differential Revision: https://reviews.llvm.org/D29184 llvm-svn: 293309	2017-01-27 17:33:16 +00:00
Chandler Carruth	fd2d7c72fc	[LICM] When we are recomputing the alias sets for a subloop, we cannot skip sub-subloops. The logic to skip subloops dated from when this code was shared with the cached case. Once it was factored out to only run in the case of recomputed subloops it became a dangerous bug. If a subsubloop contained an interfering instruction it would be silently skipped from the alias sets for LICM. With the old pass manager this was extremely hard to trigger as it would require failing to visit these subloops with the LICM pass but then visiting the outer loop somehow. I've not yet contrived any test case that actually manages to trigger this. But with the new pass manager we don't do the cross-loop caching hack that the old PM does and so we recompute alias set information from first principles. While this seems much cleaner and simpler it exposed this bug and would subtly miscompile code due to failing to correctly model the aliasing constraints of deeply nested loops. llvm-svn: 293273	2017-01-27 10:27:32 +00:00
Daniel Berlin	c479686af2	NewGVN: Add basic dead and redundant store elimination Summary: This adds basic dead and redundant store elimination to NewGVN. Unlike our current DSE, it will happily do cross-block DSE if it meets our requirements. We get a bunch of DSE's simple.ll cases, and some stuff it doesn't. Unlike DSE, however, we only try to eliminate stores of the same value to the same memory location, not just general stores to the same memory location. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29149 llvm-svn: 293258	2017-01-27 02:37:11 +00:00
Xin Tong	e5f8d643d4	Constant fold switch inst when looking for trivial conditions to unswitch on. Summary: Constant fold switch inst when looking for trivial conditions to unswitch on. Reviewers: sanjoy, chenli, hfinkel, efriedma Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29037 llvm-svn: 293250	2017-01-27 01:42:20 +00:00
Chandler Carruth	baabda9317	[PM] Port LoopLoadElimination to the new pass manager and wire it into the main pipeline. This is a very straight forward port. Nothing weird or surprising. This brings the number of missing passes from the new PM's pipeline down to three. llvm-svn: 293249	2017-01-27 01:32:26 +00:00
Justin Lebar	698c31b8db	[NVPTX] Upgrade NVVM intrinsics in InstCombineCalls. Summary: There are many NVVM intrinsics that we can't entirely get rid of, but that nonetheless often correspond to target-generic LLVM intrinsics. For example, if flush denormals to zero (ftz) is enabled, we can convert @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a non-ftz PTX instruction. In this case, we can, however, simplify the non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32. These transformations are particularly useful because they let us constant fold instructions that appear in libdevice, the bitcode library that ships with CUDA and essentially functions as its libm. Reviewers: tra Subscribers: hfinkel, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D28794 llvm-svn: 293244	2017-01-27 00:58:58 +00:00
Sanjoy Das	7516192a71	Revert a couple of InstCombine/Guard checkins This change reverts: r293061: "[InstCombine] Canonicalize guards for NOT OR condition" r293058: "[InstCombine] Canonicalize guards for AND condition" They miscompile cases like: ``` declare void @llvm.experimental.guard(i1, ...) define void @test_guard_not_or(i1 %A, i1 %B) { %C = or i1 %A, %B %D = xor i1 %C, true call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ] ret void } ``` because they do transfer the `i32 20, i32 30` parameters to newly created guard instructions. llvm-svn: 293227	2017-01-26 23:38:11 +00:00
Daniel Berlin	1ea5f324bd	NewGVN: Fix bug exposed by PR31761 Summary: This does not actually fix the testcase in PR31761 (discussion is ongoing on the testcase), but does fix a bug it exposes, where stores were not properly clobbering loads. We accomplish this by unifying the memory equivalence infratructure back into the normal congruence infrastructure, and then properly destroying congruence classes when memory state leaders disappear. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29195 llvm-svn: 293216	2017-01-26 22:21:48 +00:00
Sanjay Patel	50753f02c2	[InstCombine] fold (X >>u C) << C --> X & (-1 << C) We already have this fold when the lshr has one use, but it doesn't need that restriction. We may be able to remove some code from foldShiftedShift(). Also, move the similar: (X << C) >>u C --> X & (-1 >>u C) ...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst(). That whole function seems questionable since it is called by commonShiftTransforms(), but there's really not much in common if we're checking the shift opcodes for every fold. llvm-svn: 293215	2017-01-26 22:08:10 +00:00
Sanjay Patel	b0d96d327e	[InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with splat vectors llvm-svn: 293208	2017-01-26 20:52:27 +00:00
Sanjay Patel	0ca3f64c4d	[InstCombine] add tests for shift-shift folds; NFC llvm-svn: 293205	2017-01-26 20:10:55 +00:00
Daniel Berlin	66e3a3d0ac	NewGVN: Fix output of pr31578 testcase now that we mark unreachable blocks as unreachable llvm-svn: 293198	2017-01-26 18:49:03 +00:00
Daniel Berlin	2b83492eee	NewGVN: Make unreachable blocks be marked with unreachable llvm-svn: 293196	2017-01-26 18:30:29 +00:00
Chandler Carruth	6f4ed077d0	[LV] Fix an issue where forming LCSSA in the place that we did would change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168	2017-01-26 10:41:09 +00:00
Alexey Bataev	7a7510ea97	[SLP] Add one more reduction operation for extra argument test to make it vectorizable. llvm-svn: 293162	2017-01-26 09:18:41 +00:00
Alexey Bataev	7046a852b3	[SLP] Fixed test for extra arguments in horizontal reductions. llvm-svn: 293153	2017-01-26 06:19:52 +00:00
Chandler Carruth	eab3b90a14	[PM] Simplify the new PM interface to the loop unroller and expose two factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling. I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline. Differential Revision: https://reviews.llvm.org/D28897 llvm-svn: 293136	2017-01-26 02:13:50 +00:00
Michael Kuperstein	5dd55e8405	[LoopUnroll] Properly update loopinfo for runtime unrolling by 2 Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124	2017-01-26 01:04:11 +00:00
Davide Italiano	ccbbc8313f	[NewGVN] Skip uses in unreachable blocks. Otherwise we ask for a domtree node that's not there, and we crash. Differential Revision: https://reviews.llvm.org/D29145 llvm-svn: 293122	2017-01-26 00:42:42 +00:00
Peter Collingbourne	1df6e858ef	LowerTypeTests: Ignore external globals with type metadata. Thanks to Davide Italiano for finding the problem and providing a test case. llvm-svn: 293119	2017-01-26 00:32:15 +00:00
Justin Lebar	7e3184c412	[ValueTracking] Implement SignBitMustBeZero correctly for sqrt. Summary: Previously we assumed that the result of sqrt(x) always had 0 as its sign bit. But sqrt(-0) == -0. Reviewers: hfinkel, efriedma, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28928 llvm-svn: 293115	2017-01-26 00:10:26 +00:00
Alexey Bataev	1da8ba2adc	[SLP] Extra test for functionality with extra args. llvm-svn: 293076	2017-01-25 17:24:31 +00:00
Artur Pilipenko	8fb3d57e67	[Guards] Introduce loop-predication pass This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert for (i = 0; i < n; i++) { guard(i < len); ... } to for (i = 0; i < n; i++) { guard(n - 1 < len); ... } After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition: if (n - 1 < len) for (i = 0; i < n; i++) { ... } else deoptimize This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062). Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D29034 llvm-svn: 293064	2017-01-25 16:00:44 +00:00
Artur Pilipenko	b85f7a5d99	[InstCombine] Canonicalize guards for NOT OR condition This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29075 Patch by Maxim Kazantsev. llvm-svn: 293061	2017-01-25 14:45:12 +00:00
Simon Pilgrim	6f6b279109	[InstCombine][SSE] Add support for PACKSS/PACKUS constant folding Differential Revision: https://reviews.llvm.org/D28949 llvm-svn: 293060	2017-01-25 14:37:24 +00:00
Artur Pilipenko	4df4c4a4aa	[InstCombine] Canonicalize guards for AND condition This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29074 Patch by Maxim Kazantsev. llvm-svn: 293058	2017-01-25 14:20:52 +00:00
Artur Pilipenko	e812ca00bb	[InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056	2017-01-25 14:12:12 +00:00
Alexey Bataev	d28ab559a7	[SLP] Improve horizontal vectorization for non-power-of-2 number of instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042	2017-01-25 09:54:38 +00:00
whitequark	16f1e5f1ca	Mark @llvm.powi.* as safe to speculatively execute. Floating point intrinsics in LLVM are generally not speculatively executed, since most of them are defined to behave the same as libm functions, which set errno. However, the @llvm.powi.* intrinsics do not correspond to any libm function, and lacks any defined error handling semantics in LangRef. It most certainly does not alter errno. llvm-svn: 293041	2017-01-25 09:32:30 +00:00
Mohammed Agabaria	20caee95e1	[X86] enable memory interleaving for X86\SLM arch. Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040	2017-01-25 09:14:48 +00:00
Akira Hatanaka	4ec7b20ef6	[SimplifyCFG] Do not sink and merge inline-asm instructions. Conservatively disable sinking and merging inline-asm instructions as doing so can potentially create arguments that cannot satisfy the inline-asm constraints. For example, SimplifyCFG used to do the following transformation: (before) if.then: %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8) br label %if.end if.else: %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6) br label %if.end (after) %.sink = select i1 %tobool, i32 6, i32 8 %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink) This would result in a crash in the backend since only immediate integer operands are permitted for constraint "n". rdar://problem/30110806 Differential Revision: https://reviews.llvm.org/D29111 llvm-svn: 293025	2017-01-25 06:21:51 +00:00
Gerolf Hoflehner	22921338b1	[InstCombine] Added regression test to narrow-swich.ll llvm-svn: 293018	2017-01-25 04:34:59 +00:00
Chandler Carruth	ce40fa13ce	[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011	2017-01-25 02:49:01 +00:00
Gor Nishanov	df3d71a7a9	[coroutines] Spill the result of the invoke instruction correctly Summary: When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction). In the following example, we will split the edge going into %cont and insert the spill there. ``` %r = invoke double @print(double 0.0) to label %cont unwind label %pad cont: %0 = call i8 @llvm.coro.suspend(token none, i1 false) switch i8 %0, label %suspend [i8 0, label %resume i8 1, label %cleanup] resume: call double @print(double %r) ``` Reviewers: majnemer Reviewed By: majnemer Subscribers: mehdi_amini, llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D29102 llvm-svn: 293006	2017-01-25 02:25:54 +00:00
Dehao Chen	a5eb1689dc	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 292979	2017-01-24 21:05:51 +00:00
Daniel Berlin	390dfde0f3	Remove the load hoisting code of MLSM, it is completely subsumed by GVNHoist Summary: GVNHoist performs all the optimizations that MLSM does to loads, in a more general way, and in a faster time bound (MLSM is N^3 in most cases, N^4 in a few edge cases). This disables the load portion. Note that the way ld_hoist_st_sink.ll is written makes one think that the loads should be moved to the while.preheader block, but 1. Neither MLSM nor GVNHoist do it (they both move them to identical places). 2. MLSM couldn't possibly do it anyway, as the while.preheader block is not the head of the diamond, while.body is. (GVNHoist could do it if it was legal). 3. At a glance, it's not legal anyway because the in-loop load conflict with the in-loop store, so the loads must stay in-loop. I am happy to update the test to use update_test_checks so that checking is tighter, just was going to do it as a followup. Note that i can find no particular benefit to the store portion on any real testcase/benchmark i have (even size-wise). If we really still want it, i am happy to commit to writing a targeted store sinker, just taking the code from the MemorySSA port of MergedLoadStoreMotion (which is N^2 worst case, and N most of the time). We can do what it does in a much better time bound. We also should be both hoisting and sinking stores, not just sinking them, anyway, since whether we should hoist or sink to merge depends basically on luck of the draw of where the blockers are placed. Nonetheless, i have left it alone for now. Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29079 llvm-svn: 292971	2017-01-24 19:55:36 +00:00
Sanjay Patel	562272536a	[InstSimplify] try to eliminate icmp Pred (add nsw X, C1), C2 I was surprised to see that we're missing icmp folds based on 'add nsw' in InstCombine, but we should handle the InstSimplify cases first because that could make the InstCombine code simpler. Here are Alive-based proofs for the logic: Name: add_neg_constant Pre: C1 < 0 && (C2 > ((1<<(width(C1)-1)) + C1)) %a = add nsw i7 %x, C1 %b = icmp sgt %a, C2 => %b = false Name: add_pos_constant Pre: C1 > 0 && (C2 < ((1<<(width(C1)-1)) + C1 - 1)) %a = add nsw i6 %x, C1 %b = icmp slt %a, C2 => %b = false Name: nuw Pre: C1 u>= C2 %a = add nuw i11 %x, C1 %b = icmp ult %a, C2 => %b = false Differential Revision: https://reviews.llvm.org/D29053 llvm-svn: 292952	2017-01-24 17:03:24 +00:00
Chandler Carruth	6acdca78a0	[PH] Replace uses of AssertingVH from members of analysis results with a lazy-asserting PoisoningVH. AssertVH is fundamentally incompatible with cache-invalidation of analysis results. The invaliadtion happens after the AssertingVH has already fired. Instead, use a PoisoningVH that will assert if the dangling handle is ever used rather than merely be assigned or destroyed. This patch also removes all of the (numerous) doomed attempts to work around this fundamental incompatibility. It is a pretty significant simplification IMO. The most interesting change is in the Inliner where we still do some clearing because we don't want to rely on the coarse grained invalidation strategy of the containing pass manager. However, I prefer the approach that contains this logic to the cleanup phase of the Inliner, and I think we could enhance the CGSCC analysis management layer to make this even better in the future if desired. The rest is straight cleanup. I've also added a test for one of the harder cases to work around: when a module analysis contains many AssertingVHes pointing at functions. Differential Revision: https://reviews.llvm.org/D29006 llvm-svn: 292928	2017-01-24 12:55:57 +00:00
Alexey Bataev	992ac2d5c2	[SLP] Additional test for checking that instruction with extra args is not reconstructed. llvm-svn: 292911	2017-01-24 10:44:00 +00:00
Serge Pavlov	098ee2fe02	Update domtree incrementally in loop peeling. With this change dominator tree remains in sync after each step of loop peeling. Differential Revision: https://reviews.llvm.org/D29029 llvm-svn: 292895	2017-01-24 06:58:39 +00:00
Matt Arsenault	954a624fb9	SimplifyLibCalls: Replace more unary libcalls with intrinsics llvm-svn: 292855	2017-01-23 23:55:08 +00:00
Michael Kuperstein	461aa57ad3	[LoopUnroll] First form LCSSA, then loop-simplify Running non-LCSSA-preserving LoopSimplify followed by LCSSA on (roughly) the same loop is incorrect, since LoopSimplify may break LCSSA arbitrarily higher in the loop nest. Instead, run LCSSA first, and then run LCSSA-preserving LoopSimplify on the result. This fixes PR31718. Differential Revision: https://reviews.llvm.org/D29055 llvm-svn: 292854	2017-01-23 23:45:42 +00:00
Sanjay Patel	ce9d6faed6	[InstSimplify] add tests to show missing folds from 'icmp (add nsw)'; NFC llvm-svn: 292841	2017-01-23 22:42:55 +00:00
Alexey Bataev	95d176242b	[SLP] Additional test with extra args in horizontal reductions. llvm-svn: 292821	2017-01-23 19:28:23 +00:00
Piotr Padlewski	4040d6918e	[MemorySSA] Add new tests for invariant.groups Summary: Next round of extra tests for MSSA. I have a prototype invariant.group handling implementation that fixes all the FIXMEs, and I think it will be easier to see what is the difference if I firstly post this, and then only fix fixits. Reviewers: george.burgess.iv, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29022 llvm-svn: 292797	2017-01-23 16:38:10 +00:00
Simon Pilgrim	f6f3a36159	[InstCombine][X86] Add MULDQ/MULUDQ constant folding support llvm-svn: 292793	2017-01-23 15:22:59 +00:00
Simon Pilgrim	bb13fdabec	[InstCombine][X86] MULDQ/MULUDQ undef -> zero Match generic mul behaviour so that <X x i64> multiply and muldq/muludq pattern act the same llvm-svn: 292784	2017-01-23 12:07:32 +00:00
Alexey Bataev	61d8e0003c	[SLP] Additional test for SLP vectorizer with 31 reduction elements. llvm-svn: 292783	2017-01-23 11:53:16 +00:00
Simon Pilgrim	cdc62929f4	[InstCombine][SSE] Tests showing missed opportunities to constant fold PMULDQ/PMULUDQ llvm-svn: 292782	2017-01-23 10:57:39 +00:00
Chandler Carruth	d501b18990	This test apparently requires an x86 target and is failing on numerous bots ever since d0k fixed the CHECK lines so that it did something at all. It isn't actually testing SCEV directly but LSR, so move it into LSR and the x86-specific tree of tests that already exists there. Target dependence is common and unavoidable with the current design of LSR. llvm-svn: 292774	2017-01-23 08:33:29 +00:00
Chandler Carruth	e8c66b2766	[PM] Replace the hard invalidate in JumpThreading for LVI with correct invalidation of deleted functions in GlobalDCE. This was always testing a bug really triggered in GlobalDCE. Right now we have analyses with asserting value handles into IR. As long as those remain, when deleting an IR unit, we cannot wait for the normal invalidation scheme to kick in even though it was designed to work correctly in the face of these kinds of deletions. Instead, the pass needs to directly handle invalidating the analysis results pointing at that IR unit. I've tought the Inliner about this and this patch teaches GlobalDCE. This will handle the asserting VH case in the existing test as well as other issues of the same fundamental variety. I've moved the test into the GlobalDCE directory and added a comment explaining what is going on. Note that we cannot simply require LVI here because LVI is too lazy. llvm-svn: 292773	2017-01-23 08:33:24 +00:00
Chandler Carruth	5144703664	[PM] Add a dedicated test case for the issue fixed in r292770. While this is covered by a clang test case, we should have something locally to LLVM that immediately checks the inliner doesn't leave analyses to dangling IR bodies. llvm-svn: 292772	2017-01-23 07:53:20 +00:00
Benjamin Kramer	db9e0b659d	Fix some broken CHECK lines. The colon is important. llvm-svn: 292761	2017-01-22 20:28:56 +00:00
Chandler Carruth	b698d5964d	[PM] Fix a really nasty bug introduced when adding PGO support to the new PM's inliner. The bug happens when we refine an SCC after having computed a proxy for the FunctionAnalysisManager, and then proceed to compute fresh analyses for functions in the new SCC using the manager provided by the old SCC's proxy. And when we manage to mutate a function in this new SCC in a way that invalidates those analyses. This can be... challenging to reproduce. I've managed to contrive a set of functions that trigger this and added a test case, but it is a bit brittle. I've directly checked that the passes run in the expected ways to help avoid the test just becoming silently irrelevant. This gets the new PM back to passing the LLVM test suite after the PGO improvements landed. llvm-svn: 292757	2017-01-22 10:34:01 +00:00
Piotr Padlewski	772d253e85	[MemorySSA] Remove deprecated comment from test llvm-svn: 292733	2017-01-21 22:14:02 +00:00
Piotr Padlewski	bff575fe29	[MemorySSA] Fix invariant.group test and add new Summary: This test had a bug: !llvm.invariant.group instead of !invariant.group. Also add some new test for future development. All tests passes, when MSSA will support invariant.group only the lines with FIXIT should be changed. Reviewers: dberlin, george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28969 llvm-svn: 292730	2017-01-21 21:56:56 +00:00
Sanjay Patel	478a83c905	[InstCombine] use m_APInt to allow ashr folds for vectors with splat constants We may be able to assert that no shl-shl or lshr-lshr pairs ever get here because we should have already handled those in foldShiftedShift(). llvm-svn: 292726	2017-01-21 17:59:59 +00:00
Sanjay Patel	842a3b7000	[InstCombine] add tests for ashr-ashr; NFC llvm-svn: 292724	2017-01-21 17:43:06 +00:00
Chandler Carruth	17350de1ca	[PM] Teach the loop PM to run LoopSimplify prior to the loop pipeline. This adds the last remaining core feature of the loop pass pipeline in the new PM and removes the last of the really egregious hacks in the LICM tests. Sadly, this requires really substantial changes in the unittests in order to provide and maintain simplified loops. This is particularly hard because for example LoopSimplify will try to fold undef branches to an ideal direction and simplify the loop accordingly. Differential Revision: https://reviews.llvm.org/D28766 llvm-svn: 292709	2017-01-21 03:48:51 +00:00
Anmol P. Paralkar	910dc8de3f	MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info Summary: Under option -mergefunc-preserve-debug-info we: - Do not create a new function for a thunk. - Retain the debug info for a thunk's parameters (and associated instructions for the debug info) from the entry block. Note: -debug will display the algorithm at work. - Create debug-info for the call (to the shared implementation) made by a thunk and its return value. - Erase the rest of the function, retaining the (minimally sized) entry block to create a thunk. - Preserve a thunk's call site to point to the thunk even when both occur within the same translation unit, to aid debugability. Note that this behaviour differs from the underlying -mergefunc implementation which modifies the thunk's call site to point to the shared implementation when both occur within the same translation unit. Reviewers: echristo, eeckstein, dblaikie, aprantl, friss Reviewed By: aprantl Subscribers: davide, fhahn, jfb, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D28075 llvm-svn: 292702	2017-01-21 02:02:56 +00:00
Justin Lebar	ba4041ac4f	[ConstantFold] Remove test checking that we don't constant-fold sqrt(-2). This depended on libm's errno behavior (we constant fold iff libm's sqrt(-2) does not set errno) and was breaking on mac. llvm-svn: 292701	2017-01-21 02:02:27 +00:00
Justin Lebar	8b18a347b6	[ConstantFolding] Constant-fold llvm.sqrt(x) like other intrinsics. Summary: Currently we return undef, but we're in the process of changing the LangRef so that llvm.sqrt behaves like the other math intrinsics, matching the return value of the standard libcall but not setting errno. This change is legal even without the LangRef change because currently calling llvm.sqrt(x) where x is negative is spec'ed to be UB. But in practice it's also safe because we're simply constant-folding fewer inputs: Inputs >= -0 get constant-folded as before, but inputs < -0 now aren't constant-folded, because ConstantFoldFP aborts if the host math function raises an fp exception. Reviewers: hfinkel, efriedma, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28929 llvm-svn: 292692	2017-01-21 00:59:57 +00:00
Sanjay Patel	26c353835f	[InstCombine] auto-generate checks; NFC llvm-svn: 292682	2017-01-20 23:39:01 +00:00
Peter Collingbourne	67addbcacf	LowerTypeTests: Simplify; always create SizeM1 with type IntPtrTy, move initialization out of if statement. llvm-svn: 292674	2017-01-20 23:22:28 +00:00
Dehao Chen	77079003dd	Add indirect call promotion to SamplePGO Summary: This patch adds metadata for indirect call promotion in the sample profile loader. Reviewers: xur, davidxl, dnovillo Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28923 llvm-svn: 292672	2017-01-20 22:56:07 +00:00
Easwaran Raman	12585b0148	Improve PGO support for the new inliner This adds the following to the new PM based inliner in PGO mode: * Use block frequency analysis to derive callsite's profile count and use that to adjust thresholds of hot and cold callsites. * Incrementally update the BFI of the caller after a callee gets inlined into it. This incremental update is only within an invocation of the run method - BFI is not preserved across calls to run. Update the function entry count of the callee after inlining it into a caller. * I've tuned the thresholds for the hot and cold callsites using a hacked up version of the old inliner that explicitly computes BFI on a set of internal benchmarks and spec. Once the new PM based pipeline stabilizes (IIRC Chandler mentioned there are known issues) I'll benchmark this again and adjust the thresholds if required. Inliner PGO support. Differential revision: https://reviews.llvm.org/D28331 llvm-svn: 292666	2017-01-20 22:44:04 +00:00
Sanjay Patel	0c1c70aef4	[ValueTracking] recognize variations of 'clamp' to improve codegen (PR31693) By enhancing value tracking, we allow an existing min/max canonicalization to kick in and improve codegen for several targets that have min/max instructions. Unfortunately, recognizing min/max in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but I'm hoping we can remove that soon. Correctness proofs based on Alive: Name: smaxmin Pre: C1 < C2 %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp slt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp slt i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp sgt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: sminmax Pre: C1 > C2 %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp sgt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp sgt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp slt i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: smaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: sminmax Done: 1 Optimization is correct! Name: umaxmin Pre: C1 u< C2 %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ult i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %min => %cmp2 = icmp ult i8 %x, C2 %min = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ugt i8 %min, C1 %r = select i1 %cmp1, i8 %min, i8 C1 Name: uminmax Pre: C1 u> C2 %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp3 = icmp ugt i8 %x, C1 %r = select i1 %cmp3, i8 C1, i8 %max => %cmp2 = icmp ugt i8 %x, C2 %max = select i1 %cmp2, i8 %x, i8 C2 %cmp1 = icmp ult i8 %max, C1 %r = select i1 %cmp1, i8 %max, i8 C1 ---------------------------------------- Optimization: umaxmin Done: 1 Optimization is correct! ---------------------------------------- Optimization: uminmax Done: 1 Optimization is correct! llvm-svn: 292660	2017-01-20 22:18:47 +00:00
Sanjay Patel	5eb113d2af	[InstCombine] add tests to show missed canonicalization of min/max; NFC Unfortunately, recognizing these in value tracking may cause us to hit a hack in InstCombiner::visitICmpInst() more often: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109340.html ...but besides being the obviously Right Thing To Do, there's a clear codegen win from identifying these patterns for several targets. llvm-svn: 292655	2017-01-20 21:49:41 +00:00
Peter Collingbourne	f04a390099	LowerTypeTests: Implement importing of type identifiers. To import a type identifier we read the summary and create external references to the symbols defined when exporting. Differential Revision: https://reviews.llvm.org/D28546 llvm-svn: 292654	2017-01-20 21:49:34 +00:00
Daniel Berlin	68af279533	NewGVN: Remove pr31686.ll, it is tested by pr31594.ll, which is much smaller and simpler llvm-svn: 292649	2017-01-20 21:04:58 +00:00
Daniel Berlin	26addef1a0	NewGVN: Fix PR 31686 and PR 31698 by rewriting store leader handling. Summary: This rewrites store expression/leader handling. We no longer use the value operand as the leader, instead, we store it separately. We also now store the stored value as part of the expression, and compare it when comparing stores for equality. This enables us to get rid of a bunch of our previous hacks and machinations, as the existing machinery takes care of everything except updating the stored value on classes. The only time we have to update it is if the storecount goes to 0, and when we do, we destroy it. Since we no longer use the value operand as the leader, during elimination, we have to use the value operand. Doing this also fixes a bunch of store forwarding cases we were missing. Any value operand we use is guaranteed to either be updated by previous eliminations, or minimized by future ones. (IE the fact that we don't use the most dominating value operand when it's not a constant does not affect anything). Sadly, this change also exposes that we didn't pay attention to the output of the pr31594.ll test, as it also very clearly exposes the same store leader bug we are fixing here. (I added pr31682.ll anyway, but maybe we think that's too large to be useful) On the plus side, propagate-ir-flags.ll now passes due to the corrected store forwarding. This change was 3 stage'd on darwin and linux, with the full test-suite. Reviewers: davide Subscribers: llvm-commits llvm-svn: 292648	2017-01-20 21:04:30 +00:00
Haicheng Wu	201b191b82	Recommit "[InlineCost] Use TTI to check if GEP is free." #3 This is the third attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292633	2017-01-20 18:51:22 +00:00
Alexey Bataev	4fe77b9329	[SLP] Initial test for fix of PR31690. llvm-svn: 292631	2017-01-20 18:40:21 +00:00
Simon Pilgrim	a50a93fcd0	[InstCombine][X86] Add MULDQ/MULUDQ undef handling llvm-svn: 292627	2017-01-20 18:20:30 +00:00
Alexey Bataev	f5677329a6	[SLP] A new test for horizontal vectorization for non-power-of-2 instructions. llvm-svn: 292626	2017-01-20 18:04:29 +00:00
Simon Pilgrim	06f125230f	[InstCombine][SSE] Tests showing missed opportunities to handle muldq/muludq with undef arguments Fixed a typo in existing test names at the same time llvm-svn: 292619	2017-01-20 17:06:38 +00:00
Haicheng Wu	71ef5bc0ff	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free." #2" This reverts commit r292616 because the test case still has problem. llvm-svn: 292618	2017-01-20 16:52:22 +00:00
Haicheng Wu	8f34ae2aae	Recommit "[InlineCost] Use TTI to check if GEP is free." #2 This is the second attemp to recommit r292526. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292616	2017-01-20 16:36:34 +00:00
Simon Pilgrim	2817b476e8	[InstCombine][SSE] Tests showing missed opportunities to constant fold packss/packus llvm-svn: 292609	2017-01-20 13:21:30 +00:00
Simon Pilgrim	8942722cbc	[InstCombine][SSE] Tests showing missed opportunities to handle packss/packus with undef arguments llvm-svn: 292601	2017-01-20 11:28:07 +00:00
Simon Pilgrim	51b3b98e3a	[InstCombine][SSE] Add DemandedElts support for PACKSS/PACKUS instructions Simplify a packss/packus truncation based on the elements of the mask that are actually demanded. Differential Revision: https://reviews.llvm.org/D28777 llvm-svn: 292591	2017-01-20 09:28:21 +00:00
Chandler Carruth	e9b18e3d34	[PM] Port LoopSink to the new pass manager. Like several other loop passes (the vectorizer, etc) this pass doesn't really fit the model of a loop pass. The critical distinction is that it isn't intended to be pipelined together with other loop passes. I plan to add some documentation to the loop pass manager to make this more clear on that side. LoopSink is also different because it doesn't really need a lot of the infrastructure of our loop passes. For example, if there aren't loop invariant instructions causing a preheader to exist, there is no need to form a preheader. It also doesn't need LCSSA because this pass is only involved in sinking invariant instructions from a preheader into the loop, not reasoning about live-outs. This allows some nice simplifications to the pass in the new PM where we can directly walk the loops once without restructuring them. Differential Revision: https://reviews.llvm.org/D28921 llvm-svn: 292589	2017-01-20 08:42:19 +00:00
Daniel Berlin	89fea6fd9d	NewGVN: Fix PR 31682, an overactive assert. Part of the assert has been left active for further debugging. The other part has been turned into a stat for tracking for the moment. llvm-svn: 292583	2017-01-20 06:38:41 +00:00
Mohammad Shahid	5dc021bf45	[SLP] Add a base test for jumbled store Change-Id: I905ce08a02c76a6896dcfd9629547417c99adc4a llvm-svn: 292581	2017-01-20 06:05:33 +00:00
Haicheng Wu	8f2aca388b	Revert "Recommit "[InlineCost] Use TTI to check if GEP is free."" This reverts commit r292570. The test still has problem. llvm-svn: 292572	2017-01-20 03:40:41 +00:00
Haicheng Wu	1af1f071ea	Recommit "[InlineCost] Use TTI to check if GEP is free." This recommits r292526 which is reverted in r292529 after fixing the test case. The original summary: Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. llvm-svn: 292570	2017-01-20 03:09:11 +00:00
Anna Thomas	698f0deea9	[AliasAnalysis] Fences do not modify constant memory location Summary: Fence instructions are currently marked as `ModRef` for all memory locations. We can improve this for constant memory locations (such as constant globals), since fence instructions cannot modify these locations. This helps us to forward constant loads across fences (added test case in GVN). There were no changes in behaviour for similar test cases in early-cse and licm. Reviewers: dberlin, sanjoy, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28914 llvm-svn: 292546	2017-01-20 00:21:33 +00:00
Davide Italiano	6c2c3e07bf	[SCCP] Teach the pass how to handle `div` with overdefined operands. This can prove that: extern int f; int g() { int x = 0; for (int i = 0; i < 365; ++i) { x /= f; } return x; } always returns zero. Thanks to Sanjoy for confirming this transformation actually made sense (bugs are mine). llvm-svn: 292531	2017-01-19 23:07:51 +00:00
Haicheng Wu	e036df4723	Revert "[InlineCost] Use TTI to check if GEP is free." This reverts commit r292526. The test case has problem. llvm-svn: 292529	2017-01-19 22:51:03 +00:00
Haicheng Wu	da556345dc	[InlineCost] Use TTI to check if GEP is free. Currently, a GEP is considered free only if its indices are all constant. TTI::getGEPCost() can give target-specific more accurate analysis. TTI is already used for the cost of many other instructions. Differential Revision: https://reviews.llvm.org/D28693 llvm-svn: 292526	2017-01-19 22:28:34 +00:00

... 9 10 11 12 13 ...

9171 Commits