llvm-project

Commit Graph

Author	SHA1	Message	Date
Aaron Ballman	f9a1897c72	Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition. llvm-svn: 229340	2015-02-15 22:54:22 +00:00
Hal Finkel	8626ed2eae	[ADCE] Convert another loop for a range-based for We can use a range-based for for the operands loop too; NFC. llvm-svn: 229319	2015-02-15 15:51:25 +00:00
Hal Finkel	92fb2d3803	[ADCE] Use inst_range and range-based fors Convert a few loops to range-based fors; NFC. llvm-svn: 229318	2015-02-15 15:51:23 +00:00
Hal Finkel	c6035cff55	[ADCE] Fix formatting of pointer types We prefer to put the * with the variable, not with the type; NFC. llvm-svn: 229317	2015-02-15 15:47:52 +00:00
Hal Finkel	234d8fea7b	[ADCE] Fix capitalization of another local variable Bring another local variable in compliance with our naming conventions, NFC. llvm-svn: 229316	2015-02-15 15:45:30 +00:00
Hal Finkel	75901293a1	[ADCE] Fix capitalization of some local variables Bring some local variables in compliance with our naming conventions, NFC. llvm-svn: 229315	2015-02-15 15:45:28 +00:00
Andrea Di Biagio	f54432388f	[optnone] Skip pass Constant Hoisting on optnone functions. Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that pass Constant Hoisting is not run on optnone functions. llvm-svn: 229258	2015-02-14 15:11:48 +00:00
Duncan P. N. Exon Smith	2c79ad974c	Transforms: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202	2015-02-14 01:11:29 +00:00
Chandler Carruth	30d69c2e36	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094	2015-02-13 10:01:29 +00:00
Chandler Carruth	1fbc316534	[unroll] Concede defeat and disable the unroll analyzer for now. The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064	2015-02-13 05:31:46 +00:00
Chandler Carruth	6c03dff7cc	[unroll] Merge the simplification and DCE estimation methods on the UnrollAnalyzer. Now they share a single worklist and have less implicit state between them. There was no real benefit to separating these two things out. I'm going to subsequently refactor things to share even more code. llvm-svn: 229062	2015-02-13 04:39:05 +00:00
Chandler Carruth	d9591d8922	[unroll] Remove pointless dyn_cast<>s to Instruction - the users of an instruction must by definition be instructions. llvm-svn: 229061	2015-02-13 04:33:21 +00:00
Chandler Carruth	5457e20d27	[unroll] Don't check the loop set for whether an instruction is contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most once per instruction with the cost of the worklist set we would need to pay anyways. llvm-svn: 229060	2015-02-13 04:30:44 +00:00
Chandler Carruth	e5c30e4e10	[unroll] Change the other worklist in the unroll analyzer to be a set vector. In addition to dramatically reducing the work required for contrived example loops, this also has to correct some serious latent bugs in the cost computation. Previously, we might add an instruction onto the worklist once for every load which it used and was simplified. Then we would visit it many times and accumulate "savings" each time. I mean, fortunately this couldn't matter for things like calls with 100s of operands, but even for binary operators this code seems like it must be double counting the savings. I just noticed this by inspection and due to the runtime problems it can introduce, I don't have any test cases for cases where the cost produced by this routine is unacceptable. llvm-svn: 229059	2015-02-13 04:27:50 +00:00
Chandler Carruth	7824bc9241	[unroll] Replace a boolean, for loop, condition, and break with std::all_of and a lambda. Much cleaner, no functionality changed. llvm-svn: 229058	2015-02-13 04:18:14 +00:00
Chandler Carruth	06d537cdd6	[unroll] Directly query for dead instructions. In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simplified values map, and then if was also missing from the dead instructions set. We add everything from the simplified values map to the dead instructions set, so the first step is completely subsumed by the second. Moreover, the first step requires inserting something into the simplified value map which isn't what we want at all. This also replaces a dyn_cast with a cast as an instruction cannot be used by a non-instruction. llvm-svn: 229057	2015-02-13 04:14:05 +00:00
Chandler Carruth	82cb30f10c	[unroll] Replace a linear time check for no uses with a constant time check. Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filter these out much like we filter out constants and such. llvm-svn: 229056	2015-02-13 04:06:08 +00:00
Chandler Carruth	3b057b3216	[unroll] Rather than an operand set, use a setvector for the worklist. We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instructions. I should have gone straight to this, but I had convinced myself that it wasn't going to be necessary briefly. I've come to my senses after chatting more with Nick, and am now happier here. llvm-svn: 229054	2015-02-13 03:57:40 +00:00
Chandler Carruth	17a0496b5a	[unroll] Extract the code to enqueue operansd for the worklist in the unroll analysis into a lambda and call it. That's much simpler than duplicating all the code. llvm-svn: 229053	2015-02-13 03:49:41 +00:00
Chandler Carruth	8c86375a10	[unroll] Use a small set to de-duplicate operands prior to putting them into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands. llvm-svn: 229052	2015-02-13 03:48:38 +00:00
Chandler Carruth	93063e6191	[unroll] Make the unroll cost analysis terminate deterministically and reasonably quickly. I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing at all (after over 15 minutes of running, it hadn't terminated for me, no idea if it was a true infloop or just exponential work). The key thing here is to check the DeadInstructions set when pulling things off the worklist. Without this, we would re-walk the user list of already dead instructions again and again and again. Consider phi nodes with many, many operands and other patterns. The other important aspect of this is that because we would keep re-visiting instructions that were already known dead, we kept adding their cost savings to this! This would cause our cost savings to be insanely inflated from this. While I was here, I also rotated the operand walk out of the worklist loop to make the code easier to read. There is still work to be done to minimize worklist traffic because we don't de-duplicate operands. This means we may add the same instruction onto the worklist 1000s of times if it shows up in 1000s of operansd to a PHI node for example. Still, with this patch, the ffmpeg testcase I have finishes quickly and I can't measure the runtime impact of the unroll analysis any more. I'll probably try to do a few more cleanups to this code, but not sure how much cleanup I can justify right now. llvm-svn: 229038	2015-02-13 03:40:58 +00:00
Chandler Carruth	dd6029fc6e	[unroll] Make range based for loops a bit more explicit and more readable. The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps naming the loop variable as KeyValue helps make it obvious why you don't actually use it directly. Finally, using 'auto' instead of 'User *' doesn't seem like a good tradeoff. Much like with the other cases, I like to know its a pointer, and 'User' is just as long and tells the reader a lot more. llvm-svn: 229033	2015-02-13 02:45:17 +00:00
Chandler Carruth	415f41258f	[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where they are used widely) I prefer just spelling it out as Instruction. I've changed two of the short-lived variables to use "Inst" to match the base class. llvm-svn: 229028	2015-02-13 02:17:39 +00:00
Chandler Carruth	302a133b1e	[unroll] Tidy up the integer we use to accumululate the number of instructions optimized. NFC, just separating this out from the functionality changing commit. llvm-svn: 229026	2015-02-13 02:10:56 +00:00
Chandler Carruth	10a9926ab5	[unroll] Don't use a map from pointer to bool. Use a set. This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the set. This is just a cleanup a long the way to fixing the underlying algorithm problems here. llvm-svn: 228994	2015-02-13 00:29:39 +00:00
Michael Zolotukhin	1b48019751	Prevent division by 0. When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should bail out early if N is 0. llvm-svn: 228988	2015-02-13 00:17:03 +00:00
Chandler Carruth	186ad60815	[unroll] Update the new analysis logic from r228265 to use modern coding conventions for function names consistently. Some were already using this but not all. llvm-svn: 228987	2015-02-13 00:00:24 +00:00
James Molloy	e805ad95dc	[LoopRerolling] Be more forgiving with instruction order. We can't solve the full subgraph isomorphism problem. But we can allow obvious cases, where for example two instructions of different types are out of order. Due to them having different types/opcodes, there is no ambiguity. llvm-svn: 228931	2015-02-12 15:54:14 +00:00
Mehdi Amini	9730116bd6	Reassociate: cannot negate a INT_MIN value Summary: When trying to canonicalize negative constants out of multiplication expressions, we need to check that the constant is not INT_MIN which cannot be negated. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7286 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 228872	2015-02-11 19:54:44 +00:00
James Molloy	f147359376	[LoopReroll] Introduce the concept of DAGRootSets. A DAGRootSet models an induction variable being used in a rerollable loop. For example: x[i3+0] = y1 x[i3+1] = y2 x[i3+2] = y3 Base instruction -> i3 +---+----+ / \| \ ST[y1] +1 +2 <-- Roots \| \| ST[y2] ST[y3] There may be multiple DAGRootSets, for example: x[i2+0] = ... (1) x[i2+1] = ... (1) x[i2+4] = ... (2) x[i2+5] = ... (2) x[(i+1234)2+5678] = ... (3) x[(i+1234)2+5679] = ... (3) This concept is similar to the "Scale" member used previously, but allows multiple independent sets of roots based off the same induction variable. llvm-svn: 228821	2015-02-11 09:19:47 +00:00
Zachary Turner	3bd47cee78	Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects. This allows IDEs to recognize the entire set of header files for each of the core LLVM projects. Differential Revision: http://reviews.llvm.org/D7526 Reviewed By: Chris Bieneman llvm-svn: 228798	2015-02-11 03:28:02 +00:00
David Majnemer	7679300d93	EarlyCSE: It isn't safe to CSE across synchronization boundaries This fixes PR22514. llvm-svn: 228760	2015-02-10 23:09:43 +00:00
Philip Reames	7e7dc3e9df	Adjust how we avoid poll insertion inside the poll function (NFC) I realized that my early fix for this was overly complicated. Rather than scatter checks around in a bunch of places, just exit early when we visit the poll function itself. Thinking about it a bit, the whole inlining mechanism used with gc.safepoint_poll could probably be cleaned up a bit. Originally, poll insertion was fused with gc relocation rewriting. It might be worth going back to see if we can simplify the chain of events now that these two are seperated. As one thought, maybe it makes sense to rewrite calls inside the helper function before inlining it to the many callers. This would require us to visit the poll function before any other functions though.. llvm-svn: 228634	2015-02-10 00:04:53 +00:00
Adrian Prantl	34e7590e0d	Debug info: When updating debug info during SROA, do not emit debug info for any padding introduced by SROA. In particular, do not emit debug info for an alloca that represents only the padding introduced by a previous iteration. Fixes PR22495. llvm-svn: 228632	2015-02-09 23:57:22 +00:00
Adrian Prantl	27bd01f71c	Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the intermediate representation. This - increases consistency by using the same granularity everywhere - allows for pieces < 1 byte - DW_OP_piece didn't actually allow storing an offset. Part of PR22495. llvm-svn: 228631	2015-02-09 23:57:15 +00:00
Ramkumar Ramachandra	3edf74fe29	[Statepoint] Improve two asserts, fix some style (NFC) Summary: It's important that our users immediately know what gc.safepoint_poll is. Also fix the style of the declaration of CreateGCStatepoint, in preparation for another change that will wrap it. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7517 llvm-svn: 228626	2015-02-09 23:02:10 +00:00
Ramkumar Ramachandra	2e4b9e0a37	PlaceSafepoints: modernize gc.result.* -> gc.result Differential Revision: http://reviews.llvm.org/D7516 llvm-svn: 228625	2015-02-09 23:00:40 +00:00
Philip Reames	d4a912fefd	Update file comment to clarify points highlighted in review (NFC) llvm-svn: 228621	2015-02-09 22:44:03 +00:00
Philip Reames	a29de87ea4	Use range for loops in PlaceSafepoints (NFC) llvm-svn: 228620	2015-02-09 22:26:11 +00:00
Philip Reames	b1ed02f728	Add basic tests for PlaceSafepoints This is just adding really simple tests which should have been part of the original submission. When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission. I fixed two such bugs in this submission. llvm-svn: 228610	2015-02-09 21:48:05 +00:00
Benjamin Kramer	f094d77de8	LoopIdiom: Use utility functions. The only difference between deleteIfDeadInstruction and RecursivelyDeleteTriviallyDeadInstructions is that the former also manually invalidates SCEV. That's unnecessary because SCEV automatically gets informed when an instruction is deleted via a ValueHandle. NFC. llvm-svn: 228508	2015-02-07 21:37:08 +00:00
Bjorn Steinbrink	71bf3b800a	Properly update AA metadata when performing call slot optimization Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7482 llvm-svn: 228500	2015-02-07 17:54:36 +00:00
Michael Zolotukhin	7af83c1f39	Use estimated number of optimized insns in unroll-threshold computation. If complete-unroll could help us to optimize away N% of instructions, we might want to do this even if the final size would exceed loop-unroll threshold. However, we don't want to unroll huge loop, and we are add AbsoluteThreshold to avoid that - this threshold will never be crossed, even if we expect to optimize 99% instructions after that. llvm-svn: 228434	2015-02-06 20:20:40 +00:00
Michael Zolotukhin	4e8598eee3	[InstSimplify] Add SimplifyFPBinOp function. It is a variation of SimplifyBinOp, but it takes into account FastMathFlags. It is needed in inliner and loop-unroller to accurately predict the transformation's outcome (previously we dropped the flags and were too conservative in some cases). Example: float foo(float a, float b) { float r; if (a[1] b) r = /* a lot of expensive computations /; else r = 1; return r; } float boo(float a) { return foo(a, 0.0); } Without this patch, we don't inline 'foo' into 'boo'. llvm-svn: 228432	2015-02-06 20:02:51 +00:00
Benjamin Kramer	970eac40bf	Make helper functions/classes/globals static. NFC. llvm-svn: 228410	2015-02-06 17:51:54 +00:00
Benjamin Kramer	39f76acb5c	IRCE: Demote template to ArrayRef and SmallVector to array. NFC. llvm-svn: 228398	2015-02-06 14:43:49 +00:00
Aaron Ballman	94d4d33a38	Removing an unused variable warning I accidentally introduced with my last warning fix; NFC. llvm-svn: 228295	2015-02-05 13:52:42 +00:00
Aaron Ballman	1b072b340b	Silencing an MSVC warning about a switch statement with no cases; NFC. llvm-svn: 228294	2015-02-05 13:40:04 +00:00
Michael Zolotukhin	a9aadd2903	Implement new heuristic for complete loop unrolling. Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution. Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]a[i]; If we completely unroll the loop, we would get: v = b[0]a[0] + b[1]a[1] + b[2]a[2] Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0 And finally: v = b[1] llvm-svn: 228265	2015-02-05 02:34:00 +00:00
Tom Stellard	080209d573	StructurizeCFG: Remove obsolete fix for loop backedge detection This is no longer needed now that we are using a reverse post-order traversal. llvm-svn: 228187	2015-02-04 20:49:47 +00:00
Tom Stellard	071ec90b68	StructurizeCFG: Use a reverse post-order traversal We were previously doing a post-order traversal and operating on the list in reverse, however this would occasionaly cause backedges for loops to be visited before some of the other blocks in the loop. We know use a reverse post-order traversal, which avoids this issue. The reverse post-order traversal is not completely ideal, so we need to manually fixup the list to ensure that inner loop backedges are visited before outer loop backedges. llvm-svn: 228186	2015-02-04 20:49:44 +00:00
Aaron Ballman	34c325e749	Fixing a -Wsign-compare warning; NFC llvm-svn: 228142	2015-02-04 14:01:08 +00:00
Philip Reames	72634d6af0	Fix a warning in non-asserts builds llvm-svn: 228114	2015-02-04 05:11:20 +00:00
Philip Reames	5a9685dba6	Clang format of a file introduced in 228090 (NFC) llvm-svn: 228091	2015-02-04 00:39:57 +00:00
Philip Reames	47cc673e1f	Add a pass for inserting safepoints into (nearly) arbitrary IR This pass is responsible for figuring out where to place call safepoints and safepoint polls. It doesn't actually make the relocations explicit; that's the job of the RewriteStatepointsForGC pass (http://reviews.llvm.org/D6975). Note that this code is not yet finalized. Its moving in tree for incremental development, but further cleanup is needed and will happen over the next few days. It is not yet part of the standard pass order. Planned changes in the near future: - I plan on restructuring the statepoint rewrite to use the functions add to the IRBuilder a while back. - In the current pass, the function "gc.safepoint_poll" is treated specially but is not an intrinsic. I plan to make identifying the poll function a property of the GCStrategy at some point in the near future. - As follow on patches, I will be separating a collection of test cases we have out of tree and submitting them upstream. - It's not explicit in the code, but these two patches are introducing a new state for a statepoint which looks a lot like a patchpoint. There's no a transient form which doesn't yet have the relocations explicitly represented, but does prevent reordering of memory operations. Once this is in, I need to update actually make this explicit by reserving the 'unused' argument of the statepoint as a flag, updating the docs, and making the code explicitly check for such a thing. This wasn't really planned, but once I split the two passes - which was done for other reasons - the intermediate state fell out. Just reminds us once again that we need to merge statepoints and patchpoints at some point in the not that distant future. Future directions planned: - Identifying more cases where a backedge safepoint isn't required to ensure timely execution of a safepoint poll. - Tweaking the insertion process to generate easier to optimize IR. (For example, investigating making SplitBackedge) the default. - Adding opt-in flags for a GCStrategy to use this pass. Once done, add this pass to the actual pass ordering. Differential Revision: http://reviews.llvm.org/D6981 llvm-svn: 228090	2015-02-04 00:37:33 +00:00
Daniel Berlin	487aed0d77	Allow PRE to insert no-cost phi nodes llvm-svn: 228024	2015-02-03 20:37:08 +00:00
Jingyue Wu	d7966ff3b9	Add straight-line strength reduction to LLVM Summary: Straight-line strength reduction (SLSR) is implemented in GCC but not yet in LLVM. It has proven to effectively simplify statements derived from an unrolled loop, and can potentially benefit many other cases too. For example, LLVM unrolls #pragma unroll foo (int i = 0; i < 3; ++i) { sum += foo((b + i) * s); } into sum += foo(b * s); sum += foo((b + 1) * s); sum += foo((b + 2) * s); However, no optimizations yet reduce the internal redundancy of the three expressions: b * s (b + 1) * s (b + 2) * s With SLSR, LLVM can optimize these three expressions into: t1 = b * s t2 = t1 + s t3 = t2 + s This commit is only an initial step towards implementing a series of such optimizations. I will implement more (see TODO in the file commentary) in the near future. This optimization is enabled for the NVPTX backend for now. However, I am more than happy to push it to the standard optimization pipeline after more thorough performance tests. Test Plan: test/StraightLineStrengthReduce/slsr.ll Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick Reviewed By: jholewinski, atrick Subscribers: karthikthecool, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7310 llvm-svn: 228016	2015-02-03 19:37:06 +00:00
Jingyue Wu	49a766e468	Resurrect the assertion removed by r227717 Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*. Test Plan: no regression Reviewers: mkuper Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7327 llvm-svn: 227853	2015-02-02 20:41:11 +00:00
Chandler Carruth	21fc195c13	[multiversion] Kill FunctionTargetTransformInfo, TTI itself is now per-function and supports the exact desired interface. llvm-svn: 227743	2015-02-01 14:37:03 +00:00
Benjamin Kramer	6ab86b1bb6	EarlyCSE: Replace custom hash mixing with Hashing.h Brings it in line with the other hashes in EarlyCSE. llvm-svn: 227733	2015-02-01 12:30:59 +00:00
Chandler Carruth	fdb9c573f7	[multiversion] Thread a function argument through all the callers of the getTTI method used to get an actual TTI object. No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one. The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default. llvm-svn: 227730	2015-02-01 12:01:35 +00:00
Chandler Carruth	fdffd87d68	[PM] Port SimplifyCFG to the new pass manager. This should be sufficient to replace the initial (minor) function pass pipeline in Clang with the new pass manager. I'll probably add an (off by default) flag to do that just to ensure we can get extra testing. llvm-svn: 227726	2015-02-01 11:34:21 +00:00
Chandler Carruth	e8c686aa86	[PM] Port EarlyCSE to the new pass manager. I've added RUN lines both to the basic test for EarlyCSE and the target-specific test, as this serves as a nice test that the TTI layer in the new pass manager is in fact working well. llvm-svn: 227725	2015-02-01 10:51:23 +00:00
Jingyue Wu	6c26bb63fe	[SeparateConstOffsetFromGEP] skip optnone functions llvm-svn: 227705	2015-02-01 02:34:41 +00:00
Jingyue Wu	6e091c8eab	[SeparateConstOffsetFromGEP] set PreservesCFG flag SeparateConstOffsetFromGEP does not change the shape of the control flow graph. llvm-svn: 227704	2015-02-01 02:33:02 +00:00
Jingyue Wu	0220df0dfd	[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703	2015-02-01 02:27:45 +00:00
Adrian Prantl	152ac396db	Fix PR22393. When recursively replacing an aggregate with a smaller aggregate or scalar, the debug info needs to refer to the absolute offset (relative to the entire variable) instead of storing the offset inside the smaller aggregate. llvm-svn: 227702	2015-02-01 00:58:04 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
James Molloy	64419d414b	[LoopReroll] Alter the data structures used during reroll validation. The validation algorithm used an incremental approach, building each iteration's data structures temporarily, validating them, then adding them to a global set. This does not scale well to having multiple sets of Root nodes, as the set of instructions used in each iteration is the union over all the root nodes. Therefore, refactor the logic to create a single, simple container to which later logic then refers. This makes it simpler control-flow wise to make the creation of the container more complex with the addition of multiple root sets. llvm-svn: 227499	2015-01-29 21:52:03 +00:00
Sanjay Patel	4f07a56958	[GVN] don't propagate equality comparisons of FP zero (PR22376) In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities to allow some simple value range optimizations. But that introduced a bug when comparing to -0.0 or 0.0: these compare equal even though they are not bitwise identical. This patch disallows propagating zero constants in equality comparisons. Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376 Differential Revision: http://reviews.llvm.org/D7257 llvm-svn: 227491	2015-01-29 20:51:49 +00:00
James Molloy	5f255eb48f	[LoopReroll] Refactor most of reroll() into a helper class reroll() was slightly monolithic and a pain to modify. Refactor a bunch of its state from local variables to member variables of a helper class, and do some trivial simplification while we're there. llvm-svn: 227439	2015-01-29 13:48:05 +00:00
Philip Reames	9198b33b48	Teach SplitBlockPredecessors how to handle landingpad blocks. Patch by: Igor Laevsky <igor@azulsystems.com> "Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors." Differential Revision: http://reviews.llvm.org/D7157 llvm-svn: 227390	2015-01-28 23:06:47 +00:00
Chandler Carruth	b81dfa6378	[LPM] Stop using the string based preservation API. It is an abomination. For starters, this API is incredibly slow. In order to lookup the name of a pass it must take a memory fence to acquire a pointer to the managed static pass registry, and then potentially acquire locks while it consults this registry for information about what passes exist by that name. This stops the world of LLVMs in your process no matter how little they cared about the result. To make this more joyful, you'll note that we are preserving many passes which do not exist any more, or are not even analyses which one might wish to have be preserved. This means we do all the work only to say "nope" with no error to the user. String-based APIs are a bad idea. String-based APIs that cannot produce any meaningful error are an even worse idea. =/ I have a patch that simply removes this API completely, but I'm hesitant to commit it as I don't really want to perniciously break out-of-tree users of the old pass manager. I'd rather they just have to migrate to the new one at some point. If others disagree and would like me to kill it with fire, just say the word. =] llvm-svn: 227294	2015-01-28 04:57:56 +00:00
Sanjoy Das	dcf2651043	Teach IRCE to look at branch weights when recognizing range checks Splitting a loop to make range checks redundant is profitable only if the range check "never" fails. Make this fact a part of recognizing a range check -- a branch is a range check only if it is expected to pass (via branch_weights metadata). Differential Revision: http://reviews.llvm.org/D7192 llvm-svn: 227249	2015-01-27 21:38:12 +00:00
Eric Christopher	e38c8d4aa9	Migrate SeparateConstOffsetFromGEP to use a Function with getSubtarget. llvm-svn: 227172	2015-01-27 07:16:37 +00:00
David Majnemer	4c82daea60	LoopRotate: Don't walk the uses of a Constant LoopRotate wanted to avoid live range interference by looking at the uses of a Value in the loop latch and seeing if any lied outside of the loop. We would wrongly perform this operation on Constants. This fixes PR22337. llvm-svn: 227171	2015-01-27 06:21:43 +00:00
Chandler Carruth	d649c0ad56	[PM] Refactor the core logic to run EarlyCSE over a function into an object that manages a single run of this pass. This was already essentially how it worked. Within the run function, it would point members at stack local allocations that were only live for a single run. Instead, it seems much cleaner to have a utility object whose lifetime is clearly bounded by the run of the pass over the function and can use member variables in a more direct way. This also makes it easy to plumb the analyses used into it from the pass and will make it re-usable with the new pass manager. No functionality changed here, its just a refactoring. llvm-svn: 227162	2015-01-27 01:34:14 +00:00
Chad Rosier	f9327d6fe9	Commoning of target specific load/store intrinsics in Early CSE. Phabricator revision: http://reviews.llvm.org/D7121 Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! llvm-svn: 227149	2015-01-26 22:51:15 +00:00
Chandler Carruth	9dea5cdb8e	[PM] General doxygen and comment cleanup for this pass. llvm-svn: 227001	2015-01-24 11:44:32 +00:00
Chandler Carruth	7253bba458	[PM] Reformat this code with clang-format so that I can use clang-format when refactoring for the new pass manager without introducing too many formatting changes into meaning full diffs. llvm-svn: 227000	2015-01-24 11:33:55 +00:00
Chandler Carruth	43e590e51f	[PM] Port LowerExpectIntrinsic to the new pass manager. This just lifts the logic into a static helper function, sinks the legacy pass to be a trivial wrapper of that helper fuction, and adds a trivial wrapper for the new PM as well. Not much to see here. I switched a test case to run in both modes, but we have to strip the dead prototypes separately as that pass isn't in the new pass manager (yet). llvm-svn: 226999	2015-01-24 11:13:02 +00:00
Chandler Carruth	c3bf5bd8cf	[PM] Change LowerExpectIntrinsic to actually return true when it has changed the IR. This is particularly easy as we can just look for the existence of any expect intrinsic at all to know whether we've changed the IR. llvm-svn: 226998	2015-01-24 11:12:57 +00:00
Chandler Carruth	6eb60eb5c9	[PM] Use a more appropriate name for the statistics variable in lower-expect, as we don't have 'if's in the IR and we use it for switches as well. llvm-svn: 226997	2015-01-24 10:57:25 +00:00
Chandler Carruth	d12741e0a9	[PM] Switch tihs code to use a range based for loop over the function. We can't switch the loop over the instructions because it needs to early-increment the iterator. llvm-svn: 226996	2015-01-24 10:57:19 +00:00
Chandler Carruth	3f5e7b1fb6	[PM] Use a SmallVector instead of std::vector to avoid heap allocations for small switches, and avoid using a complex loop to set up the weights. We know what the baseline weights will be so we can just resize the vector to contain all that value and clobber the one slot that is likely. This seems much more direct than the previous code that tested at every iteration, and started off by zeroing the vector. llvm-svn: 226995	2015-01-24 10:47:13 +00:00
Chandler Carruth	0012c778a4	[PM] Pull the two helpers for this pass into static functions. There are no members for them to use. Also, make them accept references as there is no possibility of a null pointer. llvm-svn: 226994	2015-01-24 10:39:24 +00:00
Chandler Carruth	579c5c45c2	[PM] Add a basic doxygen comment for this pass. llvm-svn: 226993	2015-01-24 10:32:53 +00:00
Chandler Carruth	0ea746bf9f	[PM] Clean up the formatting of the LowerExpectIntrinsic pass prior to refactoring its code. llvm-svn: 226992	2015-01-24 10:30:14 +00:00
Chandler Carruth	72793727cc	[PM] Move the LowerExpectIntrinsic pass to the Scalar library. It was already in the Scalar header and referenced extensively as being in this library, the source file was just in the utils directory for some reason. No actual functionality changed. I noticed as it didn't make sense to add a pass header to the utils headers. llvm-svn: 226991	2015-01-24 10:18:47 +00:00
Sanjoy Das	351db05308	[NFC] Introduce a 'struct Range' for IRCE Use the struct instead of a std::pair<Value , Value >. This makes a Range an obviously immutable object, and we can now assert that a range is well-typed (Begin->getType() == End->getType()) on its construction. llvm-svn: 226804	2015-01-22 09:32:02 +00:00
Sanjoy Das	d1fb13ce4c	Fix crashes in IRCE caused by mismatched types There are places where the inductive range check elimination pass depends on two llvm::Values or llvm::SCEVs to be of the same llvm::Type when they do not need to be. This patch relaxes those restrictions (by bailing out of the optimization if the types mismatch), and adds test cases to trigger those paths. These issues were found by bootstrapping clang with IRCE running in the -O3 pass ordering. Differential Revision: http://reviews.llvm.org/D7082 llvm-svn: 226793	2015-01-22 08:29:18 +00:00
Adrian Prantl	565cc18d8f	Reapply: Teach SROA how to update debug info for fragmented variables. This reapplies r225379. ChangeLog: - The assertion that this commit previously ran into about the inability to handle indirect variables has since been removed and the backend can handle this now. - Testcases were upgrade to the new MDLocation format. - Instead of keeping a DebugDeclares map, we now use llvm::FindAllocaDbgDeclare(). Original commit message follows. Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 226598	2015-01-20 19:42:22 +00:00
Chandler Carruth	d450056c78	[PM] Replace the Pass argument to SplitEdge with specific analyses used and updated. This may appear to remove handling for things like alias analysis when splitting critical edges here, but in fact no callers of SplitEdge relied on this. Similarly, all of them wanted to preserve LCSSA if there was any update of the loop info. That makes the interface much simpler. With this, all of BasicBlockUtils.h is free of Pass arguments and prepared for the new pass manager. This is tho majority of utilities that relied on pass arguments. llvm-svn: 226459	2015-01-19 12:36:53 +00:00
Chandler Carruth	f8753fc48d	[PM] Cleanup a dead option to critical edge splitting that I noticed while refactoring this API for the new pass manager. No functionality changed here, the code didn't actually support this option. llvm-svn: 226457	2015-01-19 12:12:00 +00:00
Chandler Carruth	37df2cfbf8	[PM] Remove the Pass argument from all of the critical edge splitting APIs and replace it and numerous booleans with an option struct. The critical edge splitting API has a really large surface of flags and so it seems worth burning a small option struct / builder. This struct can be constructed with the various preserved analyses and then flags can be flipped in a builder style. The various users are now responsible for directly passing along their analysis information. This should be enough for the critical edge splitting to work cleanly with the new pass manager as well. This API is still pretty crufty and could be cleaned up a lot, but I've focused on this change just threading an option struct rather than a pass through the API. llvm-svn: 226456	2015-01-19 12:09:11 +00:00
Chandler Carruth	0eae112009	[PM] Lift the analyses into the interface for SplitLandingPadPredecessors and remove the Pass argument from its interface. Another step to the utilities being usable with both old and new pass managers. llvm-svn: 226426	2015-01-19 03:03:39 +00:00
Chandler Carruth	b5797b659f	[PM] Pull the analyses used for another utility routine into its API rather than relying on the pass object. This one is a bit annoying, but will pay off. First, supporting this one will make the next one much easier, and for utilities like LoopSimplify, this is moving them (slowly) closer to not having to pass the pass object around throughout their APIs. llvm-svn: 226396	2015-01-18 09:21:15 +00:00
Chandler Carruth	32c52c7e04	[PM] Sink the specific analyses preserved by SplitBlock into its interface, removing Pass from its interface. This also makes those analyses optional so that passes which don't even preserve these (or use them) can skip the logic entirely. llvm-svn: 226394	2015-01-18 02:39:37 +00:00
Chandler Carruth	b5c115357c	[PM] Replace another Pass argument with specific analyses that are optionally updated by MergeBlockIntoPredecessors. No functionality changed, just refactoring to clear the way for the new pass manager. llvm-svn: 226392	2015-01-18 02:11:23 +00:00
Chandler Carruth	94209094a5	[PM] Refactor how the LoopRotation pass access the DominatorTree. Instead of querying the pass every where we need to, do that once and cache a pointer in the pass object. This is both simpler and I'm about to add yet another place where we need to dig out that pointer. llvm-svn: 226391	2015-01-18 02:08:05 +00:00
Chandler Carruth	691addc25f	[PM] Now that LoopInfo isn't in the Pass type hierarchy, it is much cleaner to derive from the generic base. Thise removes a ton of boiler plate code and somewhat strange and pointless indirections. It also remove a bunch of the previously needed friend declarations. To fully remove these, I also lifted the verify logic into the generic LoopInfoBase, which seems good anyways -- it is generic and useful logic even for the machine side. llvm-svn: 226385	2015-01-18 01:25:51 +00:00
Chandler Carruth	4f8f307c77	[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373	2015-01-17 14:16:18 +00:00
Mehdi Amini	590a2700fc	Fix Reassociate handling of constant in presence of undef float http://reviews.llvm.org/D6993 llvm-svn: 226245	2015-01-16 03:00:58 +00:00
Sanjoy Das	a1837a342d	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. This pass was originally r226201. It was reverted because it used C++ features not supported by MSVC 2012. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226238	2015-01-16 01:03:22 +00:00
Sanjoy Das	7f62ac8e4d	Revert r226201 (Add a new pass "inductive range check elimination") The change used C++11 features not supported by MSVC 2012. I will fix the change to use things supported MSVC 2012 and recommit shortly. llvm-svn: 226216	2015-01-15 22:18:10 +00:00
David Majnemer	f1f72c9e43	InductiveRangeCheckElimination: Remove extra ';' This silences a GCC warning. llvm-svn: 226215	2015-01-15 21:55:16 +00:00
Sanjoy Das	7059e2959d	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226201	2015-01-15 20:45:46 +00:00
Chandler Carruth	b98f63dbdb	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
NAKAMURA Takumi	24ebfcb619	Update libdeps since TLI was moved from Target to Analysis in r226078. llvm-svn: 226126	2015-01-15 05:21:00 +00:00
Chandler Carruth	62d4215baa	[PM] Move TargetLibraryInfo into the Analysis library. While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078	2015-01-15 02:16:27 +00:00
Ramkumar Ramachandra	40c3e03e27	Standardize {pred,succ,use,user}_empty() The functions {pred,succ,use,user}_{begin,end} exist, but many users have to check _begin() with _end() by hand to determine if the BasicBlock or User is empty. Fix this with a standard *_empty(), demonstrating a few usecases. llvm-svn: 225760	2015-01-13 03:46:47 +00:00
Sanjay Patel	db8e6f472e	fix typo; NFC llvm-svn: 225753	2015-01-13 01:51:52 +00:00
Sanjay Patel	06d5589a84	80-cols; NFC llvm-svn: 225700	2015-01-12 21:21:28 +00:00
Sanjay Patel	5f1d9eaad3	GVN: propagate equalities for floating point compares Allow optimizations based on FP comparison values in the same way as integers. This resolves PR17713: http://llvm.org/bugs/show_bug.cgi?id=17713 Differential Revision: http://reviews.llvm.org/D6911 llvm-svn: 225660	2015-01-12 19:29:48 +00:00
Hal Finkel	38dd590861	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565	2015-01-10 00:30:55 +00:00
Tim Northover	eb16112e97	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Philip Reames	567feb98f0	[Refactor] Have getNonLocalPointerDependency take the query instruction Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all. I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here. The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores. llvm-svn: 225481	2015-01-09 00:04:22 +00:00
Adrian Prantl	2561bb8831	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424	2015-01-08 02:02:00 +00:00
Adrian Prantl	72b8ee708f	Reapply: Teach SROA how to update debug info for fragmented variables. The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379	2015-01-07 20:52:22 +00:00
Adrian Prantl	52f943b536	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288	2015-01-06 19:47:27 +00:00
Adrian Prantl	8335a5724a	Reapply: Teach SROA how to update debug info for fragmented variables. This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272	2015-01-06 17:14:10 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	5986b541d4	[SROA] Make the computation of adjusted pointers not leak GEP instructions. I noticed this when working on dialing up how aggressively we can pre-split loads and stores. My test case wasn't passing because dead GEPs into the allocas persisted when they were built by this routine. This isn't terribly harmful, we still rewrote and promoted the alloca and I can't conceive of how to cause this to happen in a case where we will keep the exact same alloca but rewrite and promote the uses of it. If that ever happened, we'd get an assert out of mem2reg. So I don't have a direct test case yet, but the subsequent commit's test case wouldn't pass without this. There are other problems fixed by this patch that I spotted purely by inspection such as the fact that getAdjustedPtr could have actually deleted dead base pointers. I don't know how to get a base pointer to go into getAdjustedPtr today, so I think this bug could never have manifested (and I certainly can't write a test case for it) but, it wasn't the intent of the code. The code really just wanted to GC the new instructions built. That can be done more directly by comparing with the base pointer which is the only non-new instruction that this code can return. llvm-svn: 225073	2015-01-02 02:47:38 +00:00
Chandler Carruth	29c22fae46	[SROA] Fix the loop exit placement to be prior to indexing the splits array. This prevents it from walking out of bounds on the splits array. Bug found with the existing tests by ASan and by the MSVC debug build. llvm-svn: 225069	2015-01-02 00:10:22 +00:00
Chandler Carruth	c39eaa5041	[SROA] Fix two total think-os in r225061 that should have been caught on a +asserts bootstrap, but my bootstrap had asserts off. Oops. Anyways, in some places it is reasonable to cast (as a sanity check) the pointer operand to a load or store to an instruction within SROA -- namely when the pointer operand is expected to be derived from an alloca, and thus always an instruction. However, the pre-splitting code also deals with loads and stores to non-alloca pointers and there we need to just use the Value*. Nothing about the code relied on the instruction cast, it was only there essentially as an invariant assertion. Remove the two that don't actually hold. This should fix the proximate issue in PR22080, but I'm also doing an asserts bootstrap myself to see if there are other issues lurking. I'll craft a reduced test case in a moment, but I wanted to get the tree healthy as quickly as possible. llvm-svn: 225068	2015-01-01 23:26:16 +00:00
Chandler Carruth	6044c0bc78	[SROA] Switch to using a more direct debug logging technique in one part of my new load and store splitting, and fix a bug where it logged a totally irrelevant slice rather than the actual slice in question. The logging here previously worked because we used to place new slices onto the back of the core sequence, but that caused other problems. I updated the actual code to store new slices in their own vector but didn't update the logging. There isn't a good way to reuse the logging any more, and frankly it wasn't needed. We can directly log this bit more easily. llvm-svn: 225063	2015-01-01 12:56:47 +00:00
Chandler Carruth	994cde8869	[SROA] Fix formatting with clang-format which I managed to fail to do prior to committing r225061. Sorry for that. llvm-svn: 225062	2015-01-01 12:01:03 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Philip Reames	b35f46ce06	Refine the notion of MayThrow in LICM to include a header specific version In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965	2014-12-29 23:00:57 +00:00
Chandler Carruth	ffb7ce56a6	[SROA] Update the documentation and names for accessing the slices within a partition of an alloca in SROA. This reflects the fact that the organization of the slices isn't really ideal for analysis, but is the naive way in which the slices are available while we're processing them in the core partitioning algorithm. It is possible we could improve matters, and I've left a FIXME with one of my ideas for how to do this, but it is a lot of work, the benefit is somewhat minor, and it isn't clear that it would be strictly better. =/ Not really satisfying, but I'm out of really good ideas. This also improves one place where the debug logging failed to mark some split partitions. Now we log in one place, slightly later, and with accurate information about whether the slice is split by the partition being rewritten. llvm-svn: 224800	2014-12-24 01:48:09 +00:00
Chandler Carruth	5031bbe86a	[SROA] Refactor the integer and vector promotion testing logic to operate in terms of the new Partition class, and generally have a more clear set of arguments. No functionality changed. The most notable improvements here are consistently using the terminology of 'partition' for a collection of slices that will be rewritten together and 'slice' for a region of an alloca that is used by a particular instruction. This also makes it more clear that the split things are actually slices as well, just ones that will be split by the proposed partition. This doesn't yet address the confusing aspects of the partition's interface where slices that will be split by the partition and start prior to the partition are accesssed via Partition::splitSlices() while the core range of slices exposed by a Partition includes both unsplit slices and slices which will be split by the end, but started within the offset range of the partition. This is particularly hard to address because the algorithm which computes partitions quite literally doesn't know which slices these will end up being until too late. I'm looking at whether I can fix that or not, but I'm not optimistic. I'll update the comments and/or names to further explain this either way. I've also added one FIXME in this patch relating to this confusion so that I don't forget about it. llvm-svn: 224798	2014-12-24 01:05:14 +00:00
Chandler Carruth	c7d1e24b34	Revert r224739: Debug info: Teach SROA how to update debug info for fragmented variables. This caused codegen to start crashing when we built somewhat large programs with debug info and optimizations. 'check-msan' hit in, and I suspect a bootstrap would as well. I mailed a test case to the review thread. llvm-svn: 224750	2014-12-23 02:58:14 +00:00
Chandler Carruth	e2f66ceed9	[SROA] Lift the logic for traversing the alloca slices one partition at a time into a partition iterator and a Partition class. There is a lot of knock-on simplification that this enables, largely stemming from having a Partition object to refer to in lots of helpers. I've only done a minimal amount of that because enoguh stuff is changing as-is in this commit. This shouldn't change any observable behavior. I've worked hard to preserve the exact traversal semantics which were originally present even though some of them make no sense. I'll be changing some of this in subsequent commits now that the logic is carefully factored into a reusable place. The primary motivation for this change is to break the rewriting into phases in order to support more intelligent rewriting. For example, I'm planning to change how split loads and stores are rewritten to remove the significant overuse of integer bit packing in the resulting code and allow more effective secondary splitting of aggregates. For any of this to work, they have to share the exact traversal logic. llvm-svn: 224742	2014-12-22 22:46:00 +00:00
Bruno Cardoso Lopes	bad65c3b70	[LCSSA] Handle PHI insertion in disjoint loops Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740	2014-12-22 22:35:46 +00:00
Adrian Prantl	a47ace5901	Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 224739	2014-12-22 22:26:00 +00:00
Chandler Carruth	113dc64c67	[SROA] Run clang-format over the entire SROA pass as I wrote it before much of the glory of clang-format, and now any time I touch it I risk introducing formatting changes as part of a functional commit. Also, clang-format is way better at formatting my code than I am. Most of this is a huge improvement although I reverted a couple of places where I hit a clang-format bug with lambdas that has been filed but not (fully) fixed. llvm-svn: 224666	2014-12-20 02:39:18 +00:00
Chandler Carruth	68ea415d04	[SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use a lambda now that we have them. llvm-svn: 224500	2014-12-18 05:19:47 +00:00
Elena Demikhovsky	a5599bfd72	Sink store based on alias analysis - by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247	2014-12-15 14:09:53 +00:00
Chad Rosier	78943bcc18	[Reassociate] Use dbgs() instead of errs(). llvm-svn: 224125	2014-12-12 14:44:12 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Tom Stellard	1f0dded057	StructurizeCFG: Use LoopInfo analysis for better loop detection We were assuming that each back-edge in a region represented a unique loop, which is not always the case. We need to use LoopInfo to correctly determine which back-edges are loops. llvm-svn: 223199	2014-12-03 04:28:32 +00:00
Bruno Cardoso Lopes	d035fbb96f	[LICM] Avoind store sinking if no preheader is available Load instructions are inserted into loop preheaders when sinking stores and later removed if not used by the SSA updater. Avoid sinking if the loop has no preheader and avoid crashes. This fixes one more side effect of not handling indirectbr instructions properly on LoopSimplify. llvm-svn: 223119	2014-12-02 14:22:34 +00:00
Bruno Cardoso Lopes	46d5bf2982	[LICM] Store sink and indirectbr instructions Loop simplify skips exit-block insertion when exits contain indirectbr instructions. This leads to an assertion in LICM when trying to sink stores out of non-dedicated loop exits containing indirectbr instructions. This patch fix this issue by re-checking for dedicated exits in LICM prior to store sink attempts. Differential Revision: http://reviews.llvm.org/D6414 rdar://problem/18943047 llvm-svn: 222927	2014-11-28 19:47:46 +00:00
Chandler Carruth	1a3c2c414c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
David Majnemer	1f44142e4e	This Reassociate change unintentionally slipped in r222499 llvm-svn: 222500	2014-11-21 02:37:38 +00:00
David Majnemer	c0a313b57c	SROA: The alloca type isn't a candidate promotion type for vectors The alloca's type is irrelevant, only those types which are used in a load or store of the exact size of the slice should be considered. This manifested as an assertion failure when we compared the various types: we had a size mismatch. This fixes PR21480. llvm-svn: 222499	2014-11-21 02:34:55 +00:00
Chad Rosier	90a2f9b110	Revert "[Reassociate] As the expression tree is rewritten make sure the operands are" This reverts commit r222142. This is causing/exposing an execution-time regression in spec2006/gcc and coremark on AArch64/A57/Ofast. Conflicts: test/Transforms/Reassociate/optional-flags.ll llvm-svn: 222398	2014-11-19 23:21:20 +00:00
Arnaud A. de Grandmaison	7b9dc28060	Fix tail recursion elimination When the BasicBlock containing the return instrution has a PHI with 2 incoming values, FoldReturnIntoUncondBranch will remove the no longer used incoming value and remove the no longer needed phi as well. This leaves us with a BB that no longer has a PHI, but the subsequent call to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not remove the return instruction (which still uses the result of the call instruction). This prevents EliminateRecursiveTailCall to remove the value, as it is still being used in a basicblock which has no predecessors. The basicblock can not be erased on the spot, because its iterator is still being used in runTRE. This issue was exposed when removing the threshold on size for lifetime marker insertion for named temporaries in clang. The testcase is a much reduced version of peelOffOuterExpr(const Expr, const ExplodedNode ) from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp. llvm-svn: 222354	2014-11-19 13:32:51 +00:00
David Blaikie	70573dcd9f	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Hao Liu	1d2a061bd8	[SeparateConstOffsetFromGEP] Allow SeparateConstOffsetFromGEP pass to lower GEPs. If LowerGEP is enabled, it can lower a GEP with multiple indices into GEPs with a single index or arithmetic operations. Lowering GEPs can always extract structure indices. Lowering GEPs can also give use more optimization opportunities. It can benefit passes like CSE, LICM and CGP. Reviewed in http://reviews.llvm.org/D5864 llvm-svn: 222328	2014-11-19 06:24:44 +00:00
Manman Ren	c67109313c	Revert r222039 because of bot failure. http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/ Hopefully, bot will be green. If not, we will re-submit the commit. llvm-svn: 222287	2014-11-19 00:13:26 +00:00
Chad Rosier	e53e8c8e58	[Reassociate] Rename local variable to not use same name as a member variable. NFC. llvm-svn: 222248	2014-11-18 20:21:54 +00:00
Philip Reames	018dbf18c4	Tweak EarlyCSE to recognize series of dead stores EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!). Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D6301 llvm-svn: 222241	2014-11-18 17:46:32 +00:00
David Majnemer	9a91e4a18a	IndVarSimplify: Allow LFTR to fire more often I added a pessimization in r217102 to prevent miscompiles when the incremented induction variable was used in a comparison; it would be poison. Try to use the incremented induction variable more often when we can be sure that the increment won't end in poison. Differential Revision: http://reviews.llvm.org/D6222 llvm-svn: 222213	2014-11-18 02:20:58 +00:00
Chad Rosier	bc0b869be9	[Reassociate] As the expression tree is rewritten make sure the operands are emitted in canonical form. llvm-svn: 222142	2014-11-17 16:33:50 +00:00
Chad Rosier	9a1ac6e494	[Reassociate] Canonicalize constants to RHS operand. Fix a thinko where the RHS was already a constant. llvm-svn: 222139	2014-11-17 15:52:51 +00:00
Chad Rosier	1ff4c0bf0b	Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" This commit updates the failing test in Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll The failing test is sensitive to the order in which we process loads. This version turns on the RPO traversal instead of the while DT traversal in GVN. The new test code is functionally same just the order of loads that are eliminated is swapped. This new version also fixes an issue where GVN splits a critical edge and potentially invalidate the RPO/DT iterator. llvm-svn: 222039	2014-11-14 21:09:13 +00:00
Chad Rosier	df8f2a23cb	[Reassociate] Canonicalize the operands of all binary operators. llvm-svn: 222008	2014-11-14 17:09:19 +00:00
Chad Rosier	d99df68e19	[Reassociate] Canonicalize operands of vector binary operators. Prior to this commit fmul and fadd binary operators were being canonicalized for both scalar and vector versions. We now canonicalize add, mul, and, or, and xor vector instructions. llvm-svn: 222006	2014-11-14 17:08:15 +00:00
Chad Rosier	f8b55f1bc5	[Reassociate] Canonicalize constants to RHS operand. llvm-svn: 222005	2014-11-14 17:05:59 +00:00
Chad Rosier	f59e548ba7	[Reassociate] Improve rank debug information. NFC. llvm-svn: 221999	2014-11-14 15:01:38 +00:00
Chad Rosier	8716b58583	Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE." This reverts commit r221924. It appears the commit was a bit premature and is causing bot failures that need further investigation. llvm-svn: 221939	2014-11-13 22:54:59 +00:00
Chad Rosier	dd526665fc	[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE. Phabricator Revision: http://reviews.llvm.org/D6103 Patch by "Balaram Makam" <bmakam@codeaurora.org>! llvm-svn: 221924	2014-11-13 21:17:58 +00:00
Chad Rosier	9074b18785	[Reassociate] Update comment. NFC. llvm-svn: 221894	2014-11-13 15:40:20 +00:00
Jingyue Wu	8a12cea5f1	Disable indvar widening if arithmetics on the wider type are more expensive Summary: Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test was run whether NVPTX was enabled or not. IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. This test is run only when NVPTX is enabled. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221799	2014-11-12 18:09:15 +00:00
Jingyue Wu	a48273390c	Reverts r221772 which fails tests llvm-svn: 221773	2014-11-12 07:19:25 +00:00
Jingyue Wu	635a9b14fa	Disable indvar widening if arithmetics on the wider type are more expensive Summary: IndVarSimplify should not widen an indvar if arithmetics on the wider indvar are more expensive than those on the narrower indvar. For instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is twice as expensive as that on i32, because the hardware needs to simulate a 64-bit integer using two 32-bit integers. Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo. Fixes PR21148. Test Plan: Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics on the wider type are more expensive. Reviewers: jholewinski, eliben, meheff, atrick Reviewed By: atrick Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6196 llvm-svn: 221772	2014-11-12 06:58:45 +00:00
Chad Rosier	f53f07046b	[Reassociate] Canonicalize negative constants out of expressions. Add support for FDiv, which was regressed by the previous commit. llvm-svn: 221738	2014-11-11 23:36:42 +00:00
Chad Rosier	094ac7735b	[Reassociate] Canonicalize negative constants out of expressions. This is a reapplication of r221171, but we only perform the transformation on expressions which include a multiplication. We do not transform rem/div operations as this doesn't appear to be safe in all cases. llvm-svn: 221721	2014-11-11 22:58:35 +00:00
Duncan P. N. Exon Smith	de36e8040f	Revert "IR: MDNode => Value" Instead, we're going to separate metadata from the Value hierarchy. See PR21532. This reverts commit r221375. This reverts commit r221373. This reverts commit r221359. This reverts commit r221167. This reverts commit r221027. This reverts commit r221024. This reverts commit r221023. This reverts commit r220995. This reverts commit r220994. llvm-svn: 221711	2014-11-11 21:30:22 +00:00
Chad Rosier	b3eb452e83	[Reassociate] Better preserve NSW/NUW flags. Part of PR12985. Phabricator Revision: http://reviews.llvm.org/D6172 llvm-svn: 221555	2014-11-07 22:12:57 +00:00
David Majnemer	2098b86f64	SCCP: overdefined calls cannot become constant We would attempt to fold away a call instruction which had been marked overdefined. However, it's not valid to transition to constant from overdefined. This fixes PR21512. llvm-svn: 221513	2014-11-07 08:54:19 +00:00
Chad Rosier	ac6a2f532c	[Reassociate] Don't reassociate when mixing regular and fast-math FP instructions. Inlining might cause such cases and it's not valid to reassociate floating-point instructions without the unsafe algebra flag. Patch by Mehdi Amini <mehdi_amini@apple.com>! llvm-svn: 221462	2014-11-06 16:46:37 +00:00
Reid Kleckner	941e93e9a8	Revert "[Reassociate] Canonicalize negative constants out of expressions." This reverts commit r221171. It performs this invalid transformation: - %div.i = urem i64 -1, %add - %sub.i = sub i64 -2, %div.i + %div.i = urem i64 1, %add + %sub.i1 = add i64 %div.i, -2 llvm-svn: 221317	2014-11-04 23:42:45 +00:00
Reid Kleckner	dd3f3edafa	Revert "Transforms: reapply SVN r219899" This reverts commit r220811 and r220839. It made an incorrect change to musttail handling. llvm-svn: 221226	2014-11-04 02:02:14 +00:00
Hal Finkel	840257a49c	Use AA in LoadCombine LoadCombine can be smarter about aborting when a writing instruction is encountered, instead of aborting upon encountering any writing instruction, use an AliasSetTracker, and only abort when encountering some write that might alias with the loads that could potentially be combined. This was originally motivated by comments made (and a test case provided) by David Majnemer in response to PR21448. It turned out that LoadCombine was not responsible for that PR, but LoadCombine should also be improved so that unrelated stores (and @llvm.assume) don't interrupt load combining. llvm-svn: 221203	2014-11-03 23:19:16 +00:00
Hal Finkel	1e16fa302e	EarlyCSE should ignore calls to @llvm.assume EarlyCSE uses a simple generation scheme for handling memory-based dependencies, and calls to @llvm.assume (which are marked as writing to memory to ensure the preservation of control dependencies) disturb that scheme unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative (adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that). Fixes PR21448. llvm-svn: 221175	2014-11-03 20:21:32 +00:00
Chad Rosier	005505b027	[Reassociate] Canonicalize negative constants out of expressions. This gives CSE/GVN more options to eliminate duplicate expressions. This is a follow up patch to http://reviews.llvm.org/D4904. http://reviews.llvm.org/D5363 llvm-svn: 221171	2014-11-03 19:11:30 +00:00
Duncan P. N. Exon Smith	3d5a02f677	IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc() Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of `MDNode` to one of `Value`. Part of PR21433. llvm-svn: 221167	2014-11-03 18:13:57 +00:00
Diego Novillo	fcd556074c	Use ErrorOr for the ::create factory on instrumented and sample profilers. Summary: As discussed in http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141027/242445.html, the creation of reader and writer instances is better done using ErrorOr. There are no functional changes, but several callers needed to be adjusted. Reviewers: bogner Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6076 llvm-svn: 221120	2014-11-03 00:51:45 +00:00
Elena Demikhovsky	27152aea88	Use Alias Analysis to hoist 2 loads from diamond to the common predecessor basic block. Alias Analysis allows to detect real barriers for load hoisting. Review in http://reviews.llvm.org/D5991 llvm-svn: 221091	2014-11-02 08:03:05 +00:00
Diego Novillo	77a5a5fcda	Fix Twine corruption problem with diagnostics. This fixes the autobuilders I broke with a recent patch. Thanks echristo and dblaikie for beating me with a clue stick. llvm-svn: 220918	2014-10-30 18:48:41 +00:00
Diego Novillo	c572e92c76	Add profile writing capabilities for sampling profiles. Summary: This patch finishes up support for handling sampling profiles in both text and binary formats. The new binary format uses uleb128 encoding to represent numeric values. This makes profiles files about 25% smaller. The profile writer class can write profiles in the existing text and the new binary format. In subsequent patches, I will add the capability to read (and perhaps write) profiles in the gcov format used by GCC. Additionally, I will be adding support in llvm-profdata to manipulate sampling profiles. There was a bit of refactoring needed to separate some code that was in the reader files, but is actually common to both the reader and writer. The new test checks that reading the same profile encoded as text or raw, produces the same results. Reviewers: bogner, dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6000 llvm-svn: 220915	2014-10-30 18:00:06 +00:00
Yi Jiang	ab19fff4d8	Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance llvm-svn: 220872	2014-10-29 20:19:47 +00:00
Saleem Abdulrasool	d178ada55e	Transforms: reapply SVN r219899 This restores the commit from SVN r219899 with an additional change to ensure that the CodeGen is correct for the case that was identified as being incorrect (originally PR7272). In the case that during inlining we need to synthesize a value on the stack (i.e. for passing a value byval), then any function involving that alloca must be stripped of its tailness as the restriction that it does not access the parent's stack no longer holds. Unfortunately, a single alloca can cause a rippling effect through out the inlining as the value may be aliased or may be mutated through an escaped external call. As such, we simply track if an alloca has been introduced in the frame during inlining, and strip any tail calls. llvm-svn: 220811	2014-10-28 18:27:37 +00:00
NAKAMURA Takumi	d0e13af22c	Reformat partially, where I touched for whitespace changes. llvm-svn: 220773	2014-10-28 11:54:52 +00:00
NAKAMURA Takumi	5af50a5470	LoopRerollPass.cpp: Use range-based loop. NFC. llvm-svn: 220772	2014-10-28 11:54:05 +00:00
NAKAMURA Takumi	335a7bcf1e	Untabify and whitespace cleanups. llvm-svn: 220771	2014-10-28 11:53:30 +00:00
Andrew Trick	dd925ad218	LSR: Minor cleanup after Daniel's patch. Combine the Inserted an Done sets into a Visited set. llvm-svn: 220623	2014-10-25 19:59:30 +00:00
Andrew Trick	9ccbed5a12	Fix LSR compile time. This is a simple fix that brings the compilation time from 5min to 5s on a specific real-world example. It's a large chain of computation in a crypto routine (always a problem for SCEV). A unit test is not feasible and there would be no way to check it. The fix is just basic good practice for dealing with SCEVs, there's no risk of regression. Patch by Daniel Reynaud! llvm-svn: 220622	2014-10-25 19:42:07 +00:00
Jingyue Wu	fe72fcebf6	[SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo The dividend in "signed % unsigned" is treated as unsigned instead of signed, causing unexpected behavior such as -64 % (uint64_t)24 == 0. Added a regression test in split-gep.ll Patched by Hao Liu. llvm-svn: 220618	2014-10-25 18:34:03 +00:00
Jingyue Wu	b723152379	[SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions The two operands of the new OR expression should be NextInChain and TheOther instead of the two original operands. Added a regression test in split-gep.ll. Hao Liu reported this bug, and provded the test case and an initial patch. Thanks! llvm-svn: 220615	2014-10-25 17:36:21 +00:00
Timur Iskhodzhanov	eb229ca928	Make getDISubprogram(const Function *F) available in LLVM Reviewed at http://reviews.llvm.org/D5950 llvm-svn: 220536	2014-10-23 23:46:28 +00:00
Diego Novillo	19e7b7e27c	Shorten auto iterators for function basic blocks. Use consistent naming for basic block instances. No functional changes. llvm-svn: 220404	2014-10-22 18:39:50 +00:00
Diego Novillo	b368b7d558	Use auto iteration in lib/Transforms/Scalar/SampleProfile.cpp. No functional changes. llvm-svn: 220394	2014-10-22 16:51:50 +00:00
Diego Novillo	a67c0b43e1	Change error to warning when a profile cannot be found. When the profile for a function cannot be applied, we use to emit an error. This seems extreme. The compiler can continue, it's just that the optimization opportunities won't include profile information. llvm-svn: 220386	2014-10-22 13:36:35 +00:00
Diego Novillo	8027b80b41	Support using sample profiles with partial debug info. Summary: When using a profile, we used to require the use -gmlt so that we could get access to the line locations. This is used to match line numbers in the input profile to the line numbers in the function's IR. But this is actually not necessary. The driver can provide source location tracking without the emission of debug information. In these cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the actual line location annotations are still present. This patch adds a new way of looking for the start of the current function. Instead of looking through the compile units in llvm.dbg.cu, we can walk up the scope for the first instruction in the function with a debug loc. If that describes the function, we use it. Otherwise, we keep looking until we find one. If no such instruction is found, we then give up and produce an error. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5887 llvm-svn: 220382	2014-10-22 12:59:00 +00:00
Hans Wennborg	0b39fc0d16	Revert "Teach the load analysis to allow finding available values which require" (r220277) This seems to have caused PR21330. llvm-svn: 220349	2014-10-21 23:49:52 +00:00
Chandler Carruth	aa72a6dd3b	Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277	2014-10-21 09:00:40 +00:00
Philip Reames	5a3f5f751b	Introduce enum values for previously defined metadata types. (NFC) Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248	2014-10-21 00:13:20 +00:00
Chandler Carruth	6665d62117	Fix a somewhat subtle pair of issues with JumpThreading I introduced in r220178. First, the creation routine doesn't insert prior to the terminator of the basic block provided, but really at the end of the basic block. Instead, get the terminator and insert before that. The next issue was that we need to ensure multiple PHI node entries for a single predecessor re-use the same cast instruction rather than creating new ones. All of the logic here was without tests previously. I've reduced and added a test case from the test suite that crashed without both of these fixes. llvm-svn: 220186	2014-10-20 05:34:36 +00:00
Chandler Carruth	eeec35ae1c	Teach the load analysis driving core instcombine logic and other bits of logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178	2014-10-20 00:24:14 +00:00
Chandler Carruth	a801dd5799	Fix a long-standing miscompile in the load analysis that was uncovered by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161	2014-10-19 08:17:50 +00:00
Chandler Carruth	2dc9682e59	[SROA] Change how SROA does vector-based promotion of allocas to handle cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the very few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116	2014-10-18 00:44:02 +00:00
Rafael Espindola	7da1ea83a9	Revert "TRE: make TRE a bit more aggressive" This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093	2014-10-17 21:25:48 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Chandler Carruth	8393406f05	[SROA] Switch the common variable name for the 'AllocaSlices' class to 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963	2014-10-16 21:11:55 +00:00
Chandler Carruth	61747042c1	[SROA] More range-based cleanups to SROA, these brought to you by clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962	2014-10-16 21:05:14 +00:00
Chandler Carruth	57d4cae202	[SROA] Switch a couple of overly complex iterator accessors to just be ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958	2014-10-16 20:42:08 +00:00
Chandler Carruth	c659df9389	[SROA] Start more deeply moving SROA to use ranges rather than just iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955	2014-10-16 20:24:07 +00:00
Bjorn Steinbrink	d20816fde9	Allow call-slop optzn for destinations with a suitable dereferenceable attribute Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950	2014-10-16 19:43:08 +00:00
Saleem Abdulrasool	7f52921976	TRE: make TRE a bit more aggressive Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899	2014-10-16 03:27:30 +00:00
Chris Bieneman	5c4e9551c9	Fixing the build failure due to compiler warnings and unnecessary disambiguation. llvm-svn: 219861	2014-10-15 23:11:35 +00:00
Chris Bieneman	732e0aa9fb	Defining a new API for debug options that doesn't rely on static global cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854	2014-10-15 21:54:35 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Chad Rosier	bd64d46188	[Reassociate] Don't canonicalize X - undef to X + (-undef). Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434	2014-10-09 20:06:29 +00:00
Owen Anderson	8373d338f6	Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092	2014-10-05 23:41:26 +00:00
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Zinovy Nis	ccc3e3733b	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867	2014-10-02 13:01:15 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Jingyue Wu	fc0296704c	[SimplifyCFG] threshold for folding branches with common destination Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711	2014-09-30 22:23:38 +00:00
Chad Rosier	aab5d7bd33	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659	2014-09-30 03:17:42 +00:00
Chad Rosier	7b974b73ae	[IndVar] Don't widen loop compare unless IV user is sign extended. PR21030 llvm-svn: 218539	2014-09-26 20:05:35 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Michael Liao	d120916ca7	Allow BB duplication threshold to be adjusted through JumpThreading's ctor - BB duplication may not be desired on targets where there is no or small branch penalty and code duplication needs restrict control. llvm-svn: 218375	2014-09-24 04:59:06 +00:00
Lenny Maiorani	9eefc81219	Using a deque to manage the stack of nodes is faster here. Vector is slow due to many reallocations as the size regularly changes in unpredictable ways. See the investigation provided on the mailing list for more information: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120116/135228.html llvm-svn: 218182	2014-09-20 13:29:20 +00:00
Eric Christopher	d85ffb1fc0	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Juergen Ributzka	14ae60407d	[C API] Make the 'lower switch' pass available via the C API. llvm-svn: 217630	2014-09-11 21:32:32 +00:00
Hal Finkel	f83e1f7f66	[AlignmentFromAssumptions] Don't crash just because the target is 32-bit We used to crash processing any relevant @llvm.assume on a 32-bit target (because we'd ask SE to subtract expressions of differing types). I've copied our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get some meaningful test coverage here. llvm-svn: 217574	2014-09-11 08:40:17 +00:00
Hal Finkel	71b7084112	[AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment The routine that determines an alignment given some SCEV returns zero if the answer is unknown. In a case where we could determine the increment of an AddRec but not the starting alignment, we would compute the integer modulus by zero (which is illegal and traps). Prevent this by returning early if either the start or increment alignment is unknown (zero). llvm-svn: 217544	2014-09-10 21:05:52 +00:00
Gerolf Hoflehner	24815d9b8f	[MergedLoadStoreMotion] Move pass enabling option to PassManagerBuilder llvm-svn: 217538	2014-09-10 19:55:29 +00:00
Gerolf Hoflehner	e4f6684d1b	Removed misleading comment. llvm-svn: 217527	2014-09-10 17:54:50 +00:00
NAKAMURA Takumi	1ab0cf0e28	SampleProfile.cpp: Prune a stray \param added in r217437. [-Wdocumentation] llvm-svn: 217465	2014-09-09 22:44:30 +00:00
NAKAMURA Takumi	bb4fac9050	ScalarOpts/LLVMBuild.txt: Prune unused dependency to IPA. llvm-svn: 217448	2014-09-09 15:00:38 +00:00
NAKAMURA Takumi	37ffecf06b	ScalarOpts/LLVMBuild.txt: Reorder. llvm-svn: 217447	2014-09-09 15:00:26 +00:00
Diego Novillo	de1ab26f52	Re-factor sample profile reader into lib/ProfileData. Summary: This patch moves the profile reading logic out of the Sample Profile transformation into a generic profile reader facility in lib/ProfileData. The intent is to use this new reader to implement a sample profile reader/writer that can be used to convert sample profiles from external sources into LLVM. This first patch introduces no functional changes. It moves the profile reading code from lib/Transforms/SampleProfile.cpp into lib/ProfileData/SampleProfReader.cpp. In subsequent patches I will: - Add a bitcode format for sample profiles to allow for more efficient encoding of the profile. - Add a writer for both text and bitcode format profiles. - Add a 'convert' command to llvm-profdata to be able to convert between the two (and serve as entry point for other sample profile formats). Reviewers: bogner, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5250 llvm-svn: 217437	2014-09-09 12:40:50 +00:00
Andrew Trick	8fc3c6c093	Add a comment to getNewAlignmentDiff. llvm-svn: 217350	2014-09-07 23:16:24 +00:00
Hal Finkel	7e1844940e	Make use of @llvm.assume from LazyValueInfo This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with the known-bits change (r217342), this requires feeding a "context" instruction pointer through many functions. Aside from a little refactoring to reuse the logic that turns predicates into constant ranges in LVI, the only new code is that which can 'merge' the range from an assumption into that otherwise computed. There is also a small addition to JumpThreading so that it can have LVI use assumptions in the same block as the comparison feeding a conditional branch. With this patch, we can now simplify this as expected: int foo(int a) { __builtin_assume(a > 5); if (a > 3) { bar(); return 1; } return 0; } llvm-svn: 217345	2014-09-07 20:29:59 +00:00
Hal Finkel	d67e463901	Add an AlignmentFromAssumptions Pass This adds a ScalarEvolution-powered transformation that updates load, store and memory intrinsic pointer alignments based on invariant((a+q) & b == 0) expressions. Many of the simple cases we can get with ValueTracking, but we still need something like this for the more complicated cases (such as those with an offset) that require some algebra. Note that gcc's __builtin_assume_aligned's optional third argument provides exactly for this kind of 'misalignment' offset for which this kind of logic is necessary. The primary motivation is to fixup alignments for vector loads/stores after vectorization (and unrolling). This pass is added to the optimization pipeline just after the SLP vectorizer runs (which, admittedly, does not preserve SE, although I imagine it could). Regardless, I actually don't think that the preservation matters too much in this case: SE computes lazily, and this pass won't issue any SE queries unless there are any assume intrinsics, so there should be no real additional cost in the common case (SLP does preserve DT and LoopInfo). llvm-svn: 217344	2014-09-07 20:05:11 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00
Hal Finkel	74c2f355d2	Add an Assumption-Tracking Pass This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334	2014-09-07 12:44:26 +00:00
Tilmann Scheller	faabbb5fb6	[GVN] Format variable name. Local variables need to start with an upper case letter. llvm-svn: 217133	2014-09-04 06:38:00 +00:00
David Majnemer	13046deef3	IndVarSimplify: Address review comments for r217102 No functional change intended, just some cleanups and comments added. llvm-svn: 217115	2014-09-04 00:23:13 +00:00
David Majnemer	c6ab01ecca	IndVarSimplify: Don't let LFTR compare against a poison value LinearFunctionTestReplace tries to use the next indvar to compare against when possible. However, it may be the case that the calculation for the next indvar has NUW/NSW flags and that it may only be safely used inside the loop. Using it in a comparison to calculate the exit condition could result in observing poison. This fixes PR20680. Differential Revision: http://reviews.llvm.org/D5174 llvm-svn: 217102	2014-09-03 23:03:18 +00:00
Benjamin Kramer	89854ebe8e	Make some helpers static or move into the llvm namespace. llvm-svn: 217077	2014-09-03 21:04:12 +00:00
David Majnemer	49428105aa	LICM: Don't crash when an instruction is used by an unreachable BB Summary: BBs might contain non-LCSSA'd values after the LCSSA pass is run if they are unreachable from the entry block. Normally, the users of the instruction would be PHIs but the unreachable BBs have normal users; rewrite their uses to be undef values. An alternative fix could involve fixing this at LCSSA but that would require this invariant to hold after subsequent transforms. If a BB created an unreachable block, they would be in violation of this. This fixes PR19798. Differential Revision: http://reviews.llvm.org/D5146 llvm-svn: 216911	2014-09-02 16:22:00 +00:00
David Majnemer	d4cffcf073	SROA: Don't insert instructions before a PHI SROA may decide that it needs to insert a bitcast and would set it's insertion point before a PHI. This will create an invalid module right quick. Instead, choose the first insertion point in the basic block that holds our PHI. This fixes PR20822. Differential Revision: http://reviews.llvm.org/D5141 llvm-svn: 216891	2014-09-01 21:20:14 +00:00
Chandler Carruth	18cee1defc	Fix a really bad miscompile introduced in r216865 - the else-if logic chain became completely broken here as all intrinsic users ended up being skipped, and the ones that seemed to be singled out were actually the exact wrong set. This is a great example of why long else-if chains can be easily confusing. Switch the entire code to use early exits and early continues to have simpler (and more importantly, correct) logic here, as well as fixing the reversed logic for detecting and continuing on lifetime intrinsics. I've also significantly cleaned up the test case and added another test case demonstrating an example where the optimization is not (trivially) safe to perform. llvm-svn: 216871	2014-09-01 10:09:18 +00:00
Nick Lewycky	fc243d54d2	Ignore lifetime intrinsics in use list for MemCpyOptimizer. Patch by Luqman Aden, review by Hal Finkel. llvm-svn: 216865	2014-09-01 06:03:11 +00:00
Craig Topper	e1d1294853	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created. llvm-svn: 216525	2014-08-27 05:25:25 +00:00
Craig Topper	4627679cec	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00
Jingyue Wu	ec33fa9aca	[SROA] Fold a PHI node if all its incoming values are the same Summary: Fixes PR20425. During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote. Test Plan: Added three more tests in phi-and-select.ll Reviewers: nlewycky, eliben, meheff, chandlerc Reviewed By: chandlerc Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits Differential Revision: http://reviews.llvm.org/D4659 llvm-svn: 216299	2014-08-22 22:45:57 +00:00
Reid Kleckner	c36f48f08a	SROA: Handle a case of store size being smaller than allocation size In this case, we are creating an x86_fp80 slice for a union from C where the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes, and that's just fine. We can't, however, use regular loads and stores to access the slice, because the store size is only 10 bytes / 80 bits. Instead, use memcpy and memset. Fixes PR18726. Reviewed By: chandlerc Differential Revision: http://reviews.llvm.org/D5012 llvm-svn: 216248	2014-08-22 00:09:56 +00:00
Zinovy Nis	33406da5f4	[CLNUP] Remove return after llvm_unreachable. Thanks to Hal Finkel for pointing. llvm-svn: 216176	2014-08-21 13:30:05 +00:00
Erik Verbruggen	2b98bd2a80	Reassociate x + -0.1234 * y into x - 0.1234 * y This does not require -ffast-math, and it gives CSE/GVN more options to eliminate duplicate expressions in, e.g.: return ((x + 0.1234 * y) * (x - 0.1234 * y)); Differential Revision: http://reviews.llvm.org/D4904 llvm-svn: 216169	2014-08-21 10:45:30 +00:00
Zinovy Nis	0a36cba29d	[INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions. Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like: int N = 100; float *A; void foo(int x0, int x1) { float A_cur = &A[0][0]; float * A_next = &A[1][0]; for(int x = x0; x < x1; ++x). { // Currently only [x+N] case is widened. Others 2 cases lead to sext. // This patch fixes it, so all 3 cases do not need sext. const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N]; A_next[x] = div; } } ... > clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o - Differential Revision: http://reviews.llvm.org/D4695 llvm-svn: 216160	2014-08-21 08:25:45 +00:00
Craig Topper	71b7b68b74	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
Craig Topper	6230691c91	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	5229cfd163	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Rafael Espindola	ea46c32f81	Introduce a helper to combine instruction metadata. Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use it. Patch by Björn Steinbrink! llvm-svn: 215723	2014-08-15 15:46:38 +00:00
Chad Rosier	11ab941644	[Reassociation] Add support for reassociation with unsafe algebra. Vector instructions are (still) not supported for either integer or floating point. Hopefully, that work will be landed shortly. llvm-svn: 215647	2014-08-14 15:23:01 +00:00
Jan Vesely	5a956d49f7	Initialize FlattenCFG pass Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215573	2014-08-13 20:31:52 +00:00
Gerolf Hoflehner	ea96a3d336	Fix for multi-line comment warning llvm-svn: 215169	2014-08-07 23:19:55 +00:00
Owen Anderson	6c19ab1b5d	Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In this case, the code path dealing with vector promotion was missing the explicit checks for lifetime intrinsics that were present on the corresponding integer promotion path. llvm-svn: 215148	2014-08-07 21:07:35 +00:00
Rui Ueyama	c487f7728e	Revert "r214897 - Remove dead zero store to calloc initialized memory" It broke msan. llvm-svn: 214989	2014-08-06 19:30:38 +00:00
JF Bastien	ac8b66b32c	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
Philip Reames	00c9b6461f	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
Peter Collingbourne	e52646cd80	PartiallyInlineLibCalls: Check sqrt result type before transforming it. Some configure scripts declare this with the wrong prototype, which can lead to an assertion failure. llvm-svn: 214593	2014-08-01 23:21:21 +00:00
Rafael Espindola	3f6481d0d3	Remove some calls to std::move. Instead of moving out the data in a ErrorOr<std::unique_ptr<Foo>>, get a reference to it. Thanks to David Blaikie for the suggestion. llvm-svn: 214516	2014-08-01 14:31:55 +00:00
Aaron Ballman	573f3b5313	Fixing a few -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended. llvm-svn: 214325	2014-07-30 19:23:59 +00:00
Mark Heffernan	8ec1474f7f	After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor hint) the loop unroller replaces the llvm.loop.unroll.count metadata with llvm.loop.unroll.disable metadata to prevent any subsequent unrolling passes from unrolling more than the hint indicates. This patch fixes an issue where loop unrolling could be disabled for other loops as well which share the same llvm.loop metadata. llvm-svn: 213900	2014-07-24 22:36:40 +00:00
Hal Finkel	9414665a3b	Add scoped-noalias metadata This commit adds scoped noalias metadata. The primary motivations for this feature are: 1. To preserve noalias function attribute information when inlining 2. To provide the ability to model block-scope C99 restrict pointers Neither of these two abilities are added here, only the necessary infrastructure. In fact, there should be no change to existing functionality, only the addition of new features. The logic that converts noalias function parameters into this metadata during inlining will come in a follow-up commit. What is added here is the ability to generally specify noalias memory-access sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA nodes: !scope0 = metadata !{ metadata !"scope of foo()" } !scope1 = metadata !{ metadata !"scope 1", metadata !scope0 } !scope2 = metadata !{ metadata !"scope 2", metadata !scope0 } !scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 } !scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 } Loads and stores can be tagged with an alias-analysis scope, and also, with a noalias tag for a specific scope: ... = load %ptr1, !alias.scope !{ !scope1 } ... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 } When evaluating an aliasing query, if one of the instructions is associated with an alias.scope id that is identical to the noalias scope associated with the other instruction, or is a descendant (in the scope hierarchy) of the noalias scope associated with the other instruction, then the two memory accesses are assumed not to alias. Note that is the first element of the scope metadata is a string, then it can be combined accross functions and translation units. The string can be replaced by a self-reference to create globally unqiue scope identifiers. [Note: This overview is slightly stylized, since the metadata nodes really need to just be numbers (!0 instead of !scope0), and the scope lists are also global unnamed metadata.] Existing noalias metadata in a callee is "cloned" for use by the inlined code. This is necessary because the aliasing scopes are unique to each call site (because of possible control dependencies on the aliasing properties). For example, consider a function: foo(noalias a, noalias b) { a = b; } that gets inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } -- now just because we know that a1 does not alias with b1 at the first call site, and a2 does not alias with b2 at the second call site, we cannot let inlining these functons have the metadata imply that a1 does not alias with b2. llvm-svn: 213864	2014-07-24 14:25:39 +00:00
Hal Finkel	cc39b67530	AA metadata refactoring (introduce AAMDNodes) In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859	2014-07-24 12:16:19 +00:00
Mark Heffernan	9e112443b6	Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full). llvm-svn: 213789	2014-07-23 20:05:44 +00:00
Mark Heffernan	e6b4ba1c41	In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full". llvm-svn: 213772	2014-07-23 17:31:37 +00:00
Nick Lewycky	aba900c252	We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726	2014-07-23 06:24:49 +00:00
Duncan P. N. Exon Smith	6c99015fe2	Revert "[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges." This reverts commit r213474 (and r213475), which causes a miscompile on a stage2 LTO build. I'll reply on the list in a moment. llvm-svn: 213562	2014-07-21 17:06:51 +00:00
Gerolf Hoflehner	ae1ec299df	Fix for regression: [Bug 20369] wrong code at -O3 on x86_64-linux-gnu in 64-bit mode Prevents hoisting of loads above stores and sinking of stores below loads in MergedLoadStoreMotion.cpp (rdar://15991737) llvm-svn: 213497	2014-07-21 03:02:46 +00:00
Manuel Jacob	d11beffef4	[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474	2014-07-20 09:10:11 +00:00
Matt Arsenault	1b8d83796d	Templatify RegionInfo so it works on MachineBasicBlocks llvm-svn: 213456	2014-07-19 18:29:29 +00:00
NAKAMURA Takumi	ab184fb88d	MergedLoadStoreMotion.cpp: Fix msc17 build. Member initializer is unavailable. llvm-svn: 213448	2014-07-19 03:29:25 +00:00
Mark Heffernan	f3764da8ec	Fix build breakage introduced with r213412. llvm-svn: 213414	2014-07-18 21:29:41 +00:00
Mark Heffernan	053a68688a	Remove unroll pragma metadata after it is used. llvm-svn: 213412	2014-07-18 21:04:33 +00:00
Gerolf Hoflehner	f27ae6cdcf	MergedLoadStoreMotion pass Merges equivalent loads on both sides of a hammock/diamond and hoists into into the header. Merges equivalent stores on both sides of a hammock/diamond and sinks it to the footer. Can enable if conversion and tolerate better load misses and store operand latencies. llvm-svn: 213396	2014-07-18 19:13:09 +00:00
Jingyue Wu	0bdc027e31	Partially revert r210444 due to performance regression Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209	2014-07-16 23:25:00 +00:00
Nick Lewycky	703e488ed9	Don't eliminate memcpy's when the address of the pointer may itself be relevant. Fixes PR18304. Patch by David Wiberg! llvm-svn: 212970	2014-07-14 18:52:02 +00:00
Owen Anderson	a8d1c3e74e	Fix an issue with the MergeBasicBlockIntoOnlyPred() helper function where it did not properly handle the case where the predecessor block was the entry block to the function. The only in-tree client of this is JumpThreading, which worked around the issue in its own code. This patch moves the solution into the helper so that JumpThreading (and other clients) do not have to replicate the same fix everywhere. llvm-svn: 212875	2014-07-12 07:12:47 +00:00
Hal Finkel	511fea7acd	Feeding isSafeToSpeculativelyExecute its DataLayout pointer (in Sink) This is the one remaining place I see where passing isSafeToSpeculativelyExecute a DataLayout pointer might matter (at least for loads) -- I think I got the others in r212720. Most of the other remaining callers of isSafeToSpeculativelyExecute only use it for call sites (or otherwise exclude loads). llvm-svn: 212730	2014-07-10 16:07:11 +00:00
Hal Finkel	a995f92627	Feeding isSafeToSpeculativelyExecute its DataLayout pointer isSafeToSpeculativelyExecute can optionally take a DataLayout pointer. In the past, this was mainly used to make better decisions regarding divisions known not to trap, and so was not all that important for users concerned with "cheap" instructions. However, now it also helps look through bitcasts for dereferencable loads, and will also be important if/when we add a dereferencable pointer attribute. This is some initial work to feed a DataLayout pointer through to callers of isSafeToSpeculativelyExecute, generally where one was already available. llvm-svn: 212720	2014-07-10 14:41:31 +00:00
Hal Finkel	2e42c34d05	Allow isDereferenceablePointer to look through some bitcasts isDereferenceablePointer should not give up upon encountering any bitcast. If we're casting from a pointer to a larger type to a pointer to a small type, we can continue by examining the bitcast's operand. This missing capability was noted in a comment in the function. In order for this to work, isDereferenceablePointer now takes an optional DataLayout pointer (essentially all callers already had such a pointer available). Most code uses isDereferenceablePointer though isSafeToSpeculativelyExecute (which already took an optional DataLayout pointer), and to enable the LICM test case, LICM needs to actually provide its DL pointer to isSafeToSpeculativelyExecute (which it was not doing previously). llvm-svn: 212686	2014-07-10 05:27:53 +00:00
Rafael Espindola	adf21f2a56	Update the MemoryBuffer API to use ErrorOr. llvm-svn: 212405	2014-07-06 17:43:13 +00:00
Arnold Schwaighofer	ed988fb97d	GVN: Preserve invariant.load metadata If both instructions to be replaced are marked invariant the resulting instruction is invariant. rdar://13358910 Fix by Erik Eckstein! llvm-svn: 211801	2014-06-26 19:51:19 +00:00
Eli Bendersky	5d5e18da3e	Rename loop unrolling and loop vectorizer metadata to have a common prefix. [LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710	2014-06-25 15:41:00 +00:00
Evgeniy Stepanov	d99cca2c7a	Factor out part of LICM::sink into a helper function. llvm-svn: 211678	2014-06-25 09:17:21 +00:00
Evgeniy Stepanov	10280dac1d	[LICM] Don't create more than one copy of an instruction per loop exit block when sinking. Fixes exponential compilation complexity in PR19835, caused by LICM::sink not handling the following pattern well: f = op g e = op f, g d = op e c = op d, e b = op c a = op b, c When an instruction with N uses is sunk, each of its operands gets N new uses (all of them - phi nodes). In the example above, if a had 1 use, c would have 2, e would have 4, and g would have 8. llvm-svn: 211673	2014-06-25 07:54:58 +00:00
Dinesh Dwivedi	8bb5fb0661	Updated comments as suggested by Rafael. Thanks. llvm-svn: 211268	2014-06-19 14:11:53 +00:00
Dinesh Dwivedi	657105e582	Fixed jump threading going to infinite loop. This patch add code to remove unreachable blocks from function as they may cause jump threading to stuck in infinite loop. Differential Revision: http://reviews.llvm.org/D3991 llvm-svn: 211103	2014-06-17 14:34:19 +00:00
Duncan P. N. Exon Smith	73686d305a	SROA: Only split loads on byte boundaries r199771 accidently broke the logic that makes sure that SROA only splits load on byte boundaries. If such a split happens, some bits get lost when reassembling loads of wider types, causing data corruption. Move the width check up to reject such splits early, avoiding the corruption. Fixes PR19250. Patch by: Björn Steinbrink <bsteinbr@gmail.com> llvm-svn: 211082	2014-06-17 00:19:35 +00:00
Eli Bendersky	ff90324599	Teach LoopUnrollPass to respect loop unrolling hints in metadata. [This is resubmitting r210721, which was reverted due to suspected breakage which turned out to be unrelated]. Some extra review comments were addressed. See D4090 and D4147 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 211076	2014-06-16 23:53:02 +00:00
Nick Lewycky	b06a796051	Remove extra whitespace in function declaration. No functionality change. llvm-svn: 210965	2014-06-14 03:48:29 +00:00
Jiangning Liu	96e92c1d75	Move GlobalMerge from Transform to CodeGen. This patch is to move GlobalMerge pass from Transform/Scalar to CodeGen, because GlobalMerge depends on TargetMachine. In the mean time, the macro INITIALIZE_TM_PASS is also moved to CodeGen/Passes.h. With this fix we can avoid making libScalarOpts depend on libCodeGen. llvm-svn: 210951	2014-06-13 22:57:59 +00:00
Tim Northover	6bf04e4512	SCCP: update for cmpxchg returning { iN, i1 } now. I accidentally missed this one since its use looked OK locally. llvm-svn: 210909	2014-06-13 14:54:09 +00:00
Tim Northover	420a216817	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Rafael Espindola	db4ed0bdab	Remove 'using std::errro_code' from lib. llvm-svn: 210871	2014-06-13 02:24:39 +00:00
Rafael Espindola	3acea39853	Don't use 'using std::error_code' in include/llvm. This should make sure that most new uses use the std prefix. llvm-svn: 210835	2014-06-12 21:46:39 +00:00
Duncan P. N. Exon Smith	fd5c553f54	GVN: Enable value forwarding for calloc Enable value forwarding for loads from `calloc()` without an intervening store. This change extends GVN to handle the following case: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This load is trivially constant zero %3 = load i32* %2, align 4 This is analogous to the handling for `malloc()` in the same places. `malloc()` returns `undef`; `calloc()` returns a zero value. Note that it is correct to return zero even for out of bounds GEPs since the result of such a GEP would be undefined. Patch by Philip Reames! llvm-svn: 210828	2014-06-12 21:16:19 +00:00
Eli Bendersky	dc6de2ce29	Revert r210721 as it causes breakage in internal builds (and possibly GDB). llvm-svn: 210807	2014-06-12 18:05:39 +00:00
Eli Bendersky	899bef099f	Teach LoopUnrollPass to respect loop unrolling hints in metadata. See http://reviews.llvm.org/D4090 for more details. The Clang change that produces this metadata was committed in r210667 Patch by Mark Heffernan. llvm-svn: 210721	2014-06-11 23:15:35 +00:00
Jiangning Liu	d623c528c5	Create macro INITIALIZE_TM_PASS. Pass initialization requires to initialize TargetMachine for back-end specific passes. This commit creates a new macro INITIALIZE_TM_PASS to simplify this kind of initialization. llvm-svn: 210641	2014-06-11 07:04:37 +00:00
Jiangning Liu	b2ae37fb67	Global merge for global symbols. This commit is to improve global merge pass and support global symbol merge. The global symbol merge is not enabled by default. For aarch64, we need some more back-end fix to make it really benifit ADRP CSE. llvm-svn: 210640	2014-06-11 06:44:53 +00:00
Jiangning Liu	3e5b855a51	Rename global-merge to enable-global-merge. llvm-svn: 210639	2014-06-11 06:35:26 +00:00
Eric Christopher	2af33756c7	We already have a reference to the TargetMachine, use that. llvm-svn: 210580	2014-06-10 20:39:39 +00:00
Jingyue Wu	5c7b1aed5d	[SeparateConstOffsetFromGEP] inbounds zext => sext for better splitting For each array index that is in the form of zext(a), convert it to sext(a) if we can prove zext(a) <= max signed value of typeof(a). The conversion helps to split zext(x + y) into sext(x) + sext(y). Reviewed in http://reviews.llvm.org/D4060 llvm-svn: 210444	2014-06-08 23:49:34 +00:00
Jingyue Wu	01ceeb190d	[SeparateConstOffsetFromGEP] Fix an illegitimate optimization on zext zext(a + b) != zext(a) + zext(b) even if a + b >= 0 && b >= 0. e.g., a = i4 0b1111, b = i4 0b0001 zext a + b to i8 = zext 0b0000 to i8 = 0b00000000 (zext a to i8) + (zext b to i8) = 0b00001111 + 0b00000001 = 0b00010000 llvm-svn: 210439	2014-06-08 20:19:38 +00:00
Jingyue Wu	48a5abeec0	Refactor canonicalizing array indices to a helper function No functionality changes. llvm-svn: 210438	2014-06-08 20:15:45 +00:00
Jingyue Wu	84465473e7	Fixed several correctness issues in SeparateConstOffsetFromGEP Most issues are on mishandling s/zext. Fixes: 1. When rebuilding new indices, s/zext should be distributed to sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not sext(a + b) + 5. This also affects the logic of recursively looking for a constant offset, we need to include s/zext into the context of the searching. 2. Function find should return the bitwidth of the constant offset instead of always sign-extending it to i64. 3. Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP indices to pointer-size before computing the address. Therefore, gep base, zext(a + b) != gep base, a + b Improvements: 1. Add an optimization for splitting sext(a + b): if a + b is proven non-negative (e.g., used as an index of an inbound GEP) and one of a, b is non-negative, sext(a + b) = sext(a) + sext(b) 2. Function Distributable checks whether both sext and zext can be distributed to operands of a binary operator. This helps us split zext(sext(a + b)) to zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow. Refactoring: Merge some common logic of handling add/sub/or in find. Testing: Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes we made. llvm-svn: 210291	2014-06-05 22:07:33 +00:00
Benjamin Kramer	4968944376	[Reassociate] Similar to "X + -X" -> "0", added code to handle "X + ~X" -> "-1". Handle "X + ~X" -> "-1" in the function Value Reassociate::OptimizeAdd(Instruction I, SmallVectorImpl<ValueEntry> &Ops); This patch implements: TODO: We could handle "X + ~X" -> "-1" if we wanted, since "-X = ~X+1". Patch by Rahul Jain! Differential Revision: http://reviews.llvm.org/D3835 llvm-svn: 209973	2014-05-31 15:01:54 +00:00
Michael J. Spencer	289067cc3d	Add LoadCombine pass. This pass is disabled by default. Use -combine-loads to enable in -O[1-3] Differential revision: http://reviews.llvm.org/D3580 llvm-svn: 209791	2014-05-29 01:55:07 +00:00
Jingyue Wu	80a738dc62	Distribute sext/zext to the operands of and/or/xor This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can extract a constant offset from "s/zext and/or/xor A, B". Added a new test @ext_or to verify this enhancement. Refactoring the code, I also extracted some common logic to function Distributable. llvm-svn: 209670	2014-05-27 18:00:00 +00:00
Owen Anderson	115aa160e6	Make the LoopRotate pass's maximum header size configurable both programmatically and via the command line, mirroring similar functionality in LoopUnroll. In situations where clients used custom unrolling thresholds, their intent could previously be foiled by LoopRotate having a hardcoded threshold. llvm-svn: 209617	2014-05-26 08:58:51 +00:00
Jingyue Wu	bbb6e4a885	Add the extracted constant offset using GEP Fixed a TODO in r207783. Add the extracted constant offset using GEP instead of ugly ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR easier to understand. Updated all affected tests, and added a new test in split-gep.ll to cover a corner case where emitting uglygep is necessary. llvm-svn: 209537	2014-05-23 18:39:40 +00:00
Diego Novillo	7f8af8bf91	Add support for missed and analysis optimization remarks. Summary: This adds two new diagnostics: -pass-remarks-missed and -pass-remarks-analysis. They take the same values as -pass-remarks but are intended to be triggered in different contexts. -pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed, which passes call when they tried to apply a transformation but couldn't. -pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis, which passes call when they want to inform the user about analysis results. The patch also: 1- Adds support in the inliner for the two new remarks and a test case. 2- Moves emitOptimizationRemark* functions to the llvm namespace. 3- Adds an LLVMContext argument instead of making them member functions of LLVMContext. Reviewers: qcolombet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3682 llvm-svn: 209442	2014-05-22 14:19:46 +00:00
Quentin Colombet	c88baa5c10	[LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN. This commit introduces a canonical representation for the formulae. Basically, as soon as a formula has more that one base register, the scaled register field is used for one of them. The register put into the scaled register is preferably a loop variant. The commit refactors how the formulae are built in order to produce such representation. This yields a more accurate, but still perfectible, cost model. <rdar://problem/16731508> llvm-svn: 209230	2014-05-20 19:25:04 +00:00
Matt Arsenault	04b67ceeeb	Use range for llvm-svn: 209147	2014-05-19 17:52:48 +00:00
Rafael Espindola	5a52b9f139	Revert "Implement global merge optimization for global variables." This reverts commit r208934. The patch depends on aliases to GEPs with non zero offsets. That is not supported and fairly broken. The good news is that GlobalAlias is being redesigned and will have support for offsets, so this patch should be a nice match for it. llvm-svn: 208978	2014-05-16 13:02:18 +00:00
Jiangning Liu	932e1c3924	Implement global merge optimization for global variables. This commit implements two command line switches -global-merge-on-external and -global-merge-aligned, and both of them are false by default, so this optimization is disabled by default for all targets. For ARM64, some back-end behaviors need to be tuned to get this optimization further enabled. llvm-svn: 208934	2014-05-15 23:45:42 +00:00
Alp Toker	beaca19c7c	Fix typos llvm-svn: 208839	2014-05-15 01:52:21 +00:00
Jay Foad	a0653a3e6c	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Benjamin Kramer	3f085ba6d6	GVN: Fix non-determinism in map iteration. Iterating over a DenseMaop is non-deterministic and results to unpredictable IR output. Based on a patch by Daniel Reynaud! llvm-svn: 208728	2014-05-13 21:06:40 +00:00
Benjamin Kramer	d97f95e2f2	GVN: rangify a couple of loops. No functionality change. llvm-svn: 208727	2014-05-13 21:06:36 +00:00
Nick Lewycky	54a824b758	Improve wording to make it sounds more like a change than an analysis. llvm-svn: 208370	2014-05-08 23:04:46 +00:00
Richard Smith	c45f3f7433	Simplify and fix incorrect comment. No functionality change. llvm-svn: 208272	2014-05-08 01:08:43 +00:00
Nick Lewycky	7185b5d60c	Detabify. llvm-svn: 208019	2014-05-06 00:46:20 +00:00
Nick Lewycky	5ef6bc8815	Improve 'tail' call marking in TRE. A bootstrap of clang goes from 375k calls marked tail in the IR to 470k, however this improvement does not carry into an improvement of the call/jmp ratio on x86. The most common pattern is a tail call + br to a block with nothing but a 'ret'. The number of tail call to loop conversions remains the same (1618 by my count). The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly. llvm-svn: 208017	2014-05-05 23:59:03 +00:00
Benjamin Kramer	9130cb8547	LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling. Otherwise we use the same threshold as for complete unrolling, which is way too high. This made us unroll any loop smaller than 150 instructions by 8 times, but only if someone specified -march=core2 or better, which happens to be the default on darwin. llvm-svn: 207940	2014-05-04 19:12:38 +00:00
Akira Hatanaka	f76388dd7e	[GVN] Pass the phi-translated address of a load instead of the untranslated address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where PRE is applied to a load that is not partially redundant. <rdar://problem/16638765>. llvm-svn: 207853	2014-05-02 17:59:17 +00:00
Benjamin Kramer	cd1a98bf74	Update and sort CMakeLists. llvm-svn: 207785	2014-05-01 18:59:11 +00:00
Eli Bendersky	a108a65df2	Add an optimization that does CSE in a group of similar GEPs. This optimization merges the common part of a group of GEPs, so we can compute each pointer address by adding a simple offset to the common part. The optimization is currently only enabled for the NVPTX backend, where it has a large payoff on some benchmarks. Review: http://reviews.llvm.org/D3462 Patch by Jingyue Wu. llvm-svn: 207783	2014-05-01 18:38:36 +00:00
NAKAMURA Takumi	99aa6e156a	ConstantHoisting.cpp: Add <tuple> for std::tie, since r207593 removed FileSystem.h, it includes <tuple>. llvm-svn: 207614	2014-04-30 06:44:50 +00:00
Jim Grosbach	708f80f783	Tidy up. llvm-svn: 207585	2014-04-29 22:41:58 +00:00
Jim Grosbach	4a7d496059	Spelling. llvm-svn: 207584	2014-04-29 22:41:55 +00:00
Adam Nemet	deab6f945c	Reapply r207271 without the testcase PR19608 was filed to find a suitable testcase. llvm-svn: 207569	2014-04-29 18:25:28 +00:00
Chandler Carruth	c71b2c3c7f	Revert r207271 for now. This commit introduced a test case that ran clang directly from the LLVM test suite! That doesn't work. I've followed up on the review thread to try and get a viable solution sorted out, but trying to get the tree clean here. llvm-svn: 207462	2014-04-28 23:07:49 +00:00

... 5 6 7 8 9 ...

6659 Commits