llvm-project

Commit Graph

Author	SHA1	Message	Date
David Majnemer	135ca40a7d	[InstCombine] Don't divide by zero when evaluating a potential transform Trivial multiplication by zero may survive the worklist. We tried to reassociate the multiplication with a division instruction, causing us to divide by zero; bail out instead. This fixes PR24726. llvm-svn: 246939	2015-09-06 06:49:59 +00:00
David Majnemer	daa24b9789	[InstCombine] Don't assume m_Mul gives back an Instruction This fixes PR24713. llvm-svn: 246933	2015-09-05 20:44:56 +00:00
Andrew Kaylor	a89baa21b8	Fixing bad test syntax. llvm-svn: 246897	2015-09-04 23:47:34 +00:00
Andrew Kaylor	50e4e86c26	[WinEH] Teach SimplfyCFG to eliminate empty cleanup pads. Differential Revision: http://reviews.llvm.org/D12434 llvm-svn: 246896	2015-09-04 23:39:40 +00:00
Silviu Baranga	44077da1b7	Simplify testcase added in r246759. NFC llvm-svn: 246848	2015-09-04 11:37:20 +00:00
Hal Finkel	4a7be23976	[PowerPC] Enable interleaved-access vectorization This adds a basic cost model for interleaved-access vectorization (and a better default for shuffles), and enables interleaved-access vectorization by default. The relevant difference from the default cost model for interleaved-access vectorization, is that on PPC, the shuffles that end up being used are much cheaper than modeling the process with insert/extract pairs (which are quite expensive, especially on older cores). llvm-svn: 246824	2015-09-04 00:10:41 +00:00
Hal Finkel	75afa2b6b6	[PowerPC] Always use aggressive interleaving on the A2 On the A2, with an eye toward QPX unaligned-load merging, we should always use aggressive interleaving. It is generally superior to only using concatenation unrolling. llvm-svn: 246819	2015-09-03 23:23:00 +00:00
Silviu Baranga	d0f83d15a3	Fix IRBuilder CreateBitOrPointerCast for vector types Summary: This function was not taking into account that the input type could be a vector, and wasn't properly working for vector types. This caused an assert when building spec2k6 perlbmk for armv8. Reviewers: rengolin, mzolotukhin Subscribers: silviu.baranga, mzolotukhin, rengolin, eugenis, jmolloy, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D12559 llvm-svn: 246759	2015-09-03 11:36:39 +00:00
Joseph Tremoulet	9ce71f76b9	[WinEH] Add cleanupendpad instruction Summary: Add a `cleanupendpad` instruction, used to mark exceptional exits out of cleanups (for languages/targets that can abort a cleanup with another exception). The `cleanupendpad` instruction is similar to the `catchendpad` instruction in that it is an EH pad which is the target of unwind edges in the handler and which itself has an unwind edge to the next EH action. The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad` argument indicating which cleanup it exits. The unwind successors of a `cleanuppad`'s `cleanupendpad`s must agree with each other and with its `cleanupret`s. Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12433 llvm-svn: 246751	2015-09-03 09:09:43 +00:00
Philip Reames	dab35f317d	[RewriteStatepointsForGC] Improve debug output [NFC] llvm-svn: 246713	2015-09-02 21:11:44 +00:00
Piotr Padlewski	0c7d8fc1f6	assuem(X) handling in GVN bugfix There was infinite loop because it was trying to change assume(true) into assume(true) Also added handling when assume(false) appear http://reviews.llvm.org/D12516 llvm-svn: 246697	2015-09-02 20:00:03 +00:00
Piotr Padlewski	28ffcbe1cc	Constant propagation after hitting assume(cmp) bugfix Last time code run into assertion `BBE.isSingleEdge()` in lib/IR/Dominators.cpp:200. http://reviews.llvm.org/D12170 llvm-svn: 246696	2015-09-02 19:59:59 +00:00
Piotr Padlewski	14e815c22b	Constant propagation after hiting llvm.assume After hitting @llvm.assume(X) we can: - propagate equality that X == true - if X is icmp/fcmp (with eq operation), and one of operand is constant we can change all variables with constants in the same BasicBlock http://reviews.llvm.org/D11918 llvm-svn: 246695	2015-09-02 19:59:53 +00:00
Chad Rosier	b684e381c9	Add newline to test. NFC. llvm-svn: 246653	2015-09-02 14:06:16 +00:00
James Molloy	1e583704f5	[LV] Don't bail to MiddleBlock if a runtime check fails, bail to ScalarPH instead We were bailing to two places if our runtime checks failed. If the initial overflow check failed, we'd go to ScalarPH. If any other check failed, we'd go to MiddleBlock. This caused us to have to have an extra PHI per induction and reduction as the vector loop's exit block was not dominated by its latch. There's no need to have this behavior - if we just always go to ScalarPH we can get rid of a bunch of complexity. llvm-svn: 246637	2015-09-02 10:15:39 +00:00
James Molloy	cba9230507	[LV] Refactor all runtime check emissions into helper functions. This reduces the complexity of createEmptyBlock() and will open the door to further refactoring. The test change is simply because we're now constant folding a trivial test. llvm-svn: 246634	2015-09-02 10:15:22 +00:00
James Molloy	ff623dce39	[LV] Pull creation of trip counts into a helper function. ... and do a tad of tidyup while we're at it. Because StartIdx must now be zero, there's no difference between Count and EndIdx. llvm-svn: 246633	2015-09-02 10:15:16 +00:00
James Molloy	a860a2216a	[LV] Never widen an induction variable. There's no need to widen canonical induction variables. It's just as efficient to create a new, wide, induction variable. Consider, if we widen an indvar, then we'll have to truncate it before its uses anyway (1 trunc). If we create a new indvar instead, we'll have to truncate that instead (1 trunc) [besides which IndVars should go and clean up our mess after us anyway on principle]. This lets us remove a ton of special-casing code. llvm-svn: 246631	2015-09-02 10:15:05 +00:00
James Molloy	c07701b017	[LV] Switch to using canonical induction variables. Vectorized loops only ever have one induction variable. All induction PHIs from the scalar loop are rewritten to be in terms of this single indvar. We were trying very hard to pick an indvar that already existed, even if that indvar wasn't canonical (didn't start at zero). But trying so hard is really fruitless - creating a new, canonical, indvar only results in one extra add in the worst case and that add is trivially easy to push through the PHI out of the loop by instcombine. If we try and be less clever here and instead let instcombine clean up our mess (as we do in many other places in LV), we can remove unneeded complexity. llvm-svn: 246630	2015-09-02 10:14:54 +00:00
Hans Wennborg	dada1d20ba	DeadArgElim: don't eliminate arguments from naked functions Differential Revision: http://reviews.llvm.org/D12534 llvm-svn: 246564	2015-09-01 18:06:46 +00:00
Silviu Baranga	755ec0e027	[AArch64] Turn on by default interleaved access vectorization Summary: This change turns on by default interleaved access vectorization for AArch64. We also clean up some tests which were spedifically enabling this behaviour. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12149 llvm-svn: 246542	2015-09-01 11:26:46 +00:00
Silviu Baranga	e748c9ef55	[ARM] Turn on by default interleaved access vectorization Summary: This change turns on by default interleaved access vectorization on ARM, as it has shown to be beneficial on ARM. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12146 llvm-svn: 246541	2015-09-01 11:19:15 +00:00
Hans Wennborg	4a61370b8f	Fix CHECK directives that weren't checking. llvm-svn: 246485	2015-08-31 21:10:35 +00:00
Philip Reames	a88caeab6c	[FunctionAttr] Infer nonnull attributes on returns Teach FunctionAttr to infer the nonnull attribute on return values of functions which never return a potentially null value. This is done both via a conservative local analysis for the function itself and a optimistic per-SCC analysis. If no function in the SCC returns anything which could be null (other than values from other functions in the SCC), we can conclude no function returned a null pointer. Even if some function within the SCC returns a null pointer, we may be able to locally conclude that some don't. Differential Revision: http://reviews.llvm.org/D9688 llvm-svn: 246476	2015-08-31 19:44:38 +00:00
Philip Reames	bb11d62a5a	[LazyValueInfo] Look through Phi nodes when trying to prove a predicate If asked to prove a predicate about a value produced by a PHI node, LazyValueInfo was unable to do so even if the predicate was known to be true for each input to the PHI. This prevented JumpThreading from eliminating a provably redundant branch. The problematic test case looks something like this: ListNode *p = ...; while (p != null) { if (!p) return; x = g->x; // unrelated p = p->next } The null check at the top of the loop is redundant since the value of 'p' is null checked on entry to the loop and before executing the backedge. This resulted in us a) executing an extra null check per iteration and b) not being able to LICM unrelated loads after the check since we couldn't prove they would execute or that their dereferenceability wasn't effected by the null check on the first iteration. Differential Revision: http://reviews.llvm.org/D12383 llvm-svn: 246465	2015-08-31 18:31:48 +00:00
Jingyue Wu	e84f671830	[JumpThreading] make jump threading respect convergent annotation. Summary: JumpThreading shouldn't duplicate a convergent call, because that would move a convergent call into a control-inequivalent location. For example, if (cond) { ... } else { ... } convergent_call(); if (cond) { ... } else { ... } should not be optimized to if (cond) { ... convergent_call(); ... } else { ... convergent_call(); ... } Test Plan: test/Transforms/JumpThreading/basic.ll Patch by Xuetian Weng. Reviewers: resistor, arsenm, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12484 llvm-svn: 246415	2015-08-31 06:10:27 +00:00
Duncan P. N. Exon Smith	0bd73bb58b	DI: Update tests before adding !dbg subprogram attachments I'm working on adding !dbg attachments to functions (PR23367), which we'll use to determine the canonical subprogram for a function (instead of the `subprograms:` array in the compile units). This updates a few old tests in preparation. Transforms/Mem2Reg/ConvertDebugInfo2.ll had an old-style grep+count based test that would start to fail because I've added an extra line with `!dbg`. Instead, explicitly `CHECK` for what I think the test actually cares about. All three testcases have subprograms with a valid `function:` reference -- which means my upgrade script will add a `!dbg` attachment -- but that aren't referenced from any compile unit. I suspect these testcases were handreduced over-zealously (or have bitrotted?). Add a reference from the compile unit so that upcoming Verifier checks won't fail here. llvm-svn: 246351	2015-08-28 23:32:00 +00:00
David Majnemer	0a92f86fe6	Revert r246232 and r246304. This reverts isSafeToSpeculativelyExecute's use of ReadNone until we split ReadNone into two pieces: one attribute which reasons about how the function reasons about memory and another attribute which determines how it may be speculated, CSE'd, trap, etc. llvm-svn: 246331	2015-08-28 21:13:39 +00:00
Duncan P. N. Exon Smith	814b8e91c7	DI: Require subprogram definitions to be distinct As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327	2015-08-28 20:26:49 +00:00
Sanjoy Das	6f5dca70ed	[InstCombine] Fix PR24605. PR24605 is caused due to an incorrect insert point in instcombine's IR builder. When simplifying %t = add X Y ... %m = icmp ... %t the replacement for %t should be placed before %t, not before %m, as there could be a use of %t between %t and %m. llvm-svn: 246315	2015-08-28 19:09:31 +00:00
Chad Rosier	dc65532fd9	Optimize memcmp(x,y,n)==0 for small n and suitably aligned x/y. http://reviews.llvm.org/D6952 PR20673 llvm-svn: 246313	2015-08-28 18:30:18 +00:00
Vedant Kumar	3171cabb34	[test] (NFC) Simplify Transforms/ConstProp/calls.ll Differential Revision: http://reviews.llvm.org/D12421 llvm-svn: 246312	2015-08-28 18:04:20 +00:00
JF Bastien	f5aa1ca655	Remove Merge Functions pointer comparisons Summary: This patch removes two remaining places where pointer value comparisons are used to order functions: comparing range annotation metadata, and comparing block address constants. (These are both rare cases, and so no actual non-determinism was observed from either case). The fix for range metadata is simple: the annotation always consists of a pair of integers, so we just order by those integers. The fix for block addresses is more subtle. Two constants are the same if they are the same basic block in the same function, or if they refer to corresponding basic blocks in each respective function. Note that in the first case, merging is trivially correct. In the second, the correctness of merging relies on the fact that the the values of block addresses cannot be compared. This change is actually an enhancement, as these functions could not previously be merged (see merge-block-address.ll). There is still a problem with cross function block addresses, in that constants pointing to a basic block in a merged function is not updated. This also more robustly compares floating point constants by all fields of their semantics, and fixes a dyn_cast/cast mixup. Author: jrkoenig Reviewers: dschuff, nlewycky, jfb Subscribers llvm-commits Differential revision: http://reviews.llvm.org/D12376 llvm-svn: 246305	2015-08-28 16:49:09 +00:00
Chandler Carruth	4b682f6f24	[SROA] Fix PR24463, a crash I introduced in SROA by allowing it to handle more allocas with loads past the end of the alloca. I suspect there are some related crashers with slightly different patterns, but I'll fix those and add test cases as I find them. Thanks to David Majnemer for the excellent test case reduction here. Made this super simple to debug and fix. llvm-svn: 246289	2015-08-28 09:03:52 +00:00
Steven Wu	61db34d12e	Revert r246244 and r246243 These two commits cause clang/llvm bootstrap to hang. llvm-svn: 246279	2015-08-28 06:52:00 +00:00
Piotr Padlewski	3f81ec1e38	Constant propagation after hitting assume(cmp) bugfix Last time code run into assertion `BBE.isSingleEdge()` in lib/IR/Dominators.cpp:200. http://reviews.llvm.org/D12170 llvm-svn: 246244	2015-08-28 01:02:00 +00:00
Piotr Padlewski	63cc5d4627	Constant propagation after hiting llvm.assume After hitting @llvm.assume(X) we can: - propagate equality that X == true - if X is icmp/fcmp (with eq operation), and one of operand is constant we can change all variables with constants in the same BasicBlock http://reviews.llvm.org/D11918 llvm-svn: 246243	2015-08-28 01:01:57 +00:00
David Majnemer	0293704be2	[ValueTracking] readnone CallInsts are fair game for speculation Any call which is side effect free is trivially OK to speculate. We already had similar logic in EarlyCSE and GVN but we were missing it from isSafeToSpeculativelyExecute. This fixes PR24601. llvm-svn: 246232	2015-08-27 23:03:01 +00:00
Tyler Nowicki	8f88546575	Fix test introduced in r246187 that failed on some systems. llvm-svn: 246207	2015-08-27 20:43:29 +00:00
Erik Schnetter	5e93e28d8b	Enable constant propagation for more math functions Constant propagation for single precision math functions (such as tanf) is already working, but was not enabled. This patch enables these for many single-precision functions, and adds respective test cases. Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf llvm-svn: 246194	2015-08-27 19:56:57 +00:00
Erik Schnetter	ed6eab32b3	Revert 246186; still breaks on some systems llvm-svn: 246191	2015-08-27 19:34:14 +00:00
Tyler Nowicki	5eaa5a9d26	Improve vectorization diagnostic messages and extend vectorize(enable) pragma. This patch changes the analysis diagnostics produced when loops with floating-point recurrences or memory operations are identified. The new messages say "cannot prove it is safe to reorder * operations; allow reordering by specifying #pragma clang loop vectorize(enable)". Depending on the type of diagnostic the message will include additional options such as ffast-math or __restrict__. This patch also allows the vectorize(enable) pragma to override the low pointer memory check threshold. When the hint is given a higher threshold is used. See the clang patch for the options produced for each diagnostic. llvm-svn: 246187	2015-08-27 18:56:49 +00:00
Erik Schnetter	05845d31c9	Enable constant propagation for more math functions Constant propagation for single precision math functions (such as tanf) is already working, but was not enabled. This patch enables these for many single-precision functions, and adds respective test cases. Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf llvm-svn: 246186	2015-08-27 18:56:23 +00:00
Erik Schnetter	a23672626d	Revert r246158 since it breaks LLVM.Transforms/ConstProp.calls.ll llvm-svn: 246166	2015-08-27 17:24:01 +00:00
Erik Schnetter	694bf5c9b5	Enable constant propagation for more math functions Constant propagation for single precision math functions (such as tanf) is already working, but was not enabled. This patch enables these for many single-precision functions, and adds respective test cases. Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf llvm-svn: 246158	2015-08-27 16:36:37 +00:00
Chad Rosier	dc8c48924a	[LoopVectorize] Move test from r246149 into a target-specific folder to appease bots. llvm-svn: 246154	2015-08-27 15:24:47 +00:00
Chad Rosier	c94f8e2906	[LoopVectorize] Add Support for Small Size Reductions. Unlike scalar operations, we can perform vector operations on element types that are smaller than the native integer types. We type-promote scalar operations if they are smaller than a native type (e.g., i8 arithmetic is promoted to i32 arithmetic on Arm targets). This patch detects and removes type-promotions within the reduction detection framework, enabling the vectorization of small size reductions. In the legality phase, we look through the ANDs and extensions that InstCombine creates during promotion, keeping track of the smaller type. In the profitability phase, we use the smaller type and ignore the ANDs and extensions in the cost model. Finally, in the code generation phase, we truncate the result of the reduction to allow InstCombine to rewrite the entire expression in the smaller type. This fixes PR21369. http://reviews.llvm.org/D12202 Patch by Matt Simpson <mssimpso@codeaurora.org>! llvm-svn: 246149	2015-08-27 14:12:17 +00:00
Pete Cooper	6b716218fa	isKnownNonNull needs to consider globals in non-zero address spaces. Globals in address spaces other than one may have 0 as a valid address, so we should not assume that they can be null. Reviewed by Philip Reames. llvm-svn: 246137	2015-08-27 03:16:29 +00:00
Philip Reames	dfd890dd3a	Allow value forwarding past release fences in EarlyCSE A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence. We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and then eliminate the former, but we can't do this if the stores are on opposite sides of the fence. Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences. Differential Revision: http://reviews.llvm.org/D11434 llvm-svn: 246134	2015-08-27 01:32:33 +00:00
Philip Reames	abcdc5e3a8	[RewriteStatepointsForGC] Reduce the number of new instructions for base pointers When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints. I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead. Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them. Differential Revision: http://reviews.llvm.org/D12004 llvm-svn: 246133	2015-08-27 01:02:28 +00:00
Tyler Nowicki	e0f400feaa	Improved printing of analysis diagnostics in the loop vectorizer. This patch ensures that every analysis diagnostic produced by the vectorizer will be printed if the loop has a vectorization hint on it. The condition has also been improved to prevent printing when a disabling hint is specified. llvm-svn: 246132	2015-08-27 01:02:04 +00:00
Philip Reames	98a2dabc08	[SimplifyCFG] Prune code from a provably unreachable switch default As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine. Note: Duplicate case values are disallowed by the LangRef and verifier. Differential Revision: http://reviews.llvm.org/D11995 llvm-svn: 246125	2015-08-26 23:56:46 +00:00
Chandler Carruth	748d095ff0	[SROA] Rip out all support for SSAUpdater in SROA. This was only added to preserve the old ScalarRepl's use of SSAUpdater which was originally to avoid use of dominance frontiers. Now, we only need a domtree, and we'll need a domtree right after this pass as well and so it makes perfect sense to always and only use the dom-tree powered mem2reg. This was flag-flipper earlier and has stuck reasonably so I wanted to gut the now-dead code out of SROA before we waste more time with it. Among other things, this will make passmanager porting easier. llvm-svn: 246028	2015-08-26 09:09:29 +00:00
JF Bastien	9dc042a0b6	Comparing operands should not require the same ValueID Summary: When comparing basic blocks, there is an additional check that two Value*'s should have the same ID, which interferes with merging equivalent constants of different kinds (such as a ConstantInt and a ConstantPointerNull in the included testcase). The cmpValues function already ensures that the two values in each function are the same, so removing this check should not cause incorrect merging. Also, the type comparison is redundant, based on reviewing the code and testing on the test suite and several large LTO bitcodes. Author: jrkoenig Reviewers: nlewycky, jfb, dschuff Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D12302 llvm-svn: 246001	2015-08-26 03:02:58 +00:00
Wei Mi	edae87d819	The patch replace the overflow check in loop vectorization with the minimum loop iterations check. The loop minimum iterations check below ensures the loop has enough trip count so the generated vector loop will likely be executed, and it covers the overflow check. Differential Revision: http://reviews.llvm.org/D12107. llvm-svn: 245952	2015-08-25 16:43:47 +00:00
Piotr Padlewski	9b33e28270	assume.ll test fixup llvm-svn: 245920	2015-08-25 01:48:49 +00:00
Piotr Padlewski	4e7f752bb8	Assume intrinsic handling in global opt It doesn't solve the problem, when for example we load something, and then assume that it is the same as some constant value, because globalopt will fail on unknown load instruction. The proposed solution would be to skip some instructions that we can't evaluate and they are safe to skip (f.e. load, assume and many others) and see if they are required to perform optimization (f.e. we don't care about ephemeral instructions that may appear using @llvm.assume()) http://reviews.llvm.org/D12266 llvm-svn: 245919	2015-08-25 01:34:15 +00:00
David Blaikie	0732e16e69	Update test case so it passes the verifier Some debug info was drastically out of date, from the days where we used to emit a list of length one (with a single null entry) rather than an empty list (or, more recently, no list at all) for list fields that have no elements. llvm-svn: 245796	2015-08-22 22:38:44 +00:00
JF Bastien	057292a76c	Improve the determinism of MergeFunctions Summary: Merge functions previously relied on unsigned comparisons of pointer values to order functions. This caused observable non-determinism in the compiler for large bitcode programs. Basically, opt -mergefuncs program.bc \| md5sum produces different hashes when run repeatedly on the same machine. Differing output was observed on three large bitcodes, but it was less frequent on the smallest file. It is possible that this only manifests on the large inputs, hence remaining undetected until now. This patch fixes this by removing (almost, see below) all places where comparisons between pointers are used to order functions. Most of these changes are local, but the comparison of global values requires assigning an identifier to each local in the order it is visited. This is very similar to the way the comparison function identifies Value's defined within a function. Because the order of visiting the functions and their subparts is deterministic, the identifiers assigned to the globals will be as well, and the order of functions will be deterministic. With these changes, there is no more observed non-determinism. There is also only minor slowdowns (negligible to 4%) compared to the baseline, which is likely a result of the fact that global comparisons involve hash lookups and not just pointer comparisons. The one caveat so far is that programs containing BlockAddress constants can still be non-deterministic. It is not clear what the right solution is here. In particular, even if the global numbers are used to order by function, we still need a way to order the BasicBlock's. Unfortunately, we cannot just bail out and fail to order the functions or consider them equal, because we require a total order over functions. Note that programs with BlockAddress constants are relatively rare, so the impact of leaving this in is minor as long as this pass is opt-in. Author: jrkoenig Reviewers: nlewycky, jfb, dschuff Subscribers: jevinskie, llvm-commits, chapuni Differential revision: http://reviews.llvm.org/D12168 llvm-svn: 245762	2015-08-21 23:27:24 +00:00
Adam Nemet	4e533ef7a9	[LAA] Hold bounds via ValueHandles during SCEV expansion SCEV expansion can invalidate previously expanded values. For example in SCEVExpander::ReuseOrCreateCast, if we already have the requested cast value but it's not at the desired location, a new cast is inserted and the old cast will be invalidated. Therefore, when expanding the bounds for the pointers, a later entry can invalidate the IR value for an earlier one. The fix is to store a value handle rather than the value itself. The newly added test has a more detailed description of how the bug triggers. This bug can have a negative but potentially highly variable performance impact in Loop Distribution. Because one of the bound values was invalidated and is an undef expression now, InstCombine is free to transform the array overlap check: Start0 <= End1 && Start1 <= End0 into: Start0 <= End1 So depending on the runtime location of the arrays, we would detect a conflict and fall back on the original loop of the versioned loop. Also tested compile time with SPEC2006 LTO bc files. llvm-svn: 245760	2015-08-21 23:19:57 +00:00
Sanjoy Das	c86c162a58	Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0" The original checkin was buggy, this change has a fix. Original commit message: [InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245753	2015-08-21 22:22:37 +00:00
Simon Pilgrim	76b91d9084	Line endings fix. llvm-svn: 245736	2015-08-21 21:09:51 +00:00
NAKAMURA Takumi	6a6232818d	Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0" It caused miscompilation in clang. llvm-svn: 245678	2015-08-21 07:46:07 +00:00
Michael Zolotukhin	6002295c6a	[SLP] Add one more test case for propagating 'nontemporal' attributes. llvm-svn: 245644	2015-08-21 00:08:39 +00:00
David Majnemer	2df38cd0c4	[InstSimplify] add nuw %x, C2 must be at least C2 Use the fact that add nuw always creates a larger bit pattern when trying to simplify comparisons. llvm-svn: 245638	2015-08-20 23:01:41 +00:00
Sanjoy Das	e472d8a57a	[InstCombine] Transform A & (L - 1) u< L --> L != 0 Summary: This transform is never a pessimization at the IR level (since it replaces an `icmp` with another), and has potentiall payoffs: 1. It may make the `icmp` fold away or become loop invariant. 2. It may make the `A & (L - 1)` computation dead. This shows up in Java, in range checks generated by array accesses of the form `a[i & (a.length - 1)]`. Reviewers: reames, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12210 llvm-svn: 245635	2015-08-20 22:31:55 +00:00
Michael Zolotukhin	51b00e6d82	[SLP] Propagate 'nontemporal' attribute into vectorized instructions. llvm-svn: 245633	2015-08-20 22:28:15 +00:00
Michael Zolotukhin	2a3d99fedf	[LoopVectorize] Propagate 'nontemporal' attribute into vectorized instructions. llvm-svn: 245632	2015-08-20 22:27:38 +00:00
Jingyue Wu	ca3ef11a9b	[NVPTX] truncating 64-bit to 32-bit is free Summary: Add an LSR test that exercises isTruncateFree. Without this change, LSR creates another indvar representing the truncated value. Reviewers: jholewinski, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D12058 llvm-svn: 245611	2015-08-20 20:59:02 +00:00
Adrian Prantl	baf90fc265	Fix a bug that caused SimplifyCFG to drop DebugLocs. Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all metadata in KnownSet, but the condition for DebugLocs was inverted. Most users of dropUnknownMetadata() actually worked around this by not adding LLVMContext::MD_dbg to their list of KnowIDs. This is now made explicit. llvm-svn: 245589	2015-08-20 18:24:02 +00:00
Balaram Makam	ccf59731e3	Optimize bitwise even/odd test (-x&1 -> x&1) to not use negation. Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd. Reviewers: majnemer Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D12156 llvm-svn: 245569	2015-08-20 15:35:00 +00:00
Bjorn Steinbrink	2e2f66557e	Revert "[DSE] Enable removal of lifetime intrinsics in terminating blocks" llvm-svn: 245543	2015-08-20 08:58:47 +00:00
Bjorn Steinbrink	cc7e8a9705	[DSE] Enable removal of lifetime intrinsics in terminating blocks Usually DSE is not supposed to remove lifetime intrinsics, but it's actually ok to remove them for dead objects in terminating blocks, because they convey no extra information there. Until we hit a lifetime start that cannot be removed, that is. Because from that point on the lifetime intrinsics become interesting again, e.g. for stack coloring. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11710 llvm-svn: 245542	2015-08-20 08:25:28 +00:00
Eric Christopher	0efe9f60bb	Revert "Fix PR24469 resulting from r245025 and re-enable dead store elimination across basicblocks." This is causing bootstrap problems, e.g.: http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/2960 This reverts r245195. llvm-svn: 245402	2015-08-19 02:15:13 +00:00
Hal Finkel	a8d205f145	Make ScalarEvolution::isKnownPredicate a little smarter Here we make ScalarEvolution::isKnownPredicate, indirectly, a little smarter. Given some relational comparison operator OP, and two AddRec SCEVs, {I,+,S} OP {J,+,T}, we can reduce this to the comparison I OP J when S == T, both AddRecs are for the same loop, and both are known not to wrap. As it turns out, because of the way that backedge-guard expressions can be leveraged when computing known predicates, this allows indvars to simplify the if-statement comparison in this loop: void foo (int a, int b, int n) { for (int i = 0; i < n; ++i) { if (i > n) a[i] = b[i] + 1; } } which, somewhat surprisingly, we were not previously optimizing away. llvm-svn: 245400	2015-08-19 01:51:51 +00:00
David Majnemer	c6bb0e2a51	[InstSimplify] Don't assume getAggregateElement will succeed It isn't always possible to get a value from getAggregateElement. This fixes PR24488. llvm-svn: 245365	2015-08-18 22:07:25 +00:00
Justin Bogner	9f00ebaeda	Revert "Constant propagation after hiting llvm.assume" This was also failing bootstrap: http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto_build This reverts r245265. llvm-svn: 245269	2015-08-18 07:00:34 +00:00
Piotr Padlewski	94ca3783b8	Constant propagation after hiting llvm.assume After hitting @llvm.assume(X) we can: - propagate equality that X == true - if X is icmp/fcmp (with eq operation), and one of operand is constant we can change all variables with constants in the same BasicBlock http://reviews.llvm.org/D11918 llvm-svn: 245265	2015-08-18 03:55:30 +00:00
Silviu Baranga	b322aa6f53	[CostModel][AArch64] Increase cost of vector insert element and add missing cast costs Summary: Increase the estimated costs for insert/extract element operations on AArch64. This is motivated by results from benchmarking interleaved accesses. Add missing costs for zext/sext/trunc instructions and some integer to floating point conversions. These costs were previously calculated by scalarizing these operation and were affected by the cost increase of the insert/extract element operations. Reviewers: rengolin Subscribers: mcrosier, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D11939 llvm-svn: 245226	2015-08-17 16:05:09 +00:00
Karthik Bhat	3af28945b9	Fix PR24469 resulting from r245025 and re-enable dead store elimination across basicblocks. PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was deleting the next basic block iterator. Fixed the same by resetting the basic block iterator post call to DeleteDeadInstruction. llvm-svn: 245195	2015-08-17 05:51:39 +00:00
David Majnemer	8ed559ad22	Revert "[InstCombinePHI] Partial simplification of identity operations." This reverts commit r244887, it caused PR24470. llvm-svn: 245194	2015-08-17 03:11:26 +00:00
Sanjay Patel	57fd1dc5db	transform fmin/fmax calls when possible (PR24314) If we can ignore NaNs, fmin/fmax libcalls can become compare and select (this is what we turn std::min / std::max into). This IR should then be optimized in the backend to whatever is best for any given target. Eg, x86 can use minss/maxss instructions. This should solve PR24314: https://llvm.org/bugs/show_bug.cgi?id=24314 Differential Revision: http://reviews.llvm.org/D11866 llvm-svn: 245187	2015-08-16 20:18:19 +00:00
David Majnemer	e04443baff	Revert "Add support for cross block dse. This patch enables dead stroe elimination across basicblocks." This reverts commit r245025, it caused PR24469. llvm-svn: 245172	2015-08-16 07:11:59 +00:00
David Majnemer	dfa3b09541	[InstCombine] Replace an and+icmp with a trunc+icmp Bitwise arithmetic can obscure a simple sign-test. If replacing the mask with a truncate is preferable if the type is legal because it permits us to rephrase the comparison more explicitly. llvm-svn: 245171	2015-08-16 07:09:17 +00:00
JF Bastien	5e4303dc14	Accelerate MergeFunctions with hashing This patch makes the Merge Functions pass faster by calculating and comparing a hash value which captures the essential structure of a function before performing a full function comparison. The hash is calculated by hashing the function signature, then walking the basic blocks of the function in the same order as the main comparison function. The opcode of each instruction is hashed in sequence, which means that different functions according to the existing total order cannot have the same hash, as the comparison requires the opcodes of the two functions to be the same order. The hash function is a static member of the FunctionComparator class because it is tightly coupled to the exact comparison function used. For example, functions which are equivalent modulo a single variant callsite might be merged by a more aggressive MergeFunctions, and the hash function would need to be insensitive to these differences in order to exploit this. The hashing function uses a utility class which accumulates the values into an internal state using a standard bit-mixing function. Note that this is a different interface than a regular hashing routine, because the values to be hashed are scattered amongst the properties of a llvm::Function, not linear in memory. This scheme is fast because only one word of state needs to be kept, and the mixing function is a few instructions. The main runOnModule function first computes the hash of each function, and only further processes functions which do not have a unique function hash. The hash is also used to order the sorted function set. If the hashes differ, their values are used to order the functions, otherwise the full comparison is done. Both of these are helpful in speeding up MergeFunctions. Together they result in speedups of 9% for mysqld (a mostly C application with little redundancy), 46% for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all three cases, the new speed of MergeFunctions is about half that of the module verifier, making it relatively inexpensive even for large LTO builds with hundreds of thousands of functions. The same functions are merged, so this change is free performance. Author: jrkoenig Reviewers: nlewycky, dschuff, jfb Subscribers: llvm-commits, aemerson Differential revision: http://reviews.llvm.org/D11923 llvm-svn: 245140	2015-08-15 01:18:18 +00:00
Matt Arsenault	427a0fd22e	LoopStrengthReduce: Try to pass address space to isLegalAddressingMode This seems to only work some of the time. In some situations, this seems to use a nonsensical type and isn't actually aware of the memory being accessed. e.g. if branch condition is an icmp of a pointer, it checks the addressing mode of i1. llvm-svn: 245137	2015-08-15 00:53:06 +00:00
Nick Lewycky	8075fd22b9	Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458! llvm-svn: 245119	2015-08-14 22:46:49 +00:00
Bjarke Hammersholt Roune	9791ed4705	[SCEV] Apply NSW and NUW flags via poison value analysis for sub, mul and shl Summary: http://reviews.llvm.org/D11212 made Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs for add instructions. This patch expands that to sub, mul and shl instructions. This change makes LSR able to generate pointer induction variables for loops like these, where the index is 32 bit and the pointer is 64 bit: for (int i = 0; i < numIterations; ++i) sum += ptr[i - offset]; for (int i = 0; i < numIterations; ++i) sum += ptr[i * stride]; for (int i = 0; i < numIterations; ++i) sum += ptr[3 * (i << 7)]; Reviewers: atrick, sanjoy Subscribers: sanjoy, majnemer, hfinkel, llvm-commits, meheff, jingyue, eliben Differential Revision: http://reviews.llvm.org/D11860 llvm-svn: 245118	2015-08-14 22:45:26 +00:00
Chad Rosier	67dca908fe	Cleanup test whitespace or lack thereof. NFC. llvm-svn: 245065	2015-08-14 16:34:15 +00:00
Karthik Bhat	ddc2a86a00	Add support for cross block dse. This patch enables dead stroe elimination across basicblocks. Example: define void @test_02(i32 %N) { %1 = alloca i32 store i32 %N, i32* %1 store i32 10, i32* @x %2 = load i32, i32* %1 %3 = icmp ne i32 %2, 0 br i1 %3, label %4, label %5 ; <label>:4 store i32 5, i32* @x br label %7 ; <label>:5 %6 = load i32, i32* @x store i32 %6, i32* @y br label %7 ; <label>:7 store i32 15, i32* @x ret void } In the above example dead store "store i32 5, i32* @x" is now eliminated. Differential Revision: http://reviews.llvm.org/D11143 llvm-svn: 245025	2015-08-14 04:17:23 +00:00
Jingyue Wu	1238f341ba	[SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow. Summary: This patch implements my promised optimization to reunites certain sexts from operands after we extract the constant offset. See the header comment of reuniteExts for its motivation. One key building block that enables this optimization is Bjarke's poison value analysis (D11212). That helps to prove "a +nsw b" can't overflow. Reviewers: broune Subscribers: jholewinski, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12016 llvm-svn: 245003	2015-08-14 02:02:05 +00:00
Davide Italiano	a195386ca1	[SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz If <src> is non-zero we can safely set the flag to true, and this results in less code generated for, e.g. ffs(x) + 1 on FreeBSD. Thanks to majnemer for suggesting the fix and reviewing. Code generated before the patch was applied: 0: 0f bc c7 bsf %edi,%eax 3: b9 20 00 00 00 mov $0x20,%ecx 8: 0f 45 c8 cmovne %eax,%ecx b: 83 c1 02 add $0x2,%ecx e: b8 01 00 00 00 mov $0x1,%eax 13: 85 ff test %edi,%edi 15: 0f 45 c1 cmovne %ecx,%eax 18: c3 retq Code generated after the patch was applied: 0: 0f bc cf bsf %edi,%ecx 3: 83 c1 02 add $0x2,%ecx 6: 85 ff test %edi,%edi 8: b8 01 00 00 00 mov $0x1,%eax d: 0f 45 c1 cmovne %ecx,%eax 10: c3 retq It seems we can still use cmove and save another 'test' instruction, but that can be tackled separately. Differential Revision: http://reviews.llvm.org/D11989 llvm-svn: 244947	2015-08-13 20:34:26 +00:00
Jingyue Wu	13a80eaceb	[SeparateConstOffsetFromGEP] strengthen the inbounds attribute We used to be over-conservative about preserving inbounds. Actually, the second GEP (which applies the constant offset) can inherit the inbounds attribute of the original GEP, because the resultant pointer is equivalent to that of the original GEP. For example, x = GEP inbounds a, i+5 => y = GEP a, i // inbounds removed x = GEP inbounds y, 5 // inbounds preserved llvm-svn: 244937	2015-08-13 18:48:49 +00:00
Igor Laevsky	30143aee11	Emit argmemonly attribute for intrinsics. Differential Revision: http://reviews.llvm.org/D11352 llvm-svn: 244920	2015-08-13 17:40:04 +00:00
Erik Eckstein	11fc8175d9	[DeadStoreElimination] remove a redundant store even if the load is in a different block. DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location. So far this worked only if the store is in the same block as the load. Now we can also handle stores which are in a different block than the load. Example: define i32 @test(i1, i32) { entry: %l2 = load i32, i32 %1, align 4 br i1 %0, label %bb1, label %bb2 bb1: br label %bb3 bb2: ; This store is redundant store i32 %l2, i32* %1, align 4 br label %bb3 bb3: ret i32 0 } Differential Revision: http://reviews.llvm.org/D11854 llvm-svn: 244901	2015-08-13 15:36:11 +00:00
Charlie Turner	6153698f26	[InstCombinePHI] Partial simplification of identity operations. Consider this code: BB: %i = phi i32 [ 0, %if.then ], [ %c, %if.else ] %add = add nsw i32 %i, %b ... In this common case the add can be moved to the %if.else basic block, because adding zero is an identity operation. If we go though %if.then branch it's always a win, because add is not executed; if not, the number of instructions stays the same. This pattern applies also to other instructions like sub, shl, shr, ashr \| 0, mul, sdiv, div \| 1. Patch by Jakub Kuderski! llvm-svn: 244887	2015-08-13 12:38:58 +00:00
Simon Pilgrim	becd5e8abd	[InstCombine] SSE/AVX vector shifts demanded shift amount bits Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this. I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift. Differential Revision: http://reviews.llvm.org/D11938 llvm-svn: 244872	2015-08-13 07:39:03 +00:00
Philip Reames	971dc3a82a	[RewriteStatepointsForGC] Avoid using unrelocated pointers after safepoints To be clear: this is an optimization not a correctness change. CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll. One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust. I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially. Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly. Differential Revision: http://reviews.llvm.org/D11819 llvm-svn: 244821	2015-08-12 22:11:45 +00:00
Philip Reames	9ac4e38a16	[RewriteStatepointsForGC] Handle extractelement fully in the base pointer algorithm When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :) This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future. llvm-svn: 244808	2015-08-12 21:00:20 +00:00
Simon Pilgrim	8c049d5c03	[InstCombine] Move SSE/AVX vector blend folding to instcombiner As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723	2015-08-12 08:08:56 +00:00
Adam Nemet	e2f6d34d21	[LoopDist] Add test for missing coverage Add a testcase to ensure that if we can't find bounds for a necessary memcheck we don't distribute. llvm-svn: 244703	2015-08-12 00:21:59 +00:00
Sanjoy Das	827529e7a0	Fix PR24354. `InstCombiner::OptimizeOverflowCheck` was asserting an invariant (operands to binary operations are ordered by decreasing complexity) that wasn't really an invariant. Fix this by instead having `InstCombiner::OptimizeOverflowCheck` establish the invariant if it does not hold. llvm-svn: 244676	2015-08-11 21:33:55 +00:00
Chen Li	10f01bd4d3	[LowerSwitch] Fix a bug when LowerSwitch deletes the default block Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks. Reviewers: kariddi, resistor, hans, reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11852 llvm-svn: 244642	2015-08-11 18:12:26 +00:00
Sanjay Patel	cdd5ec47ed	fix typos; NFC llvm-svn: 244619	2015-08-11 16:10:41 +00:00
Sanjay Patel	fec7965b36	fix minsize detection: minsize attribute implies optimizing for size llvm-svn: 244617	2015-08-11 15:56:31 +00:00
Mehdi Amini	b10555cc61	Fix InstCombine test: invalid CHECK line slipped in r231270 I incorrectly wrote CHECK-NEXT with followin with ':', the check was ignored by FileCheck. The non-inbound GEP is folded here because the DataLayout is no longer optional, the fold was originally guarded with a comment that said: We need TD information to know the pointer size unless this is inbounds. Now we always have "TD information" and perform the fold. Thanks Jonathan Roelofs for noticing. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 244613	2015-08-11 15:31:17 +00:00
Sanjay Patel	b5c0c58737	remove unnecessary settings/attributes from test case llvm-svn: 244612	2015-08-11 15:30:53 +00:00
James Molloy	134bec2722	Add support for floating-point minnum and maxnum The select pattern recognition in ValueTracking (as used by InstCombine and SelectionDAGBuilder) only knew about integer patterns. This teaches it about minimum and maximum operations. matchSelectPattern() has been extended to return a struct containing the existing Flavor and a new enum defining the pattern's behavior when given one NaN operand. C minnum() is defined to return the non-NaN operand in this case, but the idiomatic C "a < b ? a : b" would return the NaN operand. ARM and AArch64 at least have different instructions for these different cases. llvm-svn: 244580	2015-08-11 09:12:57 +00:00
Tyler Nowicki	c94d6ad241	Print vectorization analysis when loop hint is specified. This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints. llvm-svn: 244555	2015-08-11 01:09:15 +00:00
Sanjoy Das	7742b8ba15	Address post-commit review from r243378. This checks that bork_directive occurs exactly twice in the test output. llvm-svn: 244543	2015-08-11 00:20:24 +00:00
Tyler Nowicki	652b0dabe6	Extend late diagnostics to include late test for runtime pointer checks. This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization. llvm-svn: 244523	2015-08-10 23:01:55 +00:00
Tyler Nowicki	655e573dc5	Make fp vectorization test X86 specified to avoid cost-model related problems on arm-thumb and hexagon. llvm-svn: 244505	2015-08-10 21:14:38 +00:00
Simon Pilgrim	a3a72b41de	[InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495	2015-08-10 20:21:15 +00:00
Jonathan Roelofs	f45295c366	Fix a few more cases of 'CHECK[^:]*$'. NFCI llvm-svn: 244491	2015-08-10 19:56:39 +00:00
Tyler Nowicki	c1a86f5866	Late evaluation of the fast-math vectorization requirement. This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint. llvm-svn: 244489	2015-08-10 19:51:46 +00:00
Tyler Nowicki	4d62f2e039	Modify diagnostic messages to clearly indicate the why interleaving wasn't done. Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done. llvm-svn: 244485	2015-08-10 19:14:16 +00:00
Jonathan Roelofs	49e46ce8e2	Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481	2015-08-10 19:01:27 +00:00
Mark Heffernan	8939154a22	Add new llvm.loop.unroll.enable metadata. This change adds the unroll metadata "llvm.loop.unroll.enable" which directs the optimizer to unroll a loop fully if the trip count is known at compile time, and unroll partially if the trip count is not known at compile time. This differs from "llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not known at compile time. The "llvm.loop.unroll.enable" is intended to be added for loops annotated with "#pragma unroll". llvm-svn: 244466	2015-08-10 17:28:08 +00:00
Fraser Cormack	e29ab2bfab	Prevent the scalarizer from caching incorrect entries The scalarizer can cache incorrect entries when walking up a chain of insertelement instructions. This occurs when it encounters more than one instruction that it is not actively searching for, as it unconditionally caches every element it finds. The fix is to only cache the first element that it isn't searching for so we don't overwrite correct entries. Reviewers: hfinkel Differential Revision: http://reviews.llvm.org/D11559 llvm-svn: 244448	2015-08-10 14:48:47 +00:00
David Majnemer	4232fb3f8d	[PHITransAddr] Don't assume that instruction operands are translatable We can only PHI translate instructions. In our attempt to PHI translate a bitcast, we attempt to translate its operand; however, the operand might be an argument or a global instead of an instruction. Benignly bail out when this happens. This fixes PR24397. Differential Revision: http://reviews.llvm.org/D11879 llvm-svn: 244418	2015-08-09 15:43:02 +00:00
Chen Li	eafbc9dc47	[ConstantFoldTerminator] Preserve make.implicit metadata when converting SwitchInst to BranchInst Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion. Reviewers: sanjoy, weimingz, chenli Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D11841 llvm-svn: 244348	2015-08-07 19:30:12 +00:00
Simon Pilgrim	3815c16bf8	[InstCombine] Fix SSE2/AVX2 vector logical shift by constant This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes. e.g. %1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>) In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) \| 15) - giving a zero result. In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821). Differential Revision: http://reviews.llvm.org/D11760 llvm-svn: 244341	2015-08-07 18:22:50 +00:00
Sanjoy Das	366acc175e	[IndVars] Fix PR24356. Unsigned predicates increase or decrease agnostic of the signs of their increments. llvm-svn: 244265	2015-08-06 20:43:41 +00:00
Quentin Colombet	6443cce233	[Reassociation] Fix miscompile for va_arg arguments. iisUnmovableInstruction() had a list of instructions hardcoded which are considered unmovable. The list lacked (at least) an entry for the va_arg and cmpxchg instructions. Fix this by introducing a new Instruction::mayBeMemoryDependent() instead of maintaining another instruction list. Patch by Matthias Braun <matze@braunis.de>. Differential Revision: http://reviews.llvm.org/D11577 rdar://problem/22118647 llvm-svn: 244244	2015-08-06 18:44:34 +00:00
Richard Diamond	bd753c9315	Fix an alignment error in `llvm::expandAtomicRMWToCmpXchg` without breaking the build where X86 isn't enabled. Summary: Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test. Reviewers: rengolin, jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11804 llvm-svn: 244229	2015-08-06 16:55:03 +00:00
Renato Golin	a02ac60469	Revert "Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test." This reverts commit r244155, as it was breaking the buildbots for too long. Should be reapplied with proper fix. llvm-svn: 244205	2015-08-06 10:37:59 +00:00
Richard Diamond	559c1d72a9	Divide the primitive size in bits by eight so the initial load's alignment is in bytes as expected. Tested with the included unit test. llvm-svn: 244155	2015-08-05 22:10:57 +00:00
Chen Li	50efd9220a	[LoopUnswitch] Preserve make.implicit metadata for unswitched conditions Summary: This patch adds support to preserve make.implicit metadata for unswitched conditions in loop pre-header. Reviewers: sanjoy, weimingz Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D11769 llvm-svn: 244132	2015-08-05 21:13:26 +00:00
Simon Pilgrim	42c611b9ae	[InstCombine] Added more specific SSE2/AVX2 vector shift tests. llvm-svn: 244022	2015-08-05 08:21:38 +00:00
Simon Pilgrim	d19b9d8229	[InstCombine] Split off SSE2/AVX2 vector shift tests. These aren't vector demanded bits tests. More tests to follow. llvm-svn: 243963	2015-08-04 08:05:27 +00:00
Mehdi Amini	c8d5783114	Update test suite to make "ninja check" succeed without native backend builtin Requires "native" feature in most places that were failing. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 243960	2015-08-04 06:32:54 +00:00
Sanjoy Das	215df9ed98	Revert "[LSR] Generate and use zero extends" This reverts commit r243348 and r243357. They caused PR24347. llvm-svn: 243939	2015-08-04 01:52:05 +00:00
Chandler Carruth	87adb7a2e2	[Unroll] Improve the brute force loop unroll estimate by propagating through PHI nodes across iterations. This patch teaches the new advanced loop unrolling heuristics to propagate constants into the loop from the preheader and around the backedge after simulating each iteration. This lets us brute force solve simple recurrances that aren't modeled effectively by SCEV. It also makes it more clear why we need to process the loop in-order rather than bottom-up which might otherwise make much more sense (for example, for DCE). This came out of an attempt I'm making to develop a principled way to account for dead code in the unroll estimation. When I implemented a forward-propagating version of that it produced incorrect results due to failing to propagate cost between loop iterations through the PHI nodes, and it occured to me we really should at least propagate simplifications across those edges, and it is quite easy thanks to the loop being in canonical and LCSSA form. Differential Revision: http://reviews.llvm.org/D11706 llvm-svn: 243900	2015-08-03 20:32:27 +00:00
Duncan P. N. Exon Smith	55ca964e94	DI: Disallow uniquable DICompileUnits Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s. The backend is liable to start relying on that (if it hasn't already), so make uniquable `DICompileUnit`s illegal and automatically upgrade old bitcode. This is a nice cleanup, since we can remove an unnecessary `DenseSet` (and the associated uniquing info) from `LLVMContextImpl`. Almost all the testcases were updated with this script: git grep -e '= !DICompileUnit' -l -- test \| grep -v test/Bitcode \| xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,' I imagine something similar should work for out-of-tree testcases. llvm-svn: 243885	2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith	ed013cd221	DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags, using `DW_TAG_variable` in their place Stop exposing the `tag:` field at all in the assembly format for `DILocalVariable`. Most of the testcase updates were generated by the following sed script: find test/ -name ".ll" -o -name ".mir" \| xargs grep -l 'DILocalVariable' \| xargs sed -i '' \ -e 's/tag: DW_TAG_arg_variable, //' \ -e 's/tag: DW_TAG_auto_variable, //' There were only a handful of tests in `test/Assembly` that I needed to update by hand. (Note: a follow-up could change `DILocalVariable::DILocalVariable()` to set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable` (as appropriate), instead of having that logic magically in the backend in `DbgVariable`. I've added a FIXME to that effect.) llvm-svn: 243774	2015-07-31 18:58:39 +00:00
Wei Mi	d6f7252e2e	[SLP vectorizer]: Choose the best consecutive candidate to pair with a store instruction. The patch changes the SLPVectorizer::vectorizeStores to choose the immediate succeeding or preceding candidate for a store instruction when it has multiple consecutive candidates. In this way it has better chance to find more slp vectorization opportunities. Differential Revision: http://reviews.llvm.org/D10445 llvm-svn: 243666	2015-07-30 17:40:39 +00:00
Michael Zolotukhin	9f06ef76d3	[Unroll] Handle SwitchInst properly. Previously successor selection was simply wrong. llvm-svn: 243545	2015-07-29 18:10:33 +00:00
Michael Zolotukhin	3a7d55b623	[Unroll] Don't crash when simplified branch condition is undef. llvm-svn: 243544	2015-07-29 18:10:29 +00:00
Michael Zolotukhin	a2069d36ce	Rename test full-unroll-bad-geps.ll to full-unroll-crashers.ll. No reason to limit it only to GEP-related crashes. More tests are to come here. llvm-svn: 243543	2015-07-29 18:10:23 +00:00
Sanjoy Das	cfe41f050c	[Statepoints] Let patchable statepoints have a symbolic call target. Summary: As added initially, statepoints required their call targets to be a constant pointer null if ``numPatchBytes`` was non-zero. This turns out to be a problem ergonomically, since there is no way to mark patchable statepoints as calling a (readable) symbolic value. This change remove the restriction of requiring ``null`` call targets for patchable statepoints, and changes PlaceSafepoints to maintain the symbolic call target through its transformation. Reviewers: reames, swaroop.sridhar Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11550 llvm-svn: 243502	2015-07-28 23:50:30 +00:00
Jingyue Wu	42f1d67a45	[SCEV] Apply NSW and NUW flags via poison value analysis Summary: Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX. There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial. This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this: for (int i = 0; i < limit; ++i) { sum += ptr[i + offset]; } Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX. There are more details in this discussion on llvmdev. June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234 July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392 Patch by Bjarke Roune Reviewers: eliben, atrick, sanjoy Subscribers: majnemer, hfinkel, jingyue, meheff, llvm-commits Differential Revision: http://reviews.llvm.org/D11212 llvm-svn: 243460	2015-07-28 18:22:40 +00:00
Chih-Hung Hsieh	1e859582d6	Implement target independent TLS compatible with glibc's emutls.c. The 'common' section TLS is not implemented. Current C/C++ TLS variables are not placed in common section. DWARF debug info to get the address of TLS variables is not generated yet. clang and driver changes in http://reviews.llvm.org/D10524 Added -femulated-tls flag to select the emulated TLS model, which will be used for old targets like Android that do not support ELF TLS models. Added TargetLowering::LowerToTLSEmulatedModel as a target-independent function to convert a SDNode of TLS variable address to a function call to __emutls_get_address. Added into lib/Target//ISelLowering.cpp to call LowerToTLSEmulatedModel for TLSModel::Emulated. Although all targets supporting ELF TLS models are enhanced, emulated TLS model has been tested only for Android ELF targets. Modified AsmPrinter.cpp to print the emutls_v.* and emutls_t.* variables for emulated TLS variables. Modified DwarfCompileUnit.cpp to skip some DIE for emulated TLS variabls. TODO: Add proper DIE for emulated TLS variables. Added new unit tests with emulated TLS. Differential Revision: http://reviews.llvm.org/D10522 llvm-svn: 243438	2015-07-28 16:24:05 +00:00
Sanjoy Das	6c7a186599	FileCheck'ify some wc/grep based tests; NFCI. llvm-svn: 243378	2015-07-28 03:50:09 +00:00
Sanjoy Das	3895a57b32	[LSR] Move X86 specific test case to X86/ rL243348 added the test case in the wrong directory. llvm-svn: 243357	2015-07-28 00:13:42 +00:00
Sanjoy Das	93b3504aa8	[LSR] Generate and use zero extends Summary: If a scale or a base register can be rewritten as "Zext({A,+,1})" then LSR will now consider a formula of that form in its normal cost computation. Depends on D9180 Reviewers: qcolombet, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9181 llvm-svn: 243348	2015-07-27 23:27:51 +00:00
Sanjoy Das	5dab205ced	[IndVars] Make loop varying predicates loop invariant. Summary: Was D9784: "Remove loop variant range check when induction variable is strictly increasing" This change re-implements D9784 with the two differences: 1. It does not use SCEVExpander and does not generate new instructions. Instead, it does a quick local search for existing `llvm::Value`s that it needs when modifying the `icmp` instruction. 2. It is more general -- it deals with both increasing and decreasing induction variables. I've added all of the tests included with D9784, and two more. As an example on what this change does (copied from D9784): Given C code: ``` for (int i = M; i < N; i++) // i is known not to overflow if (i < 0) break; a[i] = 0; } ``` This transformation produces: ``` for (int i = M; i < N; i++) if (M < 0) break; a[i] = 0; } ``` Which can be unswitched into: ``` if (!(M < 0)) for (int i = M; i < N; i++) a[i] = 0; } ``` I went back and forth on whether the top level logic should live in `SimplifyIndvar::eliminateIVComparison` or be put into its own routine. Right now I've put it under `eliminateIVComparison` because even though the `icmp` is not eliminated, it no longer is an IV comparison. I'm open to putting it in its own helper routine if you think that is better. Reviewers: reames, nicholas, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11278 llvm-svn: 243331	2015-07-27 21:42:49 +00:00
Simon Pilgrim	15c0a59463	[InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code. Differential Revision: http://reviews.llvm.org/D11503 llvm-svn: 243303	2015-07-27 18:52:15 +00:00
Matt Arsenault	95365ca482	Fix assert when inlining a constantexpr addrspacecast The pointer size of the addrspacecasted pointer might not have matched, so this would have hit an assert in accumulateConstantOffset. I think this was here to allow constant folding of a load of an addrspacecasted constant. Accumulating the offset through the addrspacecast doesn't make much sense, so something else is necessary to allow folding the load through this cast. llvm-svn: 243300	2015-07-27 18:31:03 +00:00
Silviu Baranga	de38070587	The tests added in r243270 require asserts to be enabled llvm-svn: 243274	2015-07-27 15:22:49 +00:00
Silviu Baranga	65bdb6788b	Fix the tests added in r243270. Use 2>&1 instead of \|& llvm-svn: 243273	2015-07-27 15:08:55 +00:00

1 2 3 4 5 ...

5712 Commits