Commit Graph

5712 Commits

Author SHA1 Message Date
James Molloy e805ad95dc [LoopRerolling] Be more forgiving with instruction order.
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.

llvm-svn: 228931
2015-02-12 15:54:14 +00:00
Andrea Di Biagio b08862c4f0 [TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target.
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.

Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.

This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.

Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.

Differential Revision: http://reviews.llvm.org/D7585

llvm-svn: 228923
2015-02-12 14:17:24 +00:00
Chandler Carruth 63aaa98d94 [slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out.
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.

The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.

Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.

I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.

I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.

llvm-svn: 228899
2015-02-12 02:30:56 +00:00
Tim Northover 02438033e8 DeadArgElim: aggregate Return assessment properly.
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.

llvm-svn: 228885
2015-02-11 23:13:11 +00:00
Mehdi Amini 9730116bd6 Reassociate: cannot negate a INT_MIN value
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7286

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 228872
2015-02-11 19:54:44 +00:00
Andrea Di Biagio 2a0e435db1 [TTI] Improved cost heuristic for cttz/ctlz calls.
This patch is a follow-up of r228826 (see code-review: D7506).

Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we 
have to fix the cost heuristic for intrinsic calls to cttz/ctlz.

This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl
queries TLI to check if a call to cttz/ctlz is cheap for the target.

Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86,
SimplifyCFG only speculates a call to cttz/ctlz if it is cheap.

Differential Revision: http://reviews.llvm.org/D7554

llvm-svn: 228829
2015-02-11 14:22:18 +00:00
James Molloy 7c336576a5 [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

llvm-svn: 228826
2015-02-11 12:15:41 +00:00
James Molloy f147359376 [LoopReroll] Introduce the concept of DAGRootSets.
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:

   x[i*3+0] = y1
   x[i*3+1] = y2
   x[i*3+2] = y3

   Base instruction -> i*3
                    +---+----+
                   /    |     \
               ST[y1]  +1     +2  <-- Roots
                        |      |
                      ST[y2] ST[y3]

There may be multiple DAGRootSets, for example:

   x[i*2+0] = ...   (1)
   x[i*2+1] = ...   (1)
   x[i*2+4] = ...   (2)
   x[i*2+5] = ...   (2)
   x[(i+1234)*2+5678] = ... (3)
   x[(i+1234)*2+5679] = ... (3)

This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.

llvm-svn: 228821
2015-02-11 09:19:47 +00:00
Reid Kleckner b3775df32e Fix invalid LLVM IR in PruneEH tests
llvm-svn: 228786
2015-02-11 02:06:47 +00:00
Reid Kleckner 96d011315a Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
David Majnemer 44a5f22fbc EarlyCSE: Add check lines for test added in r228760
llvm-svn: 228761
2015-02-10 23:11:02 +00:00
David Majnemer 7679300d93 EarlyCSE: It isn't safe to CSE across synchronization boundaries
This fixes PR22514.

llvm-svn: 228760
2015-02-10 23:09:43 +00:00
Tim Northover 43c0d2db50 DeadArgElim: arguments affect all returned sub-values by default.
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.

Previously we were deleting arguments if sub-value 0 turned out to be dead.

llvm-svn: 228731
2015-02-10 19:49:18 +00:00
Michael Zolotukhin 03e3518c91 Add a test case for new unrolling heuristics.
THe heuristics were added in r228265 and r228434.

llvm-svn: 228713
2015-02-10 17:54:54 +00:00
Chandler Carruth 2496910325 Revert r228556: InstCombine: propagate nonNull through assume
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.

llvm-svn: 228677
2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra 2e4b9e0a37 PlaceSafepoints: modernize gc.result.* -> gc.result
Differential Revision: http://reviews.llvm.org/D7516

llvm-svn: 228625
2015-02-09 23:00:40 +00:00
Philip Reames 0edbf2e407 Introduce more tests for PlaceSafepoints
These tests the two optimizations for backedge insertion currently implemented and the split backedge flag which is currently off by default.

llvm-svn: 228617
2015-02-09 22:10:15 +00:00
Philip Reames 5fc82fd6b5 Minor test cleanup
a) add gc attribute
b) remove unused param

llvm-svn: 228612
2015-02-09 21:50:31 +00:00
Philip Reames b1ed02f728 Add basic tests for PlaceSafepoints
This is just adding really simple tests which should have been part of the original submission.  When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission.  I fixed two such bugs in this submission.

llvm-svn: 228610
2015-02-09 21:48:05 +00:00
Tim Northover 705d2af9e1 DeadArgElim: fix mismatch in accounting of array return types.
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.

This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.

llvm-svn: 228559
2015-02-09 01:21:00 +00:00
Tim Northover 854c927de5 DeadArgElim: assess uses of entire return value aggregate.
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.

E.g.

  %val = call { i32, i32 } @whatever()
  [...]
  ret { i32, i32 } %val

The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.

Also unifies the logic slightly between aggregate and non-aggregate
cases..

llvm-svn: 228558
2015-02-09 01:20:53 +00:00
Ramkumar Ramachandra a021ee62ca InstCombine: propagate nonNull through assume
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.

Differential Revision: http://reviews.llvm.org/D7107

llvm-svn: 228556
2015-02-09 01:13:13 +00:00
Bjorn Steinbrink 5ec7522771 Correctly combine alias.scope metadata by a union instead of intersecting
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7490

llvm-svn: 228525
2015-02-08 17:07:14 +00:00
Benjamin Kramer 17d9015d27 ValueTracking: Make isBytewiseValue simpler and more powerful at the same time.
Turns out there is a simpler way of checking that all bytes in a word are equal
than binary decomposition.

llvm-svn: 228503
2015-02-07 19:29:02 +00:00
Bjorn Steinbrink 71bf3b800a Properly update AA metadata when performing call slot optimization
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7482

llvm-svn: 228500
2015-02-07 17:54:36 +00:00
Matthias Braun 2e404597f4 InstCombine: Combine select sequences into a single select
Normalize
select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b)
select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b)

This normal form may enable further combines on the And/Or and shortens
paths for the values. Many targets prefer the other but can go back
easily in CodeGen.

Differential Revision: http://reviews.llvm.org/D7399

llvm-svn: 228409
2015-02-06 17:49:36 +00:00
Michael Kuperstein d2b6fdbc31 Teach isDereferenceablePointer() to look through bitcast constant expressions.
This fixes a LICM regression due to the new load+store pair canonicalization.

Differential Revision: http://reviews.llvm.org/D7411

llvm-svn: 228284
2015-02-05 09:15:37 +00:00
Cameron Esfahani 17177d1e84 Value soft float calls as more expensive in the inliner.
Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation.  By default, it's not an expensive operation.  This keeps the default behavior the same as before.  The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point.

Reviewers: chandlerc, echristo

Reviewed By: echristo

Subscribers: t.p.northover, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D6936

llvm-svn: 228263
2015-02-05 02:09:33 +00:00
Tom Stellard 071ec90b68 StructurizeCFG: Use a reverse post-order traversal
We were previously doing a post-order traversal and operating on the
list in reverse, however this would occasionaly cause backedges for
loops to be visited before some of the other blocks in the loop.

We know use a reverse post-order traversal, which avoids this issue.

The reverse post-order traversal is not completely ideal, so we need
to manually fixup the list to ensure that inner loop backedges are
visited before outer loop backedges.

llvm-svn: 228186
2015-02-04 20:49:44 +00:00
Renato Golin 2a5c0a51ce Reverting VLD1/VST1 base-updating/post-incrementing combining
This reverts patches 223862, 224198, 224203, and 224754, which were all
related to the vector load/store combining and were reverted/reaplied
a few times due to the same alignment problems we're seeing now.

Further tests, mainly self-hosting Clang, will be needed to reapply this
patch in the future.

llvm-svn: 228129
2015-02-04 10:11:59 +00:00
Daniel Berlin 487aed0d77 Allow PRE to insert no-cost phi nodes
llvm-svn: 228024
2015-02-03 20:37:08 +00:00
Jingyue Wu d7966ff3b9 Add straight-line strength reduction to LLVM
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

  #pragma unroll
  foo (int i = 0; i < 3; ++i) {
    sum += foo((b + i) * s);
  }

into

  sum += foo(b * s);
  sum += foo((b + 1) * s);
  sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

  b * s
  (b + 1) * s
  (b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

  t1 = b * s
  t2 = t1 + s
  t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Test Plan: test/StraightLineStrengthReduce/slsr.ll

Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick

Reviewed By: jholewinski, atrick

Subscribers: karthikthecool, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7310

llvm-svn: 228016
2015-02-03 19:37:06 +00:00
Erik Eckstein 7330e358f6 Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction.
The commit r225977 uncovered this bug. The problem was that the vectorizer tried to
read the second operand of an already deleted instruction.
The bug didn't show up before r225977 because the freed memory still contained a non-null pointer.
With r225977 deletion of instructions is delayed and the read operand pointer is always null.

llvm-svn: 227800
2015-02-02 12:45:34 +00:00
Chandler Carruth fdffd87d68 [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Chandler Carruth e8c686aa86 [PM] Port EarlyCSE to the new pass manager.
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.

llvm-svn: 227725
2015-02-01 10:51:23 +00:00
Adrian Prantl 3e2659eb92 Inliner: Use replaceDbgDeclareForAlloca() instead of splicing the
instruction and generalize it to optionally dereference the variable.
Follow-up to r227544.

llvm-svn: 227604
2015-01-30 19:37:48 +00:00
Hao Liu 6bd67c08fa Move the target specific test case arbitrary-induction-step.ll to test/Transforms/LoopVectorize/AArch64 folder.
llvm-svn: 227561
2015-01-30 07:33:31 +00:00
Hao Liu 8de4f8b1b5 [LoopVectorize] Induction variables: support arbitrary constant step.
Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support
arbitrary constant steps.
Initial patch by Alexey Volkov. Some bug fixes are added in the following version.

Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193

llvm-svn: 227557
2015-01-30 05:02:21 +00:00
Adrian Prantl 4d365250ec Fix PR22386. The inliner moves static allocas to the entry basic block
so we need to move the dbg.declare intrinsics that describe them, too.

llvm-svn: 227544
2015-01-30 01:55:25 +00:00
Sanjay Patel 4f07a56958 [GVN] don't propagate equality comparisons of FP zero (PR22376)
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.

This patch disallows propagating zero constants in equality comparisons. 
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376

Differential Revision: http://reviews.llvm.org/D7257

llvm-svn: 227491
2015-01-29 20:51:49 +00:00
Philip Reames 9198b33b48 Teach SplitBlockPredecessors how to handle landingpad blocks.
Patch by: Igor Laevsky <igor@azulsystems.com>

"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."

Differential Revision: http://reviews.llvm.org/D7157

llvm-svn: 227390
2015-01-28 23:06:47 +00:00
Michael Kuperstein 951995821a [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

llvm-svn: 227308
2015-01-28 14:08:22 +00:00
Elena Demikhovsky 45f0448081 Fold fcmp in cases where value is provably non-negative. By Arch Robison.
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
 - the predicate is olt or uge
 - the first operand is provably not less than zero
 - the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..

http://reviews.llvm.org/D6972

llvm-svn: 227298
2015-01-28 08:03:58 +00:00
Reid Kleckner 4af6415237 Move EH personality type classification to Analysis/LibCallSemantics.h
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7214

llvm-svn: 227284
2015-01-28 01:17:38 +00:00
Ahmed Bougacha 1ac9356524 [SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk.
This was introduced in a faulty refactoring (r225640, mea culpa):
the tests weren't testing the return values, so, for both
__strcpy_chk and __stpcpy_chk, we would return the end of the
buffer (matching stpcpy) instead of the beginning (for strcpy).

The root cause was the prefix "__" being ignored when comparing,
which made us always pick LibFunc::stpcpy_chk.
Pass the LibFunc::Func directly to avoid this kind of error.
Also, make the testcases as explicit as possible to prevent this.

The now-useful testcases expose another, entangled, stpcpy problem,
with the further simplification.  This was introduced in a
refactoring (r225640) to match the original behavior.

However, this leads to problems when successive simplifications
generate several similar instructions, none of which are removed
by the custom replaceAllUsesWith.

For instance, InstCombine (the main user) doesn't erase the
instruction in its custom RAUW.  When trying to simplify say
__stpcpy_chk:
- first, an stpcpy is created (fortified simplifier),
- second, a memcpy is created (normal simplifier), but the
  stpcpy call isn't removed.
- third, InstCombine later revisits the instructions,
  and simplifies the first stpcpy to a memcpy.  We now have
  two memcpys.

llvm-svn: 227250
2015-01-27 21:52:16 +00:00
Sanjoy Das dcf2651043 Teach IRCE to look at branch weights when recognizing range checks
Splitting a loop to make range checks redundant is profitable only if
the range check "never" fails. Make this fact a part of recognizing a
range check -- a branch is a range check only if it is expected to
pass (via branch_weights metadata).

Differential Revision: http://reviews.llvm.org/D7192

llvm-svn: 227249
2015-01-27 21:38:12 +00:00
Andrea Di Biagio 086cbc37ad [InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag.
This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by
a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared.

Added test InstCombine/select-cmp-cttz-ctlz.ll.

llvm-svn: 227197
2015-01-27 15:58:14 +00:00
David Majnemer 4c82daea60 LoopRotate: Don't walk the uses of a Constant
LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop.  We would wrongly perform this operation on Constants.

This fixes PR22337.

llvm-svn: 227171
2015-01-27 06:21:43 +00:00
Chad Rosier f9327d6fe9 Commoning of target specific load/store intrinsics in Early CSE.
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!

llvm-svn: 227149
2015-01-26 22:51:15 +00:00
Philip Reames 219b15e85f Add test cases for PRE w/volatile loads
These tests check that the combination of 227110 (cross block query inst) and 227112 (volatile load semantics) work together properly to allow PRE in cases where a loop contains a volatile access.

llvm-svn: 227146
2015-01-26 22:40:44 +00:00
Hans Wennborg b64cb271dc SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
The range check would get optimized away later, but we might as well not emit
them in the first place.

http://reviews.llvm.org/D6471

llvm-svn: 227126
2015-01-26 19:52:34 +00:00
Hans Wennborg 6800008f04 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.

Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.

Differential Revision: http://reviews.llvm.org/D6471

llvm-svn: 227125
2015-01-26 19:52:32 +00:00
Philip Reames a7ad6a589c Refine memory dependence's notion of volatile semantics
According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile.

The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes.  I will be submitting a PRE w/volatiles test case seperately in the near future.

Differential Revision: http://reviews.llvm.org/D6901

llvm-svn: 227112
2015-01-26 18:54:27 +00:00
Philip Reames 32351455f6 Pass QueryInst down through non-local dependency calculation
This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block.

Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that.

Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself. 

Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well.

Differential Revision: http://reviews.llvm.org/D6895

llvm-svn: 227110
2015-01-26 18:39:52 +00:00
Erik Eckstein 98df6da740 SLPVectorizer: fix wrong scheduling of atomic load/stores.
This fixes PR22306.

llvm-svn: 227077
2015-01-26 09:07:04 +00:00
Chandler Carruth 43e590e51f [PM] Port LowerExpectIntrinsic to the new pass manager.
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.

I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).

llvm-svn: 226999
2015-01-24 11:13:02 +00:00
Chandler Carruth 83ba269e4b [PM] Port instcombine to the new pass manager!
This is exciting as this is a much more involved port. This is
a complex, existing transformation pass. All of the core logic is shared
between both old and new pass managers. Only the access to the analyses
is separate because the actual techniques are separate. This also uses
a bunch of different and interesting analyses and is the first time
where we need to use an analysis across an IR layer.

This also paves the way to expose instcombine utility functions. I've
got a static function that implements the core pass logic over
a function which might be mildly interesting, but more interesting is
likely exposing a routine which just uses instructions *already in* the
worklist and combines until empty.

I've switched one of my favorite instcombine tests to run with both as
well to make sure this keeps working.

llvm-svn: 226987
2015-01-24 04:19:17 +00:00
Hans Wennborg ae9c971a2f LowerSwitch: replace unreachable default with popular case destination
SimplifyCFG currently does this transformation, but I'm planning to remove that
to allow other passes, such as this one, to exploit the unreachable default.

This patch takes care to keep track of what case values are unreachable even
after the transformation, allowing for more efficient lowering.

Differential Revision: http://reviews.llvm.org/D6697

llvm-svn: 226934
2015-01-23 20:43:51 +00:00
Reid Kleckner f12b33454f Revert "Don't remove a landing pad if the invoke requires a table entry."
This reverts commit r176827.

Björn Steinbrink pointed out that this didn't actually fix the bug
(PR15555) it was attempting to fix.

With this reverted, we can now remove landingpad cleanups that
immediately resume unwinding, converting the invoke to a call.

llvm-svn: 226850
2015-01-22 19:29:46 +00:00
Sanjoy Das d1fb13ce4c Fix crashes in IRCE caused by mismatched types
There are places where the inductive range check elimination pass
depends on two llvm::Values or llvm::SCEVs to be of the same
llvm::Type when they do not need to be. This patch relaxes those
restrictions (by bailing out of the optimization if the types
mismatch), and adds test cases to trigger those paths.

These issues were found by bootstrapping clang with IRCE running in
the -O3 pass ordering.

Differential Revision: http://reviews.llvm.org/D7082

llvm-svn: 226793
2015-01-22 08:29:18 +00:00
Elena Demikhovsky 079b2d8c0c Fixed a bug in masked load/store in reversed loop.
Added a test.

The bug was submitted to bugzilla:
http://llvm.org/bugs/show_bug.cgi?id=22225

llvm-svn: 226791
2015-01-22 08:20:06 +00:00
Chandler Carruth cd8522ef44 [canonicalize] Teach InstCombine to canonicalize loads which are only
ever stored to always use a legal integer type if one is available.

Regardless of whether this particular type is good or bad, it ensures we
don't get weird differences in generated code (and resulting
performance) from "equivalent" patterns that happen to end up using
a slightly different type.

After some discussion on llvmdev it seems everyone generally likes this
canonicalization. However, there may be some parts of LLVM that handle
it poorly and need to be fixed. I have at least verified that this
doesn't impede GVN and instcombine's store-to-load forwarding powers in
any obvious cases. Subtle cases are exactly what we need te flush out if
they remain.

Also note that this IR pattern should already be hitting LLVM from Clang
at least because it is exactly the IR which would be produced if you
used memcpy to copy a pointer or floating point between memory instead
of a variable.

llvm-svn: 226781
2015-01-22 05:08:12 +00:00
David Blaikie df706288fb DebugInfo: Use distinct inlinedAt MDLocations to avoid separate inlined calls being coalesced
When two calls from the same MDLocation are inlined they currently get
treated as one inlined function call (creating difficulty debugging,
duplicate variables, etc).

Clang worked around this by including column information on inline calls
which doesn't address LTO inlining or calls to the same function from
the same line and column (such as through a macro). It also didn't
address ctor and member function calls.

By making the inlinedAt locations distinct, every call site has an
explicitly distinct location that cannot be coalesced with any other
call.

This can produce linearly (2x in the worst case where every call is
inlined and the call instruction has a non-call instruction at the same
location) more debug locations. Any increase beyond that are in cases
where the Clang workaround was insufficient and the new scheme is
creating necessary distinct nodes that were being erroneously coalesced
previously.

After this change to LLVM the incomplete workarounds in Clang. That
should reduce the number of debug locations (in a build without column
info, the default on Darwin, not the default on Linux) by not creating
pseudo-distinct locations for every call to an inline function.

(oh, and I made the inlined-at chain rebuilding iterative instead of
recursive because I was having trouble wrapping my head around it the
way it was - open to discussion on the right design for that function
(including going back to a recursive solution))

llvm-svn: 226736
2015-01-21 22:57:29 +00:00
David Majnemer 4c0a6e918a InstCombine: Don't strip bitcasts off of callsites marked 'thunk'
The return type of a thunk is meaningless, we just want the arguments
and return value to be forwarded.

llvm-svn: 226708
2015-01-21 22:32:04 +00:00
Alexander Potapenko 4ac461c35f Use a smaller pragma unroll threshold to reduce test execution time.
When opt is compiled with AddressSanitizer it takes more than 30 seconds
to unroll the loop in unroll_1M().

llvm-svn: 226660
2015-01-21 13:52:02 +00:00
Karthik Bhat 0b0f4660fa Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain.
This patch fixes 2 issues in reorderInputsAccordingToOpcode
1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases.
2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode.
Thanks Michael for inputs and review.
Review: http://reviews.llvm.org/D6677

llvm-svn: 226547
2015-01-20 06:11:00 +00:00
Mehdi Amini 590a2700fc Fix Reassociate handling of constant in presence of undef float
http://reviews.llvm.org/D6993

llvm-svn: 226245
2015-01-16 03:00:58 +00:00
Sanjoy Das a1837a342d Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

This pass was originally r226201.  It was reverted because it used C++
features not supported by MSVC 2012.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226238
2015-01-16 01:03:22 +00:00
Sanjoy Das 7f62ac8e4d Revert r226201 (Add a new pass "inductive range check elimination")
The change used C++11 features not supported by MSVC 2012.  I will fix
the change to use things supported MSVC 2012 and recommit shortly.

llvm-svn: 226216
2015-01-15 22:18:10 +00:00
Sanjoy Das 7059e2959d Add a new pass "inductive range check elimination"
IRCE eliminates range checks of the form

  0 <= A * I + B < Length

by splitting a loop's iteration space into three segments in a way
that the check is completely redundant in the middle segment.  As an
example, IRCE will convert

  len = < known positive >
  for (i = 0; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }

to

  len = < known positive >
  limit = smin(n, len)
  // no first segment
  for (i = 0; i < limit; i++) {
    if (0 <= i && i < len) { // this check is fully redundant
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }
  for (i = limit; i < n; i++) {
    if (0 <= i && i < len) {
      do_something();
    } else {
      throw_out_of_bounds();
    }
  }


IRCE can deal with multiple range checks in the same loop (it takes
the intersection of the ranges that will make each of them redundant
individually).

Currently IRCE does not do any profitability analysis.  That is a
TODO.

Please note that the status of this pass is *experimental*, and it is
not part of any default pass pipeline.  Having said that, I will love
to get feedback and general input from people interested in trying
this out.

Differential Revision: http://reviews.llvm.org/D6693

llvm-svn: 226201
2015-01-15 20:45:46 +00:00
Sanjoy Das 8c252bde36 Fix PR22222
The bug was introduced in r225282. r225282 assumed that sub X, Y is
the same as add X, -Y. This is not correct if we are going to upgrade
the sub to sub nuw. This change fixes the issue by making the
optimization ignore sub instructions.

Differential Revision: http://reviews.llvm.org/D6979

llvm-svn: 226075
2015-01-15 01:46:09 +00:00
Richard Smith e78bb1249e For PR21145: recognise a builtin call to a known deallocation function even if
it's defined in the current module. Clang generates this situation for the
C++14 sized deallocation functions, because it generates a weak definition in
case one isn't provided by the C++ runtime library.

llvm-svn: 226069
2015-01-15 01:00:33 +00:00
Ramkumar Ramachandra dba7329ebb [GC] CodeGenPrep transform: simplify offsetable relocate
The transform is somewhat involved, but the basic idea is simple: find
derived pointers that have been offset from the base pointer using gep
and replace the relocate of the derived pointer with a gep to the
relocated base pointer (with the same offset).

llvm-svn: 226060
2015-01-14 23:27:07 +00:00
Duncan P. N. Exon Smith 9885469922 IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
David Majnemer a0afb55ff9 InstCombine: Don't take A-B<0 into A<B if A-B has other uses
This fixes PR22226.

llvm-svn: 226023
2015-01-14 19:26:56 +00:00
Ahmed Bougacha 71d7b18e3d [SimplifyLibCalls] Don't try to simplify indirect calls.
It turns out, all callsites of the simplifier are guarded by a check for
CallInst::getCalledFunction (i.e., to make sure the callee is direct).

This check wasn't done when trying to further optimize a simplified fortified
libcall, introduced by a refactoring in r225640.

Fix that, add a testcase, and document the requirement.

llvm-svn: 225895
2015-01-14 00:55:05 +00:00
Sanjay Patel 5f1d9eaad3 GVN: propagate equalities for floating point compares
Allow optimizations based on FP comparison values in the same way
as integers. 

This resolves PR17713:
http://llvm.org/bugs/show_bug.cgi?id=17713

Differential Revision: http://reviews.llvm.org/D6911

llvm-svn: 225660
2015-01-12 19:29:48 +00:00
Hal Finkel 611b127ad8 [PowerPC] Readjust the loop unrolling threshold
Now that the way that the partial unrolling threshold for small loops is used
to compute the unrolling factor as been corrected, a slightly smaller threshold
is preferable. This is expected; other targets may need to re-tune as well.

llvm-svn: 225566
2015-01-10 00:31:10 +00:00
Hal Finkel 38dd590861 [LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.

llvm-svn: 225565
2015-01-10 00:30:55 +00:00
Hans Wennborg dcc6e5bc03 SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210)
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.

llvm-svn: 225552
2015-01-09 22:13:31 +00:00
Tim Northover eb16112e97 Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.

Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.

llvm-svn: 225536
2015-01-09 19:19:56 +00:00
Hal Finkel b359b735d6 [PowerPC] Enable late partial unrolling on the POWER7
The P7 benefits from not have really-small loops so that we either have
multiple dispatch groups in the loop and/or the ability to form more-full
dispatch groups during scheduling. Setting the partial unrolling threshold to
44 seems good, empirically, for the P7. Compared to using no late partial
unrolling, this yields the following test-suite speedups:

SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding
	-66.3253% +/- 24.1975%
SingleSource/Benchmarks/Misc-C++/oopack_v1p8
	-44.0169% +/- 29.4881%
SingleSource/Benchmarks/Misc/pi
	-27.8351% +/- 12.2712%
SingleSource/Benchmarks/Stanford/Bubblesort
	-30.9898% +/- 22.4647%

I've speculatively added a similar setting for the P8. Also, I've noticed that
the unroller does not quite calculate the unrolling factor correctly for really
tiny loops because it neglects to account for the fact that not every loop body
replicant contains an ending branch and counter increment. I'll fix that later.

llvm-svn: 225522
2015-01-09 15:51:16 +00:00
Duncan P. N. Exon Smith 090a19bd3c IR: Add 'distinct' MDNodes to bitcode and assembly
Propagate whether `MDNode`s are 'distinct' through the other types of IR
(assembly and bitcode).  This adds the `distinct` keyword to assembly.

Currently, no one actually calls `MDNode::getDistinct()`, so these nodes
only get created for:

  - self-references, which are never uniqued, and
  - nodes whose operands are replaced that hit a uniquing collision.

The concept of distinct nodes is still not quite first-class, since
distinct-ness doesn't yet survive across `MapMetadata()`.

Part of PR22111.

llvm-svn: 225474
2015-01-08 22:38:29 +00:00
Matt Arsenault b935d9df4c Fix fcmp + fabs instcombines when using the intrinsic
This was only handling the libcall. This is another example
of why only the intrinsic should ever be used when it exists.

llvm-svn: 225465
2015-01-08 20:09:34 +00:00
Matt Arsenault 2458393104 Fix using wrong intrinsic in test
This is a leftover from renaming the intrinsic.
It's surprising the unknown llvm. intrinsic wasn't rejected.

llvm-svn: 225304
2015-01-06 23:00:33 +00:00
Rafael Espindola 83a362cde8 Change the .ll syntax for comdats and add a syntactic sugar.
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.

Just dropping the $name causes problems for

@foo = globabl i32 0, comdat
$bar = comdat ...

and

declare void @foo() comdat
$bar = comdat ...

So the syntax is changed to

@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat

and

declare void @foo() comdat($c1)
declare void @foo() comdat

llvm-svn: 225302
2015-01-06 22:55:16 +00:00
Sanjoy Das 7c0ce26614 This patch teaches IndVarSimplify to add nuw and nsw to certain kinds
of operations that provably don't overflow. For example, we can prove
%civ.inc below does not sign-overflow. With this change,
IndVarSimplify changes %civ.inc to an add nsw.

  define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
   entry:
    %length = load i32* %length_ptr, !range !0
    %len.sub.1 = sub i32 %length, 1
    %upper = icmp slt i32 %init, %len.sub.1
    br i1 %upper, label %loop, label %exit
  
   loop:
    %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
    %civ.inc = add i32 %civ, 1
    %cmp = icmp slt i32 %civ.inc, %length
    br i1 %cmp, label %latch, label %break
  
   latch:
    store i32 0, i32* %array
    %check = icmp slt i32 %civ.inc, %len.sub.1
    br i1 %check, label %loop, label %break
  
   break:
    ret i32 %civ.inc
  
   exit:
    ret i32 42
  }

Differential Revision: http://reviews.llvm.org/D6748

llvm-svn: 225282
2015-01-06 19:02:56 +00:00
Matt Arsenault 55e7312cd8 Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.

This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.

Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.

Also fold cases that aren't integers to true / false.

llvm-svn: 225265
2015-01-06 15:50:59 +00:00
David Majnemer 9b6b822814 InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
Michael Kuperstein 6ae456b0d7 Fix broken test from r225159.
llvm-svn: 225164
2015-01-05 12:34:01 +00:00
Jiangning Liu 40c1b35292 Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm.
{code}
// loop body
   ... = a[i]          (1)
    ... = a[i+1]       (2)
 .......
a[i+1] = ....          (3)
   a[i] = ...          (4)
{code}

The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed.

For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized.

The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly.

llvm-svn: 225159
2015-01-05 10:08:58 +00:00
Chandler Carruth 73b0164fe5 [SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an
assert out of the new pre-splitting in SROA.

This fix makes the code do what was originally intended -- when we have
a store of a load both dealing in the same alloca, we force them to both
be pre-split with identical offsets. This is really quite hard to do
because we can keep discovering problems as we go along. We have to
track every load over the current alloca which for any resaon becomes
invalid for pre-splitting, and go back to remove all stores of those
loads. I've included a couple of test cases derived from PR22093 that
cover the different ways this can happen. While that PR only really
triggered the first of these two, its the same fundamental issue.

The other challenge here is documented in a FIXME now. We end up being
quite a bit more aggressive for pre-splitting when loads and stores
don't refer to the same alloca. This aggressiveness comes at the cost of
introducing potentially redundant loads. It isn't clear that this is the
right balance. It might be considerably better to require that we only
do pre-splitting when we can presplit every load and store involved in
the entire operation. That would give more consistent if conservative
results. Unfortunately, it requires a non-trivial change to the actual
pre-splitting operation in order to correctly handle cases where we end
up pre-splitting stores out-of-order. And it isn't 100% clear that this
is the right direction, although I'm starting to suspect that it is.

llvm-svn: 225149
2015-01-05 04:17:53 +00:00
David Majnemer 087dc8b831 InstCombine: match can find ConstantExprs, don't assume we have a Value
We assumed the output of a match was a Value, this would cause us to
assert because we would fail a cast<>.  Instead, use a helper in the
Operator family to hide the distinction between Value and Constant.

This fixes PR22087.

llvm-svn: 225127
2015-01-04 07:36:02 +00:00
David Majnemer 6ee8d17bc6 ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes
PHI nodes can have zero operands in the middle of a transform.  It is
expected that utilities in Analysis don't freak out when this happens.

Note that it is considered invalid to allow these misshapen phi nodes to
make it to another pass.

This fixes PR22086.

llvm-svn: 225126
2015-01-04 07:06:53 +00:00
David Majnemer c8a576b5c0 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
Chandler Carruth 24ac830d7c [SROA] Teach SROA to be more aggressive in splitting now that we have
a pre-splitting pass over loads and stores.

Historically, splitting could cause enough problems that I hamstrung the
entire process with a requirement that splittable integer loads and
stores must cover the entire alloca. All smaller loads and stores were
unsplittable to prevent chaos from ensuing. With the new pre-splitting
logic that does load/store pair splitting I introduced in r225061, we
can now very nicely handle arbitrarily splittable loads and stores. In
order to fully benefit from these smarts, we need to mark all of the
integer loads and stores as splittable.

However, we don't actually want to rewrite partitions with all integer
loads and stores marked as splittable. This will fail to extract scalar
integers from aggregates, which is kind of the point of SROA. =] In
order to resolve this, what we really want to do is only do
pre-splitting on the alloca slices with integer loads and stores fully
splittable. This allows us to uncover all non-integer uses of the alloca
that would benefit from a split in an integer load or store (and where
introducing the split is safe because it is just memory transfer from
a load to a store). Once done, we make all the non-whole-alloca integer
loads and stores unsplittable just as they have historically been,
repartition and rewrite.

The result is that when there are integer loads and stores anywhere
within an alloca (such as from a memcpy of a sub-object of a larger
object), we can split them up if there are non-integer components to the
aggregate hiding beneath. I've added the challenging test cases to
demonstrate how this is able to promote to scalars even a case where we
have even *partially* overlapping loads and stores.

This restores the single-store behavior for small arrays of i8s which is
really nice. I've restored both the little endian testing and big endian
testing for these exactly as they were prior to r225061. It also forced
me to be more aggressive in an alignment test to actually defeat SROA.
=] Without the added volatiles there, we actually split up the weird i16
loads and produce nice double allocas with better alignment.

This also uncovered a number of bugs where we failed to handle
splittable load and store slices which didn't have a begininng offset of
zero. Those fixes are included, and without them the existing test cases
explode in glorious fireworks. =]

I've kept support for leaving whole-alloca integer loads and stores as
splittable even for the purpose of rewriting, but I think that's likely
no longer needed. With the new pre-splitting, we might be able to remove
all the splitting support for loads and stores from the rewriter. Not
doing that in this patch to try to isolate any performance regressions
that causes in an easy to find and revert chunk.

llvm-svn: 225074
2015-01-02 03:55:54 +00:00
Chandler Carruth e65ae89327 [SROA] Add a test case for r225068 / PR22080.
llvm-svn: 225070
2015-01-02 00:34:29 +00:00
Chandler Carruth 0715cba02d [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

llvm-svn: 225061
2015-01-01 11:54:38 +00:00
Sanjay Patel e68f71574f InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
David Majnemer f89dc3edc9 InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

llvm-svn: 225034
2014-12-31 04:21:41 +00:00
Philip Reames 9db26ffc9a Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames b35f46ce06 Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725

llvm-svn: 224965
2014-12-29 23:00:57 +00:00
Philip Reames 5ad26c353c Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
David Majnemer b1296ec0fd InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer 54c2ca2539 InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
Michael Kuperstein be8032c875 [ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits()
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. 

This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.

Differential Revision: http://reviews.llvm.org/D6758

llvm-svn: 224765
2014-12-23 11:33:41 +00:00
Michael Liao 5313da3263 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Bruno Cardoso Lopes bad65c3b70 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
David Majnemer 6eed0e0d20 This should have been part of r224676.
llvm-svn: 224677
2014-12-20 04:48:34 +00:00
David Majnemer b0362e4ee6 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
David Majnemer 0b6a0b0257 InstSimplify: Optimize away pointless comparisons
(X & INT_MIN) ? X & INT_MAX : X  into  X & INT_MAX
(X & INT_MIN) ? X : X & INT_MAX  into  X
(X & INT_MIN) ? X | INT_MIN : X  into  X
(X & INT_MIN) ? X : X | INT_MIN  into  X | INT_MIN

llvm-svn: 224669
2014-12-20 03:04:38 +00:00
Bruno Cardoso Lopes f6cf8ad4e5 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes 3be15b2fa6 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes c9005f2f2b [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
David Majnemer 824e011ad7 ConstantFold: Shifting undef by zero results in undef
llvm-svn: 224553
2014-12-18 23:54:43 +00:00
Suyog Sarda 43fae93da8 Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.

llvm-svn: 224424
2014-12-17 10:34:27 +00:00
Elena Demikhovsky 028e966a54 Added 5 more tests related to sink store revision 224247
- by Ella Bolshinsky

http://reviews.llvm.org/D6420

llvm-svn: 224418
2014-12-17 08:12:59 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
David Majnemer 65c52ae8ca InstSimplify: shl nsw/nuw undef, %V -> undef
We can always choose an value for undef which might cause %V to shift
out an important bit except for one case, when %V is zero.

However, shl behaves like an identity function when the right hand side
is zero.

llvm-svn: 224405
2014-12-17 01:54:33 +00:00
Elena Demikhovsky f5b72afff4 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527

llvm-svn: 224334
2014-12-16 11:50:42 +00:00
Sanjoy Das 4555b6d870 Teach ScalarEvolution to exploit min and max expressions when proving
isKnownPredicate.

The motivation for this change is to optimize away checks in loops
like this:

    limit = min(t, len)
    for (i = 0 to limit)
      if (i >= len || i < 0) throw_array_of_of_bounds();
      a[i] = ...

Differential Revision: http://reviews.llvm.org/D6635

llvm-svn: 224285
2014-12-15 22:50:15 +00:00
Duncan P. N. Exon Smith be7ea19b58 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Elena Demikhovsky bf74736290 Added a test related to 224247 revision
llvm-svn: 224248
2014-12-15 14:14:10 +00:00
Suyog Sarda 2b27fc78a2 Typo Correction in Test Case. NFC.
llvm-svn: 224244
2014-12-15 12:19:46 +00:00
Ahmed Bougacha 0cb861634b Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores."
r223862 tried to also combine base-updating load/stores.
r224198 reverted it, as "it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown."
Reapply, with a fix to ignore non-normal load/stores.
Truncstores are handled elsewhere (you can actually write a pattern for
those, whereas for postinc loads you can't, since they return two values),
but it should be possible to also combine extloads base updates, by checking
that the memory (rather than result) type is of the same size as the addend.

Original commit message:
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 224203
2014-12-13 23:22:12 +00:00
Renato Golin df8f9b6dc9 Revert "[ARM] Combine base-updating/post-incrementing vector load/stores."
This reverts commit r223862, as it created a regression on the test-suite
on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order
in which the words are shown. We'll investigate the issue and re-apply
when safe.

llvm-svn: 224198
2014-12-13 20:23:18 +00:00
David Majnemer 9b6097586c ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume
Respect the MaxDepth recursion limit, doing otherwise will trigger an
assert in computeKnownBits.

This fixes PR21891.

llvm-svn: 224168
2014-12-12 23:59:29 +00:00
Suyog Sarda 384095e65c This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it. 
 
 Test case :
 
       float hadd(float* a) {
           return (a[0] + a[1]) + (a[2] + a[3]);
        }
 
 
 AArch64 assembly before patch :
 
        ldp	s0, s1, [x0]
 	ldp	s2, s3, [x0, #8]
 	fadd	s0, s0, s1
 	fadd	s1, s2, s3
 	fadd	s0, s0, s1
 	ret
 
 AArch64 assembly after patch :
 
        ldp	d0, d1, [x0]
 	fadd	v0.2s, v0.2s, v1.2s
 	faddp	s0, v0.2s
 	ret

Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html

llvm-svn: 224119
2014-12-12 12:53:44 +00:00
Steven Wu 881916dea5 Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

llvm-svn: 224097
2014-12-12 04:34:07 +00:00
Andrea Di Biagio 72b05aa59c [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
David Majnemer f532fcb889 InstSimplify: Remove usesless %a parameter from tests
No functional change intended.

llvm-svn: 224016
2014-12-11 12:56:17 +00:00
Michael Kuperstein fffb6996c9 The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value.
Patch by Amjad Aboud

Differential Revision: http://reviews.llvm.org/D6525

llvm-svn: 224015
2014-12-11 12:41:10 +00:00
David Majnemer 5a7717e498 ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0
Zero is usually a nicer constant to have than -1.

llvm-svn: 223969
2014-12-10 21:58:15 +00:00
David Majnemer 89cf6d79eb ConstantFold: an undef shift amount results in undef
X shifted by undef results in undef because the undef value can
represent values greater than the width of the operands.

llvm-svn: 223968
2014-12-10 21:38:05 +00:00
David Majnemer 7b86b77248 ConstantFold: div undef, 0 should fold to undef, not zero
Dividing by zero yields an undefined value.

llvm-svn: 223924
2014-12-10 09:14:55 +00:00
David Majnemer ae707582c0 InstSimplify: [al]shr exact undef, %X -> undef
Exact shifts always keep the non-zero bits of their input.  This means
it keeps it's undef bits.

llvm-svn: 223923
2014-12-10 09:14:52 +00:00
David Majnemer 71dc8fb867 InstSimplify: div %X, 0 -> undef
We already optimized rem %X, 0 to undef, we should do the same for div.

llvm-svn: 223919
2014-12-10 07:52:18 +00:00
Ahmed Bougacha 7efbac74ec [ARM] Combine base-updating/post-incrementing vector load/stores.
We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD
when the base pointer is incremented after the load/store.

We can do the same thing for generic load/stores.

Note that we can only combine the first load/store+adds pair in
a sequence (as might be generated for a v16f32 load for instance),
because other combines turn the base pointer addition chain (each
computing the address of the next load, from the address of the last
load) into independent additions (common base pointer + this load's
offset).

Differential Revision: http://reviews.llvm.org/D6585

llvm-svn: 223862
2014-12-10 00:07:37 +00:00
Chandler Carruth a7f247ea56 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

llvm-svn: 223813
2014-12-09 19:21:16 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Sonam Kumari 72ccc3c428 Removal Of Duplicate Test Cases and Addition Of Missing Check Statements
llvm-svn: 223768
2014-12-09 10:46:38 +00:00
Ankur Garg 51eeba70da [test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC.
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.

test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.

(Removed test57 and test58)

llvm-svn: 223767
2014-12-09 10:35:19 +00:00
Chandler Carruth 7415205113 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

llvm-svn: 223764
2014-12-09 08:55:32 +00:00
David Majnemer d5b3aa49ac InstSimplify: Try to bring back the rest of r223583
This reverts r223624 with a small tweak, hopefully this will make stage3
equivalent.

llvm-svn: 223679
2014-12-08 18:30:43 +00:00
Sonam Kumari 90d266c0a9 Removal Of Duplicate Test Case from shift.ll file
llvm-svn: 223648
2014-12-08 09:40:43 +00:00
NAKAMURA Takumi 2b6e662672 Revert a part of r223583, for now. It seems causing different emission between stage2(gcc-clang) and stage3 clang. Investigating.
llvm-svn: 223624
2014-12-08 02:07:22 +00:00
David Majnemer 1af36e5baf InstSimplify: Optimize away useless unsigned comparisons
Code like X < Y && Y == 0 should always be folded away to false.

llvm-svn: 223583
2014-12-06 10:51:40 +00:00
Duncan P. N. Exon Smith da41af9e94 IR: Disallow complicated function-local metadata
Disallow complex types of function-local metadata.  The only valid
function-local metadata is an `MDNode` whose sole argument is a
non-metadata function-local value.

Part of PR21532.

llvm-svn: 223564
2014-12-06 01:26:49 +00:00
Hans Wennborg 67ead59108 Add some tests for SimplifyCFG's TurnSwitchRangeIntoICmp(). NFC.
llvm-svn: 223396
2014-12-04 22:19:28 +00:00
Hans Wennborg 1096539335 Add some tests for SimplifyCFG's ConstantFoldTerminator(). NFC.
llvm-svn: 223395
2014-12-04 22:19:25 +00:00
Philip Reames 5b3ce71b62 Add a test case for argument type coercion in an invoke of a vararg function
This would have caught the bug I fixed in 223370.  

llvm-svn: 223378
2014-12-04 19:13:45 +00:00
Hal Finkel aa19bafc9c Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots."
Reapply r223347, with a fix to not crash on uninserted instructions (or more
precisely, instructions in uninserted blocks). bugpoint was able to reduce the
test case somewhat, but it is still somewhat large (and relies on setting
things up to be simplified during inlining), so I've not included it here.
Nevertheless, it is clear what is going on and why.

Original commit message:

Restrict somewhat the memory-allocation pointer cmp opt from r223093

Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.

llvm-svn: 223371
2014-12-04 17:45:19 +00:00
Alexander Potapenko 76770e4930 Revert r223347 which has caused crashes on bootstrap bots.
llvm-svn: 223364
2014-12-04 14:22:27 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Hal Finkel 8b24b32c44 Restrict somewhat the memory-allocation pointer cmp opt from r223093
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.

llvm-svn: 223347
2014-12-04 09:22:28 +00:00
Matthias Braun d34e4d2354 [SimplifyLibCalls] Improve double->float shrinking to consider constants
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);

rdar://19049359

Differential Revision: http://reviews.llvm.org/D6496

llvm-svn: 223270
2014-12-03 21:46:33 +00:00
Matthias Braun 892c923c46 [SimplifyLibCalls] Enable double to float shrinking for copysign
rdar://19049359

Differential Revision: http://reviews.llvm.org/D6495

llvm-svn: 223269
2014-12-03 21:46:29 +00:00
Nick Lewycky 8ede1ef09b Fix test to use the right metadata node (reapply r223239 plus a fix) and also to use the correct path to the GCNO file.
llvm-svn: 223244
2014-12-03 17:32:44 +00:00
Alexander Potapenko ff5326f0d1 Revert r223239, which broke some bots.
llvm-svn: 223240
2014-12-03 16:03:08 +00:00
Alexander Potapenko 1b42ad9b43 Fix the metadata number used by llvm.gcov to match the number of the inserted metadata node.
llvm-svn: 223239
2014-12-03 15:15:58 +00:00
Erik Eckstein d181752be0 InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n

llvm-svn: 223224
2014-12-03 10:39:15 +00:00
Tom Stellard 1f0dded057 StructurizeCFG: Use LoopInfo analysis for better loop detection
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case.  We need to use LoopInfo to
correctly determine which back-edges are loops.

llvm-svn: 223199
2014-12-03 04:28:32 +00:00
Nick Lewycky 2e8a6219fc Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that.
llvm-svn: 223193
2014-12-03 02:45:01 +00:00
Michael Zolotukhin ea8327b80f PR21302. Vectorize only bottom-tested loops.
rdar://problem/18886083

llvm-svn: 223171
2014-12-02 22:59:06 +00:00
Michael Zolotukhin 540580ca06 Apply loop-rotate to several vectorizer tests.
Such loops shouldn't be vectorized due to the loops form.
After applying loop-rotate (+simplifycfg) the tests again start to check
what they are intended to check.

llvm-svn: 223170
2014-12-02 22:59:02 +00:00
Bruno Cardoso Lopes 15520db9ad [SwitchLowering] Handle destinations on multiple phi instructions
Follow up from r222926. Also handle multiple destinations from merged
cases on multiple and subsequent phi instructions.

rdar://problem/19106978

llvm-svn: 223135
2014-12-02 18:31:53 +00:00
Bruno Cardoso Lopes d035fbb96f [LICM] Avoind store sinking if no preheader is available
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.

llvm-svn: 223119
2014-12-02 14:22:34 +00:00
Sonam Kumari f2eacabd66 [signext.ll] Removal Of Duplicate Test Cases
Removed the duplicate test case existing in signext.ll file.

llvm-svn: 223109
2014-12-02 05:29:47 +00:00
Hal Finkel afcd8dbbcf Simplify pointer comparisons involving memory allocation functions
System memory allocation functions, which are identified at the IR level by the
noalias attribute on the return value, must return a pointer into a memory region
disjoint from any other memory accessible to the caller. We can use this
property to simplify pointer comparisons between allocated memory and local
stack addresses and the addresses of global variables. Neither the stack nor
global variables can overlap with the region used by the memory allocator.

Fixes PR21556.

llvm-svn: 223093
2014-12-01 23:38:06 +00:00
Reid Kleckner 35fc363ce8 Parse 'ghccc' in .ll files as the GHC convention (cc 10)
Previously we just used "cc 10" in the .ll files, but that isn't very
human readable.

llvm-svn: 223076
2014-12-01 21:04:44 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 269ebb612e SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
They would get optimized away later, but we might as well not emit them.

llvm-svn: 223051
2014-12-01 17:08:38 +00:00
Hans Wennborg 5a1e5c05d8 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.

For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.

On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).

llvm-svn: 223050
2014-12-01 17:08:35 +00:00
Sonam Kumari 237cfa9916 Removed extra whitespace. (Testing commit access). NFC.
llvm-svn: 222994
2014-12-01 09:27:46 +00:00
Rafael Espindola a4b2ee4548 Relax an assert a bit to avoid a crash on unreachable code.
Patch by Duncan Exon Smith with a small tweak by me.

llvm-svn: 222984
2014-12-01 02:55:24 +00:00
Duncan P. N. Exon Smith 910f05d181 DebugIR: Delete -debug-ir
llvm-svn: 222945
2014-11-29 03:15:47 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
David Majnemer 3d6f80b619 InstCombine: FoldOrOfICmps harder
We may be in a situation where the icmps might not be near each other in
a tree of or instructions.  Try to dig out related compare instructions
and see if they combine.

N.B.  This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR.  We may have to resort
to something more sophisticated if this is a real problem.

llvm-svn: 222928
2014-11-28 19:58:29 +00:00
Bruno Cardoso Lopes 46d5bf2982 [LICM] Store sink and indirectbr instructions
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.

Differential Revision: http://reviews.llvm.org/D6414

rdar://problem/18943047

llvm-svn: 222927
2014-11-28 19:47:46 +00:00
Bruno Cardoso Lopes bc7ba2c766 [SwitchLowering] Handle multiple destinations on condensed case stmts
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.

Differential Revision: http://reviews.llvm.org/D6415

rdar://problem/19040894

llvm-svn: 222926
2014-11-28 19:47:33 +00:00
Erik Eckstein 0d86c7623f reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
Fixed missing dominance check.
Original commit message:

This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
   if (idx < tablesize)
      r = table[idx]; // table does not contain default_value
   else
      r = default_value;
   if (r != default_value)
      ...
Is optimized to:
   cond = idx < tablesize;
   if (cond)
      r = table[idx];
   else
      r = default_value;
   if (cond)
      ...
Jump threading will then eliminate the second if(cond).

llvm-svn: 222891
2014-11-27 15:13:14 +00:00
Suyog Sarda f8516e1662 Use FileCheck instead of grep. Change by Ankur Garg.
Differential Revision: http://reviews.llvm.org/D6430

llvm-svn: 222879
2014-11-27 11:22:49 +00:00
Erik Eckstein 2190cd9ffa Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible."
It is breaking the clang bootstrag.

llvm-svn: 222877
2014-11-27 10:59:08 +00:00
Suyog Sarda c3024c75e0 Use FileCheck instead of grep. Change by Sonam.
Differential Revision: http://reviews.llvm.org/D6432

llvm-svn: 222876
2014-11-27 10:57:24 +00:00
Erik Eckstein e73e308ab9 Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
    if (idx < tablesize)
       r = table[idx]; // table does not contain default_value
    else
       r = default_value;
    if (r != default_value)
       ...
Is optimized to:
    cond = idx < tablesize;
    if (cond)
       r = table[idx];
    else
       r = default_value;
    if (cond)
       ...
\endcode
Jump threading will then eliminate the second if(cond).

llvm-svn: 222872
2014-11-27 08:33:51 +00:00
David Majnemer 40157d5c4d InstCombine: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) == 0 ? X ^ C : X  into  X | C
(X & C) != 0 ? X ^ C : X  into  X & ~C

llvm-svn: 222871
2014-11-27 07:25:21 +00:00
David Majnemer c6a5e1dd4f InstSimplify: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) ? X & ~C : X  into  X & ~C
(X & C) ? X : X & ~C  into  X
(X & C) ? X | C : X  into  X
(X & C) ? X : X | C  into  X | C

llvm-svn: 222868
2014-11-27 06:32:46 +00:00
David Majnemer 5468e86469 Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.

A test has been added to ensure that we don't regress again.

I'll work on a rewrite of what the optimization was trying to do later.

llvm-svn: 222856
2014-11-26 23:00:38 +00:00
Hans Wennborg bda193edff Remove useless rdar:// comment from switch_to_lookup_table.ll test.
llvm-svn: 222772
2014-11-25 18:45:23 +00:00
Hans Wennborg 45172aceb3 LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue()
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.

The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.

Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.

I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.

Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.

Differential Revision: http://reviews.llvm.org/D6397

llvm-svn: 222768
2014-11-25 17:23:05 +00:00
Chandler Carruth 816d26fe5e [InstCombine] Change LLVM To canonicalize toward the value type being
stored rather than the pointer type.

This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.

With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.

I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.

llvm-svn: 222748
2014-11-25 10:09:51 +00:00
Suyog Sarda 99c9c1f2b0 Change the test case file to use FileCheck instead of grep. NFC.
Change by Ankur Garg.

Differential Revision: http://reviews.llvm.org/D6382

llvm-svn: 222740
2014-11-25 08:44:56 +00:00
Chandler Carruth 1a3c2c414c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
David Majnemer bd9ce4ea51 InstSimplify: Handle some simple tautological comparisons
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.

llvm-svn: 222716
2014-11-25 02:55:48 +00:00
Matt Arsenault 238ff1ad1e Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons
llvm-svn: 222705
2014-11-24 23:15:18 +00:00
Matt Arsenault ea515d33c9 Convert test to FileCheck and use CHECK-LABEL
llvm-svn: 222704
2014-11-24 23:03:17 +00:00
David Majnemer 8e6f6a98b5 InstCombine: Don't create an unused instruction
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.

This fixes PR21653.

llvm-svn: 222659
2014-11-24 16:41:13 +00:00
David Majnemer b2a6e7458d InstCombine: Don't assume DataLayout is always available
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout.  This resulted in opt crashing.

This fixes PR21651.

llvm-svn: 222645
2014-11-24 07:26:20 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
David Majnemer fb3805576b InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2)
llvm-svn: 222625
2014-11-22 20:00:41 +00:00
David Majnemer ec6e481bc5 InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y)
llvm-svn: 222624
2014-11-22 20:00:38 +00:00
David Majnemer fa4699e65f InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C)
llvm-svn: 222623
2014-11-22 20:00:34 +00:00
David Majnemer a3aeb15613 InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2)
llvm-svn: 222620
2014-11-22 18:16:54 +00:00
David Majnemer 546f81064c InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y
llvm-svn: 222613
2014-11-22 08:57:02 +00:00
David Majnemer 8279a7506d InstCombine: Propagate NSW for -X * -Y -> X * Y
llvm-svn: 222612
2014-11-22 07:25:19 +00:00
David Majnemer 4efa9ff8ca InstSimplify: Simplify (sub 0, X) -> X if it's NUW
This is a generalization of the X - (0 - Y) -> X transform.

llvm-svn: 222611
2014-11-22 07:15:16 +00:00
David Majnemer 80c8f627db InstCombine: Preserve nsw when folding X*(2^C) -> X << C
llvm-svn: 222606
2014-11-22 04:52:55 +00:00
David Majnemer fd4a6d2b7a InstCombine: Preserve nsw/nuw for ((X << C2)*C1) -> (X * (C1 << C2))
llvm-svn: 222605
2014-11-22 04:52:52 +00:00
David Majnemer 027bc80928 InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V)
llvm-svn: 222604
2014-11-22 04:52:38 +00:00
Gerolf Hoflehner ec6217c929 [InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence)
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).

llvm-svn: 222590
2014-11-21 23:36:44 +00:00
David Majnemer c0a313b57c SROA: The alloca type isn't a candidate promotion type for vectors
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.

This manifested as an assertion failure when we compared the various
types: we had a size mismatch.

This fixes PR21480.

llvm-svn: 222499
2014-11-21 02:34:55 +00:00
Michael Zolotukhin 0dcae71449 Fix a trip-count overflow issue in LoopUnroll.
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.

However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.

llvm-svn: 222451
2014-11-20 20:19:55 +00:00
Chad Rosier 90a2f9b110 Revert "[Reassociate] As the expression tree is rewritten make sure the operands are"
This reverts commit r222142.  This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.

Conflicts:

	test/Transforms/Reassociate/optional-flags.ll

llvm-svn: 222398
2014-11-19 23:21:20 +00:00
Suyog Sarda aba97f4aba Vectorize a reduction chain feeding into a 'return' statement.
e.x 
return (a[0]+b[0]) + (a[1]+b[1])

Differential Revision: http://reviews.llvm.org/D6227

llvm-svn: 222364
2014-11-19 16:07:38 +00:00
Arnaud A. de Grandmaison 7b9dc28060 Fix tail recursion elimination
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.

The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.

This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.

llvm-svn: 222354
2014-11-19 13:32:51 +00:00
David Majnemer b7adf34ee0 AliasSetTracker: UnknownInsts should contribute to the refcount
AliasSetTracker::addUnknown may create an AliasSet devoid of pointers
just to contain an instruction if no suitable AliasSet already exists.
It will then AliasSet::addUnknownInst and we will be done.

However, it's possible for addUnknown to choose an existing AliasSet to
addUnknownInst.
If this were to occur, we are in a bit of a pickle: removing pointers
from the AliasSet can cause the entire AliasSet to become destroyed,
taking our unknown instructions out with them.

Instead, keep track whether or not our AliasSet has any unknown
instructions.

This fixes PR21582.

llvm-svn: 222338
2014-11-19 09:41:05 +00:00
Manman Ren c67109313c Revert r222039 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green. If not, we will re-submit the commit.

llvm-svn: 222287
2014-11-19 00:13:26 +00:00
David Majnemer c6b8e20a5c InstCombine: Fix another infinite loop caused by visitFPTrunc
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.

This fixes the second part of PR21576.

llvm-svn: 222265
2014-11-18 22:06:45 +00:00
David Majnemer b32eaddf11 Revert "Revert r222040 because of bot failure."
This reverts commit r222203, reverting r222040 didn't end up turning the
bot green.

llvm-svn: 222261
2014-11-18 21:30:02 +00:00
Chad Rosier b83c6d9c08 [Reassociate] Use test cases that can actually be optimized to verify optional
flags are cleared.  The reassociation pass was just reordering the leaf nodes
in the previous test cases.

llvm-svn: 222250
2014-11-18 20:34:01 +00:00
Philip Reames 018dbf18c4 Tweak EarlyCSE to recognize series of dead stores
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301

llvm-svn: 222241
2014-11-18 17:46:32 +00:00
David Majnemer 6fdb6b8fd4 InstCombine: Fold away tautological masked compares
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.

While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.

llvm-svn: 222230
2014-11-18 09:31:41 +00:00
David Majnemer 9a91e4a18a IndVarSimplify: Allow LFTR to fire more often
I added a pessimization in r217102 to prevent miscompiles when the
incremented induction variable was used in a comparison; it would be
poison.

Try to use the incremented induction variable more often when we can be
sure that the increment won't end in poison.

Differential Revision: http://reviews.llvm.org/D6222

llvm-svn: 222213
2014-11-18 02:20:58 +00:00
Manman Ren a64bd44fd8 Revert r222040 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green.

llvm-svn: 222203
2014-11-18 00:33:22 +00:00
Juergen Ributzka c9591e9bdb [SimplifyCFG] Make the value type of the hole check bitmask a power-of-2.
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.

The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.

To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.

This fixes rdar://problem/18984639.

llvm-svn: 222168
2014-11-17 19:39:56 +00:00
Chad Rosier bc0b869be9 [Reassociate] As the expression tree is rewritten make sure the operands are
emitted in canonical form.

llvm-svn: 222142
2014-11-17 16:33:50 +00:00
Chad Rosier 9a1ac6e494 [Reassociate] Canonicalize constants to RHS operand.
Fix a thinko where the RHS was already a constant.

llvm-svn: 222139
2014-11-17 15:52:51 +00:00
Erik Eckstein 105374fe5e Optimize switch lookup tables with linear mapping.
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:

int f1(int x) {
  switch (x) {
    case 0: return 10;
    case 1: return 11;
    case 2: return 12;
    case 3: return 13;
  }
  return 0;
}

generates:

define i32 @f1(i32 %x) #0 {
entry:
  %0 = icmp ult i32 %x, 4
  br i1 %0, label %switch.lookup, label %return

switch.lookup:
  %switch.offset = add i32 %x, 10
  ret i32 %switch.offset

return:
  ret i32 0
}

llvm-svn: 222121
2014-11-17 09:13:57 +00:00
Rafael Espindola a3b5b60753 Add back r222061 with a fix.
This adds back r222061, but now calls initializePAEvalPass from the correct
library to avoid link problems.

Original message:

Don't make assumptions about the name of private global variables.

Private variables are can be renamed, so it is not reliable to make
decisions on the name.

The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).

This patch changes one case where we were looking at the variable name to use
the section instead.

Test tuning by Michael Gottesman.

llvm-svn: 222117
2014-11-17 02:28:27 +00:00
Reid Kleckner 007239863e Revert "Don't make assumptions about the name of private global variables."
This reverts commit r222061.

It's causing linker errors.

llvm-svn: 222077
2014-11-15 02:03:53 +00:00
Rafael Espindola 2fc723099f Don't make assumptions about the name of private global variables.
Private variables are can be renamed, so it is not reliable to make
decisions on the name.

The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).

This patch changes one case where we were looking at the variable name to use
the section instead.

Test tuning by Michael Gottesman.

llvm-svn: 222061
2014-11-14 23:17:47 +00:00
David Majnemer 8c3d92e7e5 InstCombine: Fix infinite loop caused by visitFPTrunc
We would attempt to replace a fptrunc of an frem with an identical
fptrunc.  This would cause the new fptrunc to be added to the worklist.
Of course, this results in an infinite loop because we will keep
visiting the newly created fptruncs.

This fixes PR21576.

llvm-svn: 222040
2014-11-14 21:21:15 +00:00
Chad Rosier 1ff4c0bf0b Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll

The failing test is sensitive to the order in which we process loads.  This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.

This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.

llvm-svn: 222039
2014-11-14 21:09:13 +00:00
Chad Rosier df8f2a23cb [Reassociate] Canonicalize the operands of all binary operators.
llvm-svn: 222008
2014-11-14 17:09:19 +00:00
Chad Rosier d99df68e19 [Reassociate] Canonicalize operands of vector binary operators.
Prior to this commit fmul and fadd binary operators were being canonicalized for
both scalar and vector versions.  We now canonicalize add, mul, and, or, and xor
vector instructions.

llvm-svn: 222006
2014-11-14 17:08:15 +00:00
Chad Rosier f8b55f1bc5 [Reassociate] Canonicalize constants to RHS operand.
llvm-svn: 222005
2014-11-14 17:05:59 +00:00
Reid Kleckner 29d880bdc7 Relax the gcov version.ll test to check '.' instead of '\*'
The escaping of the '\*' doesn't work with my combination of testing
tools.

llvm-svn: 221944
2014-11-13 23:07:55 +00:00
Chad Rosier 8716b58583 Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE."
This reverts commit r221924.  It appears the commit was a bit premature and is causing
bot failures that need further investigation.

llvm-svn: 221939
2014-11-13 22:54:59 +00:00
Chad Rosier dd526665fc [GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE.
Phabricator Revision: http://reviews.llvm.org/D6103
Patch by "Balaram Makam" <bmakam@codeaurora.org>!

llvm-svn: 221924
2014-11-13 21:17:58 +00:00
Sanjoy Das c5676df3ec Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b), in a loop predicated by (icmp
ne x, a) its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.

This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

This change was originally landed in r219834 but had a bug and broke
ASan. It was reverted in r219878, and is now being re-landed after
fixing the original bug.

phabricator: http://reviews.llvm.org/D5639
reviewed by: atrick

llvm-svn: 221839
2014-11-13 00:00:58 +00:00
Ahmed Bougacha 0788d49a40 [CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead.
Fixes PR21548.  Related to PR20474.

llvm-svn: 221820
2014-11-12 22:16:55 +00:00
Sanjay Patel 4c219fd248 CGSCC should not treat intrinsic calls like function calls (PR21403)
Make the handling of calls to intrinsics in CGSCC consistent: 
they are not treated like regular function calls because they
are never lowered to function calls.

Without this patch, we can get dangling pointer asserts from
the subsequent loop that processes callsites because it already
ignores intrinsics.

See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion.

Differential Revision: http://reviews.llvm.org/D6124

llvm-svn: 221802
2014-11-12 18:25:47 +00:00
Jingyue Wu 8a12cea5f1 Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test
was run whether NVPTX was enabled or not.

IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive. This test is run only when NVPTX is
enabled.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221799
2014-11-12 18:09:15 +00:00
Jingyue Wu a48273390c Reverts r221772 which fails tests
llvm-svn: 221773
2014-11-12 07:19:25 +00:00
Jingyue Wu 635a9b14fa Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221772
2014-11-12 06:58:45 +00:00
Bill Schmidt 729547847f [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.

llvm-svn: 221767
2014-11-12 04:19:40 +00:00
Chad Rosier f53f07046b [Reassociate] Canonicalize negative constants out of expressions.
Add support for FDiv, which was regressed by the previous commit.

llvm-svn: 221738
2014-11-11 23:36:42 +00:00
Philip Reames 66c6de61ee Canonicalize an assume(load != null) into !nonnull metadata
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.

We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951

llvm-svn: 221737
2014-11-11 23:33:19 +00:00
Chad Rosier 094ac7735b [Reassociate] Canonicalize negative constants out of expressions.
This is a reapplication of r221171, but we only perform the transformation
on expressions which include a multiplication.  We do not transform rem/div
operations as this doesn't appear to be safe in all cases.

llvm-svn: 221721
2014-11-11 22:58:35 +00:00
Suyog Sarda beb064bd94 Addition to r216371 (SLP and Loop Vectorization) and r218607 where
cost model for signed division by power of 2 was improved for AArch64.
The revision r218607 missed test case for Loop Vectorization.
Adding it in this revision.

Differential Revision: http://reviews.llvm.org/D6181

llvm-svn: 221674
2014-11-11 07:39:27 +00:00
Juergen Ributzka d441725d3d [SwitchLowering] Fix the "fixPhis" function.
Switch statements may have more than one incoming edge into the same BB if they
all have the same value. When the switch statement is converted these incoming
edges are now coming from multiple BBs. Updating all incoming values to be from
a single BB is incorrect and would generate invalid LLVM IR.

The fix is to only update the first occurrence of an incoming value. Switch
lowering will perform subsequent calls to this helper function for each incoming
edge with a new basic block - updating all edges in the process.

This fixes rdar://problem/18916275.

llvm-svn: 221627
2014-11-10 21:05:27 +00:00
Chad Rosier b3eb452e83 [Reassociate] Better preserve NSW/NUW flags.
Part of PR12985.

Phabricator Revision: http://reviews.llvm.org/D6172

llvm-svn: 221555
2014-11-07 22:12:57 +00:00
David Majnemer 2098b86f64 SCCP: overdefined calls cannot become constant
We would attempt to fold away a call instruction which had been marked
overdefined.  However, it's not valid to transition to constant from
overdefined.

This fixes PR21512.

llvm-svn: 221513
2014-11-07 08:54:19 +00:00
David Majnemer bf93e7c7d3 LoopVectorize: Don't assume pointees are sized
A pointer's pointee might not be sized: the pointee could be a function.

Report this as IK_NoInduction when calculating isInductionVariable.

This fixes PR21508.

llvm-svn: 221501
2014-11-07 00:31:14 +00:00
David Majnemer c1eca5ad7c InstCombine: Rely on cmpxchg's return code when it's strong
Comparing the result of a cmpxchg instruction can be replaced with an
extractvalue of the cmpxchg success indicator.

llvm-svn: 221498
2014-11-06 23:23:30 +00:00
Chad Rosier ac6a2f532c [Reassociate] Don't reassociate when mixing regular and fast-math FP
instructions.  Inlining might cause such cases and it's not valid to
reassociate floating-point instructions without the unsafe algebra flag.

Patch by Mehdi Amini <mehdi_amini@apple.com>!

llvm-svn: 221462
2014-11-06 16:46:37 +00:00
Justin Bogner 58e41344f9 GCOV: Make sure that function idents in the .gcda and .gcno match
When generating gcov compatible profiling, we sometimes skip emitting
data for functions for one reason or another. However, this was
emitting different function IDs in the .gcno and .gcda files, because
the .gcno case was using the loop index before skipping functions and
the .gcda the array index after. This resulted in completely invalid
gcov data.

This fixes the problem by making the .gcno loop track the ID
separately from the loop index.

llvm-svn: 221441
2014-11-06 06:55:02 +00:00
NAKAMURA Takumi 61b2f48453 llvm/test/Transforms/GCOVProfiling: Avoid to parse backslashes in MDString. Use %/T instead of %T.
LLVM Parser decodes "\bb" as hex in "C:\bb-win7\buildername\build...", with MDString.

See also, http://llvm.org/docs/LangRef.html#metadata-nodes-and-metadata-strings

This reverts r221270, "Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating."

FIXME: Please check EC in GCOVProfiler::emitProfileNotes().
llvm-svn: 221334
2014-11-05 06:29:05 +00:00
David Majnemer bf7550e7ec InstSimplify: Exact shifts of X by Y are X if X has the lsb set
Exact shifts may not shift out any non-zero bits. Use computeKnownBits
to determine when this occurs and just return the left hand side.

This fixes PR21477.

llvm-svn: 221325
2014-11-05 00:59:59 +00:00
David Majnemer f20d7c4c61 Analysis: Make isSafeToSpeculativelyExecute fire less for divides
Divides and remainder operations do not behave like other operations
when they are given poison: they turn into undefined behavior.

It's really hard to know if the operands going into a div are or are not
poison.  Because of this, we should only choose to speculate if there
are constant operands which we can easily reason about.

This fixes PR21412.

llvm-svn: 221318
2014-11-04 23:49:08 +00:00
Reid Kleckner 941e93e9a8 Revert "[Reassociate] Canonicalize negative constants out of expressions."
This reverts commit r221171.

It performs this invalid transformation:
-  %div.i = urem i64 -1, %add
-  %sub.i = sub i64 -2, %div.i
+  %div.i = urem i64 1, %add
+  %sub.i1 = add i64 %div.i, -2

llvm-svn: 221317
2014-11-04 23:42:45 +00:00
NAKAMURA Takumi 14e3fc6ad4 Disable 3 tests in llvm/test/Transforms/GCOVProfiling/ for now. Investigating.
llvm-svn: 221270
2014-11-04 14:41:53 +00:00
NAKAMURA Takumi 06ac98299f Remove "REQUIRES:shell" from tests. They work for me.
llvm-svn: 221269
2014-11-04 13:41:33 +00:00
NAKAMURA Takumi 217ee5bf92 llvm/test/Transforms/GCOVProfiling/linezero.ll: Use %/T instead of %T in regex. This works on win32.
llvm-svn: 221262
2014-11-04 13:00:48 +00:00
David Majnemer d28edfea03 Minimize test case further
No functional change intended.

llvm-svn: 221237
2014-11-04 05:17:58 +00:00
Reid Kleckner dd3f3edafa Revert "Transforms: reapply SVN r219899"
This reverts commit r220811 and r220839. It made an incorrect change to
musttail handling.

llvm-svn: 221226
2014-11-04 02:02:14 +00:00
Hal Finkel 840257a49c Use AA in LoadCombine
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.

This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.

llvm-svn: 221203
2014-11-03 23:19:16 +00:00
David Majnemer 7e2b9882b1 InstCombine: Remove infinite loop caused by FoldOpIntoPhi
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into.  The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.

The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.

This fixes PR21377.

llvm-svn: 221187
2014-11-03 21:55:12 +00:00
Hal Finkel 1e16fa302e EarlyCSE should ignore calls to @llvm.assume
EarlyCSE uses a simple generation scheme for handling memory-based
dependencies, and calls to @llvm.assume (which are marked as writing to memory
to ensure the preservation of control dependencies) disturb that scheme
unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative
(adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that).

Fixes PR21448.

llvm-svn: 221175
2014-11-03 20:21:32 +00:00
Chad Rosier 005505b027 [Reassociate] Canonicalize negative constants out of expressions.
This gives CSE/GVN more options to eliminate duplicate expressions.
This is a follow up patch to http://reviews.llvm.org/D4904.

http://reviews.llvm.org/D5363

llvm-svn: 221171
2014-11-03 19:11:30 +00:00
Paul Robinson ad06e430ce Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
David Majnemer 72a643dc8f InstCombine: Combine (X | Y) - X to (~X & Y)
This implements the transformation from (X | Y) - X to (~X & Y).

Differential Revision: http://reviews.llvm.org/D5791

llvm-svn: 221129
2014-11-03 05:53:55 +00:00
Elena Demikhovsky 27152aea88 Use Alias Analysis to hoist 2 loads from diamond to the common predecessor basic block.
Alias Analysis allows to detect real barriers for load hoisting.

Review in http://reviews.llvm.org/D5991

llvm-svn: 221091
2014-11-02 08:03:05 +00:00
David Majnemer 634ca236dc InstCombine: Don't assume that m_ZExt matches an Instruction
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.

Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.

This fixes PR21445.

llvm-svn: 221069
2014-11-01 23:46:05 +00:00
David Majnemer 549f4f2510 InstCombine: Combine (X+cst) < 0 --> X < -cst
This can happen pretty often in code that looks like:
int foo = bar - 1;
if (foo < 0)
  do stuff

In this case, bar < 1 is an equivalent condition.

This transform requires that the add instruction be annotated with nsw.

llvm-svn: 221045
2014-11-01 09:09:51 +00:00
Michael Zolotukhin 9b9624de0c Correctly update dom-tree after loop vectorizer.
llvm-svn: 221009
2014-10-31 22:28:03 +00:00
Bradley Smith 9992b167ae [SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flags
In a case where we have a no {un,}signed wrap flag on the increment, if
RHS - Start is constant then we can avoid inserting a max operation bewteen
the two, since we can statically determine which is greater.

This allows us to unroll loops such as:

 void testcase3(int v) {
   for (int i=v; i<=v+1; ++i)
     f(i);
 }

llvm-svn: 220960
2014-10-31 11:40:32 +00:00
NAKAMURA Takumi d9913e6d35 llvm/test/Transforms/SampleProfile/syntax.ll: Relax MISSING-FILE not to
check locale-aware message catalog.

llvm-svn: 220934
2014-10-30 22:28:46 +00:00
Philip Reames 4cb4d3e048 Add handling for range metadata in ValueTracking isKnownNonZero
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes.  This helps to remove redundant checks and canonicalize checks for other optimization passes.  This particular patch checks whether a value is known to be non-zero from the range metadata.

Currently, these tests are against InstCombine.  In theory, all of these should be InstSimplify since we're not inserting any new instructions.  Moving the code may follow in a separate change.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947

llvm-svn: 220925
2014-10-30 20:25:19 +00:00
Diego Novillo c572e92c76 Add profile writing capabilities for sampling profiles.
Summary:
This patch finishes up support for handling sampling profiles in both
text and binary formats. The new binary format uses uleb128 encoding to
represent numeric values. This makes profiles files about 25% smaller.

The profile writer class can write profiles in the existing text and the
new binary format. In subsequent patches, I will add the capability to
read (and perhaps write) profiles in the gcov format used by GCC.

Additionally, I will be adding support in llvm-profdata to manipulate
sampling profiles.

There was a bit of refactoring needed to separate some code that was in
the reader files, but is actually common to both the reader and writer.

The new test checks that reading the same profile encoded as text or
raw, produces the same results.

Reviewers: bogner, dexonsmith

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6000

llvm-svn: 220915
2014-10-30 18:00:06 +00:00
NAKAMURA Takumi 27b6d47f36 llvm/test/Transforms/LoopRotate/nosimplifylatch.ll: Fix possibly mis-repeatedly-pasted test.
llvm-svn: 220880
2014-10-29 23:05:12 +00:00
Yi Jiang 323d57336c Test Case for r220872:Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220873
2014-10-29 20:20:33 +00:00
Yi Jiang ab19fff4d8 Do not simplifyLatch for loops where hoisting increments couldresult in extra live range interferance
llvm-svn: 220872
2014-10-29 20:19:47 +00:00
Saleem Abdulrasool 56ade2991b test: tweak inlined-allocs test
Remove pointless checks for storage of uninteresting values.  Ensure that we
perform basic alias analysis to make the test more correct.  Finally, apply a
stylistic change to the test.

llvm-svn: 220839
2014-10-29 06:31:11 +00:00
Saleem Abdulrasool d178ada55e Transforms: reapply SVN r219899
This restores the commit from SVN r219899 with an additional change to ensure
that the CodeGen is correct for the case that was identified as being incorrect
(originally PR7272).

In the case that during inlining we need to synthesize a value on the stack
(i.e. for passing a value byval), then any function involving that alloca must
be stripped of its tailness as the restriction that it does not access the
parent's stack no longer holds.  Unfortunately, a single alloca can cause a
rippling effect through out the inlining as the value may be aliased or may be
mutated through an escaped external call.  As such, we simply track if an alloca
has been introduced in the frame during inlining, and strip any tail calls.

llvm-svn: 220811
2014-10-28 18:27:37 +00:00
David Majnemer c8bdd23acf InstCombine: Fix a combine assuming that icmp operands were integers
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.

This fixes PR21388.

llvm-svn: 220664
2014-10-27 05:47:49 +00:00
Jingyue Wu fe72fcebf6 [SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.

Added a regression test in split-gep.ll

Patched by Hao Liu.

llvm-svn: 220618
2014-10-25 18:34:03 +00:00
Jingyue Wu b723152379 [SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.

Added a regression test in split-gep.ll.

Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks! 

llvm-svn: 220615
2014-10-25 17:36:21 +00:00
Sanjay Patel 848309da7c Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding 
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.

Differential Revision: http://reviews.llvm.org/D5919

llvm-svn: 220514
2014-10-23 21:52:45 +00:00
Justin Bogner 72d1f2b61b test: Make this test runnable in directories with @ in their names
Jenkins likes to use directories with names involving the '@'
character, which breaks the sed expression in this test. Switch to use
'|' on the assumption that it's less likely to show up in a path.

llvm-svn: 220401
2014-10-22 18:18:54 +00:00
Sanjay Patel a92fa44740 Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics 
(via function attribute for now because there is no IR-level FMF on calls), 
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.

We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.

I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.

Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.

This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850

Differential Revision: http://reviews.llvm.org/D5893

llvm-svn: 220390
2014-10-22 15:29:23 +00:00
Diego Novillo a67c0b43e1 Change error to warning when a profile cannot be found.
When the profile for a function cannot be applied, we use to emit an
error. This seems extreme. The compiler can continue, it's just that the
optimization opportunities won't include profile information.

llvm-svn: 220386
2014-10-22 13:36:35 +00:00
Diego Novillo 8027b80b41 Support using sample profiles with partial debug info.
Summary:
When using a profile, we used to require the use -gmlt so that we could
get access to the line locations. This is used to match line numbers in
the input profile to the line numbers in the function's IR.

But this is actually not necessary. The driver can provide source
location tracking without the emission of debug information. In these
cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the
actual line location annotations are still present.

This patch adds a new way of looking for the start of the current
function. Instead of looking through the compile units in llvm.dbg.cu,
we can walk up the scope for the first instruction in the function with
a debug loc. If that describes the function, we use it. Otherwise, we
keep looking until we find one.

If no such instruction is found, we then give up and produce an error.

Reviewers: echristo, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5887

llvm-svn: 220382
2014-10-22 12:59:00 +00:00
Bruno Cardoso Lopes c29520c5b3 [InstSimplify] Support constant folding to vector of pointers
ConstantFolding crashes when trying to InstSimplify the following load:

@a = private unnamed_addr constant %mst {
     i8* inttoptr (i64 -1 to i8*),
     i8* inttoptr (i64 -1 to i8*)
}, align 8

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8

This patch fix this by adding support to this type of folding:

%x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8
==> gets folded to:
  %x = <2 x i8*> <i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*)>

llvm-svn: 220380
2014-10-22 12:18:48 +00:00
Hans Wennborg 0b39fc0d16 Revert "Teach the load analysis to allow finding available values which require" (r220277)
This seems to have caused PR21330.

llvm-svn: 220349
2014-10-21 23:49:52 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
David Majnemer d205602a0b InstCombine: Simplify FoldICmpCstShrCst
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify.  Remove
this extra code and move the tests over to InstSimplify.  Add asserts to
make sure our preconditions hold before we make any assumptions.

llvm-svn: 220314
2014-10-21 19:51:55 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Philip Reames cdb72f369f Introduce a 'nonnull' metadata on Load instructions.
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns.  Long term, it would be nice to combine these into a single construct.   The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.

Reviewed by: Hal Finkel
Differential Revision:  http://reviews.llvm.org/D5220

llvm-svn: 220240
2014-10-20 22:40:55 +00:00
Chandler Carruth a32038b006 Fix a miscompile introduced in r220178.
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.

I've added tests for both the alloca and global case here.

llvm-svn: 220190
2014-10-20 10:03:01 +00:00
Chandler Carruth 6665d62117 Fix a somewhat subtle pair of issues with JumpThreading I introduced in
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.

All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.

llvm-svn: 220186
2014-10-20 05:34:36 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth b5f4c32830 Add a datalayout string to this test so that it exercises the full gamut
of InstCombine rather than just the bits enabled when datalayout is
optional.

The primary fixes here are because now things are little endian.

In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.

llvm-svn: 220176
2014-10-20 00:11:31 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
Chandler Carruth 5b8cd2f73c Move previously dead code to handle computing the known bits of an alias
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.

This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.

llvm-svn: 220164
2014-10-19 09:06:56 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 71009cad95 Remove a test that was ported from the old llvm-gcc frontend test suite.
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.

The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.

Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.

llvm-svn: 220137
2014-10-18 06:36:18 +00:00
Chandler Carruth 2dc9682e59 [SROA] Change how SROA does vector-based promotion of allocas to handle
cases where the alloca type, the load types, and the store types used
all disagree.

Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.

The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.

When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.

With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.

llvm-svn: 220116
2014-10-18 00:44:02 +00:00
Rafael Espindola 7da1ea83a9 Revert "TRE: make TRE a bit more aggressive"
This reverts commit r219899.

This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.

llvm-svn: 220093
2014-10-17 21:25:48 +00:00
Hal Finkel dd38c0b876 [DSE] Remove no-data-layout-only type-based overlap checking
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.

Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.

llvm-svn: 220035
2014-10-17 11:56:00 +00:00
Rafael Espindola 11aaaeebe0 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Bjorn Steinbrink d20816fde9 Allow call-slop optzn for destinations with a suitable dereferenceable attribute
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950
2014-10-16 19:43:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Saleem Abdulrasool 7f52921976 TRE: make TRE a bit more aggressive
Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899
2014-10-16 03:27:30 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Sanjoy Das 360b1ed5f2 Revert "r219834 - Teach ScalarEvolution to sharpen range information"
This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468

llvm-svn: 219878
2014-10-15 23:46:04 +00:00
Hal Finkel 68dc3c7ab2 Preserve non-byval pointer alignment attributes using @llvm.assume when inlining
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876
2014-10-15 23:44:41 +00:00
Sanjoy Das 90c2f1455a Teach ScalarEvolution to sharpen range information.
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b).  Get
ScalarEvolution and hence IndVars to exploit this fact.
    
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.

phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
2014-10-15 19:25:28 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Hal Finkel 3b7fc86677 [SLPVectorize] Basic ephemeral-value awareness
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).

Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).

llvm-svn: 219816
2014-10-15 17:35:01 +00:00
Hal Finkel 1a600faba0 [LoopVectorize] Ignore @llvm.assume for cost estimates and legality
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).

Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).

llvm-svn: 219741
2014-10-14 22:59:49 +00:00
Sanjay Patel 0ca42bb5a8 Optimize away fabs() calls when input is squared (known positive).
Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717
2014-10-14 20:43:11 +00:00
David Majnemer dad2103801 InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.

However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.

Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648

((Pow2 << 0) >>u 31) is non-zero but A is less than B.

This fixes PR21274.

llvm-svn: 219713
2014-10-14 20:28:40 +00:00
Hal Finkel 171c2ec008 Revert "r216914 - Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'"
Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted
because of buildbot failures, and credit goes to Ulrich Weigand for isolating
the underlying issue (which can be confirmed by Valgrind, which does helpfully
light up like the fourth of July). Uli explained the problem with the original
patch as:

  It seems the problem is calling multiplySignificand with an addend of category
  fcZero; that is not expected by this routine.  Note that for fcZero, the
  significand parts are simply uninitialized, but the code in (or rather, called
  from) multiplySignificand will unconditionally access them -- in effect using
  uninitialized contents.

This version avoids using a category == fcZero addend within
multiplySignificand, which avoids this problem (the Valgrind output is also now
clean).

Original commit message:

[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.

When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.

Example:
  define double @test() {
    %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
    ret double %1
  }

Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).

Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.

This fixes PR20832.

llvm-svn: 219708
2014-10-14 19:23:07 +00:00
Hal Finkel a3f23e3725 [LVI] Check for @llvm.assume dominating the edge branch
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.

The test case, and an initial patch, were provided by Philip Reames. Thanks!

llvm-svn: 219688
2014-10-14 16:04:49 +00:00
Marcello Maggioni 5bbe3df63f Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
David Majnemer db0773089f InstCombine: Fix miscompile in X % -Y -> X % Y transform
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number.  This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.

This fixes PR21256.

llvm-svn: 219639
2014-10-13 22:37:51 +00:00
David Majnemer a252138942 InstCombine: Don't miscompile (x lshr C1) udiv C2
We have a transform that changes:
  (x lshr C1) udiv C2
into:
  x udiv (C2 << C1)

However, it is unsafe to do so if C2 << C1 discards any of C2's bits.

This fixes PR21255.

llvm-svn: 219634
2014-10-13 21:48:30 +00:00
Joerg Sonnenberger 5ca10d0edb Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Benjamin Kramer 240b85eec5 InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1)
llvm-svn: 219585
2014-10-12 14:02:34 +00:00
David Majnemer fe7fccff11 InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X
Consider the case where X is 2.  (2 <<s 31)/s-2147483648 is zero but we
would fold to X.  Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.

This fixes PR21245.

llvm-svn: 219568
2014-10-11 10:20:04 +00:00
David Majnemer cb9d596655 InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow
consider:
C1 = INT_MIN
C2 = -1

C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN

This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.

N. B.  Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.

This fixes PR21243.

llvm-svn: 219567
2014-10-11 10:20:01 +00:00
David Majnemer 3cac85e071 InstCombine: mul to shl shouldn't preserve nsw
consider:
mul i32 nsw %x, -2147483648

this instruction will not result in poison if %x is 1

however, if we transform this into:
shl i32 nsw %x, 31

then we will be generating poison because we just shifted into the sign
bit.

This fixes PR21242.

llvm-svn: 219566
2014-10-11 10:19:52 +00:00
Sanjay Patel ad8b666624 Return undef on FP <-> Int conversions that overflow (PR21330).
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."

And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."

This matches the C definitions.

The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130

Returning undef will hopefully lead to a less silent failure.

Differential Revision: http://reviews.llvm.org/D5603

llvm-svn: 219542
2014-10-10 23:00:21 +00:00
Sanjoy Das 1f05c51e5e This patch teaches ScalarEvolution to pick and use !range metadata.
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.

phabricator: http://reviews.llvm.org/D5638

Reviewed by: atrick, hfinkel

llvm-svn: 219532
2014-10-10 21:22:34 +00:00
Mark Heffernan 2beab5f0b4 This patch de-pessimizes the calculation of loop trip counts in
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.

ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.

llvm-svn: 219517
2014-10-10 17:39:11 +00:00
Arnold Schwaighofer d7d010eb2a SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
Chad Rosier bd64d46188 [Reassociate] Don't canonicalize X - undef to X + (-undef).
Phabricator Revision: http://reviews.llvm.org/D5674
PR21205

llvm-svn: 219434
2014-10-09 20:06:29 +00:00
Andrea Di Biagio 458a669f49 [InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.

This fixes PR21222.

Differential Revision: http://reviews.llvm.org/D5700

llvm-svn: 219406
2014-10-09 12:41:49 +00:00
David Majnemer ac07703842 Inliner: Non-local functions in COMDATs shouldn't be dropped
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well.  This sort of thing is already handled by GlobalOpt/GlobalDCE.

This fixes PR21206.

llvm-svn: 219335
2014-10-08 19:32:32 +00:00
Justin Bogner 894eff7a9f Revert "[InstCombine] re-commit r218721 with fix for pr21199"
This seems to cause a miscompile when building clang, which causes a
bootstrapped clang to fail or crash in several of its tests.

See:
  http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184
  http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813

This reverts commit r219282.

llvm-svn: 219317
2014-10-08 16:30:22 +00:00
David Majnemer 1b3b70e371 GlobalOpt: Don't drop unused memberes of a Comdat
A linkonce_odr member of a COMDAT shouldn't be dropped if we need to
keep the entire COMDAT group.

This fixes PR21191.

llvm-svn: 219283
2014-10-08 07:23:31 +00:00
Gerolf Hoflehner e2ff5b9223 [InstCombine] re-commit r218721 with fix for pr21199
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.

llvm-svn: 219282
2014-10-08 06:42:19 +00:00
Hans Wennborg 1256198bbc Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization
This seems to have caused PR21199.

llvm-svn: 219264
2014-10-08 01:05:57 +00:00
Duncan P. N. Exon Smith c46cfcbbc6 LoopUnroll: Create sub-loops in LoopInfo
`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.

Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.

Fixes PR20987.

llvm-svn: 219241
2014-10-07 21:19:00 +00:00
Marcello Maggioni 963bc87dbd Two case switch to select optimization
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223
2014-10-07 18:16:44 +00:00
David Blaikie 17364d4e05 DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function.
After some stellar (& inspired) help from Reid Kleckner providing a test
case for some rather unstable undefined behavior showing up as
assertions produced by r214761, I was able to fix this issue in DAE
involving the application of both varargs removal, followed by normal
argument removal.

Indeed I introduced this same bug into ArgumentPromotion (r212128) by
copying the code from DAE, and when I fixed the bug in ArgPromo
(r213805) and commented in that patch that I didn't need to address the
same issue in DAE because it was a single pass. Turns out it's two pass,
one for the varargs and one for the normal arguments, so the same fix is
needed (at least during varargs removal). So here it is.

(the observable/net effect of this bug, even when it didn't result in
assertion failure, is that debug info would describe the DAE'd function
in the abstract, but wouldn't provide high/low_pc, variable locations,
line table, etc (it would appear as though the function had been
entirely optimized away), see the original PR14016 for details of the
general problem)

I'm not recommitting the assertion just yet, as there's been another
regression of it since I last tried. It might just be a few test cases
weren't adequately updated after Adrian or Duncan's recent schema
changes.

llvm-svn: 219210
2014-10-07 15:10:23 +00:00
Suyog Sarda 181cc9a029 Remove Extra lines. NFC.
llvm-svn: 219201
2014-10-07 11:31:31 +00:00
David Majnemer e025321d36 GlobalDCE: Don't drop any COMDAT members
If we require a single member of a comdat, require all of the other
members as well.

This fixes PR20981.

llvm-svn: 219191
2014-10-07 07:07:19 +00:00
Gerolf Hoflehner c0b4c20e5e [InstCombine] re-commit r218721 icmp-select-icmp optimization
Takes care of the assert that caused build fails.
Rather than asserting the code checks now that the definition
and use are in the same block, and does not attempt
to optimize when that is not the case.

llvm-svn: 219175
2014-10-07 00:16:12 +00:00
Owen Anderson 8373d338f6 Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions.
Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical.

Patch by Dmitri Shtilman.

llvm-svn: 219092
2014-10-05 23:41:26 +00:00
Hal Finkel 04a156139e [InstCombine] Remove redundant @llvm.assume intrinsics
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

llvm-svn: 219067
2014-10-04 21:27:06 +00:00
Richard Smith 1ed4229f6f PR21145: Teach LLVM about C++14 sized deallocation functions.
C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.

llvm-svn: 219014
2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
James Molloy cb7449d058 Revert r215343.
This was contentious and needs invesigation.

llvm-svn: 218971
2014-10-03 09:29:24 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Sanjay Patel 13a657819b Remove unused function attribute params.
llvm-svn: 218909
2014-10-02 21:12:04 +00:00
Sanjay Patel 12d1ce5408 Optimize square root squared (PR21126).
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.

This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.

Differential Revision: http://reviews.llvm.org/D5584

llvm-svn: 218906
2014-10-02 21:10:54 +00:00
Zinovy Nis ccc3e3733b [BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it. 
Thanks to Jörg Sonnenberger for pointing.

Differential Revision: http://reviews.llvm.org/D5576

llvm-svn: 218867
2014-10-02 13:01:15 +00:00
Sanjay Patel 7b2cd9ad86 Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html

And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html

The sqrt of a negative number when using the llvm intrinsic is undefined. 
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.

This change should not affect any code that isn't using "no-nans-fp-math"; 
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.

Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and 
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. 
We knowingly approve of this difference with the other compilers in an attempt to flag code 
that is invoking undefined behavior.

A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093

Differential Revision: http://reviews.llvm.org/D5527

llvm-svn: 218803
2014-10-01 20:36:33 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Evgeniy Stepanov 815f2869ad Revert r218721, r218735.
Failing bootstrap on Linux (arm, x86).

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470
http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518

llvm-svn: 218752
2014-10-01 10:07:28 +00:00
Gerolf Hoflehner 08cc4b950c [InstCombine] Optimize icmp-select-icmp
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
are patterns like
    %x=icmp.eq
    %y=select %x,%r, null
    %z=icmp.eq|neq %y, null
    br %z,true, false
==> %x=icmp.ne
    %y=icmp.eq %r,null
    %z=or %x,%y
    br %z,true,false
The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used and dominance is now a required analysis pass in the combiner.
The optimization itself is iterative. The critical step is to replace the
select result with the non-constant select operand. So the select becomes
local and the combiner iteratively works out simpler code pattern and
eventually eliminates the select.

rdar://17853760

llvm-svn: 218721
2014-10-01 00:13:22 +00:00
Jingyue Wu fc0296704c [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Chad Rosier aab5d7bd33 [IndVarSimplify] Widen loop unsigned compares.
This patch extends r217953 to handle unsigned comparison.
Phabricator revision: http://reviews.llvm.org/D5526

llvm-svn: 218659
2014-09-30 03:17:42 +00:00
Chad Rosier 70d54ac848 [AArch64] Improve cost model to handle sdiv by a pow-of-two.
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.

http://reviews.llvm.org/D5469
PR20714

llvm-svn: 218607
2014-09-29 13:59:31 +00:00
Kevin Qin fc02e3c363 Use a loop to simplify the runtime unrolling prologue.
Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like

    extraiters = tripcount % loopfactor
    if (extraiters == 0) jump Loop:
    if (extraiters == loopfactor) jump L1
    if (extraiters == loopfactor-1) jump L2
    ...
    L1:  LoopBody;
    L2:  LoopBody;
    ...
    if tripcount < loopfactor jump End
    Loop:
    ...
    End:

It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like

        extraiters = tripcount % loopfactor
        if (extraiters == 0) jump Loop:
        else jump Prol
 Prol:  LoopBody;
        extraiters -= 1                 // Omitted if unroll factor is 2.
        if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
        if (tripcount < loopfactor) jump End
 Loop:
 ...
 End:

Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.

llvm-svn: 218604
2014-09-29 11:15:00 +00:00
Chad Rosier 7b974b73ae [IndVar] Don't widen loop compare unless IV user is sign extended.
PR21030

llvm-svn: 218539
2014-09-26 20:05:35 +00:00
David Peixotto 472b05b36c Ignore annotation function calls in cost computation
The annotation instructions are dropped during codegen and have no
impact on size.  In some cases, the annotations were preventing the
unroller from unrolling a loop because the annotation calls were
pushing the cost over the unrolling threshold.

Differential Revision: http://reviews.llvm.org/D5335

llvm-svn: 218525
2014-09-26 17:48:40 +00:00
David Peixotto 0d4d5e64ec Fix assertion in LICM doFinalization()
The doFinalization method checks that the LoopToAliasSetMap is
empty. LICM populates that map as it runs through the loop nest,
deleting the entries for child loops as it goes. However, if a child
loop is deleted by another pass (e.g. unrolling) then the loop will
never be deleted from the map because LICM walks the loop nest to
find entries it can delete.

The fix is to delete the loop from the map and free the alias set
when the loop is deleted from the loop nest.

Differential Revision: http://reviews.llvm.org/D5305

llvm-svn: 218387
2014-09-24 16:48:31 +00:00
Reid Kleckner 78927e884b GlobalOpt: Preserve comdats of unoptimized initializers
Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them.  Only
remove the entries that we know we can optimize away.  This way we don't
need to wire through priority and comdats or anything else we might add.

Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.

llvm-svn: 218337
2014-09-23 22:33:01 +00:00
Chad Rosier 307b50b0f6 [IndVarSimplify] Partially revert r217953 to see if this fixes the bots.
Specifically, disable widening of unsigned compare instructions.

llvm-svn: 217962
2014-09-17 16:35:09 +00:00
Chad Rosier bb99f40530 [IndVarSimplify] Widen loop compare instructions.
This improves other optimizations such as LSR.  A sext may be added to the
compare's other operand, but this can often be hoisted outside of the loop.

llvm-svn: 217953
2014-09-17 14:10:33 +00:00
Andrea Di Biagio 5b92b4971a [InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945).
Example:
define i1 @foo(i32 %a) {
  %shr = ashr i32 -9, %a
  %cmp = icmp ne i32 %shr, -5
  ret i1 %cmp
}

Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.

This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).

llvm-svn: 217950
2014-09-17 11:32:31 +00:00
David Majnemer b435a4214e InstSimplify: Don't allow (x srem y) urem y -> x srem y
Let's consider the case where:
%x i16 = 32768
%y i16 = 384

%x srem %y = 65408
(%x srem %y) urem %y = 128

llvm-svn: 217939
2014-09-17 04:16:35 +00:00
David Majnemer ac717f0972 InstSimplify: ((X % Y) % Y) -> (X % Y)
Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D5350

llvm-svn: 217937
2014-09-17 03:34:34 +00:00
Tilmann Scheller 40fc9595c8 [InstCombine] Remove redundant test case.
Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D5284

llvm-svn: 217865
2014-09-16 08:50:10 +00:00
David Majnemer a315bd80c2 InstSimplify: Simplify trivial and/or of icmps
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.

This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.

This fixes PR20916.

llvm-svn: 217750
2014-09-15 08:15:28 +00:00
Chad Rosier e668f61076 FileCheckize. NFC.
llvm-svn: 217698
2014-09-12 17:55:16 +00:00
Hal Finkel f83e1f7f66 [AlignmentFromAssumptions] Don't crash just because the target is 32-bit
We used to crash processing any relevant @llvm.assume on a 32-bit target
(because we'd ask SE to subtract expressions of differing types). I've copied
our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get
some meaningful test coverage here.

llvm-svn: 217574
2014-09-11 08:40:17 +00:00
Hal Finkel 71b7084112 [AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment
The routine that determines an alignment given some SCEV returns zero if the
answer is unknown. In a case where we could determine the increment of an
AddRec but not the starting alignment, we would compute the integer modulus by
zero (which is illegal and traps). Prevent this by returning early if either
the start or increment alignment is unknown (zero).

llvm-svn: 217544
2014-09-10 21:05:52 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Bjorn Steinbrink 3c33150801 Add a test for hoisting instructions with metadata out of then/else blocks
Test for the bug fixed in r215723.

llvm-svn: 217453
2014-09-09 17:10:21 +00:00
Hal Finkel 93873cc10e Check for all known bits on ret in InstCombine
From a combination of @llvm.assume calls (and perhaps through other means, such
as range metadata), it is possible that all bits of a return value might be
known. Previously, InstCombine did not check for this (which is understandable
given assumptions of constant propagation), but means that we'd miss simple
cases where assumptions are involved.

llvm-svn: 217346
2014-09-07 21:28:34 +00:00
Hal Finkel 7e1844940e Make use of @llvm.assume from LazyValueInfo
This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with
the known-bits change (r217342), this requires feeding a "context" instruction
pointer through many functions. Aside from a little refactoring to reuse the
logic that turns predicates into constant ranges in LVI, the only new code is
that which can 'merge' the range from an assumption into that otherwise
computed. There is also a small addition to JumpThreading so that it can have
LVI use assumptions in the same block as the comparison feeding a conditional
branch.

With this patch, we can now simplify this as expected:
int foo(int a) {
  __builtin_assume(a > 5);
  if (a > 3) {
    bar();
    return 1;
  }
  return 0;
}

llvm-svn: 217345
2014-09-07 20:29:59 +00:00
Hal Finkel d67e463901 Add an AlignmentFromAssumptions Pass
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.

The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could).  Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).

llvm-svn: 217344
2014-09-07 20:05:11 +00:00
Hal Finkel 15aeaaf24a Add additional patterns for @llvm.assume in ValueTracking
This builds on r217342, which added the infrastructure to compute known bits
using assumptions (@llvm.assume calls). That original commit added only a few
patterns (to catch common cases related to determining pointer alignment); this
change adds several other patterns for simple cases.

r217342 contained that, for assume(v & b = a), bits in the mask
that are known to be one, we can propagate known bits from the a to v. It also
had a known-bits transfer for assume(a = b). This patch adds:

assume(~(v & b) = a) : For those bits in the mask that are known to be one, we
                       can propagate inverted known bits from the a to v.

assume(v | b = a) :    For those bits in b that are known to be zero, we can
                       propagate known bits from the a to v.

assume(~(v | b) = a):  For those bits in b that are known to be zero, we can
                       propagate inverted known bits from the a to v.

assume(v ^ b = a) :    For those bits in b that are known to be zero, we can
		       propagate known bits from the a to v. For those bits in
		       b that are known to be one, we can propagate inverted
                       known bits from the a to v.

assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can
		       propagate inverted known bits from the a to v. For those
		       bits in b that are known to be one, we can propagate
                       known bits from the a to v.

assume(v << c = a) :   For those bits in a that are known, we can propagate them
                       to known bits in v shifted to the right by c.

assume(~(v << c) = a) : For those bits in a that are known, we can propagate
                        them inverted to known bits in v shifted to the right by c.

assume(v >> c = a) :   For those bits in a that are known, we can propagate them
                       to known bits in v shifted to the right by c.

assume(~(v >> c) = a) : For those bits in a that are known, we can propagate
                        them inverted to known bits in v shifted to the right by c.

assume(v >=_s c) where c is non-negative: The sign bit of v is zero

assume(v >_s c) where c is at least -1: The sign bit of v is zero

assume(v <=_s c) where c is negative: The sign bit of v is one

assume(v <_s c) where c is non-positive: The sign bit of v is one

assume(v <=_u c): Transfer the known high zero bits

assume(v <_u c): Transfer the known high zero bits (if c is know to be a power
                 of 2, transfer one more)

A small addition to InstCombine was necessary for some of the test cases. The
problem is that when InstCombine was simplifying and, or, etc. it would fail to
check the 'do I know all of the bits' condition before checking less specific
conditions and would not fully constant-fold the result. I'm not sure how to
trigger this aside from using assumptions, so I've just included the change
here.

llvm-svn: 217343
2014-09-07 19:21:07 +00:00
Hal Finkel 60db05896a Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342
2014-09-07 18:57:58 +00:00
Hal Finkel 57f03dda49 Add functions for finding ephemeral values
This adds a set of utility functions for collecting 'ephemeral' values. These
are LLVM IR values that are used only by @llvm.assume intrinsics (directly or
indirectly), and thus will be removed prior to code generation, implying that
they should be considered free for certain purposes (like inlining). The
inliner's cost analysis, and a few other passes, have been updated to account
for ephemeral values using the provided functionality.

This functionality is important for the usability of @llvm.assume, because it
limits the "non-local" side-effects of adding llvm.assume on inlining, loop
unrolling, etc. (these are hints, and do not generate code, so they should not
directly contribute to estimates of execution cost).

llvm-svn: 217335
2014-09-07 13:49:57 +00:00
David Majnemer 6fe6ea740c InstCombine: Remove a special case pattern
The special case did not work when run under -reassociate and can easily
be expressed by a further generalization of an existing pattern.

llvm-svn: 217227
2014-09-05 06:09:24 +00:00
David Majnemer c6ab01ecca IndVarSimplify: Don't let LFTR compare against a poison value
LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible.  However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop.  Using it in a comparison to calculate the exit
condition could result in observing poison.

This fixes PR20680.

Differential Revision: http://reviews.llvm.org/D5174

llvm-svn: 217102
2014-09-03 23:03:18 +00:00
Robin Morisset a47cb411dc Use target-dependent emitLeading/TrailingFence instead of the target-independent insertLeading/TrailingFence (in AtomicExpandPass)
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
    It is not even clear if this is correct on ARM swift processors (where release fences are
    DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
    as it is not clear whether it is incorrect. I would love to get documentation stating
    whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).

See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.

I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.

This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.

llvm-svn: 217076
2014-09-03 21:01:03 +00:00
Sanjay Patel 9433a28845 Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802).
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.

The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015
http://llvm.org/viewvc/llvm-project?view=revision&revision=211339

The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.

Differential Revision: http://reviews.llvm.org/D5172

llvm-svn: 217051
2014-09-03 17:40:30 +00:00
Yi Jiang 77a609b556 Generate extract for in-tree uses if the use is scalar operand in vectorized instruction. radar://18144665
llvm-svn: 216946
2014-09-02 21:00:39 +00:00
Matt Arsenault 9d412ed41e Fix crash when looking up the addrspace of GEPs with vector types
Patch by Björn Steinbrink

llvm-svn: 216930
2014-09-02 18:47:54 +00:00
Andrea Di Biagio b9de900788 Revert: [APFloat] Fixed a bug in method 'fusedMultiplyAdd'.
This reverts revision 216913; the new test added at revision 216913
caused regression failures on a couple of buildbots.

llvm-svn: 216914
2014-09-02 17:22:49 +00:00
Andrea Di Biagio 7676fe1878 [APFloat] Fixed a bug in method 'fusedMultiplyAdd'.
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.

Example:
  define double @test() {
    %1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
    ret double %1
  }

Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).

Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.

This fixes PR20832.

Differential Revision: http://reviews.llvm.org/D5152

llvm-svn: 216913
2014-09-02 16:44:56 +00:00
David Majnemer 49428105aa LICM: Don't crash when an instruction is used by an unreachable BB
Summary:
BBs might contain non-LCSSA'd values after the LCSSA pass is run if they
are unreachable from the entry block.

Normally, the users of the instruction would be PHIs but the unreachable
BBs have normal users; rewrite their uses to be undef values.

An alternative fix could involve fixing this at LCSSA but that would
require this invariant to hold after subsequent transforms.  If a BB
created an unreachable block, they would be in violation of this.

This fixes PR19798.

Differential Revision: http://reviews.llvm.org/D5146

llvm-svn: 216911
2014-09-02 16:22:00 +00:00
David Majnemer d4cffcf073 SROA: Don't insert instructions before a PHI
SROA may decide that it needs to insert a bitcast and would set it's
insertion point before a PHI.  This will create an invalid module
right quick.

Instead, choose the first insertion point in the basic block that holds
our PHI.

This fixes PR20822.

Differential Revision: http://reviews.llvm.org/D5141

llvm-svn: 216891
2014-09-01 21:20:14 +00:00
David Majnemer d2df50196f Revert "Revert two GEP-related InstCombine commits"
This reverts commit r216698 which reverted r216523 and r216598.

We would attempt to perform the transformation even if the match()
failed because, as a side effect, it would set V.  This would trick us
into believing that we correctly found a place to correctly apply the
transform.

An additional test case was added to getelementptr.ll so that we might
not regress in the future.

llvm-svn: 216890
2014-09-01 21:10:02 +00:00
Sanjay Patel 5ad239e15a Add a convenience method to copy wrapping, exact, and fast-math flags (NFC).
The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions.
This patch adds a convenience method to make that operation easier because we need to do this
in the loop vectorizer, SLP vectorizer, and possibly other places.

Although this is a 'no functional change' patch, I've added a testcase to verify that the exact
flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked
in existing testcases.

Differential Revision: http://reviews.llvm.org/D5138

llvm-svn: 216886
2014-09-01 18:44:57 +00:00
Chandler Carruth 18cee1defc Fix a really bad miscompile introduced in r216865 - the else-if logic
chain became completely broken here as *all* intrinsic users ended up
being skipped, and the ones that seemed to be singled out were actually
the exact wrong set.

This is a great example of why long else-if chains can be easily
confusing. Switch the entire code to use early exits and early continues
to have simpler (and more importantly, correct) logic here, as well as
fixing the reversed logic for detecting and continuing on lifetime
intrinsics.

I've also significantly cleaned up the test case and added another test
case demonstrating an example where the optimization is not (trivially)
safe to perform.

llvm-svn: 216871
2014-09-01 10:09:18 +00:00
Renato Golin 86a6c3f269 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

Re-applying again due to MSVC 2013 being minimum requirement, and this patch
having C++11 that MSVC 2012 didn't support.

Fixes PR20655.

llvm-svn: 216870
2014-09-01 10:00:17 +00:00
Hal Finkel 0c083024f0 Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata
This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.

llvm-svn: 216866
2014-09-01 09:01:39 +00:00
Nick Lewycky fc243d54d2 Ignore lifetime intrinsics in use list for MemCpyOptimizer. Patch by Luqman Aden, review by Hal Finkel.
llvm-svn: 216865
2014-09-01 06:03:11 +00:00
Hal Finkel cbb85f249e Fix AddAliasScopeMetadata again - alias.scope must be a complete description
I thought that I had fixed this problem in r216818, but I did not do a very
good job. The underlying issue is that when we add alias.scope metadata we are
asserting that this metadata completely describes the aliasing relationships
within the current aliasing scope domain, and so in the context of translating
noalias argument attributes, the pointers must all be based on noalias
arguments (as underlying objects) and have no other kind of underlying object.
In r216818 excluding appropriate accesses from getting alias.scope metadata is
done by looking for underlying objects that are not identified function-local
objects -- but that's wrong because allocas, etc. are also function-local
objects and we need to explicitly check that all underlying objects are the
noalias arguments for which we're adding metadata aliasing scopes.

This fixes the underlying-object check for adding alias.scope metadata, and
does some refactoring of the related capture-checking eligibility logic (and
adds more comments; hopefully making everything a bit clearer).

Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
feature is still disabled by default).

llvm-svn: 216863
2014-09-01 04:26:40 +00:00
Hal Finkel a3708df41a Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers
The previous implementation of AddAliasScopeMetadata, which adds noalias
metadata to preserve noalias parameter attribute information when inlining had
a flaw: it would add alias.scope metadata to accesses which might have been
derived from pointers other than noalias function parameters. This was
incorrect because even some access known not to alias with all noalias function
parameters could easily alias with an access derived from some other pointer.
Instead, when deriving from some unknown pointer, we cannot add alias.scope
metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4.
Furthermore, we cannot add alias.scope to functions unless we know they
access only argument-derived pointers (currently, we know this only for
memory intrinsics).

Also, we fix a theoretical problem with using the NoCapture attribute to skip
the capture check. This is incorrect (as explained in the comment added), but
would not matter in any code generated by Clang because we get only inferred
nocapture attributes in Clang-generated IR.

This functionality is not yet enabled by default.

llvm-svn: 216818
2014-08-30 12:48:33 +00:00
David Majnemer 5e96f1b4c8 InstCombine: Try harder to combine icmp instructions
consider: (and (icmp X, Y), (and Z, (icmp A, B)))
It may be possible to combine (icmp X, Y) with (icmp A, B).
If we successfully combine, create an 'and' instruction with Z.

This fixes PR20814.

N.B. There is room for improvement after this change but I'm not
convinced it's worth chasing yet.

llvm-svn: 216814
2014-08-30 06:18:20 +00:00
Robin Morisset 163ef0402a Relax the constraint more in MemoryDependencyAnalysis.cpp
Even loads/stores that have a stronger ordering than monotonic can be safe.
The rule is no release-acquire pair on the path from the QueryInst, assuming that
the QueryInst is not atomic itself.

llvm-svn: 216771
2014-08-29 20:32:58 +00:00
Matt Arsenault 85cbc7e371 Make fabs safe to speculatively execute
llvm-svn: 216736
2014-08-29 16:01:17 +00:00
David Majnemer 400e725bde Revert two GEP-related InstCombine commits
This reverts commit r216523 and r216598; people have reported
regressions.

llvm-svn: 216698
2014-08-29 00:06:43 +00:00
Reid Kleckner febb279c9c Don't promote byval pointer arguments when padding matters
Don't promote byval pointer arguments when when their size in bits is
not equal to their alloc size in bits. This can happen for x86_fp80,
where the size in bits is 80 but the alloca size in bits in 128.
Promoting these types can break passing unions of x86_fp80s and other
types.

Patch by Thomas Jablin!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D5057

llvm-svn: 216693
2014-08-28 22:42:00 +00:00
Erik Eckstein 8354cfaf95 Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction.
For a detailed description of the problem see the comment in the test file.
The problematic moveBefore() calls are not required anymore because the new
scheduling algorithm ensures a correct ordering anyway.

llvm-svn: 216656
2014-08-28 07:04:02 +00:00
David Majnemer 76d06bc613 InstSimplify: Move a transform from InstCombine to InstSimplify
Several combines involving icmp (shl C2, %X) C1 can be simplified
without introducing any new instructions.  Move them to InstSimplify;
while we are at it, make them more powerful.

llvm-svn: 216642
2014-08-28 03:34:28 +00:00
David Majnemer 22ccfc4484 InstCombine: Combine gep X, (Y-X) to Y
We try to perform this transform in InstSimplify but we aren't always
able to.  Sometimes, we need to insert a bitcast if X and Y don't have
the same time.

llvm-svn: 216598
2014-08-27 20:08:37 +00:00
David Majnemer 11ca2971e8 InstSimplify: Don't simplify gep X, (Y-X) to Y if types differ
It's incorrect to perform this simplification if the types differ.
A bitcast would need to be inserted for this to work.

This fixes PR20771.

llvm-svn: 216597
2014-08-27 20:08:34 +00:00
Nico Weber 48c82400ed Reland r216439 215441, majnemer has a real fix for PR20771.
llvm-svn: 216586
2014-08-27 20:06:19 +00:00
Nico Weber 7b343e3cc6 Revert r216439 (and r216441, else the former doesn't revert cleanly).
It caused PR 20771. I'll land a test on the clang side.

llvm-svn: 216582
2014-08-27 20:00:13 +00:00
David Majnemer d6d1671c1e InstSimplify: Compute comparison ranges for left shift instructions
'shl nuw CI, x' produces [CI, CI << CLZ(CI)]
'shl nsw CI, x' produces [CI << CLO(CI)-1, CI] if CI is negative
'shl nsw CI, x' produces [CI, CI << CLZ(CI)-1] if CI is non-negative

llvm-svn: 216570
2014-08-27 18:03:46 +00:00
Michael Zolotukhin 5dc466b863 [SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix).
llvm-svn: 216549
2014-08-27 15:01:18 +00:00
David Majnemer 54e97d5dc0 InstCombine: Optimize GEP's involving ptrtoint better
We supported transforming:
(gep i8* X, -(ptrtoint Y))

to:
(inttoptr (sub (ptrtoint X), (ptrtoint Y)))

However, this only fired if 'X' had type i8*.  Generalize this to
support various types of different sizes.  This results in much better
CodeGen, especially for pointers to packed structs.

llvm-svn: 216523
2014-08-27 05:16:04 +00:00
Joerg Sonnenberger cb5674b9c2 Revert r210342 and r210343, add test case for the crasher.
PR 20642.

llvm-svn: 216475
2014-08-26 19:06:41 +00:00
David Majnemer 788d0ab8c8 InstSimplify: Fold gep X, (sub 0, ptrtoint(X)) to null
Save InstCombine some work if we can perform this fold during
InstSimplify.

llvm-svn: 216441
2014-08-26 07:08:03 +00:00
David Majnemer bc4981323f InstSimplify: Simplify trivial pointer expressions like b + (e - b)
consider:
long long *f(long long *b, long long *e) {
  return b + (e - b);
}

we would lower this to something like:
define i64* @f(i64* %b, i64* %e) {
  %1 = ptrtoint i64* %e to i64
  %2 = ptrtoint i64* %b to i64
  %3 = sub i64 %1, %2
  %4 = ashr exact i64 %3, 3
  %5 = getelementptr inbounds i64* %b, i64 %4
  ret i64* %5
}

This should fold away to just 'e'.

N.B.  This adds m_SpecificInt as a convenient way to match against a
particular 64-bit integer when using LLVM's match interface.

llvm-svn: 216439
2014-08-26 05:55:16 +00:00
Reid Kleckner 3715461b48 musttail: Don't eliminate varargs packs if there is a forwarding call
Also clean up and beef up this grep test for the feature.

llvm-svn: 216425
2014-08-26 00:59:51 +00:00
Reid Kleckner 8349864dbd Declare that musttail calls in variadic functions forward the ellipsis
Summary:
There is no functionality change here except in the way we assemble and
dump musttail calls in variadic functions. There's really no need to
separate out the bits for musttail and "is forwarding varargs" on call
instructions. A musttail call by definition has to forward the ellipsis
or it would fail verification.

Reviewers: chandlerc, nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4892

llvm-svn: 216423
2014-08-26 00:33:28 +00:00
Reid Kleckner e6e88f99b3 ArgPromotion: Don't touch variadic functions
Adding, removing, or changing non-pack parameters can change the ABI
classification of pack parameters. Clang and other frontends encode the
classification in the IR of the call site, but the callee side
determines it dynamically based on the number of registers consumed so
far. Changing the prototype affects the number of registers consumed
would break such code.

Dead argument elimination performs a similar task and already has a
similar check to avoid this problem.

Patch by Thomas Jablin!

llvm-svn: 216421
2014-08-25 23:58:48 +00:00
Bruno Cardoso Lopes e2a1fa35df Remove dangling initializers in GlobalDCE
GlobalDCE deletes global vars and updates their initializers to nullptr
while leaving underlying constants to be cleaned up later by its uses.
The clean up may never happen, fix this by forcing it every time it's
safe to destroy constants.

Final patch by Rafael Espindola
http://reviews.llvm.org/D4931

<rdar://problem/17523868>

llvm-svn: 216390
2014-08-25 17:51:14 +00:00
Karthik Bhat 7f33ff7dea Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)

llvm-svn: 216371
2014-08-25 04:56:54 +00:00
David Majnemer 0ffccf7fb5 InstCombine: Properly optimize or'ing bittests together
CFE, with -03, would turn:
bool f(unsigned x) {
  bool a = x & 1;
  bool b = x & 2;
  return a | b;
}

into:
  %1 = lshr i32 %x, 1
  %2 = or i32 %1, %x
  %3 = and i32 %2, 1
  %4 = icmp ne i32 %3, 0

This sort of thing exposes a nasty pathology in GCC, ICC and LLVM.

Instead, we would rather want:
  %1 = and i32 %x, 3
  %2 = icmp ne i32 %1, 0

Things get a bit more interesting in the following case:
  %1 = lshr i32 %x, %y
  %2 = or i32 %1, %x
  %3 = and i32 %2, 1
  %4 = icmp ne i32 %3, 0

Replacing it with the following sequence is better:
  %1 = shl nuw i32 1, %y
  %2 = or i32 %1, 1
  %3 = and i32 %2, %x
  %4 = icmp ne i32 %3, 0

This sequence is preferable because %1 doesn't involve %x and could
potentially be hoisted out of loops if it is invariant; only perform
this transform in the non-constant case if we know we won't increase
register pressure.

llvm-svn: 216343
2014-08-24 09:10:57 +00:00
Yunzhong Gao 300bdb35d4 Add a test case for SROA where the store size is bigger than slice size. The
test case was fixed in r216248.

llvm-svn: 216303
2014-08-22 23:27:04 +00:00
Jingyue Wu ec33fa9aca [SROA] Fold a PHI node if all its incoming values are the same
Summary:
Fixes PR20425.

During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote.

Test Plan: Added three more tests in phi-and-select.ll

Reviewers: nlewycky, eliben, meheff, chandlerc

Reviewed By: chandlerc

Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits

Differential Revision: http://reviews.llvm.org/D4659

llvm-svn: 216299
2014-08-22 22:45:57 +00:00
David Majnemer 49775e0173 InstCombine: Don't unconditionally preserve 'nuw' when shrinking constants
Consider:
  %add = add nuw i32 %a, -16777216
  %and = and i32 %add, 255

Regardless of whether or not we demand the sign bit of %add, we cannot
replace -16777216 with 2130706432 without also removing 'nuw' from the
instruction.

llvm-svn: 216273
2014-08-22 17:11:04 +00:00
David Majnemer 0e6c986696 InstCombine: sub nsw %x, C -> add nsw %x, -C if C isn't INT_MIN
We can preserve nsw during this transform if -C won't overflow.

llvm-svn: 216269
2014-08-22 16:41:23 +00:00
David Majnemer 42b83a5e36 InstCombine: Don't unconditionally preserve 'nsw' when shrinking constants
Consider:
  %add = add nsw i32 %a, -16777216
  %and = and i32 %add, 255

Regardless of whether or not we demand the sign bit of %add, we cannot
replace -16777216 with 2130706432 without also removing 'nsw' from the
instruction.

This fixes PR20377.

llvm-svn: 216261
2014-08-22 07:56:32 +00:00
Erik Eckstein b49d7abb7b fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions.
In unreachable blocks it's legal to have instructions like "%x = op %x".
Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for
unreachable blocks and ignore them.

Fixes bug 20646.

llvm-svn: 216256
2014-08-22 01:18:39 +00:00
David Majnemer 97ddca3224 ValueTracking: Figure out more bits when looking at add/sub
Given something like X01XX + X01XX, we know that the result must look
like X1XXX.

Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216250
2014-08-22 00:40:43 +00:00
Reid Kleckner c36f48f08a SROA: Handle a case of store size being smaller than allocation size
In this case, we are creating an x86_fp80 slice for a union from C where
the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes,
and that's just fine. We can't, however, use regular loads and stores to
access the slice, because the store size is only 10 bytes / 80 bits.
Instead, use memcpy and memset.

Fixes PR18726.

Reviewed By: chandlerc

Differential Revision: http://reviews.llvm.org/D5012

llvm-svn: 216248
2014-08-22 00:09:56 +00:00
David Blaikie 2f3f76fdb1 Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks.
Somewhat unnoticed in the original implementation of discriminators, but
it could cause instructions to end up in new, small,
DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
discriminator changes.

Instead, use DILexicalBlockFile which we already use to track file
changes without introducing new scopes, so it works well to track
discriminator changes in the same way.

llvm-svn: 216239
2014-08-21 22:45:21 +00:00
Robin Morisset 59c23cd946 Rename AtomicExpandLoadLinked into AtomicExpand
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details

The command line option is "atomic-expand"

llvm-svn: 216231
2014-08-21 21:50:01 +00:00
Erik Verbruggen 2b98bd2a80 Reassociate x + -0.1234 * y into x - 0.1234 * y
This does not require -ffast-math, and it gives CSE/GVN more options to
eliminate duplicate expressions in, e.g.:

  return ((x + 0.1234 * y) * (x - 0.1234 * y));

Differential Revision: http://reviews.llvm.org/D4904

llvm-svn: 216169
2014-08-21 10:45:30 +00:00
Zinovy Nis 0a36cba29d [INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:

int N = 100;
float **A;

void foo(int x0, int x1)
{
        float * A_cur = &A[0][0];
        float * A_next = &A[1][0];
        for(int x = x0; x < x1; ++x).
        {
          // Currently only [x+N] case is widened. Others 2 cases lead to sext.
          // This patch fixes it, so all 3 cases do not need sext.
          const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
          A_next[x] = div;
        }
}
...
> clang++ test.cpp -march=core-avx2 -Ofast  -fno-unroll-loops -fno-tree-vectorize -S -o -

Differential Revision: http://reviews.llvm.org/D4695

llvm-svn: 216160
2014-08-21 08:25:45 +00:00
David Majnemer 5d1aeba2ea InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1
Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216157
2014-08-21 05:14:48 +00:00
Jiangning Liu 950844fadb Fix a bug around truncating vector in const prop.
In constant folding stage, "TRUNC" can't handle vector data type.

llvm-svn: 216149
2014-08-21 02:12:35 +00:00
Yi Jiang 1a4e73d7bf New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition
llvm-svn: 216135
2014-08-20 22:55:40 +00:00
David Majnemer 42158f3eea InstCombine: Annotate sub with nuw when we prove it's safe
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.

llvm-svn: 216045
2014-08-20 07:17:31 +00:00
David Majnemer 57d5bc8849 InstCombine: Annotate sub with nsw when we prove it's safe
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.

The subtraction cannot be a signed overflow in either case.

llvm-svn: 216037
2014-08-19 23:36:30 +00:00
Renato Golin 06d601fb3e Revert "Small refactor on VectorizerHint for deduplication"
This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness.

llvm-svn: 215999
2014-08-19 18:08:50 +00:00
Renato Golin dd6394d833 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

llvm-svn: 215994
2014-08-19 17:30:43 +00:00
Mayur Pandey 960507beb4 InstCombine: ((A & ~B) ^ (~A & B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A & ~B),(~A & B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Differential Revision: http://reviews.llvm.org/D4898

llvm-svn: 215974
2014-08-19 08:19:19 +00:00
Robin Morisset 9e98e7f7fc Answer to Philip Reames comments
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll

llvm-svn: 215943
2014-08-18 22:18:14 +00:00
Robin Morisset 4ffe8aaa69 Weak relaxing of the constraints on atomics in MemoryDependencyAnalysis
Monotonic accesses do not have to kill the analysis, as long as the QueryInstr is not
itself atomic.

llvm-svn: 215942
2014-08-18 22:18:11 +00:00
Owen Anderson a4428aa484 Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it.  The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN.  This means that the
new expression is strictly more defined than the original one.  One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it.  Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it.  Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.

llvm-svn: 215825
2014-08-17 03:51:29 +00:00
David Majnemer f9a095d606 InstCombine: Combine mul with div.
We can combne a mul with a div if one of the operands is a multiple of
the other:

%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
  =>
%ret = mul nsw %a, (C1 / C2)

This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.

Consider this small example:

define i32 @f(i32 %a) {
  %mul = mul nuw i32 %a, 14
  %div = udiv exact i32 %mul, 7
  ret i32 %div
}

which gets CodeGen'd to:

    imull       $14, %edi, %eax
    imulq       $613566757, %rax, %rcx
    shrq        $32, %rcx
    subl        %ecx, %eax
    shrl        %eax
    addl        %ecx, %eax
    shrl        $2, %eax
    retq

We can now transform this into:
define i32 @f(i32 %a) {
  %shl = shl nuw i32 %a, 1
  ret i32 %shl
}

which gets CodeGen'd to:

    leal        (%rdi,%rdi), %eax
    retq

This fixes PR20681.

llvm-svn: 215815
2014-08-16 08:55:06 +00:00
Hal Finkel 61c386126b Copy noalias metadata from call sites to inlined instructions
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676
2014-08-14 21:09:37 +00:00
Hal Finkel d2dee16c27 Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Chad Rosier 11ab941644 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
David Majnemer 698dca0b95 InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer f1eda23514 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Jan Vesely 0cd3ec6cfa utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Chandler Carruth 0fb998110a [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Karthik Bhat a4a4db91be InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)
Correctness proof of the transform using CVC3-

$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;

$ cvc3 t.cvc
Valid.

llvm-svn: 215524
2014-08-13 05:13:14 +00:00
Matt Arsenault 4815f09bbe Allwo bitcast + struct GEP transform to work with addrspacecast
llvm-svn: 215467
2014-08-12 19:46:13 +00:00
David Majnemer ab07f00c64 InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b)
What follows bellow is a correctness proof of the transform using CVC3.

$ < t.cvc
A, B : BITVECTOR(32);

QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);

$ cvc3 < t.cvc
Valid.

llvm-svn: 215400
2014-08-11 22:32:02 +00:00
Jiangning Liu 40b04fd994 In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".

llvm-svn: 215343
2014-08-11 05:02:04 +00:00
James Molloy 65b08f5e46 [LoopVectorizer] Enable support for floating-point subtraction reductions
llvm-svn: 215200
2014-08-08 12:41:08 +00:00
David Majnemer fe8c7540b0 GlobalOpt: Optimize in the face of insertvalue/extractvalue
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst.  Optimizing these is pretty straightforward.

N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.

llvm-svn: 215184
2014-08-08 05:50:43 +00:00
Arnold Schwaighofer 4fb3c47456 SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment
We were using the pointer type which is incorrect.

llvm-svn: 215162
2014-08-07 22:47:27 +00:00
Owen Anderson 6c19ab1b5d Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.

llvm-svn: 215148
2014-08-07 21:07:35 +00:00
Rui Ueyama c487f7728e Revert "r214897 - Remove dead zero store to calloc initialized memory"
It broke msan.

llvm-svn: 214989
2014-08-06 19:30:38 +00:00
Philip Reames 00c9b6461f Remove dead zero store to calloc initialized memory
Optimize the following IR:

%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4

Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store.  If the store is to an out of bounds address, it is undefined and thus also removable.

Reviewed By: nicholas

Differential Revision: http://reviews.llvm.org/D3942

llvm-svn: 214897
2014-08-05 17:48:20 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Manman Ren 062f58d550 [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.
When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.

rdar://17887153

llvm-svn: 214642
2014-08-02 23:41:54 +00:00
Erik Eckstein 26a1bf7d84 fix bug 20513 - Crash in SLP Vectorizer
llvm-svn: 214638
2014-08-02 19:39:42 +00:00
Tyler Nowicki 064896bbc5 Add diagnostics to the vectorizer cost model.
When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision.

Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive.

Reviewed by Arnold Schwaighofer

llvm-svn: 214599
2014-08-02 00:14:03 +00:00
Peter Collingbourne e52646cd80 PartiallyInlineLibCalls: Check sqrt result type before transforming it.
Some configure scripts declare this with the wrong prototype, which can lead
to an assertion failure.

llvm-svn: 214593
2014-08-01 23:21:21 +00:00
Erik Eckstein c80e1dc081 SLPVectorizer: improved scheduling algorithm.
llvm-svn: 214494
2014-08-01 09:20:42 +00:00
Suyog Sarda 56c9a87035 This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)".
Differential Revision: http://reviews.llvm.org/D4653

llvm-svn: 214479
2014-08-01 05:07:20 +00:00
Suyog Sarda 1c6c2f69f7 This patch implements transform for pattern "(A | B) & ((~A) ^ B) -> (A & B)".
Differential Revision: http://reviews.llvm.org/D4628

llvm-svn: 214478
2014-08-01 04:59:26 +00:00
Suyog Sarda 52324c82cc This patch implements transform for pattern "( A & (~B)) | (A ^ B) -> (A ^ B)"
Differential Revision: http://reviews.llvm.org/D4652

llvm-svn: 214477
2014-08-01 04:50:31 +00:00
Suyog Sarda 16d646594e This patch implements transform for pattern "(A & B) | ((~A) ^ B) -> (~A ^ B)".
Patch Credit to Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4655

llvm-svn: 214476
2014-08-01 04:41:43 +00:00
Tyler Nowicki b5a65395cc Improve the remark generated for -Rpass-missed.
The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss.

Reviewed by Arnold Schwaighofer`

llvm-svn: 214445
2014-07-31 21:22:22 +00:00
Tyler Nowicki 9fe497fcac Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable.
Reviewed by Arnold Schwaighofer

llvm-svn: 214440
2014-07-31 21:02:40 +00:00
David Majnemer a92687d636 InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A
We can only propagate the nsw bits if both subtraction instructions are
marked with the appropriate bit.

N.B.  We only propagate the nsw bit in InstCombine because the nuw case
is already handled in InstSimplify.

This fixes PR20189.

llvm-svn: 214385
2014-07-31 04:49:29 +00:00
David Majnemer cd4fbcd1bb InstSimplify: Simplify (X - (0 - Y)) if the second sub is NUW
If the NUW bit is set for 0 - Y, we know that all values for Y other
than 0 would produce a poison value.  This allows us to replace (0 - Y)
with 0 in the expression (X - (0 - Y)) which will ultimately leave us
with X.

This partially fixes PR20189.

llvm-svn: 214384
2014-07-31 04:49:18 +00:00
Rafael Espindola 464fe024c5 Use "weak alias" instead of "alias weak"
Before this patch we had

@a = weak global ...
but
@b = alias weak ...

The patch changes aliases to look more like global variables.

Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.

The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.

llvm-svn: 214355
2014-07-30 22:51:54 +00:00
David Majnemer 42af3601c2 InstCombine: Simplify (A ^ B) or/and (A ^ B ^ C)
While we can already transform A | (A ^ B) into A | B, things get bad
once we have (A ^ B) | (A ^ B ^ Cst) because reassociation will morph
this into (A ^ B) | ((A ^ Cst) ^ B).  Our existing patterns fail once
this happens.

To fix this, we add a new pattern which looks through the tree of xor
binary operators to see that, in fact, there exists a redundant xor
operation.

What follows bellow is a correctness proof of the transform using CVC3.

$ cat t.cvc
A, B, C : BITVECTOR(64);

QUERY BVXOR(A, B) | BVXOR(BVXOR(B, C), A) = BVXOR(A, B) | C;
QUERY BVXOR(BVXOR(A, C), B) | BVXOR(A, B) = BVXOR(A, B) | C;

QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C;
QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C;

$ cvc3 < t.cvc
Valid.
Valid.
Valid.
Valid.

llvm-svn: 214342
2014-07-30 21:26:37 +00:00
Chad Rosier 78f41b3ca7 SLP Vectorizer: Canonicalize tree operands of commutitive binary operands.
llvm-svn: 214338
2014-07-30 21:07:56 +00:00
Rafael Espindola d07cf400ab SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.

For now dropping this optimization avoids a miscompilation.

Patch by Björn Steinbrink.

llvm-svn: 214336
2014-07-30 21:04:00 +00:00
Tim Northover e2239ff3eb CodeGenPrep: fall back to MVT::Other if instruction's type isn't an EVT.
The test being performed is just an approximation anyway, so it really
shouldn't crash when things don't go entirely as expected.

Should fix PR20474.

llvm-svn: 214177
2014-07-29 10:20:22 +00:00
Hal Finkel f5867a79c5 Canonicalization for @llvm.assume
Adds simple logical canonicalization of assumption intrinsics to instcombine,
currently:
 - invariant(a && b) -> invariant(a); invariant(b)
 - invariant(!(a || b)) -> invariant(!a); invariant(!b)

llvm-svn: 213977
2014-07-25 21:45:17 +00:00
Hal Finkel 930469107d Add @llvm.assume, lowering, and some basic properties
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:

 - llvm.invariant(true) is dead.
 - llvm.invariant(false) is unreachable (this directly corresponds to the
   documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).

The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.

llvm-svn: 213973
2014-07-25 21:13:35 +00:00
Hal Finkel ff0bcb60c9 Convert noalias parameter attributes into noalias metadata during inlining
This functionality is currently turned off by default.

Part of the motivation for introducing scoped-noalias metadata is to enable the
preservation of noalias parameter attribute information after inlining.
Sometimes this can be inferred from the code in the caller after inlining, but
often we simply lose valuable information.

The overall process if fairly simple:
 1. Create a new unqiue scope domain.
 2. For each (used) noalias parameter, create a new alias scope.
 3. For each pointer, collect the underlying objects. Add a noalias scope for
    each noalias parameter from which we're not derived (and has not been
    captured prior to that point).
 4. Add an alias.scope for each noalias parameter from which we might be
    derived (or has been captured before that point).

Note that the capture checks apply only if one of the underlying objects is not
an identified function-local object.

llvm-svn: 213949
2014-07-25 15:50:08 +00:00
Mark Heffernan 8ec1474f7f After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor
hint) the loop unroller replaces the llvm.loop.unroll.count metadata with
llvm.loop.unroll.disable metadata to prevent any subsequent unrolling
passes from unrolling more than the hint indicates.  This patch fixes
an issue where loop unrolling could be disabled for other loops as well which
share the same llvm.loop metadata.

llvm-svn: 213900
2014-07-24 22:36:40 +00:00
Manman Ren 29a2005596 Try to fix the bots again by moving test to X86 directory.
llvm-svn: 213884
2014-07-24 17:57:09 +00:00
Manman Ren a8bc9a4c36 Try to fix the bots. If this does not work, I am going to move it to X86 directory.
llvm-svn: 213880
2014-07-24 17:18:33 +00:00
Hal Finkel 9414665a3b Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

llvm-svn: 213864
2014-07-24 14:25:39 +00:00
Manman Ren edc60376ed SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.

For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx

It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext

rdar://17735071

llvm-svn: 213815
2014-07-23 23:13:23 +00:00
David Blaikie 8e9cfa5497 ArgPromo+DebugInfo: Handle updating debug info over multiple applications of argument promotion.
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.

To address this, ensure that the map is updated after each argument
promotion.

In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.

llvm-svn: 213805
2014-07-23 22:09:29 +00:00
David Blaikie f997c6f90b Test debug info in arg promotion with an actual promotion case, rather than a degenerate arg promotion that's actually DAE performed by ArgPromo
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.

llvm-svn: 213803
2014-07-23 21:30:59 +00:00
Mark Heffernan 9e112443b6 Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full).
llvm-svn: 213789
2014-07-23 20:05:44 +00:00
Mark Heffernan e6b4ba1c41 In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full".
llvm-svn: 213772
2014-07-23 17:31:37 +00:00
Nick Lewycky aba900c252 We may visit a call that uses an alloca multiple times in callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405!
llvm-svn: 213726
2014-07-23 06:24:49 +00:00
Suyog Sarda 3a8c2c1e6c This patch implements optimization as mentioned in PR19753: Optimize comparisons with "ashr/lshr exact" of a constanst.
It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch.
Added code for lshr as well as non-exact right shifts.

It implements : 
(icmp eq/ne (ashr/lshr const2, A), const1)" ->
(icmp eq/ne A, Log2(const2/const1)) ->
(icmp eq/ne A, Log2(const2) - Log2(const1))

Differential Revision: http://reviews.llvm.org/D4068
 

llvm-svn: 213678
2014-07-22 19:19:36 +00:00
Suyog Sarda b60ec909ca Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A | B)"
Patch idea by Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4618

llvm-svn: 213677
2014-07-22 18:30:54 +00:00
Suyog Sarda d64faf6cae Added InstCombine Transform for patterns:
"((~A & B) | A) -> (A | B)" and "((A & B) | ~A) -> (~A | B)"

Original Patch credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4591

llvm-svn: 213676
2014-07-22 18:09:41 +00:00
Hal Finkel ccc7090671 Make use of the align parameter attribute for all pointer arguments
We previously supported the align attribute on all (pointer) parameters, but we
only used it for byval parameters. However, it is completely consistent at the
IR level to treat 'align n' on all pointer parameters as an alignment
assumption on the pointer, and now we wll. Specifically, this causes
computeKnownBits to use the align attribute on all pointer parameters, not just
byval parameters. I've also added an explicit parameter attribute test for this
to test/Bitcode/attributes.ll.

And I've updated the LangRef to document the align parameter attribute (as it
turns out, it was not documented at all previously, although the byval
documentation mentioned that it could be used).

There are (at least) two benefits to doing this:
 - It allows enhancing alignment based on the pointer alignment after inlining callees.
 - It allows simplification of pointer arithmetic.

llvm-svn: 213670
2014-07-22 16:58:55 +00:00
Suyog Sarda 521237cad6 This patch implements transform for pattern "(A | B) ^ (~A) -> (A | ~B)".
Patch Credit to Ankit Jain !!

Differential Revision: http://reviews.llvm.org/D4588

llvm-svn: 213662
2014-07-22 15:37:39 +00:00
Mark Heffernan 9d20e42765 Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.
llvm-svn: 213588
2014-07-21 23:11:03 +00:00
Hal Finkel 7ae00a1282 [LoopVectorize] Use AA to partition potential dependency checks
Prior to this change, the loop vectorizer did not make use of the alias
analysis infrastructure. Instead, it performed memory dependence analysis using
ScalarEvolution-based linear dependence checks within equivalence classes
derived from the results of ValueTracking's GetUnderlyingObjects.

Unfortunately, this meant that:
  1. The loop vectorizer had logic that essentially duplicated that in BasicAA
     for aliasing based on identified objects.
  2. The loop vectorizer could not partition the space of dependency checks
     based on information only easily available from within AA (TBAA metadata is
     currently the prime example).

This means, for example, regardless of whether -fno-strict-aliasing was
provided, the vectorizer would only vectorize this loop with a runtime
memory-overlap check:

void foo(int *a, float *b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i];
}

This is suboptimal because the TBAA metadata already provides the information
necessary to show that this check unnecessary. Of course, the vectorizer has a
limit on the number of such checks it will insert, so in practice, ignoring
TBAA means not vectorizing more-complicated loops that we should.

This change causes the vectorizer to use an AliasSetTracker to keep track of
the pointers in the loop. The resulting alias sets are then used to partition
the space of dependency checks, and potential runtime checks; this results in
more-efficient vectorizations.

When pointer locations are added to the AliasSetTracker, two things are done:
  1. The location size is set to UnknownSize (otherwise you'd not catch
     inter-iteration dependencies)
  2. For instructions in blocks that would need to be predicated, TBAA is
     removed (because the metadata might have a control dependency on the condition
     being speculated).

For non-predicated blocks, you can leave the TBAA metadata. This is safe
because you can't have an iteration dependency on the TBAA metadata (if you
did, and you unrolled sufficiently, you'd end up with the same pointer value
used by two accesses that TBAA says should not alias, and that would yield
undefined behavior).

llvm-svn: 213486
2014-07-20 23:07:52 +00:00
Hal Finkel 4f7d55aac8 [LoopVectorize] Propagate known metadata to vectorized instructions
There are some kinds of metadata that are safe to propagate from the scalar
instructions to the vector instructions (fpmath and tbaa currently).

Regarding TBAA, one might worry about propagating it on if-converted loads and
stores, because the metadata might have had a control dependency on the
condition, and thus actually aliased with some other non-speculated memory
access when the condition was false. However, this would be caught by the
runtime overlap checks.

llvm-svn: 213452
2014-07-19 13:33:16 +00:00
Hal Finkel 9e440c08a9 Make Value::isDereferenceablePointer handle offsets to pointer types with dereferenceable attributes
When we have a parameter (or call site return) with a dereferenceable
attribute, it can specify the size of an array pointed to by that parameter. If
we have a value for which we can accumulate a constant offset to such a
parameter, then we can use that offset in a direct comparison with the size
specified by the dereferenceable attribute.

This enables us to handle cases like this:

  int foo(int a[static 3]) {
    return a[2]; /* this is always dereferenceable */
  }

llvm-svn: 213447
2014-07-19 03:25:16 +00:00
Mark Heffernan 053a68688a Remove unroll pragma metadata after it is used.
llvm-svn: 213412
2014-07-18 21:04:33 +00:00
Gerolf Hoflehner f27ae6cdcf MergedLoadStoreMotion pass
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.

llvm-svn: 213396
2014-07-18 19:13:09 +00:00
Hal Finkel b0407ba071 Add a dereferenceable attribute
This attribute indicates that the parameter or return pointer is
dereferenceable. Practically speaking, loads from such a pointer within the
associated byte range are safe to speculatively execute. Such pointer
parameters are common in source languages (C++ references, for example).

llvm-svn: 213385
2014-07-18 15:51:28 +00:00
Matt Arsenault 3dd43fc75d R600: Implement TTI:getPopcntSupport
The test is just copied from X86, and I don't know of a better
way to test it.

llvm-svn: 213351
2014-07-18 06:07:13 +00:00
Suyog Sarda 68862414b5 Move ashr optimization from InstCombineShift to InstSimplify.
Refactor code, no functionality change, test case moved from instcombine to instsimplify.

Differential Revision: http://reviews.llvm.org/D4102
 

llvm-svn: 213231
2014-07-17 06:28:15 +00:00
Hal Finkel 354e23b029 Improve BasicAA CS-CS queries (redux)
This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it
causes PR20303." with a fix for the bug in pr20303. As it turned out, the
relevant code was both wrong and over-conservative (because, as with the code
it replaced, it would return the overall ModRef mask even if just Ref had been
implied by the argument aliasing results). Hopefully, this correctly fixes both
problems.

Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've
cleaned up a little and added in DSE's test directory). The BasicAA test has
also been updated to check for this error.

Original commit message:

BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.

Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.

This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.

Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).

llvm-svn: 213219
2014-07-17 01:28:25 +00:00
Jingyue Wu 0bdc027e31 Partially revert r210444 due to performance regression
Summary:
Converting outermost zext(a) to sext(a) causes worse code when the
computation of zext(a) could be reused. For example, after converting

... = array[zext(a)]
... = array[zext(a) + 1]

to

... = array[sext(a)]
... = array[zext(a) + 1],

the program computes sext(a), which is actually unnecessary. I added one
test in split-gep-and-gvn.ll to illustrate this scenario.

Also, with r211281 and r211084, we annotate more "nuw" tags to
computation involving CUDA intrinsics such as threadIdx.x. These
annotations help with splitting GEP a lot, rendering the benefit we get
from this reverted optimization only marginal.

Test Plan: make check-all

Reviewers: eliben, meheff

Reviewed By: meheff

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D4542

llvm-svn: 213209
2014-07-16 23:25:00 +00:00
Justin Holewinski 3e037d98e6 [NVPTX] Rename registers %fl -> %fd and %rl -> %rd
This matches the internal behavior of NVIDIA tools like libnvvm.

llvm-svn: 213168
2014-07-16 16:26:58 +00:00
Tyler Nowicki 641d8a06bd Emit warnings if vectorization is forced and fails.
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.

Reviewed by: Arnold Schwaighofer

llvm-svn: 213110
2014-07-16 00:36:00 +00:00
Stepan Dyatkovskiy dee612d4f6 MergeFunc patch from Björn Steinbrink.
Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke.
Thanks!

llvm-svn: 213060
2014-07-15 10:46:51 +00:00
Matt Arsenault f1a7e62033 Teach computeKnownBits to look through addrspacecast.
This fixes inferring alignment through an addrspacecast.

llvm-svn: 213030
2014-07-15 01:55:03 +00:00
Matt Arsenault 70f4db88d5 Teach GetUnderlyingObject / BasicAA about addrspacecast
llvm-svn: 213025
2014-07-15 00:56:40 +00:00
Matt Arsenault ee54aaef0c Convert test to FileCheck.
Check the individual test functions for more useful failure errors.

llvm-svn: 213021
2014-07-15 00:07:27 +00:00
Matt Arsenault caa9c71f32 Look through addrspacecast in IsConstantOffsetFromGlobal
llvm-svn: 213000
2014-07-14 22:39:26 +00:00
Matt Arsenault fd78d0c934 Look through addrspacecast in GetPointerBaseWithConstantOffset
llvm-svn: 212999
2014-07-14 22:39:22 +00:00
Matt Arsenault eb9e5f41a6 Convert test to FileCheck
llvm-svn: 212992
2014-07-14 21:59:26 +00:00
David Majnemer b8f435ca70 Fix a test broken in r212981
@icmp_sdiv_neg1 should have referred to %a instead of %call, it was
renamed at the last second.

llvm-svn: 212983
2014-07-14 20:46:04 +00:00
David Majnemer af9180fd04 InstSimplify: Correct sdiv x / -1
Determining the bounds of x/ -1 would start off with us dividing it by
INT_MIN.  Suffice to say, this would not work very well.

Instead, handle it upfront by checking for -1 and mapping it to the
range: [INT_MIN + 1, INT_MAX.  This means that the result of our
division can be any value other than INT_MIN.

llvm-svn: 212981
2014-07-14 20:38:45 +00:00
David Majnemer 5ea4fc0b33 InstSimplify: The upper bound of X / C was missing a rounding step
Summary:
When calculating the upper bound of X / -8589934592, we would perform
the following calculation: Floor[INT_MAX / 8589934592]

However, flooring the result would make us wrongly come to the
conclusion that 1073741824 was not in the set of possible values.
Instead, use the ceiling of the result.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4502

llvm-svn: 212976
2014-07-14 19:49:57 +00:00
Matt Arsenault 199b39e063 Look through addrspacecast when checking isDereferenceablePointer
llvm-svn: 212971
2014-07-14 18:54:12 +00:00
Nick Lewycky 703e488ed9 Don't eliminate memcpy's when the address of the pointer may itself be relevant. Fixes PR18304. Patch by David Wiberg!
llvm-svn: 212970
2014-07-14 18:52:02 +00:00
Aditya Nandakumar 0b5a674243 When we sink an instruction, this can open up opportunity for the operands to be sunk - add them to the worklist
llvm-svn: 212847
2014-07-11 21:49:39 +00:00
Marcello Maggioni 83442bb8b2 Added test for commit r212802 that was missing
llvm-svn: 212803
2014-07-11 10:36:00 +00:00
Duncan P. N. Exon Smith 04934b0fec InstCombine: Fix a crash in Descale for multiply-by-zero
Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets
created as an argument to a GEP partway through an iteration, causing
-instcombine to optimize the GEP before the multiply.

rdar://problem/17615671

llvm-svn: 212742
2014-07-10 17:13:27 +00:00
Hal Finkel a71fe078c8 A test case for not asserting in isDereferenceablePointer upon unsized types
This is the test case for r212687.

llvm-svn: 212688
2014-07-10 07:04:37 +00:00
Hal Finkel 2e42c34d05 Allow isDereferenceablePointer to look through some bitcasts
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.

In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).

llvm-svn: 212686
2014-07-10 05:27:53 +00:00
Adam Nemet 2820a5b9e9 [X86] AVX512: Enable it in the Loop Vectorizer
This lets us experiment with 512-bit vectorization without passing
force-vector-width manually.

The code generated for a simple integer memset loop is properly vectorized.
Disassembly is still broken for it though :(.

llvm-svn: 212634
2014-07-09 18:22:33 +00:00
Sanjay Patel 7ae7a831b9 removed duplicate testcase
llvm-svn: 212632
2014-07-09 17:49:58 +00:00
Sanjay Patel 58814445d4 Fix for PR20059 (instcombine reorders shufflevector after instruction that may trap)
In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem).

This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed.

Differential Revision: http://reviews.llvm.org/D4424

llvm-svn: 212629
2014-07-09 16:34:54 +00:00
Pete Cooper 91e4ba2f88 Revert "GlobalDCE: Delete available_externally initializers if it allows removing the value the initializer is referring to."
This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c.

A test case was found to crash after this was applied.  I'll file a bug to track fixing this with the test case needed.

llvm-svn: 212550
2014-07-08 17:06:03 +00:00
Sanjay Patel a932da8f35 Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an operation in a phi node that can trap.
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.

The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.

llvm-svn: 212490
2014-07-07 21:19:00 +00:00
Tim Northover 55beb64bd0 CodeGen: it turns out that NAND is not the same thing as BIC. At all.
We've been performing the wrong operation on ARM for "atomicrmw nand" for
years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b".
This bled over into the generic expansion pass.

So I assume no-one has ever actually tried to do an atomic nand in the real
world. Oh well.

llvm-svn: 212443
2014-07-07 09:06:35 +00:00
David Majnemer d1bea693e2 IR: Fold away compares between GV GEPs and GVs
A GEP of a non-weak global variable will not be equivalent to another
non-weak global variable or a GEP of such a variable.

Differential Revision: http://reviews.llvm.org/D4238

llvm-svn: 212360
2014-07-04 22:05:26 +00:00
Benjamin Kramer 3c5b126239 GlobalDCE: Delete available_externally initializers if it allows removing the value the initializer is referring to.
This is useful for functions that are not actually available externally but
referenced by a vtable of some kind. Clang emits functions like this for the MS
ABI.

PR20182.

llvm-svn: 212337
2014-07-04 12:36:05 +00:00
Benjamin Kramer a420df2999 InstCombine: Strength reduce sadd.with.overflow into a regular nsw add if we can prove that it cannot overflow.
PR20194

llvm-svn: 212331
2014-07-04 10:22:21 +00:00
David Majnemer 651ed5e8fd InstSimplify: Fix a bug when INT_MIN is in a sdiv
When INT_MIN is the numerator in a sdiv, we would not properly handle
overflow when calculating the bounds of possible values; abs(INT_MIN) is
not a meaningful number.

Instead, check and handle INT_MIN by reasoning that the largest value is
INT_MIN/-2 and the smallest value is INT_MIN.

This fixes PR20199.

llvm-svn: 212307
2014-07-04 00:23:39 +00:00
Richard Trieu f2a795241a Add new lines to debugging information.
Differential Revision: http://reviews.llvm.org/D4262

llvm-svn: 212250
2014-07-03 02:11:49 +00:00
David Majnemer f28e2a4282 InstCombine: Optimize x/INT_MIN to x==INT_MIN
The result of x/INT_MIN is either 0 or 1, we can just use an icmp
instead.

llvm-svn: 212167
2014-07-02 06:42:13 +00:00
David Majnemer e18d302ef4 InstCombine: Add a vector variant test for PR20186
No functional change, just adding more test coverage that was meant to
go in with r212164.

llvm-svn: 212165
2014-07-02 06:14:13 +00:00
David Majnemer bdeef602e9 InstCombine: Don't turn -(x/INT_MIN) -> x/INT_MIN
It is not safe to negate the smallest signed integer, doing so yields
the same number back.

This fixes PR20186.

llvm-svn: 212164
2014-07-02 06:07:09 +00:00
David Blaikie e844cd5305 DebugInfo: Keep track of subprograms who's arguments have been promoted.
Matching behavior with DeadArgumentElimination (and leveraging some
now-common infrastructure), keep track of the function from debug info
metadata if arguments are promoted.

This may produce interesting debug info - since the arguments may be
missing or of different types... but at least backtraces, inlining, etc,
will be correct.

llvm-svn: 212128
2014-07-01 21:13:37 +00:00
David Majnemer 5c92115972 GlobalOpt: Don't swap private for internal linkage
There were transforms whose *intent* was to downgrade the linkage of
external objects to have internal linkage.

However, it fired on things with private linkage as well.

llvm-svn: 212104
2014-07-01 15:26:50 +00:00
David Majnemer 9797abb0bf GlobalOpt: FileCheck-ize test
No functionality change.

llvm-svn: 212103
2014-07-01 15:26:47 +00:00
David Majnemer 0e2cc2a519 GlobalOpt: Handle non-zero offsets for aliases
An alias with an aliasee of a non-zero GEP is not trivially replacable
with it's aliasee.

llvm-svn: 212079
2014-07-01 00:30:56 +00:00
Gerolf Hoflehner 734f4c8984 Suppress inlining when the block address is taken
Inlining functions with block addresses can cause many problem and requires a
rich infrastructure to support including escape analysis.  At this point the
safest approach to address these problems is by blocking inlining from
happening.

Background:
There have been reports on Ruby segmentation faults triggered by inlining
functions with block addresses like

//Ruby code snippet
vm_exec_core() {
    finish_insn_seq_0 = &&INSN_LABEL_finish;
    INSN_LABEL_finish:
      ;
}

This kind of scenario can also happen when LLVM picks a subset of blocks for
inlining, which is the case with the actual code in the Ruby environment.

LLVM suppresses inlining for such functions when there is an indirect branch.
The attached patch does so even when there is no indirect branch.  Note that
user code like above would not make much sense: using the global for jumping
across function boundaries would be illegal.

Why was there a segfault:

In the snipped above the block with the label is recognized as dead So it is
eliminated. Instead of a block address the cloner stores a constant (sic!) into
the global resulting in the segfault (when the global is used in a goto).

Why had it worked in the past then:

By luck. In older versions vm_exec_core was also inlined but the label address
used was the block label address in vm_exec_core.  So the global jump ended up
in the original function rather than in the caller which accidentally happened
to work.

Test case ./tools/clang/test/CodeGen/indirect-goto.c will fail as a result
of this commit.

rdar://17245966

llvm-svn: 212077
2014-07-01 00:19:34 +00:00
Reid Kleckner cdb4e64a20 Convert some byval argpromotion grep tests to FileCheck
Surprisingly, the i32* byval parameter is not transformed by
argpromotion.

llvm-svn: 212067
2014-06-30 20:44:28 +00:00
David Blaikie 644d2eee59 DebugInfo: Preserve debug location information when transforming a call into an invoke during inlining.
This both improves basic debug info quality, but also fixes a larger
hole whenever we inline a call/invoke without a location (debug info for
the entire inlining is lost and other badness that the debug info
emission code is currently working around but shouldn't have to).

llvm-svn: 212065
2014-06-30 20:30:39 +00:00
David Blaikie ba405c22b8 Remove unnecessary datalayout string from a test case.
llvm-svn: 212063
2014-06-30 20:26:12 +00:00
Erik Eckstein 5b18a09748 test commit: add a comment line in GVN test file
llvm-svn: 212019
2014-06-30 07:19:02 +00:00
Dinesh Dwivedi adc07739a9 Added instruction combine to transform few more negative values addition to subtraction (Part 3)
This patch enables transforms for

(x + (~(y | c) + 1) --> x - (y | c) if c is odd

Differential Revision: http://reviews.llvm.org/D4210

llvm-svn: 211881
2014-06-27 07:47:35 +00:00
David Majnemer 9930398c5c GlobalOpt: Fix constantfold-initializers.ll test
The test added in r211762 was sloppy, the correct initializer wasn't
added to @llvm.global_ctors

Spotted by Pasi Parviainen!

llvm-svn: 211879
2014-06-27 07:36:26 +00:00
David Blaikie b0cdf530c3 ArgumentPromotion: Propagate debug locations on calls for which arguments are promoted.
llvm-svn: 211872
2014-06-27 05:32:09 +00:00
Arnold Schwaighofer ed988fb97d GVN: Preserve invariant.load metadata
If both instructions to be replaced are marked invariant the resulting
instruction is invariant.

rdar://13358910

Fix by Erik Eckstein!

llvm-svn: 211801
2014-06-26 19:51:19 +00:00
Dinesh Dwivedi 99281a0615 This patch removed duplicate code for matching patterns
which are now handled in SimplifyUsingDistributiveLaws() 
(after r211261)

Differential Revision: http://reviews.llvm.org/D4253

llvm-svn: 211768
2014-06-26 08:57:33 +00:00
Dinesh Dwivedi a716173581 Added instruction combine to transform few more negative values addition to subtraction (Part 2)
This patch enables transforms for

(x + (~(y | c) + 1)   -->   x - (y | c) if c is even

Differential Revision: http://reviews.llvm.org/D4209

llvm-svn: 211765
2014-06-26 05:40:22 +00:00
David Majnemer 6098b2f519 GlobalOpt: Don't optimize thread_local for initializers
Folding a reference to a thread_local variable into another global
variable's initializer is very problematic, there is no relocation that
exists to represent such an access.

llvm-svn: 211762
2014-06-26 03:02:19 +00:00
Hans Wennborg b03ebfb77e Don't build switch tables for dllimport and TLS variables in GEPs
This is a follow-up to r211331, which failed to notice that we were
returning early from ValidLookupTableConstant for GEPs.

llvm-svn: 211753
2014-06-26 00:30:52 +00:00
Tyler Nowicki 4b07b00786 Add Rpass-missed and Rpass-analysis reports to the loop vectorizer. The remarks give the vector width of vectorized loops and a brief analysis of loops that fail to be vectorized. For example, an analysis will be generated for loops containing control flow that cannot be simplified to a select. The optimization remarks also give the debug location of expressions that cannot be vectorized, for example the location of an unvectorizable call.
Reviewed by: Arnold Schwaighofer

llvm-svn: 211721
2014-06-25 17:50:15 +00:00
Eli Bendersky 5d5e18da3e Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]

These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix.  Metadata name
changes:

llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*

This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata. 

Patch by Mark Heffernan.

llvm-svn: 211710
2014-06-25 15:41:00 +00:00
Evgeniy Stepanov 10280dac1d [LICM] Don't create more than one copy of an instruction per loop exit block when sinking.
Fixes exponential compilation complexity in PR19835, caused by
LICM::sink not handling the following pattern well:

f = op g
e = op f, g
d = op e
c = op d, e
b = op c
a = op b, c

When an instruction with N uses is sunk, each of its operands gets N
new uses (all of them - phi nodes). In the example above, if a had 1
use, c would have 2, e would have 4, and g would have 8.

llvm-svn: 211673
2014-06-25 07:54:58 +00:00
Diego Novillo 56653fdada Add new debug kind LocTrackingOnly.
Summary:
This new debug emission kind supports emitting line location
information in all instructions, but stops code generation
from emitting debug info to the final output.

This mode is useful when the backend wants to track source
locations during code generation, but it does not want to
produce debug info. This is currently used by optimization
remarks (-pass-remarks, -pass-remarks-missed and
-pass-remarks-analysis).

To prevent debug info emission, DIBuilder never inserts the
annotation 'llvm.dbg.cu' when LocTrackingOnly is enabled.

Reviewers: echristo, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4234

llvm-svn: 211609
2014-06-24 17:02:03 +00:00
Benjamin Kramer c96a7f88b9 InstCombine: Disable umul.with.overflow recognition for vectors.
It doesn't make a lot on most targets and the code isn't ready for it. PR20113.

llvm-svn: 211583
2014-06-24 10:47:52 +00:00
Benjamin Kramer 6de786666a InstCombine: Don't try to reorder shuffles where the mask is a ConstantExpr.
We can't analyze the individual values of a vector expression. PR20114.

llvm-svn: 211581
2014-06-24 10:38:10 +00:00
David Majnemer 23fc9afa4d GlobalOpt: Don't optimize dllimport for initializers
Referencing a dllimport variable requires actually instructions, not
just a relocation.  This fixes PR19955.

Differential Revision: http://reviews.llvm.org/D4249

llvm-svn: 211571
2014-06-24 06:53:45 +00:00
Benjamin Kramer f504ec2298 Add a description to the test from r211433 explaining why it's written that way.
llvm-svn: 211465
2014-06-22 12:22:04 +00:00
Arnold Schwaighofer c11107cb1e LoopVectorizer: Fix a dominance issue
The induction variables start value needs to be defined before we branch
(overflow check) to the scalar preheader where we used it.

llvm-svn: 211460
2014-06-22 03:38:59 +00:00
Benjamin Kramer 0bf086f80f LoopUnrollRuntime: Check for overflow in the trip count calculation.
Fixes PR19823.

llvm-svn: 211436
2014-06-21 13:46:25 +00:00
Benjamin Kramer 8dd637aa04 SCEVExpander: Fold constant PHIs harder. The logic below only understands proper IVs.
PR20093.

llvm-svn: 211433
2014-06-21 11:47:18 +00:00
Stepan Dyatkovskiy 6baeb8805c Commited patch from Björn Steinbrink:
Summary:
Different range metadata can lead to different optimizations in later
passes, possibly breaking the semantics of the merged function. So range
metadata must be taken into consideration when comparing Load
instructions.

Thanks!

llvm-svn: 211391
2014-06-20 19:11:56 +00:00
Karthik Bhat e03a25da70 Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 

llvm-svn: 211339
2014-06-20 04:32:48 +00:00
Hans Wennborg 4dc895164a Don't build switch lookup tables for dllimport or TLS variables
We would previously put dllimport variables in switch lookup tables, which
doesn't work because the address cannot be used in a constant initializer.
This is basically the same problem that we have in PR19955.

Putting TLS variables in switch tables also desn't work, because the
address of such a variable is not constant.

Differential Revision: http://reviews.llvm.org/D4220

llvm-svn: 211331
2014-06-20 00:38:12 +00:00
Jingyue Wu 37fcb5919d [ValueTracking] Extend range metadata to call/invoke
Summary:
With this patch, range metadata can be added to call/invoke including
IntrinsicInst. Previously, it could only be added to load.

Rename computeKnownBitsLoad to computeKnownBitsFromRangeMetadata because
range metadata is not only used by load.

Update the language reference to reflect this change.

Test Plan:
Add several tests in range-2.ll to confirm the verifier is happy with
having range metadata on call/invoke.

Add two tests in AddOverFlow.ll to confirm annotating range metadata to
call/invoke can benefit InstCombine.

Reviewers: meheff, nlewycky, reames, hfinkel, eliben

Reviewed By: eliben

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4187

llvm-svn: 211281
2014-06-19 16:50:16 +00:00
Dinesh Dwivedi 562fd7534c Added instruction combine to transform few more negative values addition to subtraction (Part 1)
This patch enables transforms for following patterns.
  (x + (~(y & c) + 1)   -->   x - (y & c)
  (x + (~((y >> z) & c) + 1)   -->   x - ((y>>z) & c)

Differential Revision: http://reviews.llvm.org/D3733

llvm-svn: 211266
2014-06-19 10:36:52 +00:00
Dinesh Dwivedi b62e52e1b5 Refactored and updated SimplifyUsingDistributiveLaws() to
* Find factorization opportunities using identity values.
 * Find factorization opportunities by treating shl(X, C) as mul (X, shl(C))
 * Keep NSW flag while simplifying instruction using factorization.

This fixes PR19263.

Differential Revision: http://reviews.llvm.org/D3799

llvm-svn: 211261
2014-06-19 08:29:18 +00:00
David Majnemer 6cf6c05322 InstCombine: Stop two transforms dueling
InstCombineMulDivRem has:
// Canonicalize (X+C1)*CI -> X*CI+C1*CI.

InstCombineAddSub has:
// W*X + Y*Z --> W * (X+Z)  iff W == Y

These two transforms could fight with each other if C1*CI would not fold
away to something simpler than a ConstantExpr mul.

The InstCombineMulDivRem transform only acted on ConstantInts until
r199602 when it was changed to operate on all Constants in order to
let it fire on ConstantVectors.

To fix this, make this transform more careful by checking to see if we
actually folded away C1*CI.

This fixes PR20079.

llvm-svn: 211258
2014-06-19 07:14:33 +00:00
Nick Lewycky 8561a49c27 Move optimization of some cases of (A & C1)|(B & C2) from instcombine to instsimplify. Patch by Rahul Jain, plus some last minute changes by me -- you can blame me for any bugs.
llvm-svn: 211252
2014-06-19 03:51:46 +00:00
Nick Lewycky c961030ac2 Make instsimplify's analysis of icmp eq/ne use computeKnownBits to determine whether the icmp is always true or false. Patch by Suyog Sarda!
llvm-svn: 211251
2014-06-19 03:35:49 +00:00
Matt Arsenault a0050b0961 R600/SI: Add intrinsics for various math instructions.
These will be used for custom lowering and for library
implementations of various math functions, so it's useful
to expose these as builtins.

llvm-svn: 211247
2014-06-19 01:19:19 +00:00
Dinesh Dwivedi 657105e582 Fixed jump threading going to infinite loop.
This patch add code to remove unreachable blocks from function
as they may cause jump threading to stuck in infinite loop.

Differential Revision: http://reviews.llvm.org/D3991

llvm-svn: 211103
2014-06-17 14:34:19 +00:00
Jingyue Wu 33bd53df7f [InstCombine] mark ADD with nuw if no unsigned overflow
Summary:
As a starting step, we only use one simple heuristic: if the sign bits
of both a and b are zero, we can prove "add a, b" do not unsigned
overflow, and thus convert it to "add nuw a, b".

Updated all affected tests and added two new tests (@zero_sign_bit and
@zero_sign_bit2) in AddOverflow.ll

Test Plan: make check-all

Reviewers: eliben, rafael, meheff, chandlerc

Reviewed By: chandlerc

Subscribers: chandlerc, llvm-commits

Differential Revision: http://reviews.llvm.org/D4144

llvm-svn: 211084
2014-06-17 00:42:07 +00:00
Duncan P. N. Exon Smith 73686d305a SROA: Only split loads on byte boundaries
r199771 accidently broke the logic that makes sure that SROA only splits
load on byte boundaries.  If such a split happens, some bits get lost
when reassembling loads of wider types, causing data corruption.

Move the width check up to reject such splits early, avoiding the
corruption.  Fixes PR19250.

Patch by: Björn Steinbrink <bsteinbr@gmail.com>

llvm-svn: 211082
2014-06-17 00:19:35 +00:00
Eli Bendersky ff90324599 Teach LoopUnrollPass to respect loop unrolling hints in metadata.
[This is resubmitting r210721, which was reverted due to suspected breakage
which turned out to be unrelated].

Some extra review comments were addressed. See D4090 and D4147 for more details.

The Clang change that produces this metadata was committed in r210667

Patch by Mark Heffernan.

llvm-svn: 211076
2014-06-16 23:53:02 +00:00
Jim Grosbach fff5663d48 LowerSwitch: track bounding range for the condition tree.
When LowerSwitch transforms a switch instruction into a tree of ifs it
is actually performing a binary search into the various case ranges, to
see if the current value falls into one cases range of values.

So, if we have a program with something like this:

switch (a) {
case 0:
  do0();
  break;
case 1:
  do1();
  break;
case 2:
  do2();
  break;
default:
  break;
}

the code produced is something like this:

  if (a < 1) {
    if (a == 0) {
      do0();
    }
  } else {
    if (a < 2) {
      if (a == 1) {
        do1();
      }
    } else {
      if (a == 2) {
        do2();
      }
    }
  }

This code is inefficient because the check (a == 1) to execute do1() is
not needed.

The reason is that because we already checked that (a >= 1) initially by
checking that also  (a < 2) we basically already inferred that (a == 1)
without the need of an extra basic block spawned to check if actually (a
== 1).

The patch addresses this problem by keeping track of already
checked bounds in the LowerSwitch algorithm, so that when the time
arrives to produce a Leaf Block that checks the equality with the case
value / range the algorithm can decide if that block is really needed
depending on the already checked bounds .

For example, the above with "a = 1" would work like this:

the bounds start as LB: NONE , UB: NONE
as (a < 1) is emitted the bounds for the else path become LB: 1 UB:
NONE. This happens because by failing the test (a < 1) we know that the
value "a" cannot be smaller than 1 if we enter the else branch.
After the emitting the check (a < 2) the bounds in the if branch become
LB: 1 UB: 1. This is because by checking that "a" is smaller than 2 then
the upper bound becomes 2 - 1 = 1.

When it is time to emit the leaf block for "case 1:" we notice that 1
can be squeezed exactly in between the LB and UB, which means that if we
arrived to that block there is no need to emit a block that checks if (a
== 1).

Patch by: Marcello Maggioni <hayarms@gmail.com>

llvm-svn: 211038
2014-06-16 16:55:20 +00:00
Jingyue Wu baabe5091c Canonicalize addrspacecast ConstExpr between different pointer types
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.

Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.

Piggyback a minor refactor in InstCombineCasts.cpp

Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll

llvm-svn: 211004
2014-06-15 21:40:57 +00:00
Jiangning Liu 96e92c1d75 Move GlobalMerge from Transform to CodeGen.
This patch is to move GlobalMerge pass from Transform/Scalar                                                           
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.

llvm-svn: 210951
2014-06-13 22:57:59 +00:00
Tim Northover 20b9f739eb Atomics: make use of the "cmpxchg weak" instruction.
This also simplifies the IR we create slightly: instead of working out
where success & failure should go manually, it turns out we can just
always jump to a success/failure block created for the purpose. Later
phases will sort out the mess without much difficulty.

llvm-svn: 210917
2014-06-13 16:45:52 +00:00
Tim Northover d039abdeeb Atomics: switch direction of cmpxchg comparison
This has two benefits: it makes the result more suitable for direct
insertaion into the struct to emulate the new cmpxchg, and it means
the name we give the instruction matches its actual effect better.

llvm-svn: 210916
2014-06-13 16:45:36 +00:00
Tim Northover 6bf04e4512 SCCP: update for cmpxchg returning { iN, i1 } now.
I accidentally missed this one since its use looked OK locally.

llvm-svn: 210909
2014-06-13 14:54:09 +00:00
Tim Northover 420a216817 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Duncan P. N. Exon Smith fd5c553f54 GVN: Enable value forwarding for calloc
Enable value forwarding for loads from `calloc()` without an intervening
store.

This change extends GVN to handle the following case:

    %1 = tail call noalias i8* @calloc(i64 1, i64 4)
    %2 = bitcast i8* %1 to i32*
    ; This load is trivially constant zero
    %3 = load i32* %2, align 4

This is analogous to the handling for `malloc()` in the same places.
`malloc()` returns `undef`; `calloc()` returns a zero value.  Note that
it is correct to return zero even for out of bounds GEPs since the
result of such a GEP would be undefined.

Patch by Philip Reames!

llvm-svn: 210828
2014-06-12 21:16:19 +00:00
Eli Bendersky dc6de2ce29 Revert r210721 as it causes breakage in internal builds (and possibly GDB).
llvm-svn: 210807
2014-06-12 18:05:39 +00:00
Dinesh Dwivedi 95f0d51bd3 This removes TODO added in http://reviews.llvm.org/D3658
The patch transforms

ABS(NABS(X)) -> ABS(X)
NABS(ABS(X)) -> NABS(X)

Differential Revision: http://reviews.llvm.org/D4040

llvm-svn: 210782
2014-06-12 14:06:00 +00:00
Eli Bendersky 899bef099f Teach LoopUnrollPass to respect loop unrolling hints in metadata.
See http://reviews.llvm.org/D4090 for more details.

The Clang change that produces this metadata was committed in r210667

Patch by Mark Heffernan.

llvm-svn: 210721
2014-06-11 23:15:35 +00:00
Chad Rosier 5ea14e09e9 [Reassociate] FileCheckize and cleanup a few testcases. No functional change
intended.

llvm-svn: 210685
2014-06-11 18:28:45 +00:00
Jiangning Liu b2ae37fb67 Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.

llvm-svn: 210640
2014-06-11 06:44:53 +00:00
Jiangning Liu 3e5b855a51 Rename global-merge to enable-global-merge.
llvm-svn: 210639
2014-06-11 06:35:26 +00:00
Juergen Ributzka b2e4edb5c8 [ConstantHoisting][X86] Improve the cost model for small constants with large types (i64 and above).
This improves the X86 cost model for small constants with large types. Before
this commit we would even hoist trivial constants such as i96 2.

This is related to <rdar://problem/17070936>

llvm-svn: 210504
2014-06-10 00:32:29 +00:00
Alp Toker d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Matt Arsenault 44f60d0a60 Look through addrspacecasts when turning ptr comparisons into
index comparisons.

llvm-svn: 210488
2014-06-09 19:20:29 +00:00
Stepan Dyatkovskiy 297c826385 Added functions cross-reference test.
Originally this similar was initiated by Björn Steinbrink here:
http://reviews.llvm.org/D3437

Bug itself has been fixed by principal changes in MergeFunctions. Though
special checks for functions merging are still actual. And the test has
been accepted with slight modifications.

llvm-svn: 210486
2014-06-09 19:03:02 +00:00
Jingyue Wu 5c7b1aed5d [SeparateConstOffsetFromGEP] inbounds zext => sext for better splitting
For each array index that is in the form of zext(a), convert it to sext(a)
if we can prove zext(a) <= max signed value of typeof(a). The conversion
helps to split zext(x + y) into sext(x) + sext(y).

Reviewed in http://reviews.llvm.org/D4060

llvm-svn: 210444
2014-06-08 23:49:34 +00:00
Jingyue Wu f2b85881d4 [SeparateConstOffsetFromGEP] make two tests more strict
inbounds are not necessary in these two tests. zext(a +nuw b) = zext(a) +
zext(b) should hold with or without inbounds.

llvm-svn: 210437
2014-06-08 20:01:42 +00:00
Rafael Espindola 4ba22f0813 Revert 209903 and 210040.
The messages were

 "PR19753: Optimize comparisons with "ashr exact" of a constanst."
 "Added support to optimize comparisons with "lshr exact" of a constant."

They were not correctly handling signed/unsigned operation differences,
causing pr19958.

llvm-svn: 210393
2014-06-07 04:12:35 +00:00
Jingyue Wu 77145d9410 InstCombine: Canonicalize addrspacecast between different element types
addrspacecast X addrspace(M)* to Y addrspace(N)*

-->

bitcast X addrspace(M)* to Y addrspace(M)*
addrspacecast Y addrspace(M)* to Y addrspace(N)*

Updat all affected tests and add several new tests in addrspacecast.ll.

This patch is based on http://reviews.llvm.org/D2186 (authored by Matt
Arsenault) with fixes and more tests.

llvm-svn: 210375
2014-06-06 21:52:55 +00:00
Michael Zolotukhin 07ee478829 Fix typo in a test from r210342.
llvm-svn: 210343
2014-06-06 15:49:47 +00:00
Michael Zolotukhin 1c51612d42 [SLP] Enable vectorization of GEP expressions.
The use cases look like the following:
    x->a = y->a + 10
    x->b = y->b + 12

llvm-svn: 210342
2014-06-06 15:34:24 +00:00
Dinesh Dwivedi 3217b6c661 Added select flavour for ABS and NEG(ABS)
This patch can identify 
  ABS(X) ==> (X >s 0) ? X : -X and (X >s -1) ? X : -X
  ABS(X) ==> (X <s 0) ? -X : X and (X <s 1) ? -X : X
  NABS(X) ==> (X >s 0) ? -X : X and (X >s -1) ? -X : X
  NABS(X) ==> (X <s 0) ? X : -X and (X <s 1) ? X : -X
  
and can transform
  ABS(ABS(X)) -> ABS(X)
  NABS(NABS(X)) -> NABS(X)
  
Differential Revision: http://reviews.llvm.org/D3658

llvm-svn: 210312
2014-06-06 06:54:45 +00:00
Karthik Bhat bf56d44cab Fix PR19657 (scalar loads not combined into vector load)
If we have common uses on separate paths in the tree; process the one with greater common depth first.
This makes sure that we do not assume we need to extract a load when it is actually going to be part of a vectorized tree.

Review: http://reviews.llvm.org/D3800
llvm-svn: 210310
2014-06-06 06:20:08 +00:00
Rafael Espindola 42a4c9f9e0 Allow aliases to be unnamed_addr.
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.

It seems natural to allow unnamed_addr in aliases:

* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.

This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.

llvm-svn: 210302
2014-06-06 01:20:28 +00:00
Jingyue Wu 84465473e7 Fixed several correctness issues in SeparateConstOffsetFromGEP
Most issues are on mishandling s/zext.

Fixes:

1. When rebuilding new indices, s/zext should be distributed to
sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not
sext(a + b) + 5. This also affects the logic of recursively looking for a
constant offset, we need to include s/zext into the context of the searching.

2. Function find should return the bitwidth of the constant offset instead of
always sign-extending it to i64.

3. Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP
indices to pointer-size before computing the address. Therefore, gep base,
zext(a + b) != gep base, a + b

Improvements:

1. Add an optimization for splitting sext(a + b): if a + b is proven
non-negative (e.g., used as an index of an inbound GEP) and one of a, b is
non-negative, sext(a + b) = sext(a) + sext(b)

2. Function Distributable checks whether both sext and zext can be distributed
to operands of a binary operator. This helps us split zext(sext(a + b)) to
zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow.

Refactoring:

Merge some common logic of handling add/sub/or in find.

Testing:

Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes
we made.

llvm-svn: 210291
2014-06-05 22:07:33 +00:00
Rafael Espindola c286f4bf2a Add a testcase where there is an overflow when combining two constants.
I noticed that a proposed optimization would have prevented this.

llvm-svn: 210287
2014-06-05 21:29:49 +00:00
Nick Lewycky ff114dae5a Fix coverage for files with global constructors again. Adds a testcase to the commit from r206671, as requested by David Blaikie.
llvm-svn: 210239
2014-06-05 04:31:43 +00:00
Alexey Samsonov ad81f0f419 Use AArch64 instead of now removed ARM64 in test configs
llvm-svn: 210229
2014-06-05 00:25:30 +00:00
Rafael Espindola 04c2258624 InstCombine: Improvement to check if signed addition overflows.
This patch implements two things:

1. If we know one number is positive and another is negative, we return true as
    signed addition of two opposite signed numbers will never overflow.

2. Implemented TODO : If one of the operands only has one non-zero bit, and if
    the other operand has a known-zero bit in a more significant place than it
    (not including the sign bit) the ripple may go up to and fill the zero, but
    won't change the sign. e.x -  (x & ~4) + 1

We make sure that we are ignoring 0 at MSB.

Patch by Suyog Sarda.

llvm-svn: 210186
2014-06-04 15:39:14 +00:00
Nick Lewycky 5f53ddd0cc Ignore line numbers on debug intrinsics. Add an assert to ensure that we aren't emitting line number zero, the .gcno format uses this to indicate that the next field is a filename.
llvm-svn: 210068
2014-06-03 04:25:36 +00:00
Rafael Espindola 64c1e18033 Allow alias to point to an arbitrary ConstantExpr.
This  patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is
up to MC (or the system assembler) to decide if that expression is valid or not.

This reduces our ability to diagnose invalid uses and how early we can spot
them, but it also lets us do things like

@test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32),
                                 i32 ptrtoint (i32* @bar to i32)) to i32*)

An important implication of this patch is that the notion of aliased global
doesn't exist any more. The alias has to encode the information needed to
access it in its metadata (linkage, visibility, type, etc).

Another consequence to notice is that getSection has to return a "const char *".
It could return a NullTerminatedStringRef if there was such a thing, but when
that was proposed the decision was to just uses "const char*" for that.

llvm-svn: 210062
2014-06-03 02:41:57 +00:00
Rafael Espindola d1a2c2d905 Add back commit r210029.
The code was actually correct. Sorry for the confusion. I have expanded the
comment saying why the analysis is valid to avoid me misunderstaning it
again in the future.

llvm-svn: 210052
2014-06-02 22:01:04 +00:00
Rafael Espindola 80546be566 Convert test to FileCheck.
llvm-svn: 210049
2014-06-02 21:23:54 +00:00
Rafael Espindola 582c890fbe Revert "Add the nsw flag when we detect that an add will not signed overflow."
This reverts commit r210029.

It was not correctly handling cases where LHS and RHS had multiple but different
sign bits.

llvm-svn: 210048
2014-06-02 21:12:19 +00:00
Rafael Espindola 6b04ef785e Added support to optimize comparisons with "lshr exact" of a constant.
Patch by Rahul Jain.

llvm-svn: 210040
2014-06-02 19:19:04 +00:00
Rafael Espindola 82899febf0 Add the nsw flag when we detect that an add will not signed overflow.
We already had a function for checking this, we were just using it only in
specialized cases.

llvm-svn: 210029
2014-06-02 14:32:58 +00:00
Dinesh Dwivedi ce5d35a9d0 Added inst combine tarnsform for (1 << X) & C pattrens where C is (some PowerOf2 - 1)
This patch can handles following cases from http://nondot.org/sabre/LLVMNotes/InstCombine.txt
  "((1 << X) & 7) == 0" ==> "X > 2"
  "((1 << X) & 7) != 0" ==> "X < 3".

Differential Revision: http://reviews.llvm.org/D3678

llvm-svn: 210007
2014-06-02 07:57:24 +00:00
Dinesh Dwivedi 43e127bded Added inst combine transforms for single bit tests from Chris's note
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing

Differential Revision: http://reviews.llvm.org/D3777

llvm-svn: 210006
2014-06-02 07:24:36 +00:00
Benjamin Kramer 4968944376 [Reassociate] Similar to "X + -X" -> "0", added code to handle "X + ~X" -> "-1".
Handle "X + ~X" -> "-1" in the function Value *Reassociate::OptimizeAdd(Instruction *I, SmallVectorImpl<ValueEntry> &Ops);
This patch implements:
TODO: We could handle "X + ~X" -> "-1" if we wanted, since "-X = ~X+1".

Patch by Rahul Jain!

Differential Revision: http://reviews.llvm.org/D3835

llvm-svn: 209973
2014-05-31 15:01:54 +00:00
Matt Arsenault c8fc08c31b Make bitcast, extractelement, and insertelement considered cheap for speculation.
This helps more branches into selects. On R600,
vectors are cheap and anything that helps
remove branches is very good.

llvm-svn: 209914
2014-05-30 18:34:43 +00:00
Rafael Espindola c323952cb4 PR19753: Optimize comparisons with "ashr exact" of a constanst.
Patch by suyog sarda.

llvm-svn: 209903
2014-05-30 15:54:32 +00:00
Tim Northover b4ddc0845a ARM & AArch64: make use of common cmpxchg idioms after expansion
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.

When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.

This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.

I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.

rdar://problem/16227836

llvm-svn: 209883
2014-05-30 10:09:59 +00:00
Karthik Bhat 5ab7795649 Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP Vectorizer.
This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857

llvm-svn: 209873
2014-05-30 04:31:24 +00:00
Nick Lewycky 59633cb478 When analyzing params/args for readnone/readonly, don't forget to consider that a pointer argument may be passed through a callsite to the return, and that we may need to analyze it. Fixes a bug reported on llvm-dev: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073098.html
llvm-svn: 209870
2014-05-30 02:31:27 +00:00
Arnold Schwaighofer e2067680a6 LoopVectorizer: Add a check that the backedge taken count + 1 does not overflow
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.

In case of overflow the code now jumps to the scalar loop.

Fixes PR17288.

llvm-svn: 209854
2014-05-29 22:10:01 +00:00
Louis Gerbarg c6b506a0ae Add support for combining GEPs across PHI nodes
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:

  GEP1--\
        |-->PHI1-->GEP3
  GEP2--/

This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):

  GEP1--\                     --\                           --\
        |-->PHI1-->GEP3  ==>    |-->PHI2->GEP12->GEP3 == >    |-->PHI2->GEP123
  GEP2--/                     --/                           --/

This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.

Tests included.

llvm-svn: 209843
2014-05-29 20:29:47 +00:00
Rafael Espindola a248f536b3 Revert "Revert "Revert "InstCombine: Improvement to check if signed addition overflows."""
This reverts commit r209776.

It was miscompiling llvm::SelectionDAGISel::MorphNode.

llvm-svn: 209817
2014-05-29 14:39:16 +00:00
Dinesh Dwivedi d266cb1a0b LCSSA should be performed on the outermost affected loop while unrolling loop.
During loop-unroll, loop exits from the current loop may end up in in different
outer loop. This requires to re-form LCSSA recursively for one level down from
the outer most loop where loop exits are landed during unroll. This fixes PR18861.

Differential Revision: http://reviews.llvm.org/D2976

llvm-svn: 209796
2014-05-29 06:47:23 +00:00
Michael J. Spencer 289067cc3d Add LoadCombine pass.
This pass is disabled by default. Use -combine-loads to enable in -O[1-3]

Differential revision: http://reviews.llvm.org/D3580

llvm-svn: 209791
2014-05-29 01:55:07 +00:00
Rafael Espindola 6196b7430e Revert "Revert "InstCombine: Improvement to check if signed addition overflows.""
This reverts commit r209762, bringing back r209746. It was not responsible for the libc++ build failure

llvm-svn: 209776
2014-05-28 21:43:52 +00:00
Rafael Espindola 910528a3eb Revert "Add support for combining GEPs across PHI nodes"
This reverts commit r209755.

it was the real cause of the libc++ build failure.

llvm-svn: 209775
2014-05-28 21:41:21 +00:00
Rafael Espindola fb59b05ca4 Revert "InstCombine: Improvement to check if signed addition overflows."
This reverts commit r209746.

It looks it is causing a crash while building libcxx. I am trying to get a
reduced testcase.

llvm-svn: 209762
2014-05-28 18:48:10 +00:00
Louis Gerbarg 727f1cbb17 Add support for combining GEPs across PHI nodes
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:

  GEP1--\
        |-->PHI1-->GEP3
  GEP2--/

This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):

  GEP1--\                     --\                           --\
        |-->PHI1-->GEP3  ==>    |-->PHI2->GEP12->GEP3 == >    |-->PHI2->GEP123
  GEP2--/                     --/                           --/

This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.

Tests included.

llvm-svn: 209755
2014-05-28 17:38:31 +00:00
Rafael Espindola 085b57941f InstCombine: Improvement to check if signed addition overflows.
This patch implements two things:

1. If we know one number is positive and another is negative, we return true as
   signed addition of two opposite signed numbers will never overflow.

2. Implemented TODO : If one of the operands only has one non-zero bit, and if
   the other operand has a known-zero bit in a more significant place than it
   (not including the sign bit) the ripple may go up to and fill the zero, but
   won't change the sign. e.x -  (x & ~4) + 1

We make sure that we are ignoring 0 at MSB.

Patch by Suyog Sarda.

llvm-svn: 209746
2014-05-28 15:30:40 +00:00
Arnaud A. de Grandmaison de5ff26865 No need for those tests to go thru llvm-as and/or llvm-dis.
opt can handle them by itself.

llvm-svn: 209689
2014-05-27 22:03:28 +00:00
Jingyue Wu 76cbea6b6d Fixed a test in r209670
The test was outdated with r209537.

llvm-svn: 209671
2014-05-27 18:12:55 +00:00
Jingyue Wu 80a738dc62 Distribute sext/zext to the operands of and/or/xor
This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can
extract a constant offset from "s/zext and/or/xor A, B".

Added a new test @ext_or to verify this enhancement.

Refactoring the code, I also extracted some common logic to function
Distributable. 

llvm-svn: 209670
2014-05-27 18:00:00 +00:00
Filipe Cabecinhas e8d6a1e82f Post-commit fixes for r209643
Detected by Daniel Jasper, Ilia Filippov, and Andrea Di Biagio
Fixed the argument order to select (the mask semantics to blendv* are the
inverse of select) and fixed the tests
Added parenthesis to the assert condition
Ran clang-format

llvm-svn: 209667
2014-05-27 16:54:33 +00:00
Filipe Cabecinhas 82ac07c283 Convert some X86 blendv* intrinsics into IR.
Summary:
Implemented an InstCombine transformation that takes a blendv* intrinsic
call and translates it into an IR select, if the mask is constant.

This will eventually get lowered into blends with immediates if possible,
or pblendvb (with an option to further optimize if we can transform the
pblendvb into a blend+immediate instruction, depending on the selector).
It will also enable optimizations by the IR passes, which give up on
sight of the intrinsic.

Both the transformation and the lowering of its result to asm got shiny
new tests.

The transformation is a bit convoluted because of blendvp[sd]'s
definition:

Its mask is a floating point value! This forces us to convert it and get
the highest bit. I suppose this happened because the mask has type
__m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin.

I will send an email to llvm-dev to discuss if we want to change this or
not.

Reviewers: grosbach, delena, nadav

Differential Revision: http://reviews.llvm.org/D3859

llvm-svn: 209643
2014-05-27 03:42:20 +00:00
Tim Northover 3b0846e8f7 AArch64/ARM64: move ARM64 into AArch64's place
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.

"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.

This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.

llvm-svn: 209577
2014-05-24 12:50:23 +00:00
Tim Northover cc08e1fe1b AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.
I'm doing this in two phases for a better "git blame" record. This
commit removes the previous AArch64 backend and redirects all
functionality to ARM64. It also deduplicates test-lines and removes
orphaned AArch64 tests.

The next step will be "git mv ARM64 AArch64" and rewire most of the
tests.

Hopefully LLVM is still functional, though it would be even better if
no-one ever had to care because the rename happens straight
afterwards.

llvm-svn: 209576
2014-05-24 12:42:26 +00:00
Michael Zolotukhin d4c724625a Implement sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformation in Scalar
Evolution.

That helps SLP-vectorizer to recognize consecutive loads/stores.

<rdar://problem/14860614>

llvm-svn: 209568
2014-05-24 08:09:57 +00:00
Nico Rieck bd15945ab2 Fix broken FileCheck prefixes
llvm-svn: 209538
2014-05-23 19:06:24 +00:00
Jingyue Wu bbb6e4a885 Add the extracted constant offset using GEP
Fixed a TODO in r207783.

Add the extracted constant offset using GEP instead of ugly
ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR
easier to understand. 

Updated all affected tests, and added a new test in split-gep.ll to cover a
corner case where emitting uglygep is necessary.

llvm-svn: 209537
2014-05-23 18:39:40 +00:00
Justin Bogner cbb8438bb3 ScalarEvolution: Fix handling of AddRecs in isKnownPredicate
ScalarEvolution::isKnownPredicate() can wrongly reduce a comparison
when both the LHS and RHS are SCEVAddRecExprs. This checks that both
LHS and RHS are guarded in the case when both are SCEVAddRecExprs.

The test case is against indvars because I could not find a way to
directly test SCEV.

Patch by Sanjay Patel!

llvm-svn: 209487
2014-05-23 00:06:56 +00:00
Diego Novillo 7f8af8bf91 Add support for missed and analysis optimization remarks.
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.

-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.

-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.

The patch also:

1- Adds support in the inliner for the two new remarks and a
   test case.

2- Moves emitOptimizationRemark* functions to the llvm namespace.

3- Adds an LLVMContext argument instead of making them member functions
   of LLVMContext.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3682

llvm-svn: 209442
2014-05-22 14:19:46 +00:00
Eli Bendersky f13a05607c Similar to bitcast, treat addrspacecast as a foldable operand.
Added a test sink-addrspacecast.ll to verify this change.

Patch by Jingyue Wu.

llvm-svn: 209343
2014-05-22 00:02:52 +00:00
Nick Lewycky ec373545b8 Teach isKnownNonNull that a nonnull return is not null. Add a test for this case as well as the case of a nonnull attribute (already handled but not tested).
llvm-svn: 209193
2014-05-20 05:13:21 +00:00
Juergen Ributzka 431761771c [ConstantHoisting][X86] Change the cost model to never hoist constants for types larger than i128.
Currently the X86 backend doesn't support types larger than i128 very well. For
example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted.

This fix changes the cost model to never hoist constants for types larger than
i128. Once the codegen issues have been resolved, the cost model can be updated
to allow also larger types.

This is related to <rdar://problem/16954938>

llvm-svn: 209162
2014-05-19 21:00:53 +00:00
Peter Collingbourne 68a889757d Check the alwaysinline attribute on the call as well as on the caller.
Differential Revision: http://reviews.llvm.org/D3815

llvm-svn: 209150
2014-05-19 18:25:54 +00:00
Benjamin Kramer 6dd790c617 Flip on vectorization of bswap intrinsics.
The cost model conservatively assumes that it will always get scalarized and
that's about as good as we can get with the generic TTI; reasoning whether a
shuffle with an efficient lowering is available is hard. We can override that
conservative estimate for some targets in the future.

llvm-svn: 209125
2014-05-19 13:48:08 +00:00
Dinesh Dwivedi f82f16e3e6 Added inst-combine for 'MIN(MIN(A, 97), 23)' and 'MAX(MAX(A, 23), 97)'
This removes TODO added in r208849 [http://reviews.llvm.org/D3629]

MIN(MIN(A, 97), 23) -> MIN(A, 23)
MAX(MAX(A, 23), 97) -> MAX(A, 97)

Differential Revision: http://reviews.llvm.org/D3785

llvm-svn: 209110
2014-05-19 07:08:32 +00:00
NAKAMURA Takumi 7ef81a4f98 Revert r209049 and r209065, "Add support for combining GEPs across PHI nodes"
It broke clang selfhosting even after r209065.

llvm-svn: 209067
2014-05-17 14:39:21 +00:00
Louis Gerbarg 8d2a43e9be Add support for combining GEPs across PHI nodes
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:

  GEP1--\
        |-->PHI1-->GEP3
  GEP2--/

This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):

  GEP1--\                     --\                           --\
        |-->PHI1-->GEP3  ==>    |-->PHI2->GEP12->GEP3 == >    |-->PHI2->GEP123
  GEP2--/                     --/                           --/

This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.

Tests included.

rdar://15547484

llvm-svn: 209049
2014-05-16 23:47:24 +00:00
Reid Kleckner fceb76f5f9 Add comdat key field to llvm.global_ctors and llvm.global_dtors
This allows us to put dynamic initializers for weak data into the same
comdat group as the data being initialized.  This is necessary for MSVC
ABI compatibility.  Once we have comdats for guard variables, we can use
the combination to help GlobalOpt fire more often for weak data with
guarded initialization on other platforms.

Reviewers: nlewycky

Differential Revision: http://reviews.llvm.org/D3499

llvm-svn: 209015
2014-05-16 20:39:27 +00:00
Rafael Espindola 6b238633b7 Fix most of PR10367.
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.

To avoid changing all alias related tests in this patches, I kept the common
syntax

@foo = alias i32* @bar

to mean the same as now. The cases that used to use cast now use the more
general syntax

@foo = alias i16, i32* @bar.

Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.

For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.

One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.

A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.

llvm-svn: 209007
2014-05-16 19:35:39 +00:00
David Majnemer 78910fc4da InstSimplify: Improve handling of ashr/lshr
Summary:
Analyze the range of values produced by ashr/lshr cst, %V when it is
being used in an icmp.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3774

llvm-svn: 209000
2014-05-16 17:14:03 +00:00
David Majnemer ea8d5dbf24 InstSimplify: Optimize using dividend in sdiv
Summary:
The dividend in an sdiv tells us the largest and smallest possible
results.  Use this fact to optimize comparisons against an sdiv with a
constant dividend.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3795

llvm-svn: 208999
2014-05-16 16:57:04 +00:00
Rafael Espindola 5a52b9f139 Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

llvm-svn: 208978
2014-05-16 13:02:18 +00:00
Jiangning Liu 932e1c3924 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.

llvm-svn: 208934
2014-05-15 23:45:42 +00:00
Reid Kleckner 900d46ff39 Don't insert lifetime.end markers between a musttail call and ret
The allocas going out of scope are immediately killed by the return
instruction.

This is a resend of r208912, which was committed accidentally.

Reviewers: chandlerc

Differential Revision: http://reviews.llvm.org/D3792

llvm-svn: 208920
2014-05-15 21:10:46 +00:00
Reid Kleckner b16564109b Revert "Don't insert lifetime.end markers between a musttail call and ret"
This reverts commit r208912.

It was committed accidentally without review.

llvm-svn: 208914
2014-05-15 20:41:05 +00:00
Reid Kleckner 26ab7ead45 Don't insert lifetime.end markers between a musttail call and ret
The allocas going out of scope are immediately killed by the return
instruction.

Reviewers: chandlerc

Differential Revision: http://reviews.llvm.org/D3630

llvm-svn: 208912
2014-05-15 20:39:13 +00:00
Reid Kleckner f0915aa0e6 Teach the inliner how to preserve musttail invariants
The interesting case is what happens when you inline a musttail call
through a musttail call site.  In this case, we can't break perfect
forwarding or allow any stack growth.

Instead of merging control flow from the inlined return instruction
after a musttail call into the body of the caller, leave the inlined
return instruction in the caller so that the musttail call stays in the
tail position.

More work is required in http://reviews.llvm.org/D3630 to handle the
case where the inlined function has dynamic allocas or byval arguments.

Reviewers: chandlerc

Differential Revision: http://reviews.llvm.org/D3491

llvm-svn: 208910
2014-05-15 20:11:28 +00:00
Chandler Carruth a0e5695ad9 Teach the constant folder to look through bitcast constant expressions
much more effectively when trying to constant fold a load of a constant.
Previously, we only handled bitcasts by trying to find a totally generic
byte representation of the constant and use that. Now, we look through
the bitcast to see what constant we might fold the load into, and then
try to form a constant expression cast of the found value that would be
equivalent to loading the value.

You might wonder why on earth this actually matters. Well, turns out
that the Itanium ABI causes us to create a single array for a vtable
where the first elements are virtual base offsets, followed by the
virtual function pointers. Because the array is homogenous the element
type is consistently i8* and we inttoptr the virtual base offsets into
the initial elements.

Then constructors bitcast these pointers to i64 pointers prior to
loading them. Boom, no more constant folding of virtual base offsets.
This is the first fix to LLVM to address the *insane* performance Eric
Niebler discovered with Clang on his range comprehensions[1]. There is
more to come though, this doesn't *really* fix the problem fully.

[1]: http://ericniebler.com/2014/04/27/range-comprehensions/

llvm-svn: 208856
2014-05-15 09:56:28 +00:00
Dinesh Dwivedi 83c11da849 Reverting r208848, reason: build failure: sanitizer-x86_64-linux-bootstrap/builds/3399
llvm-svn: 208852
2014-05-15 08:22:55 +00:00
Dinesh Dwivedi f675f4201b Added instcombine for 'MIN(MIN(A, 27), 93)' and 'MAX(MAX(A, 93), 27)'
MIN(MIN(A, 23), 97) -> MIN(A, 23)
MAX(MAX(A, 97), 23) -> MAX(A, 97)

Differential Revision: http://reviews.llvm.org/D3629

llvm-svn: 208849
2014-05-15 06:13:40 +00:00
Dinesh Dwivedi 837c16097e Added inst combine transforms for single bit tests from Chris's note
if ((x & C) == 0) x |= C becomes x |= C
if ((x & C) != 0) x ^= C becomes x &= ~C
if ((x & C) == 0) x ^= C becomes x |= C
if ((x & C) != 0) x &= ~C becomes x &= ~C
if ((x & C) == 0) x &= ~C becomes nothing

Z3 Verifications code for above transform
http://rise4fun.com/Z3/Pmsh

Differential Revision: http://reviews.llvm.org/D3717

llvm-svn: 208848
2014-05-15 06:01:33 +00:00
David Majnemer 186c94244c InstCombine: Optimize -x s< cst
Summary:
This gets rid of a sub instruction by moving the negation to the
constant when valid.

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3773

llvm-svn: 208827
2014-05-15 00:02:20 +00:00
David Majnemer 2d6c023576 InstSimplify: Optimize signed icmp of -(zext V)
Summary:
We know that -(zext V) will always be <= zero, simplify signed icmps
that have these.

Uncovered using http://www.cs.utah.edu/~regehr/souper/

Reviewers: nicholas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3754

llvm-svn: 208809
2014-05-14 20:16:28 +00:00
Serge Pavlov e6de9e39a8 Fix the case when reordering shuffle and binop produces a constant.
This resolves PR19737.

llvm-svn: 208762
2014-05-14 09:05:09 +00:00
Nick Lewycky f0cf8fa941 Optimize integral reciprocal (udiv 1, x and sdiv 1, x) to not use division. This fires exactly once in a clang bootstrap, but covers a few different results from http://www.cs.utah.edu/~regehr/souper/
llvm-svn: 208750
2014-05-14 03:03:05 +00:00
Serge Pavlov b575ee8294 Fix type of shuffle resulted from shuffle merge.
This fix resolves PR19730.

llvm-svn: 208666
2014-05-13 06:07:21 +00:00
Rafael Espindola cae6590f30 Convert test to FileCheck.
llvm-svn: 208658
2014-05-13 00:31:31 +00:00