Commit Graph

10365 Commits

Author SHA1 Message Date
Arnold Schwaighofer 3610139ac5 LoopVectorizer: Improve reduction variable identification
The two nested loops were confusing and also conservative in identifying
reduction variables. This patch replaces them by a worklist based approach.

llvm-svn: 181369
2013-05-07 21:55:37 +00:00
Arnold Schwaighofer e78b76fbed LoopVectorize: getConsecutiveVector must respect signed arithmetic
We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.

Should fix PR15882.

llvm-svn: 181286
2013-05-07 04:37:05 +00:00
David Majnemer 70f286d95f InstCombine: (X ^ signbit) + C -> X + (signbit ^ C)
llvm-svn: 181249
2013-05-06 21:21:31 +00:00
Andrew Trick 9c72b071fe Rotate multi-exit loops even if the latch was simplified.
Test case by Michele Scandale!

Fixes PR10293: Load not hoisted out of loop with multiple exits.

There are few regressions with this patch, now tracked by
rdar:13817079, and a roughly equal number of improvements. The
regressions are almost certainly back luck because LoopRotate has very
little idea of whether rotation is profitable. Doing better requires a
more comprehensive solution.

This checkin is a quick fix that lacks generality (PR10293 has
a counter-example). But it trivially fixes the case in PR10293 without
interfering with other cases, and it does satify the criteria that
LoopRotate is a loop canonicalization pass that should avoid
heuristics and special cases.

I can think of two approaches that would probably be better in
the long run. Ultimately they may both make sense.

(1) LoopRotate should check that the current header would make a good
loop guard, and that the loop does not already has a sufficient
guard. The artifical SimplifiedLoopLatch check would be unnecessary,
and the design would be more general and canonical. Two difficulties:

- We need a strong guarantee that we won't endlessly rotate, so the
  analysis would need to be precise in order to avoid the
  SimplifiedLoopLatch precondition.

- Analysis like this are usually based on SCEV, which we don't want to
  rely on.

(2) Rotate on-demand in late loop passes. This could even be done by
shoving the loop back on the queue after the optimization that needs
it. This could work well when we find LICM opportunities in
multi-branch loops. This requires some work, and it doesn't really
solve the problem of SCEV wanting a loop guard before the analysis.

llvm-svn: 181230
2013-05-06 17:58:18 +00:00
Jean-Luc Duprat 3e4fc3ef24 Provide InstCombines for the following 3 cases:
A * (1 - (uitofp i1 C)) -> select C, 0, A
B * (uitofp i1 C) -> select C, B, 0
select C, 0, A + select C, B, 0 -> select C, B, A

These come up in code that has been hand-optimized from a select to a linear blend, 
on platforms where that may have mattered. We want to undo such changes 
with the following transform:
A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B

llvm-svn: 181216
2013-05-06 16:55:50 +00:00
Nadav Rotem 632b25b743 Update the comment to mention that we use TTI.
llvm-svn: 181178
2013-05-06 03:06:36 +00:00
Nadav Rotem c70ef4e93c Revert r164763 because it introduces new shuffles.
Thanks Nick Lewycky for pointing this out.

llvm-svn: 181177
2013-05-06 02:39:09 +00:00
Rafael Espindola c229a4fff4 Fix const merging when an alias of a const is llvm.used.
We used to disable constant merging not only if a constant is llvm.used, but
also if an alias of a constant is llvm.used. This change fixes that.

llvm-svn: 181175
2013-05-06 01:48:55 +00:00
Benjamin Kramer 3e3f2a4b8d LoopVectorize: Print values instead of pointers in debug output.
llvm-svn: 181157
2013-05-05 14:54:52 +00:00
Arnold Schwaighofer d96e427eac LoopVectorize: Add support for floating point min/max reductions
Add support for min/max reductions when "no-nans-float-math" is enabled. This
allows us to assume we have ordered floating point math and treat ordered and
unordered predicates equally.

radar://13723044

llvm-svn: 181144
2013-05-05 01:54:48 +00:00
Arnold Schwaighofer f5183729db LoopVectorizer: Cleanup of miminimum/maximum pattern match code
No need for setting the operands. The pointers are going to be bound by the
matcher.

radar://13723044

llvm-svn: 181142
2013-05-05 01:54:44 +00:00
Arnold Schwaighofer a670a0a3aa LoopVectorize: We don't need an identity element for min/max reductions
We can just use the initial element that feeds the reduction.

  max(max(x, y), z) == max(max(x,y), max(x,z))

radar://13723044

llvm-svn: 181141
2013-05-05 01:54:42 +00:00
Dmitri Gribenko 3238fb7595 Add ArrayRef constructor from None, and do the cleanups that this constructor enables
Patch by Robert Wilhelm.

llvm-svn: 181138
2013-05-05 00:40:33 +00:00
Nick Lewycky 881e9d62e2 Tabs to spaces. No functionality change.
llvm-svn: 181082
2013-05-04 01:08:15 +00:00
Shuxin Yang 637b9bebd4 Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change.
This function consists of following steps:
   1. Collect dependent memory accesses.
   2. Analyze availability.
   3. Perform fully redundancy elimination, or 
   4. Perform PRE, depending on the availability

 Step 2, 3 and 4 are now moved to three helper routines.

llvm-svn: 181047
2013-05-03 19:17:26 +00:00
Nadav Rotem 4ce060b3da LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values.
By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.

We can now vectorize this loop:

int foo(int *A, int *B, int n) {
  for (int i=0; i < n; i++) {
    int x = 9;
    if (A[i] > B[i]) {
      if (A[i] > 19) {
        x = 3;
      } else if (B[i] < 4 ) {
        x = 4;
      } else {
        x = 5;
      }
    }
    A[i] = x;
  }
}

llvm-svn: 181037
2013-05-03 17:42:55 +00:00
Shuxin Yang af2c3ddf0d [GV] Remove dead code which is really difficult to decipher.
Actually it took me couple of hours trying to make sense of them and
only to find they are dead code.  I guess the original author used
"allSingleSucc" to indicate if there are any critial edge emanating
from some blocks, and tried to perform code motion (actually speculation)
in the presence of these critical edges; but later on he/she changed mind
and decided to perform edge-splitting first.

llvm-svn: 180951
2013-05-02 21:14:31 +00:00
Filip Pizlo dec20e43c0 This patch breaks up Wrap.h so that it does not have to include all of
the things, and renames it to CBindingWrapping.h.  I also moved 
CBindingWrapping.h into Support/.

This new file just contains the macros for defining different wrap/unwrap 
methods.

The calls to those macros, as well as any custom wrap/unwrap definitions 
(like for array of Values for example), are put into corresponding C++ 
headers.

Doing this required some #include surgery, since some .cpp files relied 
on the fact that including Wrap.h implicitly caused the inclusion of a 
bunch of other things.

This also now means that the C++ headers will include their corresponding 
C API headers; for example Value.h must include llvm-c/Core.h.  I think 
this is harmless, since the C API headers contain just external function 
declarations and some C types, so I don't believe there should be any 
nasty dependency issues here.

llvm-svn: 180881
2013-05-01 20:59:00 +00:00
Nadav Rotem 1e211913b5 SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.
Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.

llvm-svn: 180875
2013-05-01 19:53:30 +00:00
Jim Grosbach d11584a7f7 Revert "InstCombine: Fold more shuffles of shuffles."
This reverts commit r180802

There's ongoing discussion about whether this is the right place to make
this transformation. Reverting for now while we figure it out.

llvm-svn: 180834
2013-05-01 00:25:27 +00:00
Richard Trieu 624c2ebcbb Fix a use after free. RI is freed before the call to getDebugLoc(). To
prevent this, capture the location before RI is freed.

llvm-svn: 180824
2013-04-30 22:45:10 +00:00
Nadav Rotem 9feda6071a Fix a typo
llvm-svn: 180806
2013-04-30 21:04:51 +00:00
Jim Grosbach 0b914fe839 InstCombine: Fold more shuffles of shuffles.
Always fold a shuffle-of-shuffle into a single shuffle when there's only one
input vector in the first place. Continue to be more conservative when there's
multiple inputs.

rdar://13402653
PR15866

llvm-svn: 180802
2013-04-30 20:43:52 +00:00
Adrian Prantl 8beccf9e6d Spelling. Thanks, Eric.
llvm-svn: 180794
2013-04-30 17:33:32 +00:00
Adrian Prantl 0941638a1b Set debug locations for branch instructions created during inlining, even
the inlined function has multiple returns.

rdar://problem/12415623

llvm-svn: 180793
2013-04-30 17:08:16 +00:00
David Majnemer d73f37bb83 Fix a bug in foldSelectICmpAndOr.
Differences in bitwidth between X and Y could exist even if C1 and C2 have
the same Log2 representation.

llvm-svn: 180779
2013-04-30 10:36:33 +00:00
David Majnemer 8d048d0482 Fix "Combine bit test + conditional or into simple math"
This fixes the optimization introduced in r179748 and reverted in r179750.

While the optimization was sound, it did not properly respect differences in
bit-width.

llvm-svn: 180777
2013-04-30 08:57:58 +00:00
Arnold Schwaighofer 474df6d3ed SimplifyCFG: If convert single conditional stores
This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

llvm-svn: 180731
2013-04-29 21:28:24 +00:00
Michael Gottesman 03cf3c8966 Add in some conditional compilation in order to silence an unused variable warning.
llvm-svn: 180700
2013-04-29 07:29:08 +00:00
Michael Gottesman 214ca90f8e [objc-arc] Apply the RV optimization to retains next to calls in ObjCARCContract instead of ObjCARCOpts.
Turning retains into retainRV calls disrupts the data flow analysis in
ObjCARCOpts. Thus we move it as late as we can by moving it into
ObjCARCContract.

We leave in the conversion from retainRV -> retain in ObjCARCOpt since
it enables the dataflow analysis.

rdar://10813093

llvm-svn: 180698
2013-04-29 06:53:53 +00:00
Michael Gottesman 9c11815978 Added statistics to count the number of retains/releases before/after optimization.
llvm-svn: 180697
2013-04-29 06:16:57 +00:00
Michael Gottesman 8005ad3f3e Removed trailing whitespace.
llvm-svn: 180696
2013-04-29 06:16:55 +00:00
Michael Gottesman 3e3977c49f Fix for r180693. = /.
llvm-svn: 180694
2013-04-29 05:25:39 +00:00
Michael Gottesman a87bb8f50b [objc-arc-annotations] Moved the disabling of call movement to ConnectTDBUTraversals so that I can prevent Changed = true from being set. This prevents an infinite loop.
llvm-svn: 180693
2013-04-29 05:13:13 +00:00
Shuxin Yang 04a4fd43aa Fix a XOR reassociation bug.
When Reassociator optimize "(x | C1)" ^ "(X & C2)", it may swap the two
subexpressions, however, it forgot to swap cached constants (of C1 and C2)
accordingly.

rdar://13739160

llvm-svn: 180676
2013-04-27 18:02:12 +00:00
Adrian Prantl d00333a4b2 fix a typo that due to cu&paste quadrupled itself
rdar://problem/13056109

llvm-svn: 180618
2013-04-26 18:10:50 +00:00
Adrian Prantl 29b9de7bf1 Bugfix for the debug intrinsic handling in InstCombiner:
Since we can't guarantee that the original dbg.declare instrinsic
is removed by LowerDbgDeclare(), we need to make sure that we are
not inserting the same dbg.value intrinsic over and over.
This removes tons of redundant DIEs when compiling optimized code.

rdar://problem/13056109

llvm-svn: 180615
2013-04-26 17:48:33 +00:00
Nadav Rotem 13306816fc LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes.
llvm-svn: 180593
2013-04-26 05:08:59 +00:00
Michael Gottesman 47cf8a4c12 Revert "[objc-arc] Added ImpreciseAutoreleaseSet to track autorelease calls that were once autoreleaseRV instructions."
This reverts commit r180222.

I think this might tie in with a different problem which will require a
different approach potentially. I am reverting this in the case I need to go
down that second path.

My apologies for the noise. = /.

llvm-svn: 180590
2013-04-26 01:12:18 +00:00
Nadav Rotem f43cbeee15 LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers.
llvm-svn: 180570
2013-04-25 19:55:03 +00:00
Michael Gottesman fdb497a9b2 [objc-arc] Added ImpreciseAutoreleaseSet to track autorelease calls that were once autoreleaseRV instructions.
Due to the semantics of ARC, we must be extremely conservative with autorelease
calls inserted by the frontend since ARC gaurantees that said object will be in
the autorelease pool after that point, an optimization invariant that the
optimizer must respect.

On the other hand, we are allowed significantly more flexibility with
autoreleaseRV instructions.

Often times though this flexibility is disrupted by early transformations which
transform objc_autoreleaseRV => objc_autorelease if said instruction is no
longer being used as part of an RV pair (generally due to inlining). Since we
can not tell the difference in between an autorelease put into place by the
frontend and one created through said ``strength reduction'' we can not perform
these optimizations.

The addition of this set gets around said issues by allowing us to differentiate
in between said two cases.

rdar://problem/13697741.

llvm-svn: 180222
2013-04-24 22:18:18 +00:00
Michael Gottesman cd5b02701c Fixed comment typo.
llvm-svn: 180221
2013-04-24 22:18:15 +00:00
Arnold Schwaighofer 3fa801fbc2 LoopVectorizer: Change variable name Stride to ConsecutiveStride
This makes it easier to read the code.

No functionality change.

llvm-svn: 180197
2013-04-24 16:16:03 +00:00
Arnold Schwaighofer a6578f7056 LoopVectorize: Scalarize padded types
This patch disables memory-instruction vectorization for types that need padding
bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on
x86_64. Because the load/store vectorization is performed by the bit casting to
a packed vector, which has incompatible memory layout due to the lack of padding
bytes, the present vectorizer produces inconsistent result for memory
instructions of those types.
This patch checks an equality of the AllocSize of a scalar type and allocated
size for each vector element, to ensure that there is no padding bytes and the
array can be read/written using vector operations.

Patch by Daisuke Takahashi!

Fixes PR15758.

llvm-svn: 180196
2013-04-24 16:16:01 +00:00
Arnold Schwaighofer 23a0589bce LoopVectorizer: Bail out if we don't have datalayout we need it
llvm-svn: 180195
2013-04-24 16:15:58 +00:00
Adrian Prantl 15db52bf6d Make sure the instruction right after an inlined function has a
debug location. This solves a problem where range of an inlined
subroutine is emitted wrongly.
Patch by Manman Ren.

Fixes rdar://problem/12415623

llvm-svn: 180140
2013-04-23 19:56:03 +00:00
Nadav Rotem 71c9d6d333 LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order.
This fixes a miscompilation in FreeBSD's regex library.

llvm-svn: 180121
2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen d3c90e132a Call the potentially costly isAnnotatedParallel() only once.
Made the uniform write test's checks a bit stricter.

llvm-svn: 180119
2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen 6f2f66b63f Refuse to (even try to) vectorize loops which have uniform writes,
even if erroneously annotated with the parallel loop metadata.

Fixes Bug 15794: 
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"

llvm-svn: 180081
2013-04-23 08:08:51 +00:00
Eric Christopher 04d4e9312c Move C++ code out of the C headers and into either C++ headers
or the C++ files themselves. This enables people to use
just a C compiler to interoperate with LLVM.

llvm-svn: 180063
2013-04-22 22:47:22 +00:00
Anat Shemer 10260a75e3 Changed back (relative to commit 179786) the operations executed when extract(cast) is transformed to cast(extract). It uses the Builder class as before. In addition the result node is added to the Worklist, so all the previous extract users will become the new scalar cast users.
llvm-svn: 180045
2013-04-22 20:51:10 +00:00
Rafael Espindola 74f2e46eef Clarify that llvm.used can contain aliases.
Also add a check for llvm.used in the verifier and simplify clients now that
they can assume they have a ConstantArray.

llvm-svn: 180019
2013-04-22 14:58:02 +00:00
Benjamin Kramer 0212dc27ed SROA: Don't crash on a select with two identical operands.
This is an edge case that can happen if we modify a chain of multiple selects.
Update all operands in that case and remove the assert. PR15805.

llvm-svn: 179982
2013-04-21 17:48:39 +00:00
Arnold Schwaighofer 6eb32b31bd Revert "SimplifyCFG: If convert single conditional stores"
There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.

This reverts commit r179957. Original commit message:

"SimplifyCFG: If convert single conditional stores

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

a[i] =
may-alias with a[i] load
if (cond)
    a[i] = Y
into an unconditional store.

a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."

llvm-svn: 179980
2013-04-21 13:09:04 +00:00
Nadav Rotem c57af326a4 SLPVectorize: Add support for vectorization of casts.
llvm-svn: 179975
2013-04-21 08:05:59 +00:00
Nadav Rotem 98ad5f0f4c SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes with multiple users.
We did not terminate the switch case and we executed the search routine twice.

llvm-svn: 179974
2013-04-21 07:37:56 +00:00
Michael Gottesman 3eab2e43d2 When we strength reduce an objc_retainBlock call to objc_retain, increment NumPeeps and make sure that Changed is set to true.
llvm-svn: 179968
2013-04-21 00:50:27 +00:00
Michael Gottesman 1e43004295 Fixed comment typo.
llvm-svn: 179967
2013-04-21 00:44:46 +00:00
Michael Gottesman df110ac9ec [objc-arc] Fixed typo in debug message.
llvm-svn: 179966
2013-04-21 00:30:50 +00:00
Michael Gottesman cdb7c15ce8 [objc-arc] Fixed comment typo.
llvm-svn: 179965
2013-04-21 00:25:04 +00:00
Michael Gottesman fb9ece9a7c [objc-arc] Refactored OptimizeReturns so that it uses continue instead of a large multi-level nested if statement.
llvm-svn: 179964
2013-04-21 00:25:01 +00:00
Michael Gottesman 01338a442a [objc-arc] Added debug statement saying when we are resetting a sequence's progress.
This will make it clearer when we are actually resetting a sequence's progress
vs just changing state. This is an important distinction because the former case
clears any pointers that we are tracking while the later does not.

llvm-svn: 179963
2013-04-20 23:36:57 +00:00
Nadav Rotem 8aca44a623 Fix PR15800. Do not try to vectorize vectors and structs.
llvm-svn: 179960
2013-04-20 22:29:43 +00:00
Arnold Schwaighofer 3546ccf465 SimplifyCFG: If convert single conditional stores
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.

llvm-svn: 179957
2013-04-20 21:42:09 +00:00
Benjamin Kramer 519b2e3087 VecUtils: Clean up uses of dyn_cast.
llvm-svn: 179936
2013-04-20 10:36:17 +00:00
Benjamin Kramer 4600bcc337 SLPVectorizer: Strength reduce SmallVectors to ArrayRefs.
Avoids a couple of copies and allows more flexibility in the clients.

llvm-svn: 179935
2013-04-20 09:49:10 +00:00
Nadav Rotem ce2660d639 SLPVectorizer: Reduce the compile time by eliminating the search for some of the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr.
llvm-svn: 179933
2013-04-20 07:29:34 +00:00
Nadav Rotem 998e035cae refactor tryToVectorizePair to a new method that supports vectorization of lists.
llvm-svn: 179932
2013-04-20 07:22:58 +00:00
Nadav Rotem 890387289e Fix an unused variable warning.
llvm-svn: 179931
2013-04-20 06:40:28 +00:00
Nadav Rotem 83c7c41bc2 SLPVectorizer: Improve the cost model for loop invariant broadcast values.
llvm-svn: 179930
2013-04-20 06:13:47 +00:00
Nadav Rotem dfe1c93ca4 Report the number of stores that were found in the debug message.
llvm-svn: 179929
2013-04-20 05:23:11 +00:00
Nadav Rotem dfd8fcbb00 Fix the header comment.
llvm-svn: 179928
2013-04-20 05:18:51 +00:00
Nadav Rotem 5ed99674e9 Use 64bit arithmetic for calculating distance between pointers.
llvm-svn: 179927
2013-04-20 05:17:47 +00:00
Benjamin Kramer 630e6e1422 MergeFunc: Make pointer and integer types generate the same hash.
The logic that actually compares the types considers pointers and integers the
same if they are of the same size. This created a strange mismatch between hash
and reality and made the test case for this fail on some platforms (yay,
test cases).

llvm-svn: 179905
2013-04-19 23:06:44 +00:00
Arnold Schwaighofer 5146940316 LoopVectorizer: Use matcher from PatternMatch.h for the min/max patterns
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.

No functionality change intended.

llvm-svn: 179886
2013-04-19 21:03:36 +00:00
Jakub Staszak 99317268e2 Keep coding stanard. Don't use "else if" after "return".
llvm-svn: 179826
2013-04-19 01:18:04 +00:00
Bill Wendling 3b21eb69fb Implement a better fix for PR15185.
If the return type is a pointer and the call returns an integer, then do the
inttoptr convertions. And vice versa.

llvm-svn: 179817
2013-04-18 23:34:17 +00:00
Dmitri Gribenko d29ea04446 Fix a -Wdocumentation warning
llvm-svn: 179789
2013-04-18 20:13:04 +00:00
Anat Shemer 5570318f43 In the function InstCombiner::visitExtractElementInst() removed the limitation that extract is promoted over a cast only if the cast has only one use.
llvm-svn: 179786
2013-04-18 19:56:44 +00:00
Anat Shemer 0c95efad7e Added a function scalarizePHI() that sclarizes a vector phi instruction if it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location.
llvm-svn: 179783
2013-04-18 19:35:39 +00:00
Chris Lattner 8cf09416ea Fix a comment, PR15777.
llvm-svn: 179775
2013-04-18 17:42:14 +00:00
Arnold Schwaighofer 4cd6aa110c LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

llvm-svn: 179773
2013-04-18 17:22:34 +00:00
Benjamin Kramer 8df2cfb858 LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

llvm-svn: 179757
2013-04-18 14:29:13 +00:00
David Majnemer 81af06e003 Revert "Combine bit test + conditional or into simple math"
It is causing stage2 builds to fail, let's get them running again.

llvm-svn: 179750
2013-04-18 08:42:33 +00:00
David Majnemer bdf0caf6b1 Combine bit test + conditional or into simple math
Simplify:
(select (icmp eq (and X, C1), 0), Y, (or Y, C2))

Into:
(or (shl (and X, C1), C3), y)

Where:
C3 = Log(C2) - Log(C1)

If:
C1 and C2 are both powers of two

llvm-svn: 179748
2013-04-18 07:30:07 +00:00
Michael Gottesman 323964ca9e [objc-arc] Do not mismatch up retains inside a for loop with releases outside said for loop in the presense of differing provenance caused by escaping blocks.
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:

  objc_retain(%a)
  for (...) {
    objc_retain(%a)
    %block <- %a
    F(%block)
    objc_release(%block)
  }
  objc_release(%a)

From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.

This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.

Additionally a test case is included.

rdar://12969722

llvm-svn: 179747
2013-04-18 05:39:45 +00:00
Michael Gottesman 9e5181393a Removed trailing whitespace.
llvm-svn: 179746
2013-04-18 04:34:11 +00:00
Michael Gottesman 4e88ce68ae [objc-arc] Added annotation option to only emit annotations for a specific ssa identifier.
llvm-svn: 179729
2013-04-17 21:59:41 +00:00
Michael Gottesman adb921affa Fixed typo.
llvm-svn: 179721
2013-04-17 21:03:53 +00:00
Michael Gottesman 6806b51ad2 [objc-arc] Added descriptions for EnableARCAnnotations, EnableCheckForCFGHazards, EnableARCOptimizations.
llvm-svn: 179718
2013-04-17 20:48:03 +00:00
Michael Gottesman ffef24f964 [objc-arc] Added an option to arc-annotations for turning off CheckForCFGHazard.
llvm-svn: 179717
2013-04-17 20:48:01 +00:00
Peter Collingbourne 37ae72b508 Do not optimise fprintf() calls if its return value is used.
Differential Revision: http://llvm-reviews.chandlerc.com/D620

llvm-svn: 179661
2013-04-17 02:01:10 +00:00
Hans Wennborg c9e1d99279 simplifycfg: Fix integer overflow converting switch into icmp.
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.

Patch by Jed Davis!

llvm-svn: 179587
2013-04-16 08:35:36 +00:00
Bill Wendling 3789171972 We are not able to bitcast a pointer to an integral value.
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185

llvm-svn: 179569
2013-04-15 22:33:50 +00:00
Nadav Rotem b9116e6966 SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.
llvm-svn: 179562
2013-04-15 22:00:26 +00:00
Jim Grosbach 0f38c1e3a7 Fix a typo in comment.
llvm-svn: 179542
2013-04-15 17:40:48 +00:00
Nadav Rotem d4dcc003df Add an option -vectorize-slp-aggressive for running the BB vectorizer. Make -fslp-vectorize run the slp-vectorizer.
llvm-svn: 179508
2013-04-15 05:39:58 +00:00
Nadav Rotem a1e5e44eb3 Rename the slp-vectorizer clang/llvm flags. No functionality change.
llvm-svn: 179505
2013-04-15 04:54:42 +00:00
Nadav Rotem 5d393c416f SLPVectorizer: Add support for vectorizing trees that start at compare instructions.
llvm-svn: 179504
2013-04-15 04:25:27 +00:00
David Majnemer 1fae195557 Reorders two transforms that collide with each other
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493
2013-04-14 21:15:43 +00:00
Benjamin Kramer 7d62ea86e5 Miscellaneous cleanups for VecUtils.h
llvm-svn: 179483
2013-04-14 09:33:08 +00:00
Nadav Rotem 3403c11529 SLP: Document the scalarization cost method.
llvm-svn: 179479
2013-04-14 07:22:22 +00:00
Nadav Rotem 54b413d157 SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
2013-04-14 05:15:53 +00:00
Nadav Rotem 0b9cf8567b SLPVectorizer: add initial support for reduction variable vectorization.
llvm-svn: 179470
2013-04-14 03:22:20 +00:00
Benjamin Kramer adc1727c39 GlobalDCE: Fix an oversight in my last commit that could lead to crashes.
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
2013-04-13 16:11:14 +00:00
Benjamin Kramer 89ca4bc6d4 Fix a scalability issue with complex ConstantExprs.
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458
2013-04-13 12:53:18 +00:00
Benjamin Kramer e89c705030 InstCombine: Check the operand types before merging fcmp ord & fcmp ord.
Fixes PR15737.

llvm-svn: 179417
2013-04-12 21:56:23 +00:00
Nadav Rotem 8543ba3e52 SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
2013-04-12 21:16:54 +00:00
Nadav Rotem 4da0ab1d68 Add debug prints.
llvm-svn: 179412
2013-04-12 21:11:14 +00:00
David Majnemer 1a08accbb7 Simplify (A & ~B) in icmp if A is a power of 2
The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0

llvm-svn: 179386
2013-04-12 17:25:07 +00:00
Arnold Schwaighofer f9cea17f75 LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381
2013-04-12 15:15:19 +00:00
David Majnemer b81cd63c4b Optimize icmp involving addition better
Allows LLVM to optimize sequences like the following:

%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y

into:

%cmp = icmp sge i32 %x, %y

as well as:

%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2

into:

%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x

llvm-svn: 179316
2013-04-11 20:05:46 +00:00
Benjamin Kramer a95f87494a Fix for wrong instcombine on vector insert/extract
When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.

The problem is in function CollectShuffleElments.

If we have a sequence of insert/extract element instructions
like the one below:

  %tmp1 = extractelement <4 x float> %LHS, i32 0
  %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
  %tmp3 = extractelement <4 x float> %RHS, i32 2
  %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3

Where:
  . %RHS will have a mask of [4,5,6,7]
  . %LHS will have a mask of [0,1,2,3]

The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.

Patch by Andrea DiBiagio!

llvm-svn: 179291
2013-04-11 15:10:09 +00:00
Alexey Samsonov a28f36c2e2 [ASan] Allow disabling init-order checks for globals by source file name.
llvm-svn: 179280
2013-04-11 13:20:00 +00:00
Benjamin Kramer c86fdf12e8 Rename the C function to create a SLPVectorizerPass to something sane and expose it in the header file.
llvm-svn: 179272
2013-04-11 11:36:36 +00:00
Nadav Rotem 73dffa4184 Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.
llvm-svn: 179207
2013-04-10 19:41:36 +00:00
Nadav Rotem 88dd5f7a38 We require DataLayout for analyzing the size of stores.
llvm-svn: 179206
2013-04-10 18:57:27 +00:00
Joey Gouly 81259294be Change CloneFunctionInto to always clone Argument attributes induvidually,
rather than checking if the source and destination have the same number of
arguments and copying the attributes over directly.

llvm-svn: 179169
2013-04-10 10:37:38 +00:00
Bob Wilson 798a7709fc Fix some comment typos.
llvm-svn: 179132
2013-04-09 22:15:51 +00:00
Nadav Rotem 2d9dec322e Add support for bottom-up SLP vectorization infrastructure.
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117
2013-04-09 19:44:35 +00:00
Shuxin Yang 331f01dcb4 Redo the fix Benjamin Kramer committed in r178793 about iterator invalidation in Reassociate.
I brazenly think this change is slightly simpler than r178793 because: 
  - no "state" in functor
  - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" 

  While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully. 

  Thank Benjamin for fixing this bug and generously providing the test case.

llvm-svn: 179062
2013-04-08 22:00:43 +00:00
Chandler Carruth 0e8a52d18f Fix PR15674 (and PR15603): a SROA think-o.
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974
2013-04-07 11:47:54 +00:00
Michael Gottesman 7924997c36 Removed trailing whitespace.
llvm-svn: 178932
2013-04-05 23:46:45 +00:00
Michael Gottesman 31ba23aa56 An objc_retain can serve as a use for a different pointer.
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922
2013-04-05 22:54:32 +00:00
Michael Gottesman 1d8d25777d Properly model precise lifetime when given an incomplete dataflow sequence.
The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921
2013-04-05 22:54:28 +00:00
Jim Grosbach bdbd73460c Tidy up a bit. No functional change.
llvm-svn: 178915
2013-04-05 21:20:12 +00:00
Shuxin Yang 95adf5258f Disable the optimization about promoting vector-element-access with symbolic index.
This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912
2013-04-05 21:07:08 +00:00
Michael Gottesman bab49e976b Added two debug logging messages to VisitInstructionsTopDown to match VisitInstructionsBottomUp.
llvm-svn: 178895
2013-04-05 18:26:08 +00:00
Michael Gottesman 89279f8383 Cleaned up whitespace and made debug logging less verbose.
llvm-svn: 178893
2013-04-05 18:10:41 +00:00
Arnold Schwaighofer df6f67ed87 LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809
2013-04-04 23:26:27 +00:00
Benjamin Kramer dd67654af6 Reassociate: Avoid iterator invalidation.
OpndPtrs stored pointers into the Opnd vector that became invalid when the
vector grows. Store indices instead. Sadly I only have a large testcase that
only triggers under valgrind, so I didn't include it.

llvm-svn: 178793
2013-04-04 21:15:42 +00:00
Michael Gottesman 21a4ed3227 Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from ObjCARCOpt::OptimizeReturns.
Now ObjCARCOpt::OptimizeReturns is easy to read and reason about.

llvm-svn: 178715
2013-04-03 23:39:14 +00:00
Michael Gottesman 6908db148b Refactored out the helper function FindPredecessorRetainWithSafePath from ObjCARCOpt::OptimizeReturns.
llvm-svn: 178714
2013-04-03 23:16:05 +00:00
Michael Gottesman c2d5bf5c53 Small cleanups.
Cleaned up trailing whitespace and added extra slashes in front of a
function level comment so that it follow the convention of having 3
slashes.

llvm-svn: 178712
2013-04-03 23:07:45 +00:00
Michael Gottesman 54dc7fdefb Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method HasSafePathToPredecessorCall.
llvm-svn: 178710
2013-04-03 23:04:28 +00:00
Michael Gottesman 0a1748bb8c Removed an old comment.
llvm-svn: 178709
2013-04-03 23:04:24 +00:00
Michael Gottesman 43e7e00a68 Clean up arc annotations by moving the top/bottom BB annotations into conditional macros that no-op in Release mode instead of #ifdef sections of the code.
This is to follow the example of the DEBUG macro.

llvm-svn: 178705
2013-04-03 22:41:59 +00:00
Michael Gottesman b8c8836594 Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.
The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.

Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.

*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.

llvm-svn: 178612
2013-04-03 02:57:24 +00:00
Michael Gottesman 624243914f Improved comment. No functionality change.
llvm-svn: 178605
2013-04-03 01:57:16 +00:00
Bill Wendling 88d06c3b2d Use a worklist to avoid a sneaky iterator invalidation.
The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.

Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.

PR15440

llvm-svn: 178531
2013-04-02 08:16:45 +00:00
Shuxin Yang 6662fd0f15 Correct assertion condition
llvm-svn: 178484
2013-04-01 18:13:05 +00:00
Shuxin Yang 7b0c94e207 Implement XOR reassociation. It is based on following rules:
rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409
2013-03-30 02:15:01 +00:00
Michael Gottesman 3b8f877860 Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module.
clang.arc.used is an interesting call for ARC since ObjCARCContract
needs to run to remove said intrinsic to avoid a linker error (since the
call does not exist).

llvm-svn: 178369
2013-03-29 21:15:23 +00:00
Michael Gottesman 60f6b28c58 Removed trailing whitespace.
llvm-svn: 178329
2013-03-29 05:13:07 +00:00
Michael Gottesman ba64859e6e Removed dead code from ObjCARCOpts relating to tracking objc_retainBlocks through the ARC Dataflow analysis. By the time we get to the ARC dataflow analysis, any objc_retainBlock calls are not optimizable.
llvm-svn: 178306
2013-03-28 23:08:44 +00:00
Bill Wendling 85722f48da Minor simplification.
Go ahead and use the full path for both the .gcno and .gcda files.

llvm-svn: 178302
2013-03-28 22:40:08 +00:00
Michael Gottesman 49f9885a2a Non optimizable objc_retainBlock calls are not forwarding.
Since we handle optimizable objc_retainBlocks through strength reduction
in OptimizableIndividualCalls, we know that all code after that point
will only see non-optimizable objc_retainBlock calls. IsForwarding is
only called by functions after that point, so it is ok to just classify
objc_retainBlock as non-forwarding.

<rdar://problem/13249661>.

llvm-svn: 178285
2013-03-28 20:11:30 +00:00
Michael Gottesman 158fdf699e [ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the objc_retainBlock is optimizable.
If an objc_retainBlock has the copy_on_escape metadata attached to it
AND if the block pointer argument only escapes down the stack, we are
allowed to strength reduce the objc_retainBlock to to an objc_retain and
thus optimize it.

Current there is logic in the ARC data flow analysis to handle
this case which is complicated and involved making distinctions in
between objc_retainBlock and objc_retain in certain places and
considering them the same in others.

This patch simplifies said code by:

1. Performing the strength reduction in the initial ARC peephole
analysis (ObjCARCOpts::OptimizeIndividualCalls).

2. Changes the ARC dataflow analysis (which runs after the peephole
analysis) to consider all objc_retainBlock calls to not be optimizable
(since if the call was optimizable, we would have strength reduced it
already).

This patch leaves in the infrastructure in the ARC dataflow analysis to
handle this case, which due to 2 will just be dead code. I am doing this
on purpose to separate the removal of the old code from the testing of
the new code.

<rdar://problem/13249661>.

llvm-svn: 178284
2013-03-28 20:11:19 +00:00
Kostya Serebryany 463aa81418 [tsan] make sure memset/memcpy/memmove are not inlined in tsan mode
llvm-svn: 178230
2013-03-28 11:21:13 +00:00
Akira Hatanaka 99866dd535 Check if Type is a vector before calling function Type::getVectorNumElements.
llvm-svn: 178208
2013-03-28 01:28:02 +00:00