Commit Graph

11073 Commits

Author SHA1 Message Date
Yunzhong Gao 05efa23294 No functionality change.
Replace "(255 & value)" with "(0xFF & value)" to improve clarity.

llvm-svn: 188941
2013-08-21 22:11:15 +00:00
Matt Arsenault 745101d666 Teach InstCombine about address spaces
llvm-svn: 188926
2013-08-21 19:53:10 +00:00
Matt Arsenault 745832dcc9 Use attribute helper function
llvm-svn: 188916
2013-08-21 18:54:50 +00:00
Matt Arsenault 3c71dabd88 Fix typo
llvm-svn: 188915
2013-08-21 18:54:47 +00:00
Bill Wendling 707f601fa5 Move registering the execution of a basic block to the beginning rather than the end.
There are situations which can affect the correctness (or at least expectation)
of the gcov output. For instance, if a call to __gcov_flush() occurs within a
block before the execution count is registered and then the program aborts in
some way, then that block will not be marked as executed. This is not normally
what the user expects.

If we move the code that's registering when a block is executed to the
beginning, we can catch these types of situations.

PR16893

llvm-svn: 188849
2013-08-20 23:52:00 +00:00
Arnold Schwaighofer e1f3ab69d1 SLPVectorizer: Fix invalid iterator errors
Update iterator when the SLP vectorizer changes the instructions in the basic
block by restarting the traversal of the basic block.

Patch by Yi Jiang!

Fixes PR 16899.

llvm-svn: 188832
2013-08-20 21:21:45 +00:00
Hal Finkel 0c5c01aa4a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Jakub Staszak b4eb6adebb Use pop_back_val() instead of both back() and pop_back().
llvm-svn: 188723
2013-08-19 22:47:55 +00:00
Matt Arsenault d79f7d9ea1 Teach InstCombine visitGetElementPtr about address spaces
llvm-svn: 188721
2013-08-19 22:17:40 +00:00
Matt Arsenault 98f34e3abe Cleanup visitGetElementPtr to make address space change easier
llvm-svn: 188720
2013-08-19 22:17:34 +00:00
Matt Arsenault 94a028aa43 commonPointerCast cleanups to make address space change easier
llvm-svn: 188719
2013-08-19 22:17:18 +00:00
Matt Arsenault 5aeae18e9d Revert non-test parts of r188507
Re-add the inboundsless tests I didn't add originally

llvm-svn: 188710
2013-08-19 21:40:31 +00:00
Peter Collingbourne aac65a313d Introduce SpecialCaseList::isIn overload for GlobalAliases.
Differential Revision: http://llvm-reviews.chandlerc.com/D1437

llvm-svn: 188688
2013-08-19 19:00:35 +00:00
Michael Kuperstein 4bb3f8f2e4 Adds missing TLI check for library simplification of
* pow(x, 0.5) -> fabs(sqrt(x)) 
* pow(2.0, x) -> exp2(x)

llvm-svn: 188656
2013-08-19 06:55:47 +00:00
Peter Collingbourne 03c3324ccd Remove SpecialCaseList::findCategory.
It turned out that I didn't need this for DFSan.

llvm-svn: 188646
2013-08-19 00:24:20 +00:00
Joerg Sonnenberger 8e3050db51 PR 16899: Do not modify the basic block using the iterator, but keep the
next value. This avoids crashes due to invalidation.

Patch by Joey Gouly.

llvm-svn: 188605
2013-08-17 11:04:47 +00:00
Jim Grosbach d0de8ace8a InstCombine: Use isAllOnesValue() instead of explicit -1.
llvm-svn: 188563
2013-08-16 17:03:36 +00:00
Jim Grosbach 20e3b9ac30 InstCombine: Simplify if(x!=0 && x!=-1).
When both constants are positive or both constants are negative,
InstCombine already simplifies comparisons like this, but when
it's exactly zero and -1, the operand sorting ends up reversed
and the pattern fails to match. Handle that special case.

Follow up for rdar://14689217

llvm-svn: 188512
2013-08-16 00:15:20 +00:00
Matt Arsenault 1de76773bc Don't do FoldCmpLoadFromIndexedGlobal for non inbounds GEPs
This path wasn't tested before without a datalayout,
so add some more tests and re-run with and without one.

llvm-svn: 188507
2013-08-15 23:11:07 +00:00
Matt Arsenault 5cae894a13 Fix spelling
llvm-svn: 188506
2013-08-15 23:11:03 +00:00
Yunzhong Gao c0c2b16932 Fixing a corner-case bug in strchr and strrchr lib call optimizations where
the input character is not converted to char before comparing with zero.

The patch was discussed in this thread:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184069.html

llvm-svn: 188489
2013-08-15 20:58:59 +00:00
Peter Collingbourne 444c59e270 DataFlowSanitizer: Add a debugging feature to help us track nonzero labels.
Summary:
When the -dfsan-debug-nonzero-labels parameter is supplied, the code
is instrumented such that when a call parameter, return value or load
produces a nonzero label, the function __dfsan_nonzero_label is called.
The idea is that a debugger breakpoint can be set on this function
in a nominally label-free program to help identify any bugs in the
instrumentation pass causing labels to be introduced.

Reviewers: eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1405

llvm-svn: 188472
2013-08-15 18:51:12 +00:00
Mark Lacey a2626555f1 Fix small typo: s/succ/Succ/
llvm-svn: 188415
2013-08-14 22:11:42 +00:00
Peter Collingbourne 9d31d6f329 DataFlowSanitizer: Instrumentation for memset.
Differential Revision: http://llvm-reviews.chandlerc.com/D1395

llvm-svn: 188412
2013-08-14 20:51:38 +00:00
Peter Collingbourne 68162e7512 DataFlowSanitizer: greylist is now ABI list.
This replaces the old incomplete greylist functionality with an ABI
list, which can provide more detailed information about the ABI and
semantics of specific functions.  The pass treats every function in
the "uninstrumented" category in the ABI list file as conforming to
the "native" (i.e. unsanitized) ABI.  Unless the ABI list contains
additional categories for those functions, a call to one of those
functions will produce a warning message, as the labelling behaviour
of the function is unknown.  The other supported categories are
"functional", "discard" and "custom".

- "discard" -- This function does not write to (user-accessible) memory,
  and its return value is unlabelled.
- "functional" -- This function does not write to (user-accessible)
  memory, and the label of its return value is the union of the label of
  its arguments.
- "custom" -- Instead of calling the function, a custom wrapper __dfsw_F
  is called, where F is the name of the function.  This function may wrap
  the original function or provide its own implementation.

Differential Revision: http://llvm-reviews.chandlerc.com/D1345

llvm-svn: 188402
2013-08-14 18:54:12 +00:00
Chandler Carruth 2de93afee3 Fix a really terrifying but improbable bug in mem2reg. If you have seen
extremely subtle miscompilations (such as a load getting replaced with
the value stored *below* the load within a basic block) related to
promoting an alloca to an SSA value, there is the dim possibility that
you hit this. Please let me know if you won this unfortunate lottery.

The first half of mem2reg's core logic (as it is used both in the
standalone mem2reg pass and in SROA) builds up a mapping from
'Instruction *' to the index of that instruction within its basic block.
This allows quickly establishing which store dominate a particular load
even for large basic blocks. We cache this information throughout the
run of mem2reg over a function in order to amortize the cost of
computing it.

This is not in and of itself a strange pattern in LLVM. However, it
introduces a very important constraint: absolutely no instruction can be
deleted from the program without updating the mapping. Otherwise a newly
allocated instruction might get the same pointer address, and then end
up with a wrong index. Yes, LLVM routinely suffers from a *single
threaded* variant of the ABA problem. Most places in LLVM don't find
avoiding this an imposition because they don't both delete and create
new instructions iteratively, but mem2reg *loves* to do this... All the
time. Fortunately, the mem2reg code was really careful about updating
this cache to handle this eventuallity... except when it comes to the
debug declare intrinsic. Oops. The fix is to invalidate that pointer in
the cache when we delete it, the same as we do when deleting alloca
instructions and other instructions.

I've also caused the same bug in new code while working on a fix to
PR16867, so this seems to be a really unfortunate pattern. Hopefully in
subsequent patches the deletion of dead instructions can be consolidated
sufficiently to make it less likely that we'll see future occurences of
this bug.

Sorry for not having a test case, but I have literally no idea how to
reliably trigger this kind of thing. It may be single-threaded, but it
remains an ABA problem. It would require a really amazing number of
stars to align.

llvm-svn: 188367
2013-08-14 08:56:41 +00:00
Matt Arsenault 9e3a6ca698 Fix always creating GEP with i32 indices
Use the pointer size if datalayout is available.
Use i64 if it's not, which is consistent with what other
places do when the pointer size is unknown.

The test doesn't really test this in a useful way
since it will be transformed to that later anyway,
but this now tests it for non-zero arrays and when
datalayout isn't available. The cases in
visitGetElementPtrInst should save an extra re-visit to
the newly created GEP since it won't need to cleanup after
itself.

llvm-svn: 188339
2013-08-14 00:24:38 +00:00
Matt Arsenault fc00f7eabd Use type helper functions instead of cast
llvm-svn: 188338
2013-08-14 00:24:34 +00:00
Matt Arsenault 640ff9dbcf Use array initializer, space around operator
llvm-svn: 188337
2013-08-14 00:24:05 +00:00
Hal Finkel 1a61f621da BBVectorize: Add initial stores to the write set when tracking uses
When computing the use set of a store, we need to add the store to the write
set prior to iterating over later instructions. Otherwise, if there is a later
aliasing load of that store, that load will not be tagged as a use, and bad
things will happen.

trackUsesOfI still adds later dependent stores of an instruction to that
instruction's write set, but it never sees the original instruction, and so
when tracking uses of a store, the store must be added to the write set by the
caller.

Fixes PR16834.

llvm-svn: 188329
2013-08-13 23:34:32 +00:00
Nick Lewycky c7776f737f Revert r187191, which broke opt -mem2reg on the testcases included in PR16867.
However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146
when SROA started sending more things directly down the PromoteMemToReg path.

In order to revert r187191, I also revert dependent revisions r187296, r187322
and r188146. Fixes PR16867. Does not add the testcases from that PR, but both
of them should get added for both mem2reg and sroa when this revert gets
unreverted.

llvm-svn: 188327
2013-08-13 22:51:58 +00:00
Dmitry Vyukov 96a7084620 dfsan: fix lint warnings
llvm-svn: 188293
2013-08-13 16:52:41 +00:00
Arnold Schwaighofer 124ccf3ad1 Also remove logic in LateVectorize
llvm-svn: 188285
2013-08-13 16:12:04 +00:00
Arnold Schwaighofer c14b59d1a1 Remove logic that decides whether to vectorize or not depending on O-levels
I have moved this logic into clang and opt.

llvm-svn: 188281
2013-08-13 15:51:25 +00:00
Peter Collingbourne 8d642de169 Reapply r188119 now that the bug it exposed is fixed.
llvm-svn: 188217
2013-08-12 22:38:43 +00:00
Peter Collingbourne fb3a2b4f97 DataFlowSanitizer: fix a use-after-free. Spotted by libgmalloc.
llvm-svn: 188216
2013-08-12 22:38:39 +00:00
Bill Wendling e1eaecd528 Move stack protector names to the same place.
llvm-svn: 188198
2013-08-12 20:09:37 +00:00
Nadav Rotem e23147bbd4 Fix PR16797 - Support PHINodes with multiple inputs from the same basic block.
Do not generate new vector values for the same entries because we know that the incoming values
from the same block must be identical.

llvm-svn: 188185
2013-08-12 17:46:44 +00:00
Alexey Samsonov 15dc0af78b Remove unused SpecialCaseList constructors
llvm-svn: 188171
2013-08-12 11:50:44 +00:00
Alexey Samsonov e4b5fb8851 Add SpecialCaseList::createOrDie() factory and use it in sanitizer passes
llvm-svn: 188169
2013-08-12 11:46:09 +00:00
Alexey Samsonov 9e4fdd2656 Introduce factory methods for SpecialCaseList
Summary:
Doing work in constructors is bad: this change suggests to
call SpecialCaseList::create(Path, Error) instead of
"new SpecialCaseList(Path)". Currently the latter may crash with
report_fatal_error, which is undesirable - sometimes we want to report
the error to user gracefully - for example, if he provides an incorrect
file as an argument of Clang's -fsanitize-blacklist flag.

Reviewers: pcc

Reviewed By: pcc

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1327

llvm-svn: 188156
2013-08-12 07:49:36 +00:00
Richard Sandiford feb34713d5 Fix big-endian handling of integer-to-vector bitcasts in InstCombine
These functions used to assume that the lsb of an integer corresponds
to vector element 0, whereas for big-endian it's the other way around:
the msb is in the first element and the lsb is in the last element.

Fixes MultiSource/Benchmarks/mediabench/gsm/toast for z.

llvm-svn: 188155
2013-08-12 07:26:09 +00:00
Chandler Carruth d7cd7e367e Re-instate r187323 which fast-tracks promotable allocas as soon as the
SROA-based analysis has enough information. This should work now that
both mem2reg *and* the SSAUpdater-based AllocaPromoter have been updated
to be able to promote the types of allocas that the SROA analysis
detects.

I've included tests for the AllocaPromoter that were only possible to
write once we fast-tracked promotable allocas without rewriting them.
This includes a test both for r187347 and r188145.

Original commit log for r187323:
"""
Now that mem2reg understands how to cope with a slightly wider set of uses of
an alloca, we can pre-compute promotability while analyzing an alloca for
splitting in SROA. That lets us short-circuit the common case of a bunch of
trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for
typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to
within 20% of ScalarRepl for such code. My current benchmark for these numbers
is PR15412, but it fits the general pattern of IR emitted by Clang so it should
be widely applicable.
"""

llvm-svn: 188146
2013-08-11 02:17:11 +00:00
Chandler Carruth c17283b407 Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with
the more general set of patterns that are now handled by mem2reg and that we
can detect quickly while doing SROA's initial analysis. Notably, this allows it
to promote through no-op bitcast and GEP sequences. A core part of the
SSAUpdater approach is the ability to test whether a particular instruction is
part of the set being promoted. Testing this becomes significantly more complex
in the world where the operand to every load and store isn't the alloca itself.
I ended up using the approach of walking up the def-chain until we find the
alloca. I benchmarked this against keeping a set of pointer operands and
keeping a set of the loads and stores we care about, and this one seemed faster
although the difference was very small.

No test case yet because currently the rewriting always "fixes" the inputs to
not require this. The next patch which re-enables early promotion of easy cases
in SROA will include a test case that specifically exercises this aspect of the
alloca promoter.

llvm-svn: 188145
2013-08-11 01:56:15 +00:00
Chandler Carruth 45b136f4cf Reformat some bits of AllocaPromoter and simplify the name and type of
our visiting datastructures in the AllocaPromoter/SSAUpdater path of
SROA. Also shift the order if clears around to be more consistent.

No functionality changed here, this is just a cleanup.

llvm-svn: 188144
2013-08-11 01:03:18 +00:00
Arnold Schwaighofer 3dcdb89d69 Revert r188119 "Kill some duplicated code for removing unreachable BBs."
It is breaking builbots with libgmalloc enabled on Mac OS X.

$ cd llvm ; mkdir release ; cd release
$ ../configure --enable-optimized —prefix=$PWD/install
$ make
$ make check
$ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \
  gmalloc_path=/usr/lib/libgmalloc.dylib \
  ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll

llvm-svn: 188142
2013-08-10 20:16:06 +00:00
Michael Gottesman d6ce6cbdac [objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occurred.
I fixed the aforementioned problems that came up on some of the linux boxes.
Major thanks to Nick Lewycky for his help debugging!

rdar://14590914

llvm-svn: 188122
2013-08-09 23:22:27 +00:00
Peter Collingbourne 32090aba06 Kill some duplicated code for removing unreachable BBs.
This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp
to Utils/Local.cpp and uses it to replace the implementation of
llvm::removeUnreachableBlocks, which appears to do a strict subset
of what removeUnreachableBlocksFromFn does.

Differential Revision: http://llvm-reviews.chandlerc.com/D1334

llvm-svn: 188119
2013-08-09 22:47:24 +00:00
Peter Collingbourne ae66d57bcf DataFlowSanitizer: Remove unreachable BBs so IR continues to verify
under the args ABI.

Differential Revision: http://llvm-reviews.chandlerc.com/D1316

llvm-svn: 188113
2013-08-09 21:42:53 +00:00
Jakub Staszak 23ec6a97d1 Mark obviously const methods. Also use reference for parameters when possible.
llvm-svn: 188103
2013-08-09 20:53:48 +00:00
Michael Gottesman 6663c7d5fc Revert "[objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occured."
This reverts commit r187941.

The commit was passing on my os x box, but it is failing on some non-osx
platforms. I do not have time to look into it now, so I am reverting and will
recommit after I figure this out.

llvm-svn: 187946
2013-08-08 00:41:18 +00:00
Peter Collingbourne a5689e69af Fix ARM build.
llvm-svn: 187944
2013-08-08 00:15:27 +00:00
Michael Gottesman ddc89fcccd [objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occured.
rdar://14590914

llvm-svn: 187941
2013-08-07 23:56:41 +00:00
Michael Gottesman 0fecf98955 [objc-arc] Change 4 iterator methods which return const_iterators to be const methods.
llvm-svn: 187940
2013-08-07 23:56:34 +00:00
Hal Finkel 171817ee8a Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.

For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).

This will be used by the PowerPC backend in a follow-up commit.

llvm-svn: 187926
2013-08-07 22:49:12 +00:00
Peter Collingbourne e5d5b0c71e DataFlowSanitizer; LLVM changes.
DataFlowSanitizer is a generalised dynamic data flow analysis.

Unlike other Sanitizer tools, this tool is not designed to detect a
specific class of bugs on its own.  Instead, it provides a generic
dynamic data flow analysis framework to be used by clients to help
detect application-specific issues within their own code.

Differential Revision: http://llvm-reviews.chandlerc.com/D965

llvm-svn: 187923
2013-08-07 22:47:18 +00:00
Benjamin Kramer 6a4976d3e0 JumpThreading: Turn a select instruction into branching if it allows to thread one half of the select.
This is a common pattern coming out of simplifycfg generating gross code.

a:                                       ; preds = %entry
  %sel = select i1 %cmp1, double %add, double 0.000000e+00
  br label %b

b:
  %cond5 = phi double [ %sel, %a ], [ %sub, %entry ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

becomes

a:
  br i1 %cmp1, label %b, label %if.then

b:
  %cond5 = phi double [ %sub, %entry ], [ %add, %a ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

Skipping block b completely if possible.

llvm-svn: 187880
2013-08-07 10:29:38 +00:00
Bill Wendling 58f8cef83b Change the linkage of these global values to 'internal'.
The globals being generated here were given the 'private' linkage type. However,
this caused them to end up in different sections with the wrong prefix. E.g.,
they would be in the __TEXT,__const section with an 'L' prefix instead of an 'l'
(lowercase ell) prefix.

The problem is that the linker will eat a literal label with 'L'. If a weak
symbol is then placed into the __TEXT,__const section near that literal, then it
cannot distinguish between the literal and the weak symbol.

Part of the problems here was introduced because the address sanitizer converted
some C strings into constant initializers with trailing nuls. (Thus putting them
in the __const section with the wrong prefix.) The others were variables that
the address sanitizer created but simply had the wrong linkage type.

llvm-svn: 187827
2013-08-06 22:52:42 +00:00
Arnold Schwaighofer a7cd6bf3bb LoopVectorize: Allow vectorization of loops with lifetime markers
Patch by Marc Jessome!

llvm-svn: 187825
2013-08-06 22:37:52 +00:00
Jakub Staszak 27da123d66 Adjust file to the coding standard.
llvm-svn: 187808
2013-08-06 17:03:42 +00:00
Serge Pavlov 71044cbe16 Unbreak Debug build on Windows
llvm-svn: 187786
2013-08-06 08:44:18 +00:00
Tom Stellard aa664d9b92 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Matt Arsenault ff7dc7248e Fix missing -*- C++ -*-s
llvm-svn: 187758
2013-08-06 00:16:21 +00:00
Peter Collingbourne bace606657 Introduce an optimisation for special case lists with large numbers of literal entries.
Our internal regex implementation does not cope with large numbers
of anchors very efficiently.  Given a ~3600-entry special case list,
regex compilation can take on the order of seconds.  This patch solves
the problem for the special case of patterns matching literal global
names (i.e. patterns with no regex metacharacters).  Rather than
forming regexes from literal global name patterns, add them to
a StringSet which is checked before matching against the regex.
This reduces regex compilation time by an order of roughly thousands
when reading the aforementioned special case list, according to a
completely unscientific study.

No test cases.  I figure that any new tests for this code should
check that regex metacharacters are properly recognised.  However,
I could not find any documentation which documents the fact that the
syntax of global names in special case lists is based on regexes.
The extent to which regex syntax is supported in special case lists
should probably be decided on/documented before writing tests.

Differential Revision: http://llvm-reviews.chandlerc.com/D1150

llvm-svn: 187732
2013-08-05 17:48:04 +00:00
Alexey Samsonov f52b717db3 80-cols
llvm-svn: 187725
2013-08-05 13:19:49 +00:00
Nadav Rotem 5defea90e6 SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that come from different blocks.
Thanks Alexey Samsonov.

llvm-svn: 187663
2013-08-02 18:40:24 +00:00
Alexey Samsonov 9096968de5 Fix dereferencing end iterator in SimplifyCFG. Patch by Ye Mei.
llvm-svn: 187646
2013-08-02 08:06:43 +00:00
Matt Arsenault 87dc60761f Teach getOrEnforceKnownAlignment about address spaces
llvm-svn: 187629
2013-08-01 22:42:18 +00:00
Nadav Rotem e4e6e9ed47 Move the optlevel check to the frontend.
llvm-svn: 187628
2013-08-01 22:41:58 +00:00
Nadav Rotem 9153b3871d Only enable SLP-vectorization on O3 builds.
llvm-svn: 187595
2013-08-01 18:28:15 +00:00
Nadav Rotem 25f15358d2 80-col
llvm-svn: 187535
2013-07-31 22:17:45 +00:00
Owen Anderson c7be519dc0 Preserve fast-math flags when folding (fsub x, (fneg y)) to (fadd x, y).
llvm-svn: 187462
2013-07-30 23:53:17 +00:00
Matt Arsenault cacbb2377a Change behavior of calling bitcasted alias functions.
It will now only convert the arguments / return value and call
the underlying function if the types are able to be bitcasted.
This avoids using fp<->int conversions that would occur before.

llvm-svn: 187444
2013-07-30 20:45:05 +00:00
Nadav Rotem d9c74cc6d3 SLPVectorier: update the debug location for the new instructions.
llvm-svn: 187363
2013-07-29 18:18:46 +00:00
Chandler Carruth cd7c8cdfa1 Teach the AllocaPromoter which is wrapped around the SSAUpdater
infrastructure to do promotion without a domtree the same smarts about
looking through GEPs, bitcasts, etc., that I just taught mem2reg about.
This way, if SROA chooses to promote an alloca which still has some
noisy instructions this code can cope with them.

I've not used as principled of an approach here for two reasons:
1) This code doesn't really need it as we were already set up to zip
   through the instructions used by the alloca.
2) I view the code here as more of a hack, and hopefully a temporary one.

The SSAUpdater path in SROA is a real sore point for me. It doesn't make
a lot of architectural sense for many reasons:
- We're likely to end up needing the domtree anyways in a subsequent
  pass, so why not compute it earlier and use it.
- In the future we'll likely end up needing the domtree for parts of the
  inliner itself.
- If we need to we could teach the inliner to preserve the domtree. Part
  of the re-work of the pass manager will allow this to be very powerful
  even in large SCCs with many functions.
- Ultimately, computing a domtree has gotten significantly faster since
  the original SSAUpdater-using code went into ScalarRepl. We no longer
  use domfrontiers, and much of domtree is lazily done based on queries
  rather than eagerly.
- At this point keeping the SSAUpdater-based promotion saves a total of
  0.7% on a build of the 'opt' tool for me. That's not a lot of
  performance given the complexity!

So I'm leaving this a bit ugly in the hope that eventually we just
remove all of this nonsense.

I can't even readily test this because this code isn't reachable except
through SROA. When I re-instate the patch that fast-tracks allocas
already suitable for promotion, I'll add a testcase there that failed
before this change. Before that, SROA will fix any test case I give it.

llvm-svn: 187347
2013-07-29 09:06:53 +00:00
Nadav Rotem 750e42cba3 Don't vectorize when the attribute NoImplicitFloat is used.
llvm-svn: 187340
2013-07-29 05:13:00 +00:00
Rafael Espindola caa776be91 Fix -Wdocumentation warnings.
llvm-svn: 187336
2013-07-28 23:43:28 +00:00
Chandler Carruth 6b55dbea86 Update comments for SSAUpdater to use the modern doxygen comment
standards for LLVM. Remove duplicated comments on the interface from the
implementation file (implementation comments are left there of course).
Also clean up, re-word, and fix a few typos and errors in the commenst
spotted along the way.

This is in preparation for changes to these files and to keep the
uninteresting tidying in a separate commit.

llvm-svn: 187335
2013-07-28 22:00:33 +00:00
Chandler Carruth d31370e060 Temporarily revert r187323 until I update SSAUpdater to match mem2reg.
I forgot that we had two totally independent things here. :: sigh ::

llvm-svn: 187327
2013-07-28 09:05:49 +00:00
Chandler Carruth 9d96100ff0 Now that mem2reg understands how to cope with a slightly wider set of
uses of an alloca, we can pre-compute promotability while analyzing an
alloca for splitting in SROA. That lets us short-circuit the common case
of a bunch of trivially promotable allocas. This cuts 20% to 30% off the
run time of SROA for typical frontend-generated IR sequneces I'm seeing.
It gets the new SROA to within 20% of ScalarRepl for such code. My
current benchmark for these numbers is PR15412, but it fits the general
pattern of IR emitted by Clang so it should be widely applicable.

llvm-svn: 187323
2013-07-28 08:27:12 +00:00
Chandler Carruth d5b806a27f Thread DataLayout through the callers and into mem2reg. This will be
useful in a subsequent patch, but causes an unfortunate amount of noise,
so I pulled it out into a separate patch.

llvm-svn: 187322
2013-07-28 06:43:11 +00:00
Nadav Rotem 3e50c68956 Update the comment
llvm-svn: 187316
2013-07-27 23:28:47 +00:00
Chandler Carruth 8e3c4dc50e Don't use all the #ifdefs to hide the stats counters and instead rely on
their being optimized out in debug mode. Realistically, this just isn't
going to be the slow part anyways. This also fixes unused variable
warnings that are breaking LLD build bots. =/ I didn't see these at
first, and kept losing track of the fact that they were broken.

llvm-svn: 187297
2013-07-27 10:17:49 +00:00
Chandler Carruth e8f5812a30 Merge the removal of dead instructions and lifetime markers with the
analysis of the alloca. We don't need to visit all the users twice for
this. We build up a kill list during the analysis and then just process
it afterward. This recovers the tiny bit of performance lost by moving
to the visitor based analysis system as it removes one entire use-list
walk from mem2reg. In some cases, this is now faster than mem2reg was
previously.

llvm-svn: 187296
2013-07-27 09:43:30 +00:00
Nick Lewycky 0b68245ec8 Reimplement isPotentiallyReachable to make nocapture deduction much stronger.
Adds unit tests for it too.

Split BasicBlockUtils into an analysis-half and a transforms-half, and put the
analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable
into llvm::isPotentiallyReachable and move it into Analysis/CFG.

llvm-svn: 187283
2013-07-27 01:24:00 +00:00
Tom Stellard 8b1e021e85 SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches.  The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.

Patch by: Mei Ye

llvm-svn: 187278
2013-07-27 00:01:07 +00:00
Nadav Rotem cfd40da9b1 SLP Vectorier: Don't vectorize really short chains because they are already handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize.
llvm-svn: 187267
2013-07-26 23:07:55 +00:00
Nadav Rotem 9ce0f779bc SLP Vectorizer: Disable the vectorization of non power of two chains, such as <3 x float>, because we dont have a good cost model for these types.
llvm-svn: 187265
2013-07-26 22:53:11 +00:00
Owen Anderson d6d4da09f7 Fix variable name.
llvm-svn: 187253
2013-07-26 22:06:21 +00:00
Owen Anderson e37c2e4d11 When InstCombine tries to fold away (fsub x, (fneg y)) into (fadd x, y), it is
also worthwhile for it to look through FP extensions and truncations, whose
application commutes with fneg.

llvm-svn: 187249
2013-07-26 21:40:29 +00:00
Stephen Lin 4ef1387221 Correct case of m_UIToFp to m_UIToFP to match instruction name, add m_SIToFP for consistency.
llvm-svn: 187225
2013-07-26 17:55:00 +00:00
Chandler Carruth 9af38fc247 Re-implement the analysis of uses in mem2reg to be significantly more
robust. It now uses an InstVisitor and worklist to actually walk the
uses of the Alloca transitively and detect the pattern which we can
directly promote: loads & stores of the whole alloca and instructions we
can completely ignore.

Also, with this new implementation teach both the predicate for testing
whether we can promote and the promotion engine itself to use the same
code so we no longer have strange divergence between the two code paths.

I've added some silly test cases to demonstrate that we can handle
slightly more degenerate code patterns now. See the below for why this
is even interesting.

Performance impact: roughly 1% regression in the performance of SROA or
ScalarRepl on a large C++-ish test case where most of the allocas are
basically ready for promotion. The reason is because of silly redundant
work that I've left FIXMEs for and which I'll address in the next
commit. I wanted to separate this commit as it changes the behavior.
Once the redundant work in removing the dead uses of the alloca is
fixed, this code appears to be faster than the old version. =]

So why is this useful? Because the previous requirement for promotion
required a *specific* visit pattern of the uses of the alloca to verify:
we *had* to look for no more than 1 intervening use. The end goal is to
have SROA automatically detect when an alloca is already promotable and
directly hand it to the mem2reg machinery rather than trying to
partition and rewrite it. This is a 25% or more performance improvement
for SROA, and a significant chunk of the delta between it and
ScalarRepl. To get there, we need to make mem2reg actually capable of
promoting allocas which *look* promotable to SROA without have SROA do
tons of work to massage the code into just the right form.

This is actually the tip of the iceberg. There are tremendous potential
savings we can realize here by de-duplicating work between mem2reg and
SROA.

llvm-svn: 187191
2013-07-26 08:20:39 +00:00
Bill Schmidt 0a9170d931 [PowerPC] Support powerpc64le as a syntax-checking target.
This patch provides basic support for powerpc64le as an LLVM target.
However, use of this target will not actually generate little-endian
code.  Instead, use of the target will cause the correct little-endian
built-in defines to be generated, so that code that tests for
__LITTLE_ENDIAN__, for example, will be correctly parsed for
syntax-only testing.  Code generation will otherwise be the same as
powerpc64 (big-endian), for now.

The patch leaves open the possibility of creating a little-endian
PowerPC64 back end, but there is no immediate intent to create such a
thing.

The LLVM portions of this patch simply add ppc64le coverage everywhere
that ppc64 coverage currently exists.  There is nothing of any import
worth testing until such time as little-endian code generation is
implemented.  In the corresponding Clang patch, there is a new test
case variant to ensure that correct built-in defines for little-endian
code are generated.

llvm-svn: 187179
2013-07-26 01:35:43 +00:00
Rafael Espindola 17600e29fa Respect llvm.used in Internalize.
The language reference says that:

"If a symbol appears in the @llvm.used list, then the compiler,
assembler, and linker are required to treat the symbol as if there is
a reference to the symbol that it cannot see"

Since even the linker cannot see the reference, we must assume that
the reference can be using the symbol table. For example, a user can add
__attribute__((used)) to a debug helper function like dump and use it from
a debugger.

llvm-svn: 187103
2013-07-25 03:23:25 +00:00
Nick Lewycky 5b15037fc9 Check that TD isn't NULL before dereferencing it down this path.
llvm-svn: 187099
2013-07-25 02:55:14 +00:00
Rafael Espindola ec2375fb51 Make these methods const correct.
Thanks to Nick Lewycky for noticing it.

llvm-svn: 187098
2013-07-25 02:50:08 +00:00
Benjamin Kramer 328da33d19 TRE: Move class into anonymous namespace.
While there shrink a dangerously large SmallPtrSet.

llvm-svn: 187050
2013-07-24 16:12:08 +00:00
Chandler Carruth 58e25d3905 Fix a problem I introduced in r187029 where we would over-eagerly
schedule an alloca for another iteration in SROA. This only showed up
with a mixture of promotable and unpromotable selects and phis. Added
a test case for this.

llvm-svn: 187031
2013-07-24 12:12:17 +00:00
Chandler Carruth 83ea195d40 Fix PR16687 where we were incorrectly promoting an alloca that had
pending speculation for a phi node. The problem here is that we were
using growth of the specluation set as an indicator of whether
speculation would occur, and if the phi node is already in the set we
don't see it grow. This is a symptom of the fact that this signal is
a total hack.

Unfortunately, I couldn't really come up with a non-hacky way of
signaling that promotion remains valid *after* speculation occurs, such
that we only speculate when all else looks good for promotion. In the
end, I went with at least a much more explicit approach of doing the
work of queuing inside the phi and select processing and setting
a preposterously named flag to convey that we're in the special state of
requiring speculating before promotion.

Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
a testcase for this from a pretty giant, nasty assert in a big
application. =] The testcase was excellent.

llvm-svn: 187029
2013-07-24 09:47:28 +00:00
Matt Arsenault f64212b281 Fix spelling
llvm-svn: 186997
2013-07-23 22:20:57 +00:00
Nick Lewycky 6ab9d936d5 Remove extraneous null statement. No functionality change!
llvm-svn: 186893
2013-07-22 23:38:27 +00:00
Jakub Staszak d4d94065e3 Use switch instead of if. No functionality change.
llvm-svn: 186892
2013-07-22 23:38:16 +00:00
Jakub Staszak 8e1a6e7d53 Remove trailing spaces.
llvm-svn: 186890
2013-07-22 23:16:36 +00:00
Nadav Rotem cf0dcdc71c When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can.
llvm-svn: 186883
2013-07-22 22:18:07 +00:00
Jakub Staszak cb132face0 OldPtr is llvm::Instruction. Remove unneeded cast<>.
llvm-svn: 186880
2013-07-22 22:10:43 +00:00
Jakub Staszak 6b36db08f3 Change tabs to spaces.
llvm-svn: 186877
2013-07-22 21:11:30 +00:00
Matt Arsenault fb18323885 Fix spelling and grammar
llvm-svn: 186858
2013-07-22 18:59:58 +00:00
Nadav Rotem 8c45d4b27f Fix an obvious typo in the loop vectorizer where the cost model uses the wrong variable. The variable BlockCost is ignored.
We don't have tests for the effect of if-conversion loops because it requires a big test (that includes if-converted loops) and it is difficult to find and balance a loop to do the right thing.

llvm-svn: 186845
2013-07-22 17:10:48 +00:00
Nadav Rotem d7ff88a8d9 Delete unused helper functions.
llvm-svn: 186808
2013-07-22 05:19:22 +00:00
Benjamin Kramer 2fdb758ca8 mem2reg: Minor STL usage cleanup. No functionality change.
llvm-svn: 186790
2013-07-21 11:03:40 +00:00
Chandler Carruth 7aa9ebb546 Make the mem2reg interface use an ArrayRef as it keeps a copy of these
to iterate over.

llvm-svn: 186788
2013-07-21 08:37:58 +00:00
Nadav Rotem f6bb6a464c Revert a part of r186420. Don't forbid multiple store chains that merge.
llvm-svn: 186786
2013-07-21 06:12:57 +00:00
Chandler Carruth b1ca98c4d0 Hoist the rest of the logic for promoting single-store allocas into the
helper function. This leaves both trivial cases handled entirely in
helper functions and merely manages the list of allocas to process in
the run method.

The next step will be to handle all of the trivial promotion work prior
to even creating the core class and the subsequent simplifications that
enables.

llvm-svn: 186784
2013-07-21 01:52:33 +00:00
Chandler Carruth f9e7e1dd87 Hoist the rest of the logic for fully promoting allocas with all uses in
a single block into the helper routine. This takes advantage of the fact
that we can directly replace uses prior to any store with undef to
simplify matters and unconditionally promote allocas only used within
one block.

I've removed the special handling for the case of no stores existing.
This has no semantic effect but might slow things down. I'll fix that in
a later patch when I refactor this entire thing to be easier to manage
the different cases.

llvm-svn: 186783
2013-07-21 01:44:07 +00:00
Chandler Carruth e99f931516 Remove a method made dead by the prior refactoring.
llvm-svn: 186782
2013-07-21 00:01:34 +00:00
Chandler Carruth 420fafef93 Hoist the two trivial promotion routines out of the big class that
handles the general cases.

The hope is to refactor this so that we don't end up building the entire
class for the trivial cases. I also want to lift a lot of the early
pre-processing in the initial segment of run() into a separate routine,
and really none of it needs to happen inside the primary promotion
class.

These routines in particular used none of the actual state in the
promotion class, so they don't really make sense as members.

llvm-svn: 186781
2013-07-20 23:59:51 +00:00
Chandler Carruth 48e11fd76d Hoist the AllocaInfo struct to the top of the file.
This struct is nicely independent of everything else, and we already
needed a foward declaration here. It's simpler to just define it
immediately.

llvm-svn: 186780
2013-07-20 23:39:26 +00:00
Chandler Carruth 4711793e8a Sink a typedef and comparator down to the function that actually uses them.
llvm-svn: 186779
2013-07-20 23:36:19 +00:00
Rafael Espindola c2bb73fc8d Don't crash when llvm.compiler.used becomes empty.
GlobalOpt simplifies llvm.compiler.used by removing any members that are also
in the more strict llvm.used. Handle the special case where llvm.compiler.used
becomes empty.

llvm-svn: 186778
2013-07-20 23:33:15 +00:00
Chandler Carruth f3878f46ce Don't allocate the DIBuilder on the heap and remove all the complexity
that ensued from that.

llvm-svn: 186777
2013-07-20 23:33:06 +00:00
Chandler Carruth e62f211b77 Rename constructor parameters to follow the common member-shadowing
pattern and conform to the naming conventions.

llvm-svn: 186776
2013-07-20 23:23:47 +00:00
Chandler Carruth b3e8e6f10b Reformat the implementation of mem2reg with clang-format so that my
subsequent changes don't introduce inconsistencies.

llvm-svn: 186775
2013-07-20 23:20:08 +00:00
Chandler Carruth 985eb0b550 Remove a DenseMapInfo specialization for std::pair -- we have one of
those baked into DenseMap now.

llvm-svn: 186773
2013-07-20 23:09:05 +00:00
Chandler Carruth 019516109d Update mem2reg's comments to conform to the new doxygen standards. No
functionality changed.

llvm-svn: 186772
2013-07-20 22:20:05 +00:00
Benjamin Kramer 08e5070bf5 SROA: Microoptimization: Remove dead entries first, then sort.
While there replace an explicit struct with std::mem_fun.

llvm-svn: 186761
2013-07-20 08:38:34 +00:00
Stephen Lin a9b57f6bea InstCombine: call FoldOpIntoSelect for all floating binops, not just fmul
llvm-svn: 186759
2013-07-20 07:13:13 +00:00
Nadav Rotem e210839f5b fix an 80-col line.
llvm-svn: 186733
2013-07-19 23:14:01 +00:00
Nadav Rotem c069c25518 Use LLVMs ADTs that improve the compile time of this pass.
llvm-svn: 186732
2013-07-19 23:12:19 +00:00
Nadav Rotem 5c9a193a65 SLPVectorizer: Improve the compile time of isConsecutive by reordering the conditions that check GEPs and eliminate two of the calls to accumulateConstantOffset.
llvm-svn: 186731
2013-07-19 23:11:15 +00:00
Rafael Espindola 9aadcc4c0e s/compiler_used/compiler.used/.
We were incorrectly using compiler_used instead of compiler.used. Unfortunately
the passes using the broken name had tests also using the broken name.

llvm-svn: 186705
2013-07-19 18:44:51 +00:00
Chandler Carruth 6c321c131b Cleanup the stats counters for the new implementation. These actually
count the right things and have the right names.

llvm-svn: 186667
2013-07-19 10:57:36 +00:00
Chandler Carruth 1ed848d55c Fix another assert failure very similar to PR16651's test case. This
test case came from Benjamin and found the parallel bug in the vector
promotion code.

llvm-svn: 186666
2013-07-19 10:57:32 +00:00
Chandler Carruth 9f21fe1d65 Try to move to a more reasonable set of naming conventions given the new
implementation of the SROA algorithm. We were using the term 'partition'
in many places that no longer ever represented an actual partition, but
rather just an arbitrary slice of an alloca.

No functionality change intended here. Mostly just renaming of types,
functions, variables, and rewording of comments. Several comments were
rewritten to make a lot more sense in the new structure of things.

The stats are still weird and not reflective of how this really works.
I'll fix those up in a separate patch as it is a touch more semantic of
a change...

llvm-svn: 186659
2013-07-19 09:13:58 +00:00
Chandler Carruth 90a735d606 A long overdue cleanup in SROA to use 'DL' instead of 'TD' for the
DataLayout variables.

llvm-svn: 186656
2013-07-19 07:21:28 +00:00
Chandler Carruth 5955c9e4da Fix PR16651, an assert introduced in my recent re-work of the innards of
SROA.

The crux of the issue is that now we track uses of a partition of the
alloca in two places: the iterators over the partitioning uses and the
previously collected split uses vector. We weren't accounting for the
fact that the split uses might invalidate integer widening in ways other
than due to their width (in this case due to being volatile).

Further reduced testcase added to the tests.

llvm-svn: 186655
2013-07-19 07:12:23 +00:00
Eric Christopher 03b3e1118f Remove DIBuilder cache of variable TheCU and change the few
uses that wanted it. Also change the interface for createCompileUnit
to compensate. Fix comments that refer to TheCU as well.

llvm-svn: 186637
2013-07-19 00:51:47 +00:00
Nick Lewycky 03f3d34ffb Clean up some of this code a tiny bit, no functionality change.
llvm-svn: 186622
2013-07-18 22:32:32 +00:00
Eric Christopher a4b6cf14f6 Revert "Remove DIBuilder cache of variable TheCU and change the few"
This reverts commit r186599 as I didn't want to commit this yet.

llvm-svn: 186601
2013-07-18 19:13:06 +00:00
Eric Christopher d0b2150f01 Remove DIBuilder cache of variable TheCU and change the few
uses that wanted it. Also change the interface for createCompileUnit
to compensate. Fix comments that refer to TheCU as well.

llvm-svn: 186599
2013-07-18 19:11:29 +00:00
Nadav Rotem bb3398f000 Handle constants without going through SCEV.
llvm-svn: 186593
2013-07-18 18:34:21 +00:00
Nadav Rotem de2815a5f7 SLPVectorizer: Speedup isConsecutive by manually checking GEPs with multiple indices.
This brings the compile time of the SLP-Vectorizer to about 2.5% of OPT for my testcase.

llvm-svn: 186592
2013-07-18 18:20:45 +00:00
Chandler Carruth f0546402af Reapply r186316 with a fix for one bug where the code could walk off the
end of a vector. This was found with ASan. I've had one other report of
a crasher, but thus far been unable to reproduce the crash. It may well
be fixed with this version, and if not I'd like to get more information
from the build bots about what is happening.

See r186316 for the full commit log for the new implementation of the
SROA algorithm.

llvm-svn: 186565
2013-07-18 07:15:00 +00:00
Nadav Rotem 7d7036b8c6 SLPVectorizer: Speedup isConsecutive (that checks if two addresses are consecutive in memory) by checking for additional patterns that don't need to go through SCEV.
llvm-svn: 186563
2013-07-18 04:33:20 +00:00
Eric Christopher 7ab2c3ecb2 Add comparison operators for DIDescriptors to fix c++98 fallout
of operator bool change.

Also convert a variable in DebugIR.

llvm-svn: 186544
2013-07-17 23:25:22 +00:00
Nadav Rotem 43639e8492 Fix a comment.
llvm-svn: 186541
2013-07-17 22:41:16 +00:00
Stephen Lin 03f9fbbcd7 Restore r181216, which was partially reverted in r182499.
llvm-svn: 186533
2013-07-17 20:06:03 +00:00
Nadav Rotem 3072baeb9c Add a micro optimization to catch cases where the PtrA equals PtrB.
llvm-svn: 186531
2013-07-17 19:52:25 +00:00
Hal Finkel ec7cd26968 Fix comparisons of alloca alignment in inliner merging
Duncan pointed out a mistake in my fix in r186425 when only one of the allocas
being compared had the target-default alignment. This is essentially his
suggested solution. Thanks!

llvm-svn: 186510
2013-07-17 14:32:41 +00:00
Craig Topper 24048c9440 Mark a method 'const' and another 'static'.
llvm-svn: 186485
2013-07-17 03:54:53 +00:00
Craig Topper 1c4d667ca5 Make a few more static string pointers constant.
llvm-svn: 186484
2013-07-17 03:43:10 +00:00
Nadav Rotem 2202317fce SLPVectorizer: Accelerate the isConsecutive check by replacing the subtraction of the two values with a simple SCEV expression that adds the offset to one of the pointers that we compare.
llvm-svn: 186479
2013-07-17 00:48:31 +00:00
Nadav Rotem d2e8c4cdea flip the scev minus direction to simplify the code.
llvm-svn: 186466
2013-07-16 22:57:06 +00:00
Nadav Rotem 8f924f3891 SLPVectorizer: Improve the compile time of isConsecutive by adding a simple constant-gep check before using SCEV.
This check does not always work because not all of the GEPs use a constant offset, but it happens often enough to reduce the number of times we use SCEV.

llvm-svn: 186465
2013-07-16 22:51:07 +00:00
Rafael Espindola 6d35481c94 Add a wrapper for open.
This centralizes the handling of O_BINARY and opens the way for hiding more
differences (like how open behaves with directories).

llvm-svn: 186447
2013-07-16 19:44:17 +00:00
Peter Collingbourne 8b77f18da0 Make SpecialCaseList match full strings, as documented, using anchors.
Differential Revision: http://llvm-reviews.chandlerc.com/D1149

llvm-svn: 186431
2013-07-16 17:56:07 +00:00
Hal Finkel 9caa8f7ba7 When the inliner merges allocas, it must keep the larger alignment
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.

I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.

llvm-svn: 186425
2013-07-16 17:10:55 +00:00
Nadav Rotem 26bf9a0c75 SLPVectorizer: Reduce the compile time of the consecutive store lookup.
Process groups of stores in chunks of 16.

llvm-svn: 186420
2013-07-16 15:25:17 +00:00
Craig Topper d3a34f81f8 Add 'const' qualifiers to static const char* variables.
llvm-svn: 186371
2013-07-16 01:17:10 +00:00
Nadav Rotem 1c1d6c1666 PR16628: Fix a bug in the code that merges compares.
Compares return i1 but they compare different types.

llvm-svn: 186359
2013-07-15 22:52:48 +00:00
Stephen Lin 837bba1c51 Remove trailing whitespace
llvm-svn: 186333
2013-07-15 17:55:02 +00:00
Chandler Carruth e3899f2c2c Revert r186316 while I track down an ASan failure and an assert from
a bot.

This reverts the commit which introduced a new implementation of the
fancy SROA pass designed to reduce its overhead. I'll skip the huge
commit log here, refer to r186316 if you're looking for how this all
works and why it works that way.

llvm-svn: 186332
2013-07-15 17:36:21 +00:00
Chandler Carruth e74ff4c643 Reimplement SROA yet again. Same fundamental principle, but a totally
different core implementation strategy.

Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.

The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
   walk it once.
3) No more complex algorithms for associating a particular use with
   a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
   alloca, removing the need for some side datastructures.

Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.

I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.

Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.

Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.

I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.

llvm-svn: 186316
2013-07-15 10:30:19 +00:00
Craig Topper 06b3b6651e Add 'const' qualifier to some arrays.
llvm-svn: 186312
2013-07-15 08:02:13 +00:00
Craig Topper 5871321e49 Use llvm::array_lengthof to replace sizeof(array)/sizeof(array[0]).
llvm-svn: 186301
2013-07-15 04:27:47 +00:00
Nadav Rotem d9f3f4548e SLPVectorizer: change the order in which we search for vectorization candidates. Do stores first and PHIs second.
llvm-svn: 186277
2013-07-14 06:15:46 +00:00
Craig Topper b94011fd28 Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector size.
llvm-svn: 186274
2013-07-14 04:42:23 +00:00
Arnold Schwaighofer a92eeebde8 LoopVectorizer: Disallow reductions whose header phi is used outside the loop
If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.

lp:
  p = phi (0, lv)
  lv = lv + 1
  ...
  brcond , lp, outside

outside:
  usr = add 0, p

(Say the loop iterates two times, the value of p coming out of the loop is one).

We cannot just transform this to:

vlp:
  p = phi (<0,0>, lv)
  lv = lv + <1,1>
  ..
  brcond , lp, outside

outside:
  p_reduced = p[0] + [1];
  usr = add 0, p_reduced

(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).

We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.

PR16522

llvm-svn: 186256
2013-07-13 19:09:29 +00:00
Andrew Trick 0ae8c94f8f LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.
In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.

LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.

llvm-svn: 186241
2013-07-13 06:20:06 +00:00
Nick Lewycky 7459be6dc7 Add a microoptimization for urem.
llvm-svn: 186235
2013-07-13 01:16:47 +00:00
Joey Gouly a3250f22c2 Fix a crash in EvaluateInDifferentElementOrder where it would generate an
undef vector of the wrong type.

LGTM'd by Nick Lewycky on IRC.

llvm-svn: 186224
2013-07-12 23:08:06 +00:00
Andrew Trick a1e4118a46 LFTR improvement to avoid truncation.
This is a reimplemntation of the patch originally in r186107.

llvm-svn: 186215
2013-07-12 22:08:48 +00:00
Andrew Trick 2b71848ffe Cleanup LFTR logic.
llvm-svn: 186214
2013-07-12 22:08:44 +00:00
Andrew Trick 466555e50d Cleanup: rename a variable to make the logic easier to follow.
llvm-svn: 186213
2013-07-12 22:08:41 +00:00
Arnold Schwaighofer 9da9a43af8 TargetTransformInfo: address calculation parameter for gather/scather
Address calculation for gather/scather in vectorized code can incur a
significant cost making vectorization unbeneficial. Add infrastructure to add
cost.
Tests and cost model for targets will be in follow-up commits.

radar://14351991

llvm-svn: 186187
2013-07-12 19:16:02 +00:00
Chandler Carruth cf3715cadd Revert "indvars: Improve LFTR by eliminating truncation when comparing
against a constant."

This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:

  #include <stdio.h>
  int main() {
    unsigned char first = 0, last = 255;
    do { printf("%d\n", first); } while (first++ != last);
  }

Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.

llvm-svn: 186152
2013-07-12 11:18:55 +00:00
Nadav Rotem 89c41bf06a SLPVectorizer: Sink and enable CSE for ExtractElements.
llvm-svn: 186145
2013-07-12 06:09:24 +00:00
Nadav Rotem fa3c2db211 SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes.
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.

llvm-svn: 186139
2013-07-12 00:04:18 +00:00
Nadav Rotem db06b139fd Remove an argument that we dont use anymore.
llvm-svn: 186116
2013-07-11 20:56:13 +00:00
Andrew Trick 3095993d6f indvars: Improve LFTR by eliminating truncation when comparing against a constant.
Patch by Michele Scandale!

Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.

llvm-svn: 186107
2013-07-11 17:08:59 +00:00
Benjamin Kramer fc3ea6f4bc Don't use a potentially expensive shift if all we want is one set bit.
No functionality change.

llvm-svn: 186095
2013-07-11 16:05:50 +00:00
Arnold Schwaighofer e97c71b8fd LoopVectorize: Vectorize all accesses in address space zero with unit stride
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

llvm-svn: 186088
2013-07-11 15:21:55 +00:00
Duncan Sands e773c08021 TryToSimplifyUncondBranchFromEmptyBlock was checking that any common
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block.  This change
allows merging in the case where there is one or more incoming values
that are undef.  The undef values are rewritten to match the non-undef
value that flows from the other edge.  Patch by Mark Lacey.

llvm-svn: 186069
2013-07-11 08:28:20 +00:00
Nadav Rotem 08efb262a9 Fix a warning.
llvm-svn: 186064
2013-07-11 05:39:02 +00:00
Nadav Rotem b8dd66f655 SLPVectorizer: refactor the code that places extracts. Place the code that decides where to put extracts in the build-tree phase. This allows us to take the cost of the extracts into account.
llvm-svn: 186058
2013-07-11 04:54:05 +00:00
Michael Gottesman b40db26eae Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas.
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.

Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.

Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:

  1. By assumption A has no escaping alloca arguments itself so it can not
     access the caller's stack via its arguments.

  2. Since all of B's escaping alloca arguments are passed as parameters with
     the nocapture attribute, we know that B does not stash said escaping
     allocas in a manner that outlives B itself and thus could be accessed
     indirectly by A.

With the changes introduced by this patch:

  1. If we see any escaping allocas passed as a capturing argument, we do
     nothing and bail early.

  2. If we do not see any escaping allocas passed as captured arguments but we
     do see escaping allocas passed as nocapture arguments:

       i. We do not perform TRE to avoid PR962 since the code generator produces
          significantly worse code for the dynamic allocas that would be created
          by the TRE algorithm.

       ii. If we do not return twice, mark call sites without escaping allocas
           with the tail marker. *NOTE* This excludes functions with escaping
           nocapture allocas.

  3. If we do not see any escaping allocas at all (whether captured or not):

       i. If we do not have usage of setjmp, mark all callsites with the tail
          marker.

       ii. If there are no dynamic/variable sized allocas in the function,
           attempt to perform TRE on all callsites in the function.

Based off of a patch by Nick Lewycky.

rdar://14324281.

llvm-svn: 186057
2013-07-11 04:40:01 +00:00
Michael Gottesman 6eb95dc2f7 [objc-arc] Changed 'mode: c++' => 'C++' at Nick Lewycky's suggestion. Also removed unnecessary mode: c++ lines from .cpp files.
llvm-svn: 186026
2013-07-10 18:49:00 +00:00
Peter Collingbourne 49062a97cf Implement categories for special case lists.
A special case list can now specify categories for specific globals,
which can be used to instruct an instrumentation pass to treat certain
functions or global variables in a specific way, such as by omitting
certain aspects of instrumentation while keeping others, or informing
the instrumentation pass that a specific uninstrumentable function
has certain semantics, thus allowing the pass to instrument callers
according to those semantics.

For example, AddressSanitizer now uses the "init" category instead of
global-init prefixes for globals whose initializers should not be
instrumented, but which in all other respects should be instrumented.

The motivating use case is DataFlowSanitizer, which will have a
number of different categories for uninstrumentable functions, such
as "functional" which specifies that a function has pure functional
semantics, or "discard" which indicates that a function's return
value should not be labelled.

Differential Revision: http://llvm-reviews.chandlerc.com/D1092

llvm-svn: 185978
2013-07-09 22:03:17 +00:00
Peter Collingbourne 2eb048d230 Introduce a SpecialCaseList ctor which takes a MemoryBuffer to make
it more unit testable, and fix memory leak in the other ctor.

Differential Revision: http://llvm-reviews.chandlerc.com/D1090

llvm-svn: 185976
2013-07-09 22:03:09 +00:00
Peter Collingbourne 015370e23a Rename BlackList class to SpecialCaseList and move it to Transforms/Utils.
Differential Revision: http://llvm-reviews.chandlerc.com/D1089

llvm-svn: 185975
2013-07-09 22:02:49 +00:00
Nadav Rotem d7b574e5b3 Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform.
llvm-svn: 185970
2013-07-09 21:38:08 +00:00
Nadav Rotem 861bef7dd0 Set the default insert point to the first instruction, and not to end()
llvm-svn: 185953
2013-07-09 17:55:36 +00:00
David Majnemer eeed73b981 InstCombine: Fix typo in comment for visitICmpInstWithInstAndIntCst
llvm-svn: 185916
2013-07-09 09:24:35 +00:00
David Majnemer 72d76275ac InstCombine: variations on 0xffffffff - x >= 4
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)

These are nice, they get rid of the xor.

llvm-svn: 185915
2013-07-09 09:20:58 +00:00
David Majnemer 414d4e58aa InstCombine: X & -C != -C -> X <= u ~C
Tests were added in r185910 somehow.

llvm-svn: 185912
2013-07-09 08:09:32 +00:00
David Majnemer bafa537eb7 Commit r185909 was a misapplied patch, fix it
llvm-svn: 185910
2013-07-09 07:58:32 +00:00
David Majnemer f2a9a513c7 InstCombine: add more transforms
C1-X <u C2 -> (X|(C2-1)) == C1
C1-X >u C2 -> (X|C2) == C1
X-C1 <u C2 -> (X & -C2) == C1
X-C1 >u C2 -> (X & ~C2) == C1

llvm-svn: 185909
2013-07-09 07:50:59 +00:00
Eli Bendersky 07b0e451ca Fix comment
llvm-svn: 185888
2013-07-08 23:57:07 +00:00
Nadav Rotem c9c57518ab This patch changes the saved IRBuilder insert point from BasicBlock::iterator to AssertingVH.
Commit 185883 fixes a bug in the IRBuilder that should fix the ASan bot. AssertingVH can help in exposing some RAUW problems.

Thanks Ben and Alexey!

llvm-svn: 185886
2013-07-08 23:31:13 +00:00
Michael Gottesman c1b648f6c0 [objc-arc] Fix assertion in EraseInstruction so that noop on null calls when passed null do not trigger the assert.
The specific case of interest is when objc_retainBlock is passed null.

llvm-svn: 185885
2013-07-08 23:30:23 +00:00
David Majnemer fa90a0b325 InstCombine: Fold X-C1 <u 2 -> (X & -2) == C1
Back in r179493 we determined that two transforms collided with each
other.  The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform.  However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.

llvm-svn: 185808
2013-07-08 11:53:08 +00:00
Nadav Rotem 2ee35771a8 Clear the builder insert point between tree-vectorization phases.
llvm-svn: 185777
2013-07-07 14:57:18 +00:00
Nadav Rotem 2041b742d4 SLPVectorizer: Implement DCE as part of vectorization.
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.

llvm-svn: 185774
2013-07-07 06:57:07 +00:00
Michael Gottesman 618df456e2 [objc-arc] Remove the alias analysis part of r185764.
Upon further reflection, the alias analysis part of r185764 is not a safe
change.

llvm-svn: 185770
2013-07-07 04:18:03 +00:00
Michael Gottesman a72630d453 [objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do not modify the ref count of an objc object and additionally are inert for modref purposes.
llvm-svn: 185769
2013-07-07 01:52:55 +00:00
Michael Gottesman e557da26db [objc-arc] When we initialize ARCRuntimeEntryPoints, make sure we reset all references to entrypoint declarations as well.
llvm-svn: 185764
2013-07-06 18:43:05 +00:00
Benjamin Kramer 3d90a8f4f9 Reassociate: Remove unnecessary default operator=.
llvm-svn: 185757
2013-07-06 15:10:13 +00:00
Michael Gottesman 4d9439c73f [objc-arc] Performed some small cleanups in ARCRuntimeEntryPoints and added an llvm_unreachable after the switch to quiet -Wreturn_type errors.
llvm-svn: 185746
2013-07-06 02:18:56 +00:00
Michael Gottesman 574d521c85 [objc-arc] Renamed Module => TheModule in ARCRuntimeEntryPoints. Also did some small cleanups.
This fixes an issue that came up due to -fpermissive on the bots.

llvm-svn: 185744
2013-07-06 01:57:32 +00:00
Michael Gottesman 01df45056e Removed trailing whitespace.
llvm-svn: 185743
2013-07-06 01:41:35 +00:00
Michael Gottesman 0b912b2673 [objc-arc] Updated ObjCARCContract to use ARCRuntimeEntryPoints.
llvm-svn: 185742
2013-07-06 01:39:26 +00:00
Michael Gottesman 14acfacc48 [objc-arc] Updated ObjCARCOpts to use ARCRuntimeEntryPoints.
llvm-svn: 185741
2013-07-06 01:39:23 +00:00
Michael Gottesman a94186a4c0 [objc-arc] Refactor runtime entrypoint declaration entrypoint creation.
This is the first patch in a series of 3 patches which clean up how we create
runtime function declarations in the ARC optimizer when they do not exist
already in the IR.

Currently we have a bunch of duplicated code in ObjCARCOpts, ObjCARCContract
that does this. This patch refactors that code into a separate class called
ARCRuntimeEntryPoints which lazily creates the declarations for said
entrypoints.

The next two patches will consist of the work of refactoring
ObjCARCContract/ObjCARCOpts to use this new code.

llvm-svn: 185740
2013-07-06 01:39:18 +00:00
Nick Lewycky cff2cf8e3a Fix annotation of unlink. Should fix builder.
llvm-svn: 185738
2013-07-06 00:59:28 +00:00
Nick Lewycky c2ec0725ce Extend 'readonly' and 'readnone' to work on function arguments as well as
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.

llvm-svn: 185735
2013-07-06 00:29:58 +00:00
Rafael Espindola 155cf0f3a6 Use sys::fs::createTemporaryFile.
llvm-svn: 185719
2013-07-05 20:14:52 +00:00
Sylvestre Ledru 751447a3ac Remove a useless declarations (found by scan-build)
llvm-svn: 185709
2013-07-05 15:58:12 +00:00
David Majnemer c2a990bc00 InstCombine: (icmp eq B, 0) | (icmp ult A, B) -> (icmp ule A, B-1)
This transform allows us to turn IR that looks like:
  %1 = icmp eq i64 %b, 0
  %2 = icmp ult i64 %a, %b
  %3 = or i1 %1, %2
  ret i1 %3

into:
  %0 = add i64 %b, -1
  %1 = icmp uge i64 %0, %a
  ret i1 %1

which means we go from lowering:
        cmpq    %rsi, %rdi
        setb    %cl
        testq   %rsi, %rsi
        sete    %al
        orb     %cl, %al
        ret

to lowering:
        decq    %rsi
        cmpq    %rdi, %rsi
        setae   %al
        ret

llvm-svn: 185677
2013-07-05 00:31:17 +00:00
David Majnemer 37f8f445de InstCombine: Reimplementation of visitUDivOperand
This transform was originally added in r185257 but later removed in
r185415.  The original transform would create instructions speculatively
and then discard them if the speculation was proved incorrect.  This has
been replaced with a scheme that splits the transform into two parts:
preflight and fold.  While we preflight, we build up fold actions that
inform the folding stage on how to act.

llvm-svn: 185667
2013-07-04 21:17:49 +00:00
Benjamin Kramer 371722288c SimplifyCFG: Teach switch generation some patterns that instcombine forms.
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.

llvm-svn: 185632
2013-07-04 14:22:02 +00:00
Nick Lewycky a833c667e2 Tabs to spaces. No functionality change.
llvm-svn: 185612
2013-07-04 03:51:53 +00:00
Craig Topper af0dea1347 Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185606
2013-07-04 01:31:24 +00:00
Craig Topper 31ee5866de Use SmallVectorImpl::iterator/const_iterator instead of SmallVector to avoid specifying the vector size.
llvm-svn: 185540
2013-07-03 15:07:05 +00:00
Evgeniy Stepanov dc6d7eb860 [msan] Unpoison stack allocations and undef values in blacklisted functions.
This changes behavior of -msan-poison-stack=0 flag from not poisoning stack
allocations to actively unpoisoning them.

llvm-svn: 185538
2013-07-03 14:39:14 +00:00
Michael Gottesman 2db11161a8 Added support in FunctionAttrs for adding relevant function/argument attributes for the posix call gettimeofday.
This implies annotating it as nounwind and its arguments as nocapture. To be
conservative, we do not annotate the arguments with noalias since some platforms
do not have restrict on the declaration for gettimeofday.

llvm-svn: 185502
2013-07-03 04:00:54 +00:00
Manman Ren d0e67aa1ce Debug Info: cleanup
llvm-svn: 185456
2013-07-02 18:37:35 +00:00
Hal Finkel fdbe161b1a Revert r185257 (InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms)
I'm reverting this commit because:

 1. As discussed during review, it needs to be rewritten (to avoid creating and
then deleting instructions).

 2. This is causing optimizer crashes. Specifically, I'm seeing things like
this:

    While deleting: i1 %
    Use still stuck around after Def is destroyed:  <badref> = select i1 <badref>, i32 0, i32 1
    opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.

   I'd guess that these will go away once we're no longer creating/deleting
instructions here, but just in case, I'm adding a regression test.

Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message:

	InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms

	Real world code sometimes has the denominator of a 'udiv' be a
	'select'.  LLVM can handle such cases but only when the 'select'
	operands are symmetric in structure (both select operands are a constant
	power of two or a left shift, etc.).  This falls apart if we are dealt a
	'udiv' where the code is not symetric or if the select operands lead us
	to more select instructions.

	Instead, we should treat the LHS and each select operand as a distinct
	divide operation and try to optimize them independently.  If we can
	to simplify each operation, then we can replace the 'udiv' with, say, a
	'lshr' that has a new select with a bunch of new operands for the
	select.

llvm-svn: 185415
2013-07-02 05:21:11 +00:00
Nick Lewycky 26fcc51f15 Add missing break statements. Noticed by inspection.
llvm-svn: 185414
2013-07-02 05:02:56 +00:00
Manman Ren 74c188f026 Debug Info: clean up usage of Verify.
No functionality change. It should suffice to check the type of a debug info
metadata, instead of calling Verify.

llvm-svn: 185383
2013-07-01 21:02:01 +00:00
Arnold Schwaighofer ef51cf202b LoopVectorize: Math functions only read rounding mode
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.

llvm-svn: 185299
2013-07-01 00:54:44 +00:00
Stephen Lin 2e551adcd9 DeadArgumentElimination: keep return value on functions that have a live argument with the 'returned' attribute (rather than generate invalid IR); however, if both can be eliminated, both will be
llvm-svn: 185290
2013-06-30 20:26:21 +00:00
Benjamin Kramer 4093f29366 InstCombine: Also turn selects fed by an and into arithmetic when the types don't match.
Inserting a zext or trunc is sufficient. This pattern is somewhat common in
LLVM's pointer mangling code.

llvm-svn: 185270
2013-06-29 21:17:04 +00:00
Benjamin Kramer 4ab72f9b9a LoopVectorizer: Pack MemAccessInfo pairs.
llvm-svn: 185263
2013-06-29 17:52:08 +00:00
Benjamin Kramer 53545693d7 Move helper classes into anonymous namespaces.
llvm-svn: 185262
2013-06-29 17:02:06 +00:00
David Majnemer 5953d3712a InstCombine: FoldGEPICmp shouldn't change sign of base pointer comparison
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
  %gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
  %gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
  %cmp.i = icmp ult i8* %gep.i, %gep2.i
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = icmp ne i1 %cmp.i, %cmp.i1
  ret i1 %cmp

into:
  %cmp.i = icmp slt [1 x i8]* %a, %b
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = xor i1 %cmp.i, %cmp.i1
  ret i1 %cmp

By preserving the original sign, we now get:
  ret i1 false

This fixes PR16483.

llvm-svn: 185259
2013-06-29 10:28:04 +00:00
David Majnemer 92a8a7d45a InstCombine: Small whitespace cleanup in FoldGEPICmp
llvm-svn: 185258
2013-06-29 09:45:35 +00:00
David Majnemer 797227eea6 InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms
Real world code sometimes has the denominator of a 'udiv' be a
'select'.  LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.).  This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.

Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently.  If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.

llvm-svn: 185257
2013-06-29 08:40:07 +00:00
Nadav Rotem 0a25727f31 We preserve the CFG and some of the analysis passes.
llvm-svn: 185251
2013-06-29 05:38:15 +00:00
Nadav Rotem e00343446c Update docs.
llvm-svn: 185250
2013-06-29 05:37:19 +00:00
David Majnemer b889e405eb InstCombine: Optimize (1 << X) Pred CstP2 to X Pred Log2(CstP2)
We may, after other optimizations, find ourselves with IR that looks
like:

  %shl = shl i32 1, %y
  %cmp = icmp ult i32 %shl, 32

Instead, we should just compare the shift count:

  %cmp = icmp ult i32 %y, 5

llvm-svn: 185242
2013-06-28 23:42:03 +00:00
Nadav Rotem 060be733a5 SLP Vectorizer: Add support for trees with external users.
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.

llvm-svn: 185230
2013-06-28 22:07:09 +00:00
Nadav Rotem 9ce3fedcdd LoopVectorizer: Refactor the code that checks if it is safe to predicate blocks.
In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks.
We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault.

llvm-svn: 185214
2013-06-28 20:46:27 +00:00
Daniel Malea b17b1cd6f5 Remove needless include (unistd.h) in DebugIR pass
- should unbreak Windows builds

llvm-svn: 185198
2013-06-28 19:19:44 +00:00
Daniel Malea 0673464a92 Add missing header for DebugIR
- missed svn add...

llvm-svn: 185194
2013-06-28 19:07:59 +00:00
Daniel Malea 31321fa53d Remove limitation on DebugIR that made it require existing debug metadata.
- Build debug metadata for 'bare' Modules using DIBuilder
- DebugIR can be constructed to generate an IR file (to be seen by a debugger)
  or not in cases where the user already has an IR file on disk.

llvm-svn: 185193
2013-06-28 19:05:23 +00:00
Arnold Schwaighofer ce2c766f61 LoopVectorize: Pull dyn_cast into setDebugLocFromInst
llvm-svn: 185168
2013-06-28 17:14:48 +00:00
Arnold Schwaighofer 3b27b992ca LoopVectorize: Use static function instead of DebugLocSetter class
I used the class to safely reset the state of the builder's debug location.  I
think I have caught all places where we need to set the debug location to a new
one. Therefore, we can replace the class by a function that just sets the debug
location.

llvm-svn: 185165
2013-06-28 16:26:54 +00:00
Manman Ren 983a16c08a Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.

Also update testing cases to make them conform to the format of DI classes.

llvm-svn: 185135
2013-06-28 05:43:10 +00:00
Arnold Schwaighofer 12ecb331af LoopVectorize: Preserve debug location info
radar://14169017

llvm-svn: 185122
2013-06-28 00:38:54 +00:00
Matt Arsenault 5d2e85f6d7 Fix using arg_end() - arg_begin() instead of arg_size()
llvm-svn: 185121
2013-06-28 00:25:40 +00:00
Michael Gottesman 79b0967548 Revert "Revert "[APFloat] Removed APFloat constructor which initialized to either zero/NaN but allowed you to arbitrarily set the category of the float.""
This reverts commit r185099.

Looks like both the ppc-64 and mips bots are still failing after I reverted this
change.

Since:

1. The mips bot always performs a clean build,
2. The ppc64-bot failed again after a clean build (I asked the ppc-64
maintainers to clean the bot which they did... Thanks Will!),

I think it is safe to assume that this change was not the cause of the failures
that said builders were seeing. Thus I am recomitting.

llvm-svn: 185111
2013-06-27 21:58:19 +00:00
Michael Gottesman ccaf3321f1 Revert "[APFloat] Removed APFloat constructor which initialized to either zero/NaN but allowed you to arbitrarily set the category of the float."
This reverts commit r185095. This is causing a FileCheck failure on
the 3dnow intrinsics on at least the mips/ppc bots but not on the x86
bots.

Reverting while I figure out what is going on.

llvm-svn: 185099
2013-06-27 20:40:11 +00:00
Arnold Schwaighofer 38de7cd464 LoopVectorize: Cache edge masks created during if-conversion
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.

llvm-svn: 185097
2013-06-27 20:31:06 +00:00
Michael Gottesman 03255a1675 [APFloat] Removed APFloat constructor which initialized to either zero/NaN but allowed you to arbitrarily set the category of the float.
The category which an APFloat belongs to should be dependent on the
actual value that the APFloat has, not be arbitrarily passed in by the
user. This will prevent inconsistency bugs where the category and the
actual value in APFloat differ.

I also fixed up all of the references to this constructor (which were
only in LLVM).

llvm-svn: 185095
2013-06-27 19:50:52 +00:00
Arnold Schwaighofer a2dd195fb3 LoopVectorize: Use vectorized loop invariant gep index anchored in loop
Use vectorized instruction instead of original instruction anchored in the
original loop.

Fixes PR16452 and t2075.c of PR16455.

llvm-svn: 185081
2013-06-27 15:11:55 +00:00
Arnold Schwaighofer ccd6c9929b LoopVectorize: Don't store a reversed value in the vectorized value map
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.

This fixes 3 test cases of PR16455.

llvm-svn: 185051
2013-06-27 00:45:41 +00:00
Michael Gottesman 41748d7c86 Added support for the Builtin attribute.
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,

rdar://problem/13727199

llvm-svn: 185049
2013-06-27 00:25:01 +00:00
Nadav Rotem 8edefb3665 No need to use a Set when a vector would do.
llvm-svn: 185047
2013-06-27 00:14:13 +00:00
Nadav Rotem 93f880fb77 SLP: When searching for vectorization opportunities scan the blocks in post-order because we grow chains upwards.
llvm-svn: 185041
2013-06-26 23:44:45 +00:00
Nadav Rotem 7f0d6d7975 SLP: Dont erase instructions during vectorization because it prevents the outerloops from iterating over the instructions.
llvm-svn: 185040
2013-06-26 23:43:23 +00:00
Michael Gottesman c2af8d6273 In InstCombine{AddSub,MulDivRem} convert APFloat.isFiniteNonZero() && !APFloat.isDenormal => APFloat.isNormal.
llvm-svn: 185037
2013-06-26 23:17:31 +00:00
Eric Christopher b8c608ea39 Revert "Debug Info: clean up usage of Verify." as it's breaking bots.
This reverts commit r185020

llvm-svn: 185032
2013-06-26 22:44:57 +00:00
Manman Ren aa00ce0e8f Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify.

llvm-svn: 185020
2013-06-26 21:26:10 +00:00
Nadav Rotem 4c5b2d1de6 Erase all of the instructions that we RAUWed
llvm-svn: 184969
2013-06-26 17:16:09 +00:00
Nadav Rotem f4ca3994b8 Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited.
llvm-svn: 184966
2013-06-26 16:54:53 +00:00
Kostya Serebryany 5e276f9dbc [asan] workaround for PR16277: don't instrument AllocaInstr with alignment more than the redzone size
llvm-svn: 184928
2013-06-26 09:49:52 +00:00
Kostya Serebryany 9f5213f20f [asan] add option -asan-keep-uninstrumented-functions
llvm-svn: 184927
2013-06-26 09:18:17 +00:00
Nick Lewycky 5cd9538b90 dbgs() << Instruction doesn't print a newline on the end any more. Update these
debug statements to add a missing newline. Also canonicalize to '\n' instead of
"\n"; the latter calls a function with a loop the former does not.

llvm-svn: 184897
2013-06-26 00:30:18 +00:00
Nadav Rotem 0794acc1da SLPVectorizer: support slp-vectorization of PHINodes between basic blocks
llvm-svn: 184888
2013-06-25 23:04:09 +00:00
Bob Wilson acfc01dedf Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca.  This patch just adds a check
to skip that conversion when it is unnecessary.  This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.

llvm-svn: 184870
2013-06-25 19:09:50 +00:00
Nadav Rotem 3de032a3b6 Fix a typo in the code that collected the costs recursively.
llvm-svn: 184827
2013-06-25 05:30:56 +00:00
Nadav Rotem 9c7c997a7e Rename the variable to fix a warning. Thanks Andy Gibbs.
llvm-svn: 184749
2013-06-24 15:59:47 +00:00
Arnold Schwaighofer b252c11ccc Reapply 184685 after the SetVector iteration order fix.
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.

"LoopVectorize: Use the dependence test utility class

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598"

llvm-svn: 184724
2013-06-24 12:09:15 +00:00
Arnold Schwaighofer 91472fa4fc LoopVectorize: Use SetVector for the access set
We are creating the runtime checks using this set so we need a deterministic
iteration order.

llvm-svn: 184723
2013-06-24 12:09:12 +00:00
Chandler Carruth 08e1b8742b Add a flag to defer vectorization into a phase after the inliner and its
CGSCC pass manager. This should insulate the inlining decisions from the
vectorization decisions, however it may have both compile time and code
size problems so it is just an experimental option right now.

Adding this based on a discussion with Arnold and it seems at least
worth having this flag for us to both run some experiments to see if
this strategy is workable. It may solve some of the regressions seen
with the loop vectorizer.

llvm-svn: 184698
2013-06-24 07:21:47 +00:00
Arnold Schwaighofer 58ca945f38 Revert "LoopVectorize: Use the dependence test utility class"
This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

llvm-svn: 184690
2013-06-24 06:10:41 +00:00
Arnold Schwaighofer b914a7e2ef LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

llvm-svn: 184685
2013-06-24 03:55:48 +00:00
Arnold Schwaighofer d517976758 LoopVectorize: Add utility class for checking dependency among accesses
This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.

The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.

llvm-svn: 184684
2013-06-24 03:55:45 +00:00
Arnold Schwaighofer d57419696d LoopVectorize: Add utility class for building sets of dependent accesses
Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.

llvm-svn: 184683
2013-06-24 03:55:44 +00:00
Nadav Rotem 210e86d7c4 SLP Vectorizer: Add support for vectorizing parts of the tree.
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree.  With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.

Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.

llvm-svn: 184681
2013-06-24 02:52:43 +00:00
Nadav Rotem 0323925d51 SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences.
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.

llvm-svn: 184674
2013-06-23 21:57:27 +00:00
Nadav Rotem 78428401e9 SLP Vectorizer: Erase instructions outside the vectorizeTree method.
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.

llvm-svn: 184671
2013-06-23 19:38:56 +00:00
Nadav Rotem eb65e67eea SLP Vectorizer: Implement a simple CSE optimization for the gather sequences.
llvm-svn: 184660
2013-06-23 06:15:46 +00:00
Nadav Rotem 80de0a28f1 SLP Vectorizer: Implement multi-block slp-vectorization.
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).

llvm-svn: 184647
2013-06-22 21:34:10 +00:00
Benjamin Kramer 40d7f354b5 Revert "FunctionAttrs: Merge attributes once instead of doing it for every argument."
It doesn't work as I intended it to.  This reverts commit r184638.

llvm-svn: 184641
2013-06-22 16:56:32 +00:00
Benjamin Kramer 76b7bd0e75 FunctionAttrs: Merge attributes once instead of doing it for every argument.
It has become an expensive operation. No functionality change.

llvm-svn: 184638
2013-06-22 15:51:19 +00:00
Michael Gottesman 9799cf7fb3 [objc-arc-opts] Make IsTrackingImpreciseReleases a const method.
Thanks to Bill Wendling for pointing this out!

llvm-svn: 184593
2013-06-21 20:52:49 +00:00
Michael Gottesman e3943d0554 [objc-arc-opts] Now that PtrState.RRI is encapsulated in PtrState, make PtrState.RRI private and delete the TODO.
llvm-svn: 184587
2013-06-21 19:44:30 +00:00
Michael Gottesman 4f6ef11763 [objc-arc-opts] Encapsulated PtrState.RRI.{Calls,ReverseInsertPts} into several methods on PtrState.
llvm-svn: 184586
2013-06-21 19:44:27 +00:00
Michael Gottesman f040118167 [objcarcopts] Encapsulated PtrState.RRI.IsTrackingImpreciseRelease() => PtrState.IsTrackingImpreciseRelease().
llvm-svn: 184583
2013-06-21 19:12:38 +00:00
Michael Gottesman 2f2945973a [objcarcopts] Encapsulate PtrState.RRI.CFGHazardAfflicted via methods PtrState.{IsCFGHazardAfflicted,SetCFGHazardAfflicted}.
llvm-svn: 184582
2013-06-21 19:12:36 +00:00
Michael Gottesman f701d3f864 [objcarcopts] Encapsulate PtrState.RRI.ReleaseMetadata into the methods PtrState.GetReleaseMetadata() and PtrState.SetReleaseMetadata().
llvm-svn: 184534
2013-06-21 07:03:07 +00:00
Michael Gottesman b82a179606 [objcarcopts] Encapsulate PtrState.RRI.IsTailCallRelease into the method PtrState.IsTailCallRelease() and PtrState.SetTailCallRelease().
llvm-svn: 184533
2013-06-21 07:00:44 +00:00
Michael Gottesman 9313225e72 [obcjarcopts] Encapsulate PtrState.RRI.KnownSafe in the methods PtrState.IsKnownSafe and PtrState.SetKnownSafe.
This is apart of a series of patches to encapsulate PtrState.RRI and
make PtrState.RRI a private field of PtrState.

*NOTE* This is actually the second commit in the patch stream. I should
have put this note on the first such commit r184528.

llvm-svn: 184532
2013-06-21 06:59:02 +00:00
Michael Gottesman b7deb4cd79 [objcarcopts] Some more minor code cleanups/comment additions.
llvm-svn: 184531
2013-06-21 06:54:31 +00:00
Michael Gottesman 4773a10cfb [objcarcopts] Refactor out the RRInfo merging code from PtrState into RRInfo::Merge.
I also added some comments and performed minor code cleanups.

llvm-svn: 184528
2013-06-21 05:42:08 +00:00
Nadav Rotem e1713e5fcf SLP Vectorizer: do not search for store-chains that are wider than the vector-register size.
llvm-svn: 184527
2013-06-21 04:18:13 +00:00
Meador Inge dfb08a2cb8 Remove the simplify-libcalls pass (finally)
This commit completely removes what is left of the simplify-libcalls
pass.  All of the functionality has now been migrated to the instcombine
and functionattrs passes.  The following C API functions are now NOPs:

  1. LLVMAddSimplifyLibCallsPass
  2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls

llvm-svn: 184459
2013-06-20 19:48:07 +00:00
Nadav Rotem b488beefeb Clang-format the SLP vectorizer. No functionality change.
llvm-svn: 184446
2013-06-20 17:54:36 +00:00
Nadav Rotem 14a89c5428 SLPVectorization: Add a basic support for cross-basic block slp vectorization.
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.

llvm-svn: 184444
2013-06-20 17:41:45 +00:00
Nadav Rotem c41028a013 Change the debug type to match the debug type that is used by vecutils.cpp.
This change makes it easier to filter debug messages.

llvm-svn: 184440
2013-06-20 16:38:05 +00:00
Michael Gottesman 3cb77ab98a [APFloat] Converted all references to APFloat::isNormal => APFloat::isFiniteNonZero.
Turns out all the references were in llvm and not in clang.

llvm-svn: 184356
2013-06-19 21:23:18 +00:00
Bill Wendling 7a639ea2a4 Access the TargetLoweringInfo from the TargetMachine object instead of caching it. The TLI may change between functions. No functionality change.
llvm-svn: 184352
2013-06-19 21:07:11 +00:00
Matt Arsenault d46fce1141 Move StructurizeCFG out of R600 to generic Transforms.
Register it with PassManager

llvm-svn: 184343
2013-06-19 20:18:24 +00:00
Quentin Colombet 145eb97d3a LSR: Fix the parameters used to compute the scaling factor cost.
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.

The added test case exercices that behavior.

<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")

llvm-svn: 184341
2013-06-19 19:59:41 +00:00
Nadav Rotem 1e9668ea81 SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst).
llvm-svn: 184325
2013-06-19 17:33:16 +00:00
Nadav Rotem 86e848c849 SLPVectorizer: start constructing chains at stores that are not power of two.
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.

This changes accelerates bullet by 12% and 471_omnetpp by 5%.

llvm-svn: 184317
2013-06-19 15:57:29 +00:00
Nadav Rotem e98da7f548 SLPVectorizer: vectorize compares and selects.
llvm-svn: 184282
2013-06-19 05:49:52 +00:00
Nadav Rotem 4f3224f3ed Document the return value and fix a typo.
llvm-svn: 184281
2013-06-19 05:47:33 +00:00
Nadav Rotem 1f96427da0 Scan the successor blocks and use the PHI nodes as a hint for possible chain roots.
llvm-svn: 184201
2013-06-18 15:58:05 +00:00
Nadav Rotem 3349feac4e Add a return value to make this function more useful.
llvm-svn: 184200
2013-06-18 15:57:12 +00:00
Nick Lewycky 0fdd01965e Fix nondeterminism in .gcno file generation.
llvm-svn: 184174
2013-06-18 06:38:21 +00:00
Pekka Jaaskelainen eb90fd1c3b Fix for a regression caused by the LoopVectorizer when
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.

llvm-svn: 184103
2013-06-17 18:49:06 +00:00
Nadav Rotem cde24ef389 Disable vectorization for -Oz.
llvm-svn: 184089
2013-06-17 17:22:40 +00:00
Nadav Rotem 7dd8210b71 Enable the loop vectorizer by default for -Os and -O2.
llvm-svn: 184084
2013-06-17 16:23:34 +00:00
Jakub Staszak 4898e62ac0 Use 0 instead of NULL.
llvm-svn: 184044
2013-06-15 12:20:44 +00:00
Benjamin Kramer 9ddfaf2be6 PruneEH: Only merge attribute sets when used. No functionality change.
llvm-svn: 184041
2013-06-15 10:55:39 +00:00
Derek Schuff ec9dc01b33 Fix DeleteDeadVarargs not to crash on functions referenced by BlockAddresses
This pass was assuming that if hasAddressTaken() returns false for a
function, the function's only uses are call sites.  That's not true
because there can be references by BlockAddresses too.

Fix the pass to handle this case.  Fix
BlockAddress::replaceUsesOfWithOnConstant() to allow a function's type
to be changed by RAUW'ing the function with a bitcast of the recreated
function.

Patch by Mark Seaborn.

llvm-svn: 183933
2013-06-13 19:51:17 +00:00
Rafael Espindola 8d30480344 Always remove an alias when we rename the target.
Should fix the dragonegg build bots.

llvm-svn: 183845
2013-06-12 16:45:47 +00:00
Rafael Espindola 3bc8e71909 Move PathV2.h to Path.h
Most clients have already been moved from Path V1 to V2. The ones using V1
now include PathV1.h explicitly.

llvm-svn: 183801
2013-06-11 22:21:28 +00:00
Rafael Espindola a82555c0f8 Change how globalopt handles aliases in llvm.used.
Instead of a custom implementation of replaceAllUsesWith, we just call
replaceAllUsesWith and recreate llvm.used and llvm.compiler-used.

This change is particularity interesting because it makes llvm see
through what clang is doing with static used functions in extern "C"
contexts. With this change, running clang -O2 in

extern "C" {
  __attribute__((used)) static void foo() {}
}

produces

@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to
i8*)], section "llvm.metadata"
define internal void @foo() #0 {
entry:
  ret void
}

llvm-svn: 183756
2013-06-11 17:48:06 +00:00
Tim Northover 64280fbba1 Make DeadArgumentElimination more conservative on variadic functions
Variadic functions are particularly fragile in the face of ABI changes, so this
limits how much the pass changes them

llvm-svn: 183625
2013-06-09 02:17:27 +00:00
Shuxin Yang 140d592d84 Fix a potential bug in r183584.
r183584 tries to derive some info from the code *AFTER* a call and apply
these derived info to the code *BEFORE* the call, which is not always safe
as the call in question may never return, and in this case, the derived
info is invalid.
  
  Thank Duncan for pointing out this potential bug.

rdar://14073661 

llvm-svn: 183606
2013-06-08 04:56:05 +00:00
Shuxin Yang bd254f2601 Fix an assertion in MemCpyOpt pass.
The MemCpyOpt pass is capable of optimizing:
      callee(&S); copy N bytes from S to D.
    into:
      callee(&D);
subject to some legality constraints. 

  Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
while D is an opaque-typed, 'sret' formal argument of function being compiled.
i.e. the signature of the func being compiled is something like this:
  T caller(...,%opaque* noalias nocapture sret %D, ...)

  The fix is that when come across such situation, instead of calling some
utility functions to get the size of D's type (which will crash), we simply
assume D has at least N bytes as implified by the copy-instruction.

rdar://14073661 

llvm-svn: 183584
2013-06-07 22:45:21 +00:00
Michael Gottesman 9e7261c874 [objc-arc] Ensure that the cfg path count does not overflow when we multiply TopDownPathCount/BottomUpPathCount.
rdar://12480535

llvm-svn: 183489
2013-06-07 06:16:49 +00:00
Jakub Staszak 96ff4d6d3b Simplify code. No functionality change.
llvm-svn: 183461
2013-06-06 23:34:59 +00:00
Nadav Rotem 99e529ea3c Jeffrey Yasskin volunteered to benchmark the vectorizer on -O2 or -Os when compiling chrome. This patch adds a new flag to enable vectorization on all levels and not only on -O3. It should go away once we make a decision.
llvm-svn: 183456
2013-06-06 22:35:47 +00:00
Jakub Staszak bddea11bc5 Re-apply "Use IRBuilder instead of ConstantInt methods." with the fixed issues.
llvm-svn: 183439
2013-06-06 20:18:46 +00:00
Rafael Espindola a7bbc0b740 Revert "Use IRBuilder instead of ConstantInt methods. It simplifies code a little bit."
This reverts commit 183328. It caused pr16244 and broke the bots.

llvm-svn: 183422
2013-06-06 17:03:05 +00:00
Jakub Staszak 9de494e0ee Remove unneeded cast<>.
llvm-svn: 183363
2013-06-06 00:49:57 +00:00
Jakub Staszak 461d1fe6fc Use IRBuilder instead of ConstantInt methods.
llvm-svn: 183360
2013-06-06 00:37:23 +00:00
Jakub Staszak 2f390b755a Use IRBuilder instead of ConstantInt methods. It simplifies code a little bit.
llvm-svn: 183328
2013-06-05 18:27:02 +00:00
David Majnemer 29130c5e8d IndVarSimplify: check if loop invariant expansion can trap
IndVarSimplify is willing to move divide instructions outside of their
loop bodies if they are invariant of the loop.  However, it may not be
safe to expand them if we do not know if they can trap.

Instead, check to see if it is not safe to expand the instruction and
skip the expansion.

This fixes PR16041.

Testcase by Rafael Ávila de Espíndola.

llvm-svn: 183239
2013-06-04 17:51:58 +00:00
Rafael Espindola a5e536ab0e Second part of pr16069
The problem this time seems to be a thinko. We were assuming that in the CFG

A
| \
|  B
| /
C

speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.

llvm-svn: 183226
2013-06-04 14:11:59 +00:00
Hans Wennborg 5cf30be6e4 Typo: s/caes/cases/ in SimplifyCFG
llvm-svn: 183219
2013-06-04 11:22:30 +00:00
Nick Lewycky 688d668e5c Delete dead safety check.
llvm-svn: 183167
2013-06-03 23:15:20 +00:00
David Majnemer c82f27af2a SimplifyCFG: Do not transform PHI to select if doing so would be unsafe
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.

We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.

Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.

llvm-svn: 183152
2013-06-03 20:43:12 +00:00
David Majnemer 8e7dd2f628 SimplifyCFG: Small cleanup, use ICmpInst::isEquality()
llvm-svn: 183151
2013-06-03 20:39:50 +00:00
Kostya Serebryany 9e62b301e6 [asan] ASan Linux MIPS32 support (llvm part), patch by Jyun-Yan Y
llvm-svn: 183104
2013-06-03 14:46:56 +00:00
Nick Lewycky 3f715e260a When determining the new index for an insertelement, we may not assume that an
index greater than the size of the vector is invalid. The shuffle may be
shrinking the size of the vector. Fixes a crash!

Also drop the maximum recursion depth of the safety check for this
optimization to five.

llvm-svn: 183080
2013-06-01 20:51:31 +00:00
David Majnemer 91142c485e SimplifyCFG: Fix typo in comment for ComputeSpeculationCost
llvm-svn: 183078
2013-06-01 19:43:23 +00:00
Benjamin Kramer 7c275640e7 Move getRealLinkageName to a common place and remove all the duplicates of it.
Also simplify code a bit while there. No functionality change.

llvm-svn: 183076
2013-06-01 17:51:14 +00:00
Arnold Schwaighofer 7b1b4db35e LoopVectorize: Change API call to get the backedge taken count
Use ScalarEvolution's getBackedgeTakenCount API instead of getExitCount since
that is really what we want to know. Using the more specific getExitCount was
safe because we made sure that there is only one exiting block.

No functionality change.

llvm-svn: 183047
2013-05-31 21:48:56 +00:00
Quentin Colombet bf490d4a32 Loop Strength Reduce: Scaling factor cost.
Account for the cost of scaling factor in Loop Strength Reduce when rating the
formulae. This uses a target hook.

The default implementation of the hook is: if the addressing mode is legal, the
scaling factor is free.

<rdar://problem/13806271>

llvm-svn: 183045
2013-05-31 21:29:03 +00:00
Arnold Schwaighofer 70a9be5297 LoopVectorize: PHIs with only outside users should prevent vectorization
We check that instructions in the loop don't have outside users (except if
they are reduction values). Unfortunately, we skipped this check for
if-convertable PHIs.

Fixes PR16184.

llvm-svn: 183035
2013-05-31 19:53:50 +00:00
Quentin Colombet 8aa7abe2ae Modify how the formulae are rated in Loop Strength Reduce.
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.

Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.

<rdar://problem/13973908>

llvm-svn: 183021
2013-05-31 17:20:29 +00:00
Rafael Espindola 65281bf36e Simplify multiplications by vectors whose elements are powers of 2.
Patch by Andrea Di Biagio.

llvm-svn: 183005
2013-05-31 14:27:15 +00:00
Evgeniy Stepanov 888385e40f [msan] Handle mixed track-origins and keep-going settings (llvm part).
Before this change, each module defined a weak_odr global __msan_track_origins 
with a value of 1 if origin tracking is enabled, 0 if disabled. If there are 
modules with different values, any of them may win. If 0 wins, and there is at 
least one module with 1, the program will most likely crash.

With this change, __msan_track_origins is only emitted if origin tracking is 
on. Then runtime library detects if there is at least one module with origin 
tracking, and enables runtime support for it.

llvm-svn: 182997
2013-05-31 12:04:29 +00:00
Nick Lewycky a2b7720618 Reapply with r182909 with a fix to the calculation of the new indices for
insertelement instructions.

llvm-svn: 182976
2013-05-31 00:59:42 +00:00
Evgeniy Stepanov 2c14269883 Revert r182909.
PR/16177

llvm-svn: 182919
2013-05-30 09:40:17 +00:00
Nick Lewycky d7f27094c0 Swizzle vector inputs if it helps us eliminate shuffles.
llvm-svn: 182909
2013-05-30 04:33:38 +00:00
NAKAMURA Takumi d11b42aaad LoopVectorize.cpp: Fix abuse of StringRef on Twine. Twine captures the pointer of StringRef.
llvm-svn: 182820
2013-05-29 03:13:47 +00:00
NAKAMURA Takumi d57ea87080 Whitespace.
llvm-svn: 182819
2013-05-29 03:13:41 +00:00
Paul Redmond 5fdf836ba4 Add support for llvm.vectorizer metadata
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
  by making the root of additional loop metadata.
  - Loop::isAnnotatedParallel now looks for llvm.loop and associated
    llvm.mem.parallel_loop_access
  - document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
  - document llvm.vectorizer.* metadata
  - add utility class LoopVectorizerHints for getting/setting loop metadata
  - use llvm.vectorizer.width=1 to indicate already vectorized instead of
    already_vectorized
- update existing tests that used llvm.loop.parallel and
  llvm.vectorizer.already_vectorized

Reviewed by: Nadav Rotem

llvm-svn: 182802
2013-05-28 20:00:34 +00:00
James Molloy f6f121e277 Extend RemapInstruction and friends to take an optional new parameter, a ValueMaterializer.
Extend LinkModules to pass a ValueMaterializer to RemapInstruction and friends to lazily create Functions for lazily linked globals. This is a big win when linking small modules with large (mostly unused) library modules.

llvm-svn: 182776
2013-05-28 15:17:05 +00:00
Evgeniy Stepanov fca012334b [msan] Fix argument shadow alignment.
llvm-svn: 182771
2013-05-28 13:07:43 +00:00
Michael J. Spencer df1ecbd734 Replace Count{Leading,Trailing}Zeros_{32,64} with count{Leading,Trailing}Zeros.
llvm-svn: 182680
2013-05-24 22:23:49 +00:00
Michael Gottesman e67f40c514 [objc-arc] KnownSafe does not imply that it is safe to perform code motion across CFG edges since even if it is safe to remove RR pairs, we may still be able to move a retain/release into a loop.
rdar://13949644

llvm-svn: 182670
2013-05-24 20:44:05 +00:00
Michael Gottesman 5a91bbf33a [objc-arc] Make sure that multiple owners is propogated correctly through the pass via the usage of a global data structure.
rdar://13750319

llvm-svn: 182669
2013-05-24 20:44:02 +00:00
Benjamin Kramer 6ac1e62377 LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in it, don't assert on those cases.
Fixes PR16139.

llvm-svn: 182656
2013-05-24 18:05:35 +00:00
Joey Gouly b34294d0e4 Run clang-format over the scalarizePHI function.
llvm-svn: 182640
2013-05-24 12:33:28 +00:00
Joey Gouly 83699284be scalarizePHI needs to insert the next ExtractElement in the same block
as the BinaryOperator, *not* in the block where the IRBuilder is currently
inserting into. Fixes a bug where scalarizePHI would create instructions
that would not dominate all uses.

llvm-svn: 182639
2013-05-24 12:29:54 +00:00
Daniel Malea fddddbeab0 Re-implement DebugIR in a way that does not subclass AssemblyWriter:
- move AsmWriter.h from public headers into lib
- marked all AssemblyWriter functions as non-virtual; no need to override them
- DebugIR now "plugs into" AssemblyWriter with an AssemblyAnnotationWriter helper
- exposed flags to control hiding of a) debug metadata b) debug intrinsic calls

C/R: Paul Redmond

llvm-svn: 182617
2013-05-23 22:34:33 +00:00
Benjamin Kramer ad5c24f161 More symbols that should be static.
llvm-svn: 182590
2013-05-23 16:09:15 +00:00
Michael Gottesman 740db977f6 [objc-arc] Fixed number of prefixing slashes in some comments in a function from 3 to 2 to match the rest of ObjCARCOpts.
llvm-svn: 182557
2013-05-23 02:35:21 +00:00
Nadav Rotem 9e00eb38a2 SLPVectorizer: Change the order in which new instructions are added to the function.
We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs).
* Imroved the numbering API.
* Changed the placement of new instructions to the last root.
* Fixed a bug with external tree users with non-zero lane.
* Fixed a bug in the placement of in-tree users.

llvm-svn: 182508
2013-05-22 19:47:32 +00:00
Jean-Luc Duprat 0dda6f168c This is an update to a previous commit (r181216).
The earlier change list introduced the following inst combines:
B * (uitofp i1 C) —> select C, B, 0
A * (1 - uitofp i1 C) —> select C, 0, A
select C, 0, B + select C, A, 0 —> select C, A, B

Together these 3 changes would simplify :
A * (1 - uitofp i1 C) + B * uitofp i1 C 
down to :
select C, B, A

In practice we found that the first two substitutions can have a
negative effect on performance, because they reduce opportunities to
use FMA contractions; between the two options FMAs are often the
better choice.  This change list amends the previous one to enable
just these inst combines:

select C, B, 0 + select C, 0, A —> select C, B, A
A * (1 - uitofp i1 C) + B * uitofp i1 C —> select C, B, A

llvm-svn: 182499
2013-05-22 18:29:31 +00:00
Arnold Schwaighofer 12b0d1cda0 LoopVectorize: Make Value pointers that could be RAUW'ed a VH
The Value pointers we store in the induction variable list can be RAUW'ed by a
call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing
in some other places where we store pointers that could potentially be RAUW'ed.

Fixes PR16073.

llvm-svn: 182485
2013-05-22 16:54:56 +00:00
Evgeniy Stepanov ebd7f8e7ef [msan] A no-op implementation of VarArg handling.
This stuff is used on platforms where MSan does not have a proper VarArg
implementation (anything other than x86_64 at the moment).

llvm-svn: 182375
2013-05-21 12:27:47 +00:00
Bill Wendling 5f4740390e Remove unused #include.
llvm-svn: 182315
2013-05-20 20:59:12 +00:00
Hal Finkel a969df84ab Rename LoopSimplify.h to LoopUtils.h
As discussed, LoopUtils.h is a better name.

llvm-svn: 182314
2013-05-20 20:46:30 +00:00
Hal Finkel a12d82b421 Expose InsertPreheaderForLoop from LoopSimplify to other passes
Other passes, PPC counter-loop formation for example, also need to add loop
preheaders outside of the regular loop simplification pass. This makes
InsertPreheaderForLoop a global function so that it can be used by other
passes.

No functionality change intended.

llvm-svn: 182299
2013-05-20 16:47:07 +00:00
Arnold Schwaighofer 693a1ca628 LoopVectorize: Handle single edge PHIs
We might encouter single edge PHIs - handle them with an identity select.

Fixes PR15990.

llvm-svn: 182199
2013-05-18 18:38:34 +00:00
Matt Arsenault 52ddb7bcdd Add missing -*- C++ -*- to headers
llvm-svn: 182164
2013-05-17 21:43:39 +00:00
Benjamin Kramer d84a63398e LoopVectorize: Simplify code. No functionality change.
llvm-svn: 182100
2013-05-17 14:48:17 +00:00
Evgeniy Stepanov 1e7643243d [msan] Switch TLS globals to initial-exec model.
They are always defined in the main executable.

llvm-svn: 181994
2013-05-16 09:14:05 +00:00
Arnold Schwaighofer 88e7fddc8c LoopVectorize: Move call of canHoistAllLoads to canVectorizeWithIfConvert
We only want to check this once, not for every conditional block in the loop.

No functionality change (except that we don't perform a check redudantly
anymore).

llvm-svn: 181942
2013-05-15 22:38:14 +00:00
Michael Gottesman b4e7f4d841 [objc-arc] Fixed a spelling error and made the statistic descriptions be consistent about their usage of periods.
llvm-svn: 181901
2013-05-15 17:43:03 +00:00
Arnold Schwaighofer 09cee97270 LoopVectorize: Fix comments
No functionality change.

llvm-svn: 181862
2013-05-15 02:02:45 +00:00
Arnold Schwaighofer 2d920477a4 LoopVectorize: Hoist conditional loads if possible
InstCombine can be uncooperative to vectorization and sink loads into
conditional blocks. This prevents vectorization.

Undo this optimization if there are unconditional memory accesses to the same
addresses in the loop.

radar://13815763

llvm-svn: 181860
2013-05-15 01:44:30 +00:00
Sylvestre Ledru 149e281aa8 Fix two typo
llvm-svn: 181848
2013-05-14 23:36:24 +00:00
Manman Ren b3c52fb45b GlobalOpt: fix an issue where CXAAtExitFn points to a deleted function.
CXAAtExitFn was set outside a loop and before optimizations where functions
can be deleted. This patch will set CXAAtExitFn inside the loop and after
optimizations.

Seg fault when running LTO because of accesses to a deleted function.
rdar://problem/13838828

llvm-svn: 181838
2013-05-14 21:52:44 +00:00
Michael Gottesman 0c8b562851 Removed trailing whitespace.
llvm-svn: 181760
2013-05-14 06:40:10 +00:00
Arnold Schwaighofer 2e7a922a15 LoopVectorize: Handle loops with multiple forward inductions
We used to give up if we saw two integer inductions. After this patch, we base
further induction variables on the chosen one like we do in the reverse
induction and pointer induction case.

Fixes PR15720.

radar://13851975

llvm-svn: 181746
2013-05-14 00:21:18 +00:00
Michael Gottesman f3f9e3b10a [objc-arc-opts] Added debug statements when we set and unset whether a pointer is known positive.
llvm-svn: 181745
2013-05-14 00:08:09 +00:00
Michael Gottesman a76143eeee [objc-arc-opts] In the presense of an alloca unconditionally remove RR pairs if and only if we are both KnownSafeBU/KnownSafeTD rather than just either or.
In the presense of a block being initialized, the frontend will emit the
objc_retain on the original pointer and the release on the pointer loaded from
the alloca. The optimizer will through the provenance analysis realize that the
two are related (albiet different), but since we only require KnownSafe in one
direction, will match the inner retain on the original pointer with the guard
release on the original pointer. This is fixed by ensuring that in the presense
of allocas we only unconditionally remove pointers if both our retain and our
release are KnownSafe (i.e. we are KnownSafe in both directions) since we must
deal with the possibility that the frontend will emit what (to the optimizer)
appears to be unbalanced retain/releases.

An example of the miscompile is:

  %A = alloca
  retain(%x)
  retain(%x) <--- Inner Retain
  store %x, %A
  %y = load %A
  ... DO STUFF ...
  release(%y)
  call void @use(%x)
  release(%x) <--- Guarding Release

getting optimized to:

  %A = alloca
  retain(%x)
  store %x, %A
  %y = load %A
  ... DO STUFF ...
  release(%y)
  call void @use(%x)

rdar://13750319

llvm-svn: 181743
2013-05-13 23:49:42 +00:00
Matt Beaumont-Gay e55d9492e3 Move a couple more statistics inside '#ifndef NDEBUG'.
Suppresses an unused-variable warning in -Asserts builds.

llvm-svn: 181733
2013-05-13 21:10:49 +00:00
Michael Gottesman 993fbf704a [objc-arc-opts] Add comment to BBState making it clear that get{TopDown,BottomUp}PtrState will create a new PtrState object if it does not find a PtrState for Arg.
llvm-svn: 181726
2013-05-13 19:40:39 +00:00
Michael Gottesman 9fc50b82a4 [objc-arc] Move the before optimization statistics gathering phase out of OptimizeIndividualCalls.
This makes the statistics gathering completely independent of the actual
optimization occuring, preventing any sort of bleeding over from occuring.

Additionally, it simplifies a switch statement in the non-statistic gathering case.

llvm-svn: 181719
2013-05-13 18:29:07 +00:00
Duncan Sands 0480b9b54e Suppress GCC compiler warnings in release builds about variables that are only
read in asserts.

llvm-svn: 181689
2013-05-13 07:50:47 +00:00
Nadav Rotem 33dcf0a70f SLPVectorizer: Swap LHS and RHS. No functionality change.
llvm-svn: 181684
2013-05-13 05:13:13 +00:00
Nadav Rotem ce42cc6d4d SLPVectorizer: Fix a bug in the code that generates extracts for values with multiple users.
The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract.

llvm-svn: 181674
2013-05-12 22:58:45 +00:00
Nadav Rotem cbf6d24d50 SLPVectorizer: Clear the map that maps between scalars to vectors after each round of vectorization.
Testcase in the next commit.

llvm-svn: 181673
2013-05-12 22:55:57 +00:00
David Majnemer 6c30f49af3 InstCombine: Flip the order of two urem transforms
There are two transforms in visitUrem that conflict with each other.

*) One, if a divisor is a power of two, subtracts one from the divisor
   and turns it into a bitwise-and.
*) The other unwraps both operands if they are surrounded by zext
   instructions.

Flipping the order allows the subtraction to go beneath the sign
extension.

llvm-svn: 181668
2013-05-12 00:07:05 +00:00
Arnold Schwaighofer f2305e4467 LoopVectorize: Use the widest induction variable type
Use the widest induction type encountered for the cannonical induction variable.

We used to turn the following loop into an empty loop because we used i8 as
induction variable type and truncated 1024 to 0 as trip count.

int a[1024];
void fail() {
  int reverse_induction = 1023;
  unsigned char forward_induction = 0;
  while ((reverse_induction) >= 0) {
    forward_induction++;
    a[reverse_induction] = forward_induction;
    --reverse_induction;
  }
}

radar://13862901

llvm-svn: 181667
2013-05-11 23:04:28 +00:00
Arnold Schwaighofer a544fefa32 LoopVectorize: Use variable instead of repeated function call
No functionality change intended.

llvm-svn: 181666
2013-05-11 23:04:26 +00:00
Arnold Schwaighofer 1ba84df437 LoopVectorize: Use IRBuilder interface in more places
No functionality change intended.

llvm-svn: 181665
2013-05-11 23:04:24 +00:00
David Majnemer 470b077bca InstCombine: Turn urem to bitwise-and more often
Use isKnownToBeAPowerOfTwo in visitUrem so that we may more aggressively
fold away urem instructions.

llvm-svn: 181661
2013-05-11 09:01:28 +00:00
Nadav Rotem cdfb48d2fe SLPVectorizer: Add support for trees with external users.
For example:
bar() {
  int a = A[i];
  int b = A[i+1];
  B[i] = a;
  B[i+1] = b;
  foo(a);  <--- a is used outside the vectorized expression.
}

llvm-svn: 181648
2013-05-10 22:59:33 +00:00
Nadav Rotem 0686e5cb05 Add a debug print
llvm-svn: 181647
2013-05-10 22:56:18 +00:00
Benjamin Kramer 14e915f7b4 InstCombine: Don't claim to be able to evaluate any shl in a zexted type.
The shift amount may be larger than the type leading to undefined behavior.
Limit the transform to constant shift amounts. While there update the bits to
clear in the result which may enable additional optimizations.

PR15959.

llvm-svn: 181604
2013-05-10 16:26:37 +00:00
Benjamin Kramer a6645e8b8f InstCombine: Verify the type before transforming uitofp into select.
PR15952.

llvm-svn: 181586
2013-05-10 09:16:52 +00:00
Dmitri Gribenko 9bf66a5fd0 Fix a documentation warning: \bried -> \brief
llvm-svn: 181551
2013-05-09 21:16:18 +00:00
Shuxin Yang 1d8d7e4d38 [GVN] Split critical-edge on the fly, instead of postpone edge-splitting to next
iteration.
  
  This on step toward non-iterative GVN. My local hack suggests that getting rid
of iteration will speedup GVN by 30%+ on a medium sized input (2k LOC, C++).
I cannot explain why not 2x or more at this moment.

llvm-svn: 181532
2013-05-09 18:34:27 +00:00
Rafael Espindola 007521673b Don't replace an alias in llvm.used with its target.
When we replace an internal alias with its target, be careful not to
replace the entry in llvm.used (and llvm.compiler_used).

llvm-svn: 181524
2013-05-09 17:22:59 +00:00
Benjamin Kramer 21b972ae94 InstCombine: Don't just copy known bits from the first operand of an srem.
That's obviously wrong. Conservatively restrict it to the sign bit, which
matches the original intention of this analysis. Fixes PR15940.

llvm-svn: 181518
2013-05-09 16:32:32 +00:00
Arnold Schwaighofer 2e8c69cf97 LoopVectorizer: Don't assert on the absence of induction variables
A computable loop exit count does not imply the presence of an induction
variable. Scalar evolution can return a value for an infinite loop.

Fixes PR15926.

llvm-svn: 181495
2013-05-09 00:32:18 +00:00
Daniel Malea 3c5bed1670 Add DebugIR pass -- emits IR file and replace source lines with IR lines in MD
- requires existing debug information to be present
- fixes up file name and line number information in metadata
- emits a "<orig_filename>-debug.ll" succinct IR file (without !dbg metadata
  or debug intrinsics) that can be read by a debugger
- initialize pass in opt tool to enable the "-debug-ir" flag
- lit tests to follow

llvm-svn: 181467
2013-05-08 20:44:14 +00:00
Nick Lewycky 5fb1963f2a Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInst
by switching to a ValueMap. Patch by Andrea DiBiagio!

llvm-svn: 181397
2013-05-08 09:00:10 +00:00
Arnold Schwaighofer 3610139ac5 LoopVectorizer: Improve reduction variable identification
The two nested loops were confusing and also conservative in identifying
reduction variables. This patch replaces them by a worklist based approach.

llvm-svn: 181369
2013-05-07 21:55:37 +00:00
Arnold Schwaighofer e78b76fbed LoopVectorize: getConsecutiveVector must respect signed arithmetic
We were passing an i32 to ConstantInt::get where an i64 was needed and we must
also pass the sign if we pass negatives numbers. The start index passed to
getConsecutiveVector must also be signed.

Should fix PR15882.

llvm-svn: 181286
2013-05-07 04:37:05 +00:00
David Majnemer 70f286d95f InstCombine: (X ^ signbit) + C -> X + (signbit ^ C)
llvm-svn: 181249
2013-05-06 21:21:31 +00:00
Andrew Trick 9c72b071fe Rotate multi-exit loops even if the latch was simplified.
Test case by Michele Scandale!

Fixes PR10293: Load not hoisted out of loop with multiple exits.

There are few regressions with this patch, now tracked by
rdar:13817079, and a roughly equal number of improvements. The
regressions are almost certainly back luck because LoopRotate has very
little idea of whether rotation is profitable. Doing better requires a
more comprehensive solution.

This checkin is a quick fix that lacks generality (PR10293 has
a counter-example). But it trivially fixes the case in PR10293 without
interfering with other cases, and it does satify the criteria that
LoopRotate is a loop canonicalization pass that should avoid
heuristics and special cases.

I can think of two approaches that would probably be better in
the long run. Ultimately they may both make sense.

(1) LoopRotate should check that the current header would make a good
loop guard, and that the loop does not already has a sufficient
guard. The artifical SimplifiedLoopLatch check would be unnecessary,
and the design would be more general and canonical. Two difficulties:

- We need a strong guarantee that we won't endlessly rotate, so the
  analysis would need to be precise in order to avoid the
  SimplifiedLoopLatch precondition.

- Analysis like this are usually based on SCEV, which we don't want to
  rely on.

(2) Rotate on-demand in late loop passes. This could even be done by
shoving the loop back on the queue after the optimization that needs
it. This could work well when we find LICM opportunities in
multi-branch loops. This requires some work, and it doesn't really
solve the problem of SCEV wanting a loop guard before the analysis.

llvm-svn: 181230
2013-05-06 17:58:18 +00:00
Jean-Luc Duprat 3e4fc3ef24 Provide InstCombines for the following 3 cases:
A * (1 - (uitofp i1 C)) -> select C, 0, A
B * (uitofp i1 C) -> select C, B, 0
select C, 0, A + select C, B, 0 -> select C, B, A

These come up in code that has been hand-optimized from a select to a linear blend, 
on platforms where that may have mattered. We want to undo such changes 
with the following transform:
A*(1 - uitofp i1 C) + B*(uitofp i1 C) -> select C, A, B

llvm-svn: 181216
2013-05-06 16:55:50 +00:00
Nadav Rotem 632b25b743 Update the comment to mention that we use TTI.
llvm-svn: 181178
2013-05-06 03:06:36 +00:00
Nadav Rotem c70ef4e93c Revert r164763 because it introduces new shuffles.
Thanks Nick Lewycky for pointing this out.

llvm-svn: 181177
2013-05-06 02:39:09 +00:00
Rafael Espindola c229a4fff4 Fix const merging when an alias of a const is llvm.used.
We used to disable constant merging not only if a constant is llvm.used, but
also if an alias of a constant is llvm.used. This change fixes that.

llvm-svn: 181175
2013-05-06 01:48:55 +00:00
Benjamin Kramer 3e3f2a4b8d LoopVectorize: Print values instead of pointers in debug output.
llvm-svn: 181157
2013-05-05 14:54:52 +00:00
Arnold Schwaighofer d96e427eac LoopVectorize: Add support for floating point min/max reductions
Add support for min/max reductions when "no-nans-float-math" is enabled. This
allows us to assume we have ordered floating point math and treat ordered and
unordered predicates equally.

radar://13723044

llvm-svn: 181144
2013-05-05 01:54:48 +00:00
Arnold Schwaighofer f5183729db LoopVectorizer: Cleanup of miminimum/maximum pattern match code
No need for setting the operands. The pointers are going to be bound by the
matcher.

radar://13723044

llvm-svn: 181142
2013-05-05 01:54:44 +00:00
Arnold Schwaighofer a670a0a3aa LoopVectorize: We don't need an identity element for min/max reductions
We can just use the initial element that feeds the reduction.

  max(max(x, y), z) == max(max(x,y), max(x,z))

radar://13723044

llvm-svn: 181141
2013-05-05 01:54:42 +00:00
Dmitri Gribenko 3238fb7595 Add ArrayRef constructor from None, and do the cleanups that this constructor enables
Patch by Robert Wilhelm.

llvm-svn: 181138
2013-05-05 00:40:33 +00:00
Nick Lewycky 881e9d62e2 Tabs to spaces. No functionality change.
llvm-svn: 181082
2013-05-04 01:08:15 +00:00
Shuxin Yang 637b9bebd4 Decompose GVN::processNonLocalLoad() (about 400 LOC) into smaller helper functions. No function change.
This function consists of following steps:
   1. Collect dependent memory accesses.
   2. Analyze availability.
   3. Perform fully redundancy elimination, or 
   4. Perform PRE, depending on the availability

 Step 2, 3 and 4 are now moved to three helper routines.

llvm-svn: 181047
2013-05-03 19:17:26 +00:00
Nadav Rotem 4ce060b3da LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values.
By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements.

We can now vectorize this loop:

int foo(int *A, int *B, int n) {
  for (int i=0; i < n; i++) {
    int x = 9;
    if (A[i] > B[i]) {
      if (A[i] > 19) {
        x = 3;
      } else if (B[i] < 4 ) {
        x = 4;
      } else {
        x = 5;
      }
    }
    A[i] = x;
  }
}

llvm-svn: 181037
2013-05-03 17:42:55 +00:00
Shuxin Yang af2c3ddf0d [GV] Remove dead code which is really difficult to decipher.
Actually it took me couple of hours trying to make sense of them and
only to find they are dead code.  I guess the original author used
"allSingleSucc" to indicate if there are any critial edge emanating
from some blocks, and tried to perform code motion (actually speculation)
in the presence of these critical edges; but later on he/she changed mind
and decided to perform edge-splitting first.

llvm-svn: 180951
2013-05-02 21:14:31 +00:00
Filip Pizlo dec20e43c0 This patch breaks up Wrap.h so that it does not have to include all of
the things, and renames it to CBindingWrapping.h.  I also moved 
CBindingWrapping.h into Support/.

This new file just contains the macros for defining different wrap/unwrap 
methods.

The calls to those macros, as well as any custom wrap/unwrap definitions 
(like for array of Values for example), are put into corresponding C++ 
headers.

Doing this required some #include surgery, since some .cpp files relied 
on the fact that including Wrap.h implicitly caused the inclusion of a 
bunch of other things.

This also now means that the C++ headers will include their corresponding 
C API headers; for example Value.h must include llvm-c/Core.h.  I think 
this is harmless, since the C API headers contain just external function 
declarations and some C types, so I don't believe there should be any 
nasty dependency issues here.

llvm-svn: 180881
2013-05-01 20:59:00 +00:00
Nadav Rotem 1e211913b5 SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.
Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often.

llvm-svn: 180875
2013-05-01 19:53:30 +00:00
Jim Grosbach d11584a7f7 Revert "InstCombine: Fold more shuffles of shuffles."
This reverts commit r180802

There's ongoing discussion about whether this is the right place to make
this transformation. Reverting for now while we figure it out.

llvm-svn: 180834
2013-05-01 00:25:27 +00:00
Richard Trieu 624c2ebcbb Fix a use after free. RI is freed before the call to getDebugLoc(). To
prevent this, capture the location before RI is freed.

llvm-svn: 180824
2013-04-30 22:45:10 +00:00
Nadav Rotem 9feda6071a Fix a typo
llvm-svn: 180806
2013-04-30 21:04:51 +00:00
Jim Grosbach 0b914fe839 InstCombine: Fold more shuffles of shuffles.
Always fold a shuffle-of-shuffle into a single shuffle when there's only one
input vector in the first place. Continue to be more conservative when there's
multiple inputs.

rdar://13402653
PR15866

llvm-svn: 180802
2013-04-30 20:43:52 +00:00
Adrian Prantl 8beccf9e6d Spelling. Thanks, Eric.
llvm-svn: 180794
2013-04-30 17:33:32 +00:00
Adrian Prantl 0941638a1b Set debug locations for branch instructions created during inlining, even
the inlined function has multiple returns.

rdar://problem/12415623

llvm-svn: 180793
2013-04-30 17:08:16 +00:00
David Majnemer d73f37bb83 Fix a bug in foldSelectICmpAndOr.
Differences in bitwidth between X and Y could exist even if C1 and C2 have
the same Log2 representation.

llvm-svn: 180779
2013-04-30 10:36:33 +00:00
David Majnemer 8d048d0482 Fix "Combine bit test + conditional or into simple math"
This fixes the optimization introduced in r179748 and reverted in r179750.

While the optimization was sound, it did not properly respect differences in
bit-width.

llvm-svn: 180777
2013-04-30 08:57:58 +00:00
Arnold Schwaighofer 474df6d3ed SimplifyCFG: If convert single conditional stores
This resurrects r179957, but adds code that makes sure we don't touch
atomic/volatile stores:

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case where the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

llvm-svn: 180731
2013-04-29 21:28:24 +00:00
Michael Gottesman 03cf3c8966 Add in some conditional compilation in order to silence an unused variable warning.
llvm-svn: 180700
2013-04-29 07:29:08 +00:00
Michael Gottesman 214ca90f8e [objc-arc] Apply the RV optimization to retains next to calls in ObjCARCContract instead of ObjCARCOpts.
Turning retains into retainRV calls disrupts the data flow analysis in
ObjCARCOpts. Thus we move it as late as we can by moving it into
ObjCARCContract.

We leave in the conversion from retainRV -> retain in ObjCARCOpt since
it enables the dataflow analysis.

rdar://10813093

llvm-svn: 180698
2013-04-29 06:53:53 +00:00
Michael Gottesman 9c11815978 Added statistics to count the number of retains/releases before/after optimization.
llvm-svn: 180697
2013-04-29 06:16:57 +00:00
Michael Gottesman 8005ad3f3e Removed trailing whitespace.
llvm-svn: 180696
2013-04-29 06:16:55 +00:00
Michael Gottesman 3e3977c49f Fix for r180693. = /.
llvm-svn: 180694
2013-04-29 05:25:39 +00:00
Michael Gottesman a87bb8f50b [objc-arc-annotations] Moved the disabling of call movement to ConnectTDBUTraversals so that I can prevent Changed = true from being set. This prevents an infinite loop.
llvm-svn: 180693
2013-04-29 05:13:13 +00:00
Shuxin Yang 04a4fd43aa Fix a XOR reassociation bug.
When Reassociator optimize "(x | C1)" ^ "(X & C2)", it may swap the two
subexpressions, however, it forgot to swap cached constants (of C1 and C2)
accordingly.

rdar://13739160

llvm-svn: 180676
2013-04-27 18:02:12 +00:00
Adrian Prantl d00333a4b2 fix a typo that due to cu&paste quadrupled itself
rdar://problem/13056109

llvm-svn: 180618
2013-04-26 18:10:50 +00:00
Adrian Prantl 29b9de7bf1 Bugfix for the debug intrinsic handling in InstCombiner:
Since we can't guarantee that the original dbg.declare instrinsic
is removed by LowerDbgDeclare(), we need to make sure that we are
not inserting the same dbg.value intrinsic over and over.
This removes tons of redundant DIEs when compiling optimized code.

rdar://problem/13056109

llvm-svn: 180615
2013-04-26 17:48:33 +00:00
Nadav Rotem 13306816fc LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes.
llvm-svn: 180593
2013-04-26 05:08:59 +00:00
Michael Gottesman 47cf8a4c12 Revert "[objc-arc] Added ImpreciseAutoreleaseSet to track autorelease calls that were once autoreleaseRV instructions."
This reverts commit r180222.

I think this might tie in with a different problem which will require a
different approach potentially. I am reverting this in the case I need to go
down that second path.

My apologies for the noise. = /.

llvm-svn: 180590
2013-04-26 01:12:18 +00:00
Nadav Rotem f43cbeee15 LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers.
llvm-svn: 180570
2013-04-25 19:55:03 +00:00
Michael Gottesman fdb497a9b2 [objc-arc] Added ImpreciseAutoreleaseSet to track autorelease calls that were once autoreleaseRV instructions.
Due to the semantics of ARC, we must be extremely conservative with autorelease
calls inserted by the frontend since ARC gaurantees that said object will be in
the autorelease pool after that point, an optimization invariant that the
optimizer must respect.

On the other hand, we are allowed significantly more flexibility with
autoreleaseRV instructions.

Often times though this flexibility is disrupted by early transformations which
transform objc_autoreleaseRV => objc_autorelease if said instruction is no
longer being used as part of an RV pair (generally due to inlining). Since we
can not tell the difference in between an autorelease put into place by the
frontend and one created through said ``strength reduction'' we can not perform
these optimizations.

The addition of this set gets around said issues by allowing us to differentiate
in between said two cases.

rdar://problem/13697741.

llvm-svn: 180222
2013-04-24 22:18:18 +00:00
Michael Gottesman cd5b02701c Fixed comment typo.
llvm-svn: 180221
2013-04-24 22:18:15 +00:00