A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.
This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.
llvm-svn: 299247
Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable
Differential Revision: https://reviews.llvm.org/D31488
llvm-svn: 299118
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
llvm-svn: 298731
Summary: Declarations need to be filtered out when counting functions.
Reviewers: eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31336
llvm-svn: 298720
Library functions can have specific semantics that affect the behavior of
certain passes. DSE, for instance, gives special treatment to malloc-ed pointers
but not to pointers returned from an equivalently typed (but differently named)
function.
MetaRenamer ought not to alter program semantics, so library functions must
remain untouched.
Reviewers: mehdi_amini, majnemer, chandlerc, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D31304
llvm-svn: 298659
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary: Inliner should update the branch_weights annotation to scale it to proper value.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D30767
llvm-svn: 298270
NFCI.
Summary:
This is ground work for the changes to enable coercion in NewGVN.
GVN doesn't care if they end up constant because it eliminates as it goes.
NewGVN cares.
IRBuilder and ConstantFolder deliberately present the same interface,
so we use this to our advantage to templatize our functions to make
them either constant only or not.
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30928
llvm-svn: 298262
When InstCombine calls into SimplifyLibCalls and it createa putChar calls, we don't infer the attributes. And since SimplifyLibCalls doesn't use InstCombine's IRBuilder the calls doesn't end up in the worklist on this iteration of InstCombine. So it gets picked up on the next iteration where it causes an IR change. This of course causes InstCombine to run another iteration.
So this patch just gets the attributes right the first time. We already did this for puts and some other libcalls.
Differential Revision: https://reviews.llvm.org/D31094
llvm-svn: 298171
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
Differential Revision: https://reviews.llvm.org/D30121
llvm-svn: 298158
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
[Reapplies r297971 and punting on finding a better API for findDbgValues()]
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297994
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297971
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
Reviewers: davide
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30478
llvm-svn: 297438
This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
llvm-svn: 297163
Summary:
We should check if loop size allows us to peel at least one iteration
before we do so.
Patch by Max Kazantsev!
Reviewers: sanjoy, mkuper, efriedma
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30632
llvm-svn: 297122
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.
Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).
All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.
llvm-svn: 297016
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Patch by Max Kazantsev!
Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames
Reviewed By: mkuper
Subscribers: mkuper, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30161
llvm-svn: 296898
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Patch by Max Kazantsev!
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Reviewed By: mkuper
Subscribers: mkuper, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30243
llvm-svn: 296897
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.
With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:
1) Both operands are known to be short enough. Then, the long division can be
simply replaced with a short one without CFG modification.
2) If a division is unsigned and the dividend is known to be short then the
long division is not needed at all. Because if the divisor is too big for
short division then the quotient is obviously zero (and the remainder is
equal to the dividend). Actually, the division is not needed when
(divisor > dividend).
Differential Revision: https://reviews.llvm.org/D29897
llvm-svn: 296832
The most important goal of the patch is to break large insertFastDiv function
into separate pieces, so that later a different fast insertion logic can be
implemented using some of these pieces.
Differential Revision: https://reviews.llvm.org/D29896
llvm-svn: 296828
The LLVM backend cannot produce any debug info for an llvm::Function
without a DISubprogram attachment. When inlining a debug-info-carrying
function into a nodebug function, there is therefore no reason to keep
any debug info intrinsic calls or debug locations on the instructions.
This fixes a problem discovered in PR32042.
rdar://problem/30679307
llvm-svn: 296488
This was suggested in D27855: have the inliner add assumptions, so we don't
lose nonnull info provided by argument attributes.
This still doesn't solve PR28430 (dyn_cast), but this gets us closer.
https://reviews.llvm.org/D29999
llvm-svn: 296366
Summary:
Depends on D29606 and D29682
Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).
Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29747
llvm-svn: 295889
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.
Summary: Mostly cleanup
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30221
llvm-svn: 295887
Summary:
This lets one add aliasing stores to the updater.
(i'm next going to move the creation/etc functions to the updater)
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30154
llvm-svn: 295677
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:
1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.
The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.
This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).
Patch by Max Kazantsev!
Reviewers: anna, sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30036
llvm-svn: 295410
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
Summary:
This adds support for placing predicateinfo such that it affects critical edges.
This fixes the issues mentioned by Nuno on the mailing list.
Depends on D29519
Reviewers: davide, nlopes
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29606
llvm-svn: 294921
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html
When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.
The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.
Reviewers: probinson, aprantl, davidxl, hfinkel, echristo
Reviewed By: hfinkel
Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26420
llvm-svn: 294782
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:
if (cond) { do something } else {deoptimize}
Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:
if (cond1) {
// code A
} else {
// code B
}
// code C
guard(cond2)
// code D
If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:
if (cond1) {
// code A
// code C
} else {
// code B
// code C
guard(cond2)
}
// code D
Thus, removing the guard from one of execution branches.
Patch by Max Kazantsev!
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29620
llvm-svn: 294617
Summary:
This patch adds a utility to build extended SSA (see "ABCD: eliminating
array bounds checks on demand"), and an intrinsic to support it. This
is then used to get functionality equivalent to propagateEquality in
GVN, in NewGVN (without having to replace instructions as we go). It
would work similarly in SCCP or other passes. This has been talked
about a few times, so i built a real implementation and tried to
productionize it.
Copies are inserted for operands used in assumes and conditional
branches that are based on comparisons (see below for more)
Every use affected by the predicate is renamed to the appropriate
intrinsic result.
E.g.
%cmp = icmp eq i32 %x, 50
br i1 %cmp, label %true, label %false
true:
ret i32 %x
false:
ret i32 1
will become
%cmp = icmp eq i32, %x, 50
br i1 %cmp, label %true, label %false
true:
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 }
%x.0 = call @llvm.ssa_copy.i32(i32 %x)
ret i32 %x.0
false:
ret i23 1
(you can use -print-predicateinfo to get an annotated-with-predicateinfo dump)
This enables us to easily determine what operations are affected by a
given predicate, and how operations affected by a chain of
predicates.
Reviewers: davide, sanjoy
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D29519
Update for review comments
Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong
Update for review comments
llvm-svn: 294351
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.
I will remove all importing support from ModuleLinker in a separate
change.
Differential Revision: https://reviews.llvm.org/D29468
llvm-svn: 294014
Summary:
I have a similar patch up for review already (D29173). If you prefer I
can squash them both together.
Also I think there more potential for code sharing between
LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for
that would be worthwhile?
Reviewers: mkuper, mzolotukhin
Reviewed By: mkuper, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29311
llvm-svn: 293758
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.
@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.
Reviewers: mzolotukhin, chandlerc, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits, mkuper
Differential Revision: https://reviews.llvm.org/D29173
llvm-svn: 293615
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
Reviewers: aprantl, andreadb, danielcdh
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29062
llvm-svn: 293372
Summary: Extend the MemorySSAUpdater API to allow movement to arbitrary places
Reviewers: davide, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29239
llvm-svn: 293363
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
insertUse, moveBefore and moveAfter operations.
Summary:
This creates a basic MemorySSA updater that handles arbitrary
insertion of uses and defs into MemorySSA, as well as arbitrary
movement around the CFG. It replaces the current splice API.
It can be made to handle arbitrary control flow changes.
Currently, it uses the same updater algorithm from D28934.
The main difference is because MemorySSA is single variable, we have
the complete def and use list, and don't need anyone to give it to us
as part of the API. We also have to rename stores below us in some
cases.
If we go that direction in that patch, i will merge all the updater
implementations (using an updater_traits or something to provide the
get* functions we use, called read*/write* in that patch).
Sadly, the current SSAUpdater algorithm is way too slow to use for
what we are doing here.
I have updated the tests we have to basically build memoryssa
incrementally using the updater api, and make sure it still comes out
the same.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29047
llvm-svn: 293356
Summary:
Some frontends emit a speculate-and-select idiom for sqrt, wherein they compute
sqrt(x), check if x is negative, and select NaN if it is:
%cmp = fcmp olt double %a, -0.000000e+00
%sqrt = call double @llvm.sqrt.f64(double %a)
%ret = select i1 %cmp, double 0x7FF8000000000000, double %sqrt
This is technically UB as the LangRef is written today if %a is ever less than
-0. But emitting code that's compliant with the current definition of sqrt
would require a branch, which would then prevent us from matching this idiom in
SelectionDAG (which we do today -- ISD::FSQRT has defined behavior on negative
inputs), because SelectionDAG looks at one BB at a time.
Nothing in LLVM takes advantage of this undefined behavior, as far as we can
tell, and the fact that llvm.sqrt has UB dates from its initial addition to the
LangRef.
Reviewers: arsenm, mehdi_amini, hfinkel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D28797
llvm-svn: 293242
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
Summary:
This is the first in a series of patches to add a simple, generalized updater to MemorySSA.
For MemorySSA, every def is may-def, instead of the normal must-def.
(the best way to think of memoryssa is "everything is really one variable, with different versions of that variable at different points in the program).
This means when updating, we end up having to do a bunch of work to touch defs below and above us.
In order to support this quickly, i have ilist'd all the defs for each block. ilist supports tags, so this is quite easy. the only slightly messy part is that you can't have two iplists for the same type that differ only whether they have the ownership part enabled or not, because the traits are for the value type.
The verifiers have been updated to test that the def order is correct.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29046
llvm-svn: 293085
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.
For example, SimplifyCFG used to do the following transformation:
(before)
if.then:
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
br label %if.end
if.else:
%1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
br label %if.end
(after)
%.sink = select i1 %tobool, i32 6, i32 8
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)
This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".
rdar://problem/30110806
Differential Revision: https://reviews.llvm.org/D29111
llvm-svn: 293025
a lazy-asserting PoisoningVH.
AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.
This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.
The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.
The rest is straight cleanup.
I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.
Differential Revision: https://reviews.llvm.org/D29006
llvm-svn: 292928
With this change dominator tree remains in sync after each step of loop
peeling.
Differential Revision: https://reviews.llvm.org/D29029
llvm-svn: 292895
Running non-LCSSA-preserving LoopSimplify followed by LCSSA on (roughly) the
same loop is incorrect, since LoopSimplify may break LCSSA arbitrarily higher
in the loop nest. Instead, run LCSSA first, and then run LCSSA-preserving
LoopSimplify on the result.
This fixes PR31718.
Differential Revision: https://reviews.llvm.org/D29055
llvm-svn: 292854
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
the library routine shared with the new PM and other code.
This assert checks that when LCSSA preservation is requested we start in
LCSSA form. Without this early assert, given *very* complex test cases
we can hit an assert or crash much later on when trying to preserve
LCSSA.
The new PM's loop simplify doesn't need to (and indeed can't) preserve
LCSSA as the new PM doesn't deal in transforms in the dependency graph.
But we asked the library to and shockingly, this didn't work very well!
Stop doing that. Now the assert will tell us immediately with existing
test cases. Before this, it took a pretty convoluted input to trigger
this.
However, sinking the assert also found a bug in LoopUnroll where we
asked simplifyLoop to preserve LCSSA *right before we reform it*. That's
kinda silly and unsurprising that it wasn't available. =D Stop doing
that too.
We also would assert that the unrolled loop was in LCSSA even if
preserving LCSSA was never requested! I don't have a test case or
anything here. I spotted it by inspection and it seems quite obvious. No
logic change anyways, that's just avoiding a spurrious assert.
llvm-svn: 292710
This adds the following to the new PM based inliner in PGO mode:
* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.
* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.
* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.
Differential revision: https://reviews.llvm.org/D28331
llvm-svn: 292666