Setting the "Debug Info Version" module flag makes it possible to pipe
synthetic debug info into llc, which is useful for testing backends.
llvm-svn: 333237
If the nsw flag is used in the absolute value then it is undefined for INT_MIN. For all other value it will produce a positive number. So we can assume the result is positive.
This breaks some InstCombine abs/nabs combining tests because we simplify the second compare from known bits rather than as the whole pattern. Looks like we can probably fix it by adding a neg+abs/nabs combine to just swap the select operands. Need to check alive to make sure there are no corner cases.
Differential Revision: https://reviews.llvm.org/D47041
llvm-svn: 333226
Reassociation of math ops in some contexts (especially vector contexts)
has generally only been happening when the 'fast' FMF was set. This
enables reassoication when only the finer grained controls 'reassoc' and
'nsz' are set.
Differential Revision: https://reviews.llvm.org/D47335
llvm-svn: 333221
Summary:
In LICM, CFG could be changed in splitPredecessorsOfLoopExit(), which update
only DT and LoopInfo. Therefore, we should preserve only DT and LoopInfo specifically,
instead of all analyses that depend on the CFG (setPreservesCFG()).
This change should fix PR37323.
Reviewers: uabelho, davide, dberlin, Ka-Ka
Reviewed By: dberlin
Subscribers: mzolotukhin, bjope, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46775
llvm-svn: 333198
The ARM/ARM64 AESE and AESD instructions have a builtin XOR as the first step in
the instruction. Therefore, if the AES key is zero and the AES data was
previously XORed, it can be combined into a single instruction.
Differential Revision: https://reviews.llvm.org/D47239
Patch by Michael Brase!
llvm-svn: 333193
Summary:
If NaryReassociate succeed it will, when replacing the old instruction
with the new instruction, also recursively delete trivially
dead instructions from the old instruction. However, if the input to the
NaryReassociate pass contain dead code it is not save to recursively
delete trivially deadinstructions as it might lead to deleting the newly
created instruction.
This patch will fix the problem by using WeakVH to detect this
rare case, when the newly created instruction is dead, and it will then
restart the basic block iteration from the beginning.
This fixes pr37539
Reviewers: tra, meheff, grosser, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47139
llvm-svn: 333155
Summary:
StructurizeCFG::orderNodes basically uses a reverse post-order (RPO) traversal of the region list to get the order.
The only problem with it is that sometimes backedges for outer loops will be visited before backedges for inner loops.
To solve this problem, a loop depth based approach has been used to make sure all blocks in this loop has been visited
before moving on to outer loop.
However, we found a problem for a SubRegion which is a loop itself:
--> BB1 --> BB2 --> BB3 -->
In this case, BB2 is a SubRegion (loop), and thus its loopdepth is different than that of BB1 and BB3. This fact will lead
BB2 to be placed in the wrong order.
In this work, we treat the SubRegion as a special case and use its exit block to determine the loop and its depth
to guard the sorting.
Reviewers:
arsenm, jlebar
Differential Revision:
https://reviews.llvm.org/D46912
llvm-svn: 333111
Summary:
Finally fixes [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]].
Now that the backend is all done, we can finally fold it!
The canonical unfolded masked merge pattern is
```(x & m) | (y & ~m)```
There is a second, equivalent variant:
```(x | ~m) & (y | m)```
Only one of them (the or-of-and's i think) is canonical.
And if the mask is not a constant, we should fold it to:
```((x ^ y) & M) ^ y```
https://rise4fun.com/Alive/ndQw
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: nicholas, RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D46814
llvm-svn: 333106
Loop unswitching makes substantial changes to a loop that can also affect cached
SCEV info in its outer loops as well, but it only cares to invalidate SCEV cache for the
innermost loop in case of full unswitching and does not invalidate anything at all in
case of trivial unswitching. As result, we may end up with incorrect data in cache.
Differential Revision: https://reviews.llvm.org/D46045
Reviewed By: mzolotukhin
llvm-svn: 333072
Summary:
Patch for capture tracking broke
bootstrap of clang with -fstict-vtable-pointers
which resulted in debbugging nightmare. It was fixed
https://reviews.llvm.org/D46900 but as it turned
out, there were other parts like inliner (computing of
noalias metadata) that I found after bootstraping with enabled
assertions.
Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar
Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D47088
llvm-svn: 333070
Also, produce the canonical IR abs (s<0) to be more efficient.
This is the libcall equivalent of the clang builtin change from:
rL333038
Pasting from that commit message:
The stdlib functions are defined in section 7.20.6.1 of the C standard with:
"If the result cannot be represented, the behavior is undefined."
That lets us mark the negation with 'nsw' because "sub i32 0, INT_MIN" would
be UB/poison.
llvm-svn: 333042
Summary: Previous patch does not care if a value is changed between calloc and strlen. This needs to be removed from InstCombine and maybe moved to DSE later after some rework.
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47218
llvm-svn: 333022
This patch fixes two bugs:
* test1: Previously assume(a >= 5) concluded that a == 5. That's only
valid for assume(a == 5)...
* test2: If operands were swapped, additional users were added to the
wrong cmp operand. This resulted in an "unsettled iteration"
assertion failure.
Patch by Nikita Popov
Differential Revision: https://reviews.llvm.org/D46974
llvm-svn: 333007
Summary:
When lowerswitch merge several cases into a new default block it's not
updating the PHI nodes accordingly. The code that update the PHI nodes
for the default edge only update the first entry and do not remove the
remaining ones, to make sure the number of entries match the number of
predecessors.
This is easily fixed by replacing the code that update the PHI node with
the already existing utility function for updating PHI nodes.
Reviewers: hans, reames, arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47055
llvm-svn: 332960
Summary:
In LoopVersioning::addPHINodes we need to iterate over all
users for a value "Inst", and if the user is outside of the
VersionedLoop we should replace the use of "Inst" by using
the value "PN" instead.
Replacing the use of "Inst" for a user of "Inst" also means
that Inst->users() is modified. So it is not safe to do the
replace while iterating over Inst->users() as we used to do.
This patch splits the task into two steps. First we iterate
over Inst->users() to find all users that should be updated.
Those users are saved into a local data structure on the stack.
And then, in the second step, we do the actual updates. This
time iterating over the local data structure.
Reviewers: mzolotukhin, anemet
Reviewed By: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47134
llvm-svn: 332958
We can eliminate old value if bound_ctrl = 1 and row_mask = bank_mask = 0xf.
This is alternative implementation working with the intrinsic in InstCombine.
Original review for past-ISel optimization: D46570.
Differential Revision: https://reviews.llvm.org/D46596
llvm-svn: 332956
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.
Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.
llvm-svn: 332953
In all cases, we're pulling the cast above the select.
That's not a good canonicalization if we're creating
a select that then mismatches the operand size of its
condition.
llvm-svn: 332883
Change matchSelectPattern to return X and -X for ABS/NABS in a well defined order. Adjust EarlyCSE to account for this. Ensure the SPF result is some kind of min/max and not abs/nabs in one place in InstCombine that made me nervous.
Prevously we returned the two operands of the compare part of the abs pattern. The RHS is always going to be a 0i, 1 or -1 constant. This isn't a very meaningful thing to return for any one. There's also some freedom in the abs pattern as to what happens when the value is equal to 0. This freedom led to early cse failing to match when different constants were used in otherwise equivalent operations. By returning the input and its negation in a defined order we can ensure an exact match. This also makes sure both patterns use the exact same subtract instruction for the negation. I believe CSE should evebntually make this happen and properly merge the nsw/nuw flags. But I'm not familiar with CSE and what order it does things in so it seemed like it might be good to really enforce that they were the same.
Differential Revision: https://reviews.llvm.org/D47037
llvm-svn: 332865
r332654 was reverted due to an unused function warning in
release build. This commit includes the same code with the
warning silenced.
Differential Revision: https://reviews.llvm.org/D44338
llvm-svn: 332860
Summary:
This patch fixes PR37526 by simplifying the newly generated LoadInst
instructions. If the pointer address is a bitcast from the pointer to
the NewType, we can just remove this extra bitcast instead of creating
the new one. This fixes the PR37526 + may speed up the whole compilation
process.
Reviewers: spatel, RKSimon, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47144
llvm-svn: 332855
We were previously using a DT in CVP through SimplifyQuery, but not requiring it in
the new pass manager. Hence it would crash if DT was not already available. This now
gets DT directly and plumbs it through to where it is used (instead of using it
through SQ).
llvm-svn: 332836
We already do this for min/max (see the blob above the diff),
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR
transforms and it's likely the best for codegen.
This might solve the motivating cases for D47037 and D47041,
but I think those patches still make sense. We can't guarantee
this canonicalization if the icmp has more than one use.
Differential Revision: https://reviews.llvm.org/D47076
llvm-svn: 332819
In the patch rL329547, we have lifted the over-restrictive limitation on collected range
checks, allowing to work with range checks with the end of their range not being
provably non-negative. However it appeared that the non-negativity of this value was
assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based
on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative.
The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK)
and the second time with `X = IRC.getEnd()`, where we may now see the problem if
the end is actually a negative value. In this case, we may sometimes miscompile.
This patch is the conservative fix of the miscompile problem. Rather than rejecting
non-provably non-negative `getEnd()` values, we will check it for non-negativity in
runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X`
is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe
iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original
`[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked
correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative.
So we in fact prohibit execution of the main loop if at least one of range checks was
made against a negative value (and we figured it out in runtime). It is still better than
what we have before (non-negativity had to be proved in compile time) and prevents
us from miscompile, however it is sometiles too restrictive for unsigned range checks
against a negative value (which in fact can be eliminated).
Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly,
this limitation can be lifted, too.
Differential Revision: https://reviews.llvm.org/D46860
Reviewed By: samparker
llvm-svn: 332809
The evaluator goes through BB and creates global vars as temporary values to evaluate
results of LLVM instructions. It creates undef for alloca, however it assumes alloca
in addr space 0. If the next instruction is addrspace cast to 0, then we get an invalid
cast instruction.
This patch let the temp global var have an address space matching alloca addr space,
so that the valuation can be done.
Differential Revision: https://reviews.llvm.org/D47081
llvm-svn: 332794
Summary:
This feature is not needed, but it might be usefull in the future
to use metadata to mark what which function should support it
(and strip it when not).
Reviewers: rsmith, sanjoy, amharc, kuhar
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45419
llvm-svn: 332787
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.
Comes with a clang patch (D46881).
Patch by mike.dvoretsky
Differential Revision: https://reviews.llvm.org/D46882
llvm-svn: 332705
Summary:
Fix a case where FoldBranchToCommonDest() would bail out from doing CSE
when encountering a debug intrinsic. Handle that by skipping past the
debug intrinsics.
Also, as a minor refactoring, rename checkCSEInPredecessor() to
tryCSEWithPredecessor() to make it a bit more clear that the function
may remove instructions.
Reviewers: fhahn, craig.topper, dblaikie, xbolva00
Reviewed By: fhahn, xbolva00
Subscribers: vsk, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D46635
llvm-svn: 332698
CanProveNotTakenFirstIteration utility does not handle the case when
condition of the branch is a constant. Add its handling.
Reviewers: reames, anna, mkazantsev
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46996
llvm-svn: 332695
Patch #3 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
Expected to be NFC for the current inner loop vectorization path. It
introduces the basic algorithm to build the VPlan plain CFG (single-level
CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization
path using VPInstructions. It includes:
- VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
- VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
- VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.
Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl.
Differential Revision: https://reviews.llvm.org/D44338
llvm-svn: 332654
Currently debugify prints it's output to stdout,
with this patch all the output generated goes to stderr.
This change lets us use debugify without taking away
the ability to pipe the output to other llvm tools.
llvm-svn: 332642
According to alive this is valid. I'm hoping to use this to make an assumption that the sign bit is zero after this sequence. The only way it wouldn't be is if the input was INT__MIN, but by preserving the flags we can make doing this to INT_MIN UB.
The nuw flags is weird because it creates such a contradiction that the original number would have to be positive meaning we could remove the select entirely, but we don't get that far.
Differential Revision: https://reviews.llvm.org/D46988
llvm-svn: 332623
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets.
Differential Revision: https://reviews.llvm.org/D46326
llvm-svn: 332610
Summary:
Require DominatorTree when requiring/preserving LoopInfo in the old pass manager
BreakCriticalEdges tries to keep LoopInfo and DominatorTree updated if they
exist. However, since commit r321653 and r321805, to update LoopInfo we
must have a DominatorTree, or we will hit an assert.
To fix this we now make a couple of passes that only required/preserved
LoopInfo also require DominatorTree.
This solves PR37334.
Reviewers: eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D46829
llvm-svn: 332583
The existing comment said that the functions were available only
on GNU/Linux (and on certain Android versions), but only checked
T.isGNUEnvironment() which also is true on MinGW (for arch-windows-gnu
triplets), which doesn't have such functions.
Existing checks in the initialize function in TargetLibraryInfo.cpp
also use only T.isOSLinux() to check for glibc features.
This fixes use of stdio on MinGW.
Differential Revision: https://reviews.llvm.org/D47002
llvm-svn: 332581
Summary:
The verifier accepts PHI nodes with multiple entries for the
same basic block, as long as the value is the same.
As seen in PR37203, SROA did not handle such PHI nodes properly
when speculating loads over the PHI, since it inserted multiple
loads in the predecessor block and changed the PHI into having
multiple entries for the same basic block, but with different
values.
This patch teaches SROA to reuse the same speculated load for
each PHI duplicate entry in such situations.
Resolves: https://bugs.llvm.org/show_bug.cgi?id=37203
Reviewers: uabelho, chandlerc, hfinkel, bkramer, efriedma
Reviewed By: efriedma
Subscribers: dberlin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D46426
llvm-svn: 332577
The current integer widening does not support rewriting partial split slices in rewriteIntegerStore (and rewriteIntegerLoad).
This patch adds explicit checks for this case in isIntegerWideningViableForSlice.
Before r322533, splitting is allowed only for the whole-alloca slice and hence the above case is implicitly rejected by another check `if (DL.getTypeStoreSize(ValueTy) > Size)` because whole-alloca slice is larger than the partition.
Differential Revision: https://reviews.llvm.org/D46750
llvm-svn: 332575
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
llvm-svn: 332479
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja
Reviewed By: rja
Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 332452
A catchswitch must be the only non-phi instruction in its basic block;
attempting to move a retain or release into a catchswitch basic block
will result in invalid IR. Explicitly mark a CFG hazard in this case to
prevent the code motion.
Differential Revision: https://reviews.llvm.org/D46482
llvm-svn: 332430
Author: Samuel Pitoiset
Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.
This fixes a regression introduced by
"[PatternMatch] define m_Not using m_Xor and cst_pred_ty"
https://reviews.llvm.org/D46351
llvm-svn: 332403
When two interposable functions are merged, we cannot replace
uses and have to emit calls to a common internal function. However,
writeThunk() will not actually emit a thunk if the function is too
small. This leaves us in a broken state where mergeTwoFunctions
already rewired the functions, but writeThunk doesn't do anything.
This patch changes the implementation so that:
* writeThunk() does just that.
* The direct replacement of calls is moved into mergeTwoFunctions()
into the non-interposable case only.
* isThunkProfitable() is extracted and will be called for
the non-iterposable case always, and in the interposable case
only if uses are still left after replacement.
This issue has been introduced in https://reviews.llvm.org/D34806,
where the code for checking thunk profitability has been moved.
Differential Revision: https://reviews.llvm.org/D46804
Reviewed By: whitequark
llvm-svn: 332342
This adds a -debugify-each mode to opt which, when enabled, wraps each
{Module,Function}Pass in a pipeline with logic to add, check, and strip
synthetic debug info for testing purposes.
This mode can be used to test complex pipelines for debug info bugs, or
to collect statistics about the number of debug values & locations lost
throughout various stages of a pipeline.
Patch by Son Tuan Vu!
Differential Revision: https://reviews.llvm.org/D46525
llvm-svn: 332312
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.
This caused julia issue https://github.com/JuliaLang/julia/issues/27055
Patch by Jeff Bezanson <jeff@juliacomputing.com>
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D46722
llvm-svn: 332302
This is a CodeExtractor improvement which adds support for extracting blocks
which have exception handling constructs if that is legal to do. CodeExtractor
performs validation checks to ensure that extraction is legal when it finds
invoke instructions or EH pads (landingpad, catchswitch, or cleanuppad) in
blocks to be extracted.
I have also added an option to allow extraction of blocks with alloca
instructions, but no validation is done for allocas. CodeExtractor caller has
to validate it himself before allowing alloca instructions to be extracted.
By default allocas are still not allowed in extraction blocks.
Differential Revision: https://reviews.llvm.org/D45904
llvm-svn: 332151
Let separate-const-offset-from-gep pass handle trunc() when it calculates
constant offset relative to base. The pass itself may insert trunc()
instructions when it canonicalises array indices to pointer-size integers
and needs to handle trunc() in order to evaluate the offset.
Differential Revision: https://reviews.llvm.org/D46732
llvm-svn: 332142
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46660
llvm-svn: 332132
If the sprintf function is static (as on mingw-w64, where many stdio
functions are static inline wrappers), earlier optimization passes
could optimize out the return value altogether, and make it void,
which could break optimizations of this libcall that touch the
return value.
This fixes the issue discussed in PR37408 for the sprintf function.
Differential Revision: https://reviews.llvm.org/D46752
llvm-svn: 332106
Phi nodes can reside in live blocks but one of their incoming
arguments can come from a dead block. Dead blocks and reassociate
don't play nice together. In fact, reassociate performs an RPO
as a first step to avoid processing dead blocks.
The reason why Reassociate might not fixpoint when examining
dead blocks is that the following:
%xor0 = xor i16 %xor1, undef
%xor1 = xor i16 %xor0, undef
is perfectly valid LLVM IR (if it appears in a dead block),
so the worklist algorithm keeps pushing the two instructions for
reexamination. Note that this is not Reassociate fault, at least
not entirely. It's llvm that has a weird definition of dominance.
Fixes PR37390.
llvm-svn: 332100
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.
In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.
Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46658
llvm-svn: 332093
Summary:
https://bugs.llvm.org/show_bug.cgi?id=34897 demonstrates an incorrect
coroutine frame allocation elision in the coro-elide pass. The elision
is performed on the basis that the SSA variables from all llvm.coro.begin
are directly referenced in subsequent llvm.coro.destroy instructions.
However, this ignores the fact that the function may exit through paths
that do not run these destroy instructions. In the sample program from
PR34897, for example, the llvm.coro.destroy instruction is only
executed in exception handling code. When the coroutine function exits
normally, llvm.coro.destroy is not called. Eliding the allocation in
this case causes a subsequent reference to the coroutine handle from
outside of the function to access freed memory.
To fix the issue, when finding an llvm.coro.destroy for each llvm.coro.begin,
only consider llvm.coro.destroy that are executed along non-exceptional paths.
Test Plan:
1. Download the sample program from
https://bugs.llvm.org/show_bug.cgi?id=34897, compile it with
`clang++ -fcoroutines-ts -stdlib=libc++ -std=c++1z -O2`, and run it.
It should print `"run1\ncheck1\nrun2\ncheck2"` and then exit
successfully.
2. Compile https://godbolt.org/g/mCKfnr and confirm it is still
optimized to a single instruction, 'return 1190'.
3. `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: andrewrk, lewissbaker, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43242
llvm-svn: 332077
These rotates take the form
(x << (n & mask)) | (x >> (-n & mask)) where mask is bitwidth - 1.
If x has been promoted to a wider type than its original bit width due to type promotion we fail to narrower it and therefore don't recognize it as a rotate.
llvm-svn: 332068
We found current sampleFDO had a performance issue when triaging a regression.
For a callsite with inline instance in the profile, even if hot callsite inliner
cannot inline it, it may still execute enough times and should not be treated as
cold in regular inliner later. However, currently if such callsite is not inlined
by hot callsite inliner, and the BB where the callsite locates doesn't get
samples from other instructions inside of it, the callsite will have no profile
metadata annotated. In regular inliner cost analysis, if the callsite has no
profile annotated and its caller has profile information, it will be treated as
cold.
The fix changes the isCallsiteHot check and chooses to compare
CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo.
Differential Revision: https://reviews.llvm.org/D45377
llvm-svn: 332058
This reverts commit SVN r331889, which could trigger failed
assertions for cases where the snprintf function is declared
with a vaguely differing signature (e.g. being defined as
static inline), see PR37408.
llvm-svn: 332043
Accessing the members of a large data structures needs a lot of GEPs which
usually have large offsets due to the size of the underlying data structure. If
the offsets are too large to fit into the r+i addressing mode, these GEPs cannot
be sunk to their users' blocks and many extra registers are needed then to carry
the values of these GEPs.
This patch tries to split a large data struct starting from %base like the
following.
Before:
BB0:
%base =
BB1:
%gep0 = gep %base, off0
%gep1 = gep %base, off1
%gep2 = gep %base, off2
BB2:
%load1 = load %gep0
%load2 = load %gep1
%load3 = load %gep2
After:
BB0:
%base =
%new_base = gep %base, off0
BB1:
%new_gep0 = %new_base
%new_gep1 = gep %new_base, off1 - off0
%new_gep2 = gep %new_base, off2 - off0
BB2:
%load1 = load i32, i32* %new_gep0
%load2 = load i32, i32* %new_gep1
%load3 = load i32, i32* %new_gep2
In the above example, the struct is split into two parts. The first part still
starts from %base and the second part starts from %new_base. After the
splitting, %new_gep1 and %new_gep2 have smaller offsets and then can be sunk to
BB2 and folded into their users.
The algorithm to split data structure is simple and very similar to the work of
merging SExts. First, it collects GEPs that have large offsets when iterating
the blocks. Second, it splits the underlying data structures and updates the
collected GEPs to use smaller offsets.
Differential Revision: https://reviews.llvm.org/D42759
llvm-svn: 332015
Summary:
This change teaches DSE that the atomic memory intrinsics can be overwriten
partially in the same way as the non-atomic forms. Specifically, that the
atomic memcpy & memset can be shortened at the end and that the atomic memset
can be shortened at the beginning, if they partially overwritten
by later stores.
Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith, spatel, filcab, sanjoy
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45584
llvm-svn: 331991
Fixes bug https://bugs.llvm.org/show_bug.cgi?id=37339.
InlineAsm is only uniqued if the FunctionTypes are exactly the
same, while cmpTypes() for example considers all pointer types
in the default address space to be the same. For this reason
the end of cmpInlineAsm() can be reached.
This patch replaces the unreachable assertion with a check that
the function types are not identical.
Differential Revision: https://reviews.llvm.org/D46495
Reviewers: jfb
llvm-svn: 331990
This is a follow-up to D45986. As suggested there, we should match the "all-bits-set"
pattern in addition to "any-bits-set".
This was a little more complicated than I thought it would be initially because the
"and 1" instruction can be anywhere in the chain. Hopefully, the code comments make
that logic understandable, but if you see a way to simplify or improve that, it's
most appreciated.
This transforms patterns that emerge from bitfield tests as seen in PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
I think it would also help reduce the large test from:
D46336
D46595
but we need something to reassociate that case to the forms we're expecting here first.
Differential Revision: https://reviews.llvm.org/D46649
llvm-svn: 331937
The previous handling for guard widening in InstCombine was extremely restrictive. In particular, it didn't handle the common case where we had two guards separated by a single icmp. Handle this by scanning through a small fixed window of instructions to find the next guard if needed.
Differential Revision: https://reviews.llvm.org/D46203
llvm-svn: 331935
This is safe as long as the udiv is not exact. The pattern is not common in
C++ code, but comes up all the time in code generated by XLA's GPU backend.
Differential Revision: https://reviews.llvm.org/D46647
llvm-svn: 331933
Summary:
MergedLoadStoreMotion::mergeStores is using some heuristics
to limit the amount of stores that it tries to sink (see
MagicCompileTimeControl in MergedLoadStoreMotion.cpp). The
heuristic involves counting the number of instructions in
one of the basic blocks that is part of the transformation.
We now ignore dbg intrinsics when counting instruction for
the MagicCompileTimeControl heuristic. This to make sure that
the amount of stores that are sunk doesn't depend on the amount
of debug information (if -g is used or not).
Reviewers: Gerolf, davide, majnemer
Reviewed By: davide
Subscribers: dberlin, bjope, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46600
llvm-svn: 331852
In order to set breakpoints on labels and list source code around
labels, we need collect debug information for labels, i.e., label
name, the function label belong, line number in the file, and the
address label located. In order to keep these information in LLVM
IR and to allow backend to generate debug information correctly.
We create a new kind of metadata for labels, DILabel. The format
of DILabel is
!DILabel(scope: !1, name: "foo", file: !2, line: 3)
We hope to keep debug information as much as possible even the
code is optimized. So, we create a new kind of intrinsic for label
metadata to avoid the metadata is eliminated with basic block.
The intrinsic will keep existing if we keep it from optimized out.
The format of the intrinsic is
llvm.dbg.label(metadata !1)
It has only one argument, that is the DILabel metadata. The
intrinsic will follow the label immediately. Backend could get the
label metadata through the intrinsic's parameter.
We also create DIBuilder API for labels to be used by Frontend.
Frontend could use createLabel() to allocate DILabel objects, and use
insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.
Differential Revision: https://reviews.llvm.org/D45024
Patch by Hsiangkai Wang.
llvm-svn: 331841
Summary:
The current LowerInvoke pass cannot handle invoke instructions with a
funclet bundle operand. The order of operands for an invoke instruction
is {call arguments, callee, funclet operand (if any), normal dest,
unwind dest}. The current code assumes there is no funclet operand and
incorrectly includes a funclet operand into call arguments.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46242
llvm-svn: 331832
We enter MergeBlockIntoPredecessor with a block looking like this:
for.inc.us-lcssa: ; preds = %cond.end
%k.1.lcssa.ph = phi i32 [ %conv15, %cond.end ]
%t.3.lcssa.ph = phi i32 [ %k.1.lcssa.ph, %cond.end ]
br label %for.inc, !dbg !66
[note the first arg of the PHI being a PHI].
FoldSingleEntryPHINodes gets rid of both PHIs (calling, eraseFromParent).
But right before we call the function, we push into IncomingValues the
only argument of the PHIs, and shortly after we try to iterate over
something which has been invalidated before :(
The fix its not trying to remove PHIs which have an incoming value
coming from the same BB we're looking at.
Fixes PR37300 and rdar://problem/39910460
Differential Revision: https://reviews.llvm.org/D46568
llvm-svn: 331824
Summary:
Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body.
This resulted in an IL verification error ---- value used before defined.
Reviewers: rengolin, fhahn, hfinkel
Reviewed By: rengolin, fhahn
Subscribers: dcaballe, Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D46302
llvm-svn: 331799
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.
Differential Revision: https://reviews.llvm.org/D45537
llvm-svn: 331783
Summary:
In formLCSSAForInstructions we speculatively add new PHI
nodes, that sometimes ends up without having any uses. It
has been discovered that sometimes an added PHI node can
appear as being unused in one iteration of the Worklist,
although it can end up being used by a PHI node added in
a later iteration. We now check, a second time, that the
PHI node still is unused before we remove it. This avoids
an assert about "Trying to remove a phi with uses." for the
added test case.
Reviewers: davide, mzolotukhin, mattd, dberlin
Reviewed By: mzolotukhin, dberlin
Subscribers: dberlin, mzolotukhin, davide, bjope, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D46422
llvm-svn: 331741
Summary: Makes this consistent with the old PM.
Reviewers: eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D46526
llvm-svn: 331709
to be needed: jump table sections are created with .cfi.jumptable suffix. With
this change each jump table is placed in a separate section, which allows the
linker to re-order them.
Differential Revision: https://reviews.llvm.org/D46537
llvm-svn: 331680
This pattern came up in D46494.
I'm pretty sure we want to canonicalize it from
(x | ~m) & (y & m)
to
(x & m) | (y & ~m)
https://rise4fun.com/Alive/TEM
llvm-svn: 331625
Summary:
launder.invariant.group has the same rules of capturing as
bitcast, gep, etc - the original value is not captured
if the returned pointer is not captured.
With this patch, we mark 40% more functions as noalias when compiling with -fstrict-vtable-pointers;
1078 vs 1778 (39.37%)
Reviewers: sanjoy, davide, nlewycky, majnemer, mehdi_amini
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D32673
llvm-svn: 331587
Computing this property within the existing walk ensures that the cost is linear with the size of the block. If we did this from within isGuaranteedToExecute, it would be quadratic without some very fancy caching.
This allows us to reliably catch a hoistable instruction within a header which may throw at some point *after* our hoistable instruction. It doesn't do anything for non-header cases, but given how common single block loops are, this seems very worthwhile.
llvm-svn: 331557
A catchswitch is both a pad and a terminator, meaning it must be the
only non-phi instruction in its basic block. When we're inserting a
bitcast in the incoming basic block for a phi, if that incoming block is
a catchswitch, we should go up the dominator tree to find a valid
insertion point rather than attempting to insert before the catchswitch
(which would result in invalid IR).
Differential Revision: https://reviews.llvm.org/D46412
llvm-svn: 331548
We currently recognize this idiom where x is signed and thus the shift in an ashr.
int cnt = 0;
while (x) {
x >>= 1; // arithmetic shift right
++cnt;
}
and turn it into (bitwidth - ctlz(x)). And if there is anything else in the loop we will create a new loop that runs that many times.
If x is initially negative, the shift result will never be 0 and thus the loop is infinite. If you put something with side effects in the loop, that side effect will now only happen bitwidth times instead of an infinite number of times.
So this transform is only safe for logical shift right (which we don't currently recognize) or if we can prove that x cannot be negative before the loop.
llvm-svn: 331493
Add logic for the special case when a cmp+select can clearly be
reduced to just a bitwise logic instruction, and remove an
over-reaching chunk of general purpose bit magic. The primary goal
is to remove cases where we are not improving the IR instruction
count when doing these select transforms, and in all cases here that
is true.
In the motivating 3-way compare tests, there are further improvements
because we can combine/propagate select values (not sure if that
belongs in instcombine, but it's there for now).
DAGCombiner has folds to turn some of these selects into bit magic,
so there should be no difference in the end result in those cases.
Not all constant combinations are handled there yet, however, so it
is possible that some targets will see more cmov/csel codegen with
this change in IR canonicalization.
Ideally, we'll go further to *not* turn selects into multiple
logic/math ops in instcombine, and we'll canonicalize to selects.
But we should make sure that this step does not result in regressions
first (and if it does, we should fix those in the backend).
The general direction for this change was discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105373.htmlhttp://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html
Alive proofs for the new bit magic:
https://rise4fun.com/Alive/XG7
Differential Revision: https://reviews.llvm.org/D46086
llvm-svn: 331486
The code fails to check that the same value is used twice. We only make sure the left hand side of the and is part of the loop recurrence. The 'x' in the subtract can be any value.
llvm-svn: 331436
This patch was temporarily reverted because it has exposed bug 37229 on
PowerPC platform. The bug is unrelated to the patch and was just a general
bug in the optimization done for PowerPC platform only. The bug was fixed
by the patch rL331410.
This patch returns the disabled commit since the bug was fixed.
llvm-svn: 331427
Summary:
Prior to this change, LLVM would in some cases emit *massive* writeout
functions with many 10s of 1000s of function calls in straight-line
code. This is a very wasteful way to represent what are fundamentally
loops and creates a number of scalability issues. Among other things,
register allocating these calls is extremely expensive. While D46127 makes this
less severe, we'll still run into scaling issues with this eventually. If not
in the compile time, just from the code size.
Now the pass builds up global data structures modeling the inputs to
these functions, and simply loops over the data structures calling the
relevant functions with those values. This ensures that the code size is
a fixed and only data size grows with larger amounts of coverage data.
A trivial change to IRBuilder is included to make it easier to build
the constants that make up the global data.
Reviewers: wmi, echristo
Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46357
llvm-svn: 331407
Summary:
Some of our internal testing detected a major compile time regression which I've
tracked down to:
r278938 - Revert "Reassociate: Reprocess RedoInsts after each inst".
It appears that processing long chains of reassociatable instructions causes
non-linear (potentially exponential) growth in the number of times an
instruction is revisited. For example, the included test revisits instructions
220 times in a 20-instruction test.
It appears that r278938 reversed the order instructions were visited and that
this is preventing scheduled revisits from being cancelled as a result of
visiting the instructions naturally during normal processing. However, simply
reversing the order also harmed the generated code. Upon closer inspection, it
was discovered that revisits occurred in the opposite order to the first pass
(Thanks to escha for spotting that).
This patch makes the revisit order consistent with the first pass which allows
more revisits to be cancelled. This does appear to have a small impact on the
generated code in few cases but it significantly reduces compile-time.
After this patch, our internal test that was most affected by the regression
dropped from ~2 million revisits to ~4k resulting in Reassociate having 0.46%
of the runtime it had before (99.54% improvement).
Here's the summaries reported by lnt for the LLVM test-suite with --benchmarking-only:
| metric | geomean before patch | geomean after patch | delta |
| ----- | ----- | ----- | ----- |
| compile time | 0.1956 | 0.1261 | -35.54% |
| execution time | 0.3240 | 0.3237 | - |
| code size | 7365.4459 | 7365.6079 | - |
The results have a few wins and losses on compile-time, mostly in the +/- 2.5% range. There was one outlier though:
| Performance Regressions - compile_time | Δ | Previous | Current |
| MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk | 9.82% | 2.0473 | 2.2483 |
Reviewers: javed.absar, dberlin
Reviewed By: dberlin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45734
llvm-svn: 331381
and (or (lshr X, C), ...), 1 --> (X & C') != 0
I initially thought about implementing the minimal pattern in instcombine as mentioned here:
https://bugs.llvm.org/show_bug.cgi?id=37098#c6
...but we need to do better to catch the more general sequence from the motivating test
(more than 2 bits in the compare). And a test-suite run with statistics showed that this
pattern only happened 2 times currently. It would potentially happen more often if
reassociation worked better (D45842), but it's probably still not too frequent?
This is small enough that I didn't see a need to create a whole new class/file within
AggressiveInstCombine. There are likely other relatively small matchers like what was
discussed in D44266 that would slide under foldUnusualPatterns() (name suggestions welcome).
We could potentially also consolidate matchers for ctpop, bswap, etc under here.
Differential Revision: https://reviews.llvm.org/D45986
llvm-svn: 331311
As mentioned in D45986, there's a potential ordering dependency
between instcombine and aggressive-instcombine for detecting these,
so I'm adding a few tests to confirm that the expected folds occur
using -O3 (because aggressive-instcombine only runs at -O3 currently).
llvm-svn: 331308
Summary:
This fixes a build break with r331269.
test/Transforms/LoopVectorize/pr23997.ll
should be in:
test/Transforms/LoopVectorize/X86/pr23997.ll
llvm-svn: 331281
Summary:
This is a fix for PR23997.
The loop vectorizer is not preserving the inbounds property of GEPs that it creates.
This is inhibiting some optimizations. This patch preserves the inbounds property in
the case where a load/store is being fed by an inbounds GEP.
Reviewers: mkuper, javed.absar, hsaito
Reviewed By: hsaito
Subscribers: dcaballe, hsaito, llvm-commits
Differential Revision: https://reviews.llvm.org/D46191
llvm-svn: 331269
phi is on lhs of a comparison op.
For the following testcase,
L1:
%t0 = add i32 %m, 7
%t3 = icmp eq i32* %t2, null
br i1 %t3, label %L3, label %L2
L2:
%t4 = load i32, i32* %t2, align 4
br label %L3
L3:
%t5 = phi i32 [ %t0, %L1 ], [ %t4, %L2 ]
%t6 = icmp eq i32 %t0, %t5
br i1 %t6, label %L4, label %L5
We know if we go through the path L1 --> L3, %t6 should always be true. However
currently, if the rhs of the eq comparison is phi, JumpThreading fails to
evaluate %t6 to true. And we know that Instcombine cannot guarantee always
canonicalizing phi to the left hand side of the comparison operation according
to the operand priority comparison mechanism in instcombine. The patch handles
the case when rhs of the comparison op is a phi.
Differential Revision: https://reviews.llvm.org/D46275
llvm-svn: 331266
instcombine should transform the relevant cases if the OverflowingBinaryOperator/PossiblyExactOperator can be proven to be safe.
Change-Id: I7aec62a31a894e465e00eb06aed80c3ea0c9dd45
llvm-svn: 331265
This test had values that differed in only in capitalization,
and that causes problems for the auto-generating check line
script. So I changed that in rL331226, but I accidentally
forgot to change a subsequent use of a param.
llvm-svn: 331228
Summary:
As discussed in D45733, we want to do this in InstCombine.
https://rise4fun.com/Alive/LGk
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: chandlerc, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D45867
llvm-svn: 331205
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
https://rise4fun.com/Alive/Yol
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45664
llvm-svn: 331112
Summary:
Masked merge has a pattern of: `((x ^ y) & M) ^ y`.
But, there is no difference between `((x ^ y) & M) ^ y` and `((x ^ y) & ~M) ^ x`,
We should canonicalize the pattern to non-inverted mask.
Differential Revision: https://reviews.llvm.org/D45663
llvm-svn: 331111
The effect of doing so is not disrupting the LoopPassManager when mixing this pass with other loop passes. This should help locality of access substaintially and avoids the cost of computing PostDom.
The assumption here is that the full GuardWidening (which does use PostDom) is run as a canonicalization before loop opts and that this version is just catching cases exposed by other loop passes. (i.e. LoopPredication, IndVarSimplify, LoopUnswitch, etc..)
llvm-svn: 331094
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.
Thanks to Reid Kleckner for providing me a reproducer!
llvm-svn: 331086
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
As suggested in D45842
(although still not sure if we're going to advance that),
we must invalidate references to instructions that have
been recycled (operands were changed, so result is different).
llvm-svn: 331083
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.
This patch only includes a legacy pass. A follow up will add a new style pass as well.
llvm-svn: 331060
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.
This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472
Reviewers: efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43237
llvm-svn: 331037
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.
Differential Revision: https://reviews.llvm.org/D46082
llvm-svn: 331008
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
remainder expressions as operands.
Summary:
Add test cases to prepare for the new optimization that Simplifies integer add
expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1).
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46017
llvm-svn: 330991
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893
Summary:
When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument.
It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch.
Reviewers: mssimpso, davidxl, davide
Reviewed By: mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46026
llvm-svn: 330844
As discussed in D45862, we want these folds sometimes
because they're good improvements.
But as we can see here, the current logic doesn't
check uses and doesn't produce optimal code in all
cases.
llvm-svn: 330837
This also means we have to check if the latch is the exiting block now,
as `transform` expects the latches to be the exiting blocks too.
https://bugs.llvm.org/show_bug.cgi?id=36586
Reviewers: efriedma, davide, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45279
llvm-svn: 330806
Summary:
When Reassociate is rewriting an expression tree it may
reuse old binary expression nodes, for new expressions.
Whenever an expression node is reused, but with a non-trivial
change in the result, we need to invalidate any debug info
that is associated with the node.
If for example rewriting
x = mul a, b
y = mul c, x
into
x = mul c, b
y = mul a, x
we still get the same result for 'y', but 'x' is a new expression.
All debug info referring to 'x' must be invalidated (marked as
optimized out) since we no longer calculate the expected value.
As a side-effect this patch avoid (at least some) problems where
reassociate could end up creating IR with debug-use before def.
Earlier the dbg.value nodes where left untouched in the IR, while
the reused binary nodes where sinked to just before the root node
of the rewritten expression tree. See PR27273 for more info about
such problems.
Reviewers: dblaikie, aprantl, dexonsmith
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45975
llvm-svn: 330804
These are all but 1 of the select-of-constant tests that appear
to be transformed within foldSelectICmpAnd() and the block above
it predicated by decomposeBitTestICmp().
As discussed in D45862 (and can be seen in several tests here),
we probably want to stop doing those transforms because they
can increase the instruction count without benefitting other
passes or codegen.
The 1 test not included here is a urem test where the bit hackery
allows us to remove a urem. To preserve killing that urem, we
should do some stronger known-bits analysis or pattern matching of
'urem x, (select-of-pow2-constants)'.
llvm-svn: 330768
Patch #2 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces the basic infrastructure to detect, legality check
and process outer loops annotated with hints for explicit vectorization.
All these changes are protected under the feature flag
-enable-vplan-native-path. This should make this patch NFC for the existing
inner loop vectorizer.
Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso.
Differential Revision: https://reviews.llvm.org/D42447
llvm-svn: 330739
After D43236, we started interchanging loops with empty dependence
matrices. In isProfitableForVectorization, we try to determine if
interchanging makes the loop dependences more friendly to the
vectorizer. If there are no dependences, we should not interchange,
based on that heuristic.
Reviewers: efriedma, mcrosier, karthikthecool, blitz.opensource
Reviewed By: mcrosier
Differential Revision: https://reviews.llvm.org/D45208
llvm-svn: 330738
The first step in fixing problems raised in D45862
is to make the problems visible. Now we can more easily
see/update cases where selects have been turned into
multiple instructions with no apparent improvement in
analysis or benefits for other passes (vectorization).
llvm-svn: 330731
The current version of the script uses regex for params.
This could mask a bug (param values got wrongly swapped),
but it seems unlikely in practice, so let's just update
the whole file to reduce diffs when there is a meaningful
change here.
llvm-svn: 330729
The memory location an invariant load is using can never be clobbered by
any store, so it's safe to move the load ahead of the store.
Differential Revision: https://reviews.llvm.org/D46011
llvm-svn: 330725
loop unswitch.
This code incorrectly added the header to the loop block set early. As
a consequence we would incorrectly conclude that a nested loop body had
already been visited when the header of the outer loop was the preheader
of the nested loop. In retrospect, adding the header eagerly doesn't
really make sense. It seems nicer to let the cycle be formed naturally.
This will catch crazy bugs in the CFG reconstruction where we can't
correctly form the cycle earlier rather than later, and makes the rest
of the logic just fall out.
I've also added various asserts that make these issues *much* easier to
debug.
llvm-svn: 330707
This code path can very clearly be called in a context where we have
baselined all the cloned blocks to a particular loop and are trying to
handle nested subloops. There is no harm in this, so just relax the
assert. I've added a test case that will make sure we actually exercise
this code path.
llvm-svn: 330680
I was reminded today that this patch got reverted in r301885. I can no
longer reproduce the failure that caused the revert locally (...almost
one year later), and the patch applied pretty cleanly, so I guess we'll
see if the bots still get angry about it.
The original breakage was InstSimplify complaining (in "assertion
failed" form) about getting passed some crazy IR when running `ninja
check-sanitizer`. I'm unable to find traces of what, exactly, said crazy
IR was. I suppose we'll find out pretty soon if that's still the case.
:)
Original commit:
Author: gbiv
Date: Mon May 1 18:12:08 2017
New Revision: 301880
URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev
Log:
[InstSimplify] Handle selects of GEPs with 0 offset
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 330667
I'm not sure if this is where we should try to fold these
patterns inspired by:
https://bugs.llvm.org/show_bug.cgi?id=37098
...if this isn't the right place, we can move the tests.
llvm-svn: 330642
Summary: We do not need nonull attribute if we know an argument is going to be constant.
Reviewers: junbuml, davide, fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45608
llvm-svn: 330641
Summary:
Skip basic blocks not reachable from the entry node
in MemCpyOptPass::iterateOnFunction.
Code that is unreachable may have properties that do not exist
for reachable code (an instruction in a basic block can for
example be dominated by a later instruction in the same basic
block, for example if there is a single block loop).
MemCpyOptPass::processStore is only safe to use for reachable
basic blocks, since it may iterate past the basic block
beginning when used for unreachable blocks. By simply skipping
to optimize unreachable basic blocks we can avoid asserts such
as "Assertion `!NodePtr->isKnownSentinel()' failed."
in MemCpyOptPass::processStore.
The problem was detected by fuzz tests.
Reviewers: eli.friedman, dneilson, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D45889
llvm-svn: 330635
Summary:
This change teaches DSE that the atomic memory intrinsics are stores
that can be eliminated, and can allow other stores to be eliminated.
This change specifically does not teach DSE that these intrinsics
can be partially eliminated (i.e. length reduced, and dest/src changed);
that will be handled in another change.
Reviewers: mkazantsev, skatkov, apilipenko, efriedma, rsmith
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D45535
llvm-svn: 330629
LoopRotate only invalidates innermost loops while the changes that it makes may
also affert any of this parents. With patch rL329047, SCEV becomes much smarter
about calculation of exit counts for outer loops, so we cannot assume that they are
not affected.
Differential Revision: https://reviews.llvm.org/D45945
llvm-svn: 330582
In the function `simplifyOneLoop` we optimistically assume that changes in the
inner loop only affect this very loop and have no impact on its parents. In fact,
after rL329047 has been merged, we can now calculate exit counts for outer
loops which may depend on inner loops. Thus, we need to invalidate all parents
when we do something to a loop.
There is an evidence of incorrect behavior of `simplifyOneLoop`: when we insert
`SE->verify()` check in the end of this funciton, it fails on a bunch of existing
test, in particular:
LLVM :: Transforms/LoopUnroll/peel-loop-not-forced.ll
LLVM :: Transforms/LoopUnroll/peel-loop-pgo.ll
LLVM :: Transforms/LoopUnroll/peel-loop.ll
LLVM :: Transforms/LoopUnroll/peel-loop2.ll
Note that previously we have fixed issues of this variety, see rL328483.
This patch makes this function invalidate the outermost loop properly.
Differential Revision: https://reviews.llvm.org/D45937
Reviewed By: chandlerc
llvm-svn: 330576
The condition this was asserting doesn't actually hold. I've added
comments to explain why, removed the assert, and added a fun test case
reduced out of 403.gcc.
llvm-svn: 330564
This is the last step in getting constant pattern matchers to allow
undef elements in constant vectors.
I'm adding a dedicated m_ZeroInt() function and building m_Zero() from
that. In most cases, calling code can be updated to use m_ZeroInt()
directly when there's no need to match pointers, but I'm leaving that
efficiency optimization as a follow-up step because it's not always
clear when that's ok.
There are just enough icmp folds in InstSimplify that can be used for
integer or pointer types, that we probably still want a generic m_Zero()
for those cases. Otherwise, we could eliminate it (and possibly add a
m_NullPtr() as an alias for isa<ConstantPointerNull>()).
We're conservatively returning a full zero vector (zeroinitializer) in
InstSimplify/InstCombine on some of these folds (see diffs in InstSimplify),
but I'm not sure if that's actually necessary in all cases. We may be
able to propagate an undef lane instead. One test where this happens is
marked with 'TODO'.
llvm-svn: 330550
When creating a call to storeStrong in ObjCARCContract, ensure the call
gets the correct funclet token, otherwise WinEHPrepare will turn the
call (and all subsequent instructions) into unreachable.
We already have logic to do this for the ARC autorelease elision marker;
factor that out into a common function that's used for both. These are
the only two places in this transform that create call instructions.
Differential Revision: https://reviews.llvm.org/D45857
llvm-svn: 330487
The callback used to create an ORE for the legacy PI pass caches the allocated
object in a unique_ptr in the runOnModule function, and returns a reference to
that object. Under certian circumstances we can end up holding onto that
reference after the OREs destruction. Rather then allowing the new and legacy
passes to create ORE object in diffrent ways, create the ORE at the point of
use.
Differential Revision: https://reviews.llvm.org/D43219
llvm-svn: 330473
It also adds a check making sure PHIs for operands are all in the same
block.
Patch by Daniel Berlin <dberlin@dberlin.org>
Reviewers: dberlin, davide
Differential Revision: https://reviews.llvm.org/D43865
llvm-svn: 330444
Reapply the patches with a fix. Thanks Ilya and Hans for the reproducer!
This reverts commit r330416.
The issue was that removing predecessors invalidated uses that we stored
for rewrite. The fix is to finish manipulating with CFG before we select
uses for rewrite.
llvm-svn: 330431
Revert r330413: "[SSAUpdaterBulk] Use SmallVector instead of DenseMap for storing rewrites."
Revert r330403 "Reapply "[PR16756] Use SSAUpdaterBulk in JumpThreading." one more time."
r330403 commit seems to crash clang during our integrate while doing PGO build with the following stacktrace:
#2 llvm::SSAUpdaterBulk::RewriteAllUses(llvm::DominatorTree*, llvm::SmallVectorImpl<llvm::PHINode*>*)
#3 llvm::JumpThreadingPass::ThreadEdge(llvm::BasicBlock*, llvm::SmallVectorImpl<llvm::BasicBlock*> const&, llvm::BasicBlock*)
#4 llvm::JumpThreadingPass::ProcessThreadableEdges(llvm::Value*, llvm::BasicBlock*, llvm::jumpthreading::ConstantPreference, llvm::Instruction*)
#5 llvm::JumpThreadingPass::ProcessBlock(llvm::BasicBlock*)
The crash happens while compiling 'lib/Analysis/CallGraph.cpp'.
r3340413 is reverted due to conflicting changes.
llvm-svn: 330416
Hopefully, changing set to vector removes nondeterminism detected by
some bots, or the new assert will catch something.
This reverts commit r330180.
llvm-svn: 330403
Summary:
This change fixes https://crbug.com/834474, a build failure caused by
LowerTypeTests not preserving .symver symbol versioning directives for
exported functions. Emit symver information to ThinLTO summary data and
then propagate symver directives for exported functions to the merged
module.
Emitting symver information to the summaries increases the size of
intermediate build artifacts for a Chromium build by less than 0.2%.
Reviewers: pcc
Reviewed By: pcc
Subscribers: tejohnson, mehdi_amini, eraman, llvm-commits, eugenis, kcc
Differential Revision: https://reviews.llvm.org/D45798
llvm-svn: 330387
Similar to rL330086, I don't know if we want to do these
transforms here, but we might as well have the tests
here either way to show that this pass is missing
potential functionality (intentionally or not).
llvm-svn: 330368
Summary:
This fixes the bug pointed out in review with non-trivial unswitching.
This also provides a basis that should make it pretty easy to finish
fleshing out a routine to scan an entire function body for irreducible
control flow, but this patch remains minimal for disabling loop
unswitch.
Reviewers: sanjoy, fedor.sergeev
Subscribers: mcrosier, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45754
llvm-svn: 330357
If those operands change, we might find a leader for ValueOp, which
could enable new phi-of-op creation.
This fixes a case where we missed creating a phi-of-ops node. With D43865
and this patch, bootstrapping clang/llvm works with -enable-newgvn, whereas
without it, the "value changed after iteration" assertion is triggered.
Reviewers: dberlin, davide
Reviewed By: dberlin
Differential Revision: https://reviews.llvm.org/D42180
llvm-svn: 330334
After investigation discussed in D45439, it would seem that the nsw
flag restriction is unnecessary in most cases. So the IsInductionVar
lambda has been removed, the functionality extracted, and now only
require nsw when using eq/ne predicates.
Differential Revision: https://reviews.llvm.org/D45617
llvm-svn: 330256
If a predicate does not become known after peeling, peeling is unlikely
to be beneficial.
Reviewers: mcrosier, efriedma, mkazantsev, junbuml
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D44983
llvm-svn: 330250
Summary:
When sinking an instruction in InstCombine we now also sink
the DbgInfoIntrinsics that are using the sunken value.
Example)
When sinking the load in this input
bb.X:
%0 = load i64, i64* %start, align 4, !dbg !31
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
br label %for.body
we now also move the dbg.value, like this
bb.X:
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
%0 = load i64, i64* %start, align 4, !dbg !31
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br label %for.body
In the past we haven't moved the dbg.value so we got
bb.X:
tail call void @llvm.dbg.value(metadata i64 %0, ...)
br i1 %cond, label %for.end, label %for.body.lr.ph
for.body.lr.ph:
%0 = load i64, i64* %start, align 4, !dbg !31
br label %for.body
So in the past we got a debug-use before the def of %0.
And that dbg.value was also on the path jumping to %for.end, for
which %0 never was defined.
CodeGenPrepare normally comes to rescue later (when not moving
the dbg.value), since it moves dbg.value instrinsics quite
brutally, without really analysing if it is correct to move
the intrinsic (see PR31878).
So at the moment this patch isn't expected to have much impact,
besides that it is moving the dbg.value already in opt, making
the IR look more sane directly.
This can be seen as a preparation to (hopefully) make it possible
to turn off CodeGenPrepare::placeDbgValues later as a solution
to PR31878.
I also adjusted test/DebugInfo/X86/sdagsplit-1.ll to make the
IR in the test case up-to-date with this behavior in InstCombine.
Reviewers: rnk, vsk, aprantl
Reviewed By: vsk, aprantl
Subscribers: mattd, JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45425
llvm-svn: 330243
The bitcast may be interfering with other combines or vectorization
as shown in PR16739:
https://bugs.llvm.org/show_bug.cgi?id=16739
Most pointer-related optimizations are probably able to look through
this bitcast, but removing the bitcast shrinks the IR, so it's at
least a size savings.
Differential Revision: https://reviews.llvm.org/D44833
llvm-svn: 330237
Track the debug locations of the incoming values to newly-created phis,
and apply merged debug locations to the phis.
A merged location will be on line 0, but will have the correct scope
set. This improves crash reporting when an inlined instruction with a
merged location triggers a machine exception. A debugger will be able to
narrow down the crash to the correct inlined scope, instead of simply
pointing to the outer scope of the caller.
Taken together with a change allows generating merged line-0 locations
for instructions which aren't calls, this results in a 0.5% increase in
the uncompressed size of the .debug_line section of a stage2+Release
build of clang (-O3 -g).
rdar://33858697
Differential Revision: https://reviews.llvm.org/D45397
llvm-svn: 330227
One more, hopefully the last, bug is fixed: when forming UsesToRewrite
we should ignore phi operands coming from edges that we want to delete.
This reverts r329910.
llvm-svn: 330175
We use getExtractWithExtendCost to calculate the cost of extractelement and
s|zext together when computing the extract cost after vectorization, but we
calculate the cost of extractelement and s|zext separately when computing the
scalar cost which is larger than it should be.
Differential Revision: https://reviews.llvm.org/D45469
llvm-svn: 330143
Two cleanups:
1. As noted in D45453, we had tests that don't need FMF that were misplaced in the 'fast-math.ll' test file.
2. This removes the final uses of dyn_castFNegVal, so that can be deleted. We use 'match' now.
llvm-svn: 330126
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Was reviewed as part of https://reviews.llvm.org/D45631
llvm-svn: 330103
Summary:
In order to get the whole fold as specified in [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]],
let's first handle the simple straight-forward things.
Let's start with the `and` -> `or` simplification.
The one obvious thing missing here: the constant mask is not handled.
I have an idea how to handle it, but it will require some thinking,
and is not strictly required here, so i've left that for later.
https://rise4fun.com/Alive/Pkmg
Reviewers: spatel, craig.topper, eli.friedman, jingyue
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45631
llvm-svn: 330101
These simplifications were previously enabled only with isFast(), but that
is more restrictive than required. Since r317488, FMF has 'reassoc' to
control these cases at a finer level.
llvm-svn: 330089
It debateable whether instcombine should be in the business of
reassociation, but it is currently.
These tests and PR37098 demonstrate a missing ability to do a
simple reassociation that allows eliminating shifts.
If we decide that functionality belongs somewhere else, then we
should still have some tests to show that we've intentionally
limited instcombine to *not* include this ability.
llvm-svn: 330086
(and plain 'or', for completeness sake.)
After submitting D45631, i have realized that it will *already*
affect 'and' pattern, and it was obvious that there were no
good test patterns to show that.
Since the masked-merge.ll is getting kinda big,
unify naming schemes a bit, and split into 'xor'/'and'/'or'
testfiles, with the only difference being the last operation.
llvm-svn: 330072
The function getMinimumVF(ElemWidth) will return the minimum VF for
a vector with elements of size ElemWidth bits. This value will only
apply to targets for which TTI::shouldMaximizeVectorBandwidth returns
true. The value of 0 indicates that there is no minimum VF.
Differential Revision: https://reviews.llvm.org/D45271
llvm-svn: 330062
This was intended since initially, but i did not really think
about it, and did not know how to force that. Now that the
xor->or fold is working (patch upcoming), this came up
to improve the test coverage.
A followup for rL330003, rL330007
https://bugs.llvm.org/show_bug.cgi?id=6773
llvm-svn: 330039
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.
I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]
This extends/generalizes that fold, so it is handled too.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45539
llvm-svn: 330001
This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics.
We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well.
llvm-svn: 329990
This lifts a restriction on DILocation::getMergedLocation(), allowing it
to create merged locations for instructions other than calls.
Instruction::applyMergedLocation() now defaults to creating merged
locations for all instructions.
The default behavior of getMergedLocation() is unchanged: callers which
invoke it directly are unaffected.
This change will enable a follow-up Mem2Reg fix which improves crash
reporting.
Differential Revision: https://reviews.llvm.org/D45396
llvm-svn: 329955
Created a helper function to query for non negative SCEVs. Uses the
SGE predicate to catch constants that could be interpreted as
negative.
Differential Revision: https://reviews.llvm.org/D45481
llvm-svn: 329907
Summary:
The fold added in D45108 did not account for the fact that
the and instruction is commutative, and if the mask is a variable,
the mask variable and the fold variable may be swapped.
I have noticed this by accident when looking into [[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]]
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45538
llvm-svn: 329901
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
Summary:
These tests show that DSE currently does nothing with the atomic memory
intrinsics. Future work will teach DSE how to simplify these.
llvm-svn: 329845
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/OverwriteStoreBegin.ll
test/Transforms/DeadStoreElimination/OverwriteStoreEnd.ll
llvm-svn: 329839
Summary:
In preparation for a future commit, this regenerates the test checks for
test/Transforms/DeadStoreElimination/simple.ll
test/Transforms/DeadStoreElimination/memintrinsics.ll
llvm-svn: 329824
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
Fix PR36484, as suggested:
<quote>
during moves, mark the direct users of the erased things that were phis as "not to be optimized"
<quote>
llvm-svn: 329621
Summary:
We do not try to move the instructions and split the block till we
know the blocks can be split, i.e. BCE-cmp-insts can be separated from
non-BCE-cmp-insts.
Reviewers: davide, courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44443
llvm-svn: 329564
In IRCE, we have a very old legacy check that works when we collect comparisons that we
treat as range checks. It ensures that the value against which the indvar is compared is
loop invariant and is also positive.
This latter condition remained there since the times when IRCE was only able to handle
signed latch comparison. As the optimization evolved, it now learned how to intersect
signed or unsigned ranges, and this logic has no reliance on the fact that the right border
of each range should be positive.
The old implementation of this non-negativity check was also naive enough and just looked
into ranges (while most of other IRCE logic tries to use power of SCEV implications), so this
check did not allow to deal with the most simple case that looks like follows:
int size; // not known non-negative
int length; //known non-negative;
i = 0;
if (size != 0) {
do {
range_check(i < size);
range_check(i < length);
++i;
} while (i < size)
}
In this case, even if from some dominating conditions IRCE could parse loop
structure, it could only remove the range check against `length` and simply
ignored the check against `size`.
In this patch we remove this obsolete check. It will allow IRCE to pick comparison
against `size` as a potential range check and then let Range Intersection logic
decide whether it is OK to eliminate it or not.
Differential Revision: https://reviews.llvm.org/D45362
Reviewed By: samparker
llvm-svn: 329547
There are a pair of folds that try to merge fneg into fsub
with an intervening cast, but as shown in the FIXME tests,
they can create extra instructions.
llvm-svn: 329501
D45344 is proposing to remove the use restriction that made the calloc
transform safe, but it doesn't currently address the problematic example
given inD16337. Add a test to make sure that doesn't break.
llvm-svn: 329412
Summary:
Fixing an issue where initializations of globals where constructors use
casts were silently translated to 0-initialization.
Reviewers: davidxl, evgeny777
Reviewed By: evgeny777
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45198
llvm-svn: 329409
Inserting instrumentation between a musttail call and ret instruction
would create invalid IR. Instead, treat musttail calls as function
exits.
llvm-svn: 329385
This restores what was lost with rL73243 but without
re-introducing the bug that was present in the old code.
Note that we already have these transforms if the ops are
marked 'fast' (and I assume that's happening somewhere in
the code added with rL170471), but we clearly don't need
all of 'fast' for these transforms.
llvm-svn: 329362
A fold for this pattern was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
...and apparently there were no tests that went with that fold.
llvm-svn: 329360
Summary:
More tests for D45108:
* One use tests
* allow shift to be a variable, too
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45329
llvm-svn: 329348
Summary:
This is a fix to PR37005.
Essentially, rL328539 ([InstCombine] reassociate loop invariant GEP chains to enable LICM) contains a bug
whereby it will convert:
%src = getelementptr inbounds i8, i8* %base, <2 x i64> %val
%res = getelementptr inbounds i8, <2 x i8*> %src, i64 %val2
into:
%src = getelementptr inbounds i8, i8* %base, i64 %val2
%res = getelementptr inbounds i8, <2 x i8*> %src, <2 x i64> %val
By swapping the index operands if the GEPs are in a loop, and %val is loop variant while %val2
is loop invariant.
This fix recreates new GEP instructions if the index operand swap would result in the type
of %src changing from vector to scalar, or vice versa.
Reviewers: sebpop, spatel
Reviewed By: sebpop
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45287
llvm-svn: 329331
There used to be a fold that would handle this case more generally,
but it was removed at rL73243 to fix PR4374:
https://bugs.llvm.org/show_bug.cgi?id=4374
llvm-svn: 329322
Using cstfp_pred_ty in the definition allows us to match vectors with undef elements.
This replicates the change for m_Not from D44076 / rL326823 and continues
towards making all pattern matchers allow undef elements in vectors.
llvm-svn: 329303
LoopInterchange relies on LoopInfo being up-to-date, so we should
preserve it after interchanging. This patch updates restructureLoops to
move the BBs of the interchanged loops to the right place.
Reviewers: davide, efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45278
llvm-svn: 329264
Summary:
If the callsite is inside landing pad, do not perform callsite splitting.
Callsite splitting uses utility function llvm::DuplicateInstructionsInSplitBetween, which eventually calls llvm::SplitEdge. llvm::SplitEdge calls llvm::SplitCriticalEdge with an assumption that the function returns nullptr only when the target edge is not a critical edge (and further assumes that if the return value was not nullptr, the predecessor of the original target edge always has a single successor because critical edge splitting was successful). However, this assumtion is not true because SplitCriticalEdge returns nullptr if the destination block is a landing pad. This invalid assumption results assertion failure.
Fundamental solution might be fixing llvm::SplitEdge to not to rely on the invalid assumption. However, it'll involve a lot of work because current API assumes that llvm::SplitEdge never fails. Instead, this patch makes callsite splitting to not to attempt splitting if the callsite is in a landing pad.
Attached test case will crash with assertion failure without the fix.
Reviewers: fhahn, junbuml, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45130
llvm-svn: 329250
Summary: @llvm.icall.branch.funnel is musttail with variable number of
arguments. After inlining current backend can't separate call targets from call
arguments.
Reviewers: pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45116
llvm-svn: 329235
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329218
Summary:
Clang's __builtin_operator_new/delete was recently taught about the aligned allocation overloads (r328134). This patch makes LLVM aware of them as well.
This allows the compiler to perform certain optimizations including eliding new/delete calls.
Reviewers: rsmith, majnemer, dblaikie, vsk, bkramer
Reviewed By: bkramer
Subscribers: ckennelly, llvm-commits
Differential Revision: https://reviews.llvm.org/D44769
llvm-svn: 329215
Fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform, so the branch is
non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
As discovered after committing an earlier version of this change, this
exposes a subtle interaction between this pass and DivergenceAnalysis:
since we remove and re-create branch instructions, we can no longer rely
on DivergenceAnalysis for branches in subregions that were already
processed by the pass.
Explicitly remove branch instructions from DivergenceAnalysis to
avoid dangling pointers as a matter of defensive programming, and
change how we detect non-uniform subregions.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Differential Revision: https://reviews.llvm.org/D43743
llvm-svn: 329165
This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis.
If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values
`L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also
prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient
to prove the predicate for values that come from the same predecessor block.
The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0`
given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and
non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot.
Differential Revision: https://reviews.llvm.org/D44001
Reviewed By: anna
llvm-svn: 329150
Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.
This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.
Reviewers: efriedma, jmolloy, ABataev
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39760
llvm-svn: 329142
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().
llvm-svn: 329121
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.
For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.
```
%PackedStruct = type <{ i64 }>
...
%data = alloca %PackedStruct, align 8
```
If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:
```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```
Reviewers: rnk, lewissbaker, modocache
Reviewed By: modocache
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D45221
llvm-svn: 329112
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.
Reviewers: sebpop, karthikthecool, blitz.opensource
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D45206
llvm-svn: 329111
Summary:
Folding patterns like:
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0
Combined with existing rules, this allows us to fold patterns like:
%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%cond = icmp eq i8 %val, 0
When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.
Reviewers: spatel, anna, mkazantsev
Reviewed By: spatel
Subscribers: efriedma, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44997
llvm-svn: 329087
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 329085
I have taken the opportunity to simplify some tests slightly and move
parts around.
It also brings back a few IR checks for interchangable loops.
Reviewers: karthikthecool, sebpop, grosser
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D45207
llvm-svn: 329081
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
We use two approaches for determining the minimum bitwidth.
* Demanded bits
* Value tracking
If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.
But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in
%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i
are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in
%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0
it doesn't make sense to shrink %tmp15 and we can skip the value tracking.
Ideas are from Matthew Simpson!
Differential Revision: https://reviews.llvm.org/D44868
llvm-svn: 329035
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.
Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.
Test Plan: check-llvm
Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk
Reviewed By: rnk
Subscribers: llvm-commits, eric_niebler
Differential Revision: https://reviews.llvm.org/D44185
llvm-svn: 329033
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.
Reviewers: efriedma, majnemer, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45186
llvm-svn: 329027
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.
This patch clones the metadatas in the original function to the new function.
Differential Revision: https://reviews.llvm.org/D44127
llvm-svn: 328991
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.
Reviewers: EricWF, modocache, rnk, lewissbaker
Reviewed By: modocache
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45114
llvm-svn: 328986
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 328980
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no limit).
Reviewers: wmi
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45127
llvm-svn: 328934
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.
Differential Revision: https://reviews.llvm.org/D44199
llvm-svn: 328933
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.
Reviewers: eraman
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45062
llvm-svn: 328909
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 328854
In r312664 (D36404), JumpThreading stopped threading edges into
loop headers. Unfortunately, I observed a significant performance
regression as a result of this change. Upon further investigation,
the problematic pattern looked something like this (after
many high level optimizations):
while (true) {
bool cond = ...;
if (!cond) {
<body>
}
if (cond)
break;
}
Now, naturally we want jump threading to essentially eliminate the
second if check and hook up the edges appropriately. However, the
above mentioned change, prevented it from doing this because it would
have to thread an edge into the loop header.
Upon further investigation, what is happening is that since both branches
are threadable, JumpThreading picks one of them at arbitrarily. In my
case, because of the way that the IR ended up, it tended to pick
the one to the loop header, bailing out immediately after. However,
if it had picked the one to the exit block, everything would have
worked out fine (because the only remaining branch would then be folded,
not thraded which is acceptable).
Thus, to fix this problem, we can simply eliminate loop headers from
consideration as possible threading targets earlier, to make sure that
if there are multiple eligible branches, we can still thread one of
the ones that don't target a loop header.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42260
llvm-svn: 328798
Fixed counter/weight overflow that leads to an assertion. Also fixed the help
string for pgo-emit-branch-prob option.
Differential Revision: https://reviews.llvm.org/D44809
llvm-svn: 328653
As a follow-up to r328480, this updates the logic for the decreasing
safety checks in a similar manner:
- CanBeMax is replaced by CannotBeMaxInLoop which queries
isLoopEntryGuardedByCond on the maximum value.
- SumCanReachMin is replaced by isSafeDecreasingBound which includes
some logic from parseLoopStructure and, again, has been updated to
use isLoopEntryGuardedByCond on the given bounds.
Differential Revision: https://reviews.llvm.org/D44776
llvm-svn: 328613
Currently, `getExact` fails if it sees two exit counts in different blocks. There is
no solid reason to do so, given that we only calculate exact non-taken count
for exiting blocks that dominate latch. Using this fact, we can simply take min
out of all exits of all blocks to get the exact taken count.
This patch makes the calculation more optimistic with enforcing our assumption
with asserts. It allows us to calculate exact backedge taken count in trivial loops
like
for (int i = 0; i < 100; i++) {
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44676
Reviewed By: fhahn
llvm-svn: 328611
This patch teaches `computeConstantDifference` handle calculation of constant
difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`.
Differential Revision: https://reviews.llvm.org/D43759
Reviewed By: anna
llvm-svn: 328609
MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for
each block, it computes the previous definition for each predecessor,
then takes those definitions and combines them. But currently it doesn't
remember results which it already computed; this means it can visit the
same block multiple times, which adds up to exponential time overall.
To fix this, this patch adds a cache. If we computed the result for a
block already, we don't need to visit it again because we'll come up
with the same result. Well, unless we RAUW a MemoryPHI; in that case,
the TrackingVH will be updated automatically.
This matches the original source paper for this algorithm.
The testcase isn't really a test for the bug, but it adds coverage for
the case where tryRemoveTrivialPhi erases an existing PHI node. (It's
hard to write a good regression test for a performance issue.)
Differential Revision: https://reviews.llvm.org/D44715
llvm-svn: 328577
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext
llvm-svn: 328539
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.
We still need to loosen the FMF check.
llvm-svn: 328502
Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on
fact that exit count of outer loops cannot rely on exiting blocks of
inner loops, which is true in current implementation of backedge taken count
calculation but is wrong in general. As result, when we only forget the loop that
we have just unrolled, we may still have cached data for its outer loops (in particular,
exit counts) which keeps references on blocks of inner loop that could have been
changed or even deleted.
The attached test demonstrates a situaton when after unrolling of innermost loop
the outermost loop contains a dangling pointer on non-existant block. The problem
shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV
smarter about exit count calculation. I am not sure if the bug exists without this patch,
it appears that now it is accidentally correct just because in practice exact backedge
taken count for outer loops with complex control flow inside is never calculated.
But when SCEV learns to do so, this problem shows up.
This patch replaces existing logic of SCEV loop invalidation with a correct one, which
happens to be invalidation of outermost loop (which also leads to invalidation of all
loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers
on removed blocks, or just outdated information that has changed after unrolling.
Differential Revision: https://reviews.llvm.org/D44818
Reviewed By: samparker
llvm-svn: 328483
CanBeMin is currently used which will report true for any unknown
values, but often a check is performed outside the loop which covers
this situation:
for (int i = 0; i < N; ++i)
...
if (N > 0)
for (int i = 0; i < N; ++i)
...
So I've add 'LoopGuardedAgainstMin' which reports whether N is
greater than the minimum value which then allows loop with a variable
loop count to be optimised. I've also moved the increasing bound
checking into its own function and replaced SumCanReachMax is another
isLoopEntryGuardedByCond function.
llvm-svn: 328480
We have thorough coverage of predicates and scalar types,
so we just need a sampling of vector tests to show that
things are working or not with vectors types.
llvm-svn: 328449
Summary:
This was motivated by absence of PrunEH functionality in new PM.
It was decided that a proper way to do PruneEH is to add NoUnwind inference
into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top.
This change generalizes attribute handling implemented for (a removal of)
Convergent attribute, by introducing a generic builder-like class
AttributeInferer
It registers all the attribute inference requests, storing per-attribute
predicates into a vector, and then goes through an SCC Node, scanning all
the instructions for not breaking attribute assumptions.
The main idea is that as soon all the instructions from all the functions
of SCC Node conform to attribute assumptions then we are free to infer
the attribute as set for all the functions of SCC Node.
It handles two distinct cases of attributes:
- those that might break due to derefinement of the function code
for these attributes we are allowed to apply inference only if all the
functions are "exact definitions". Example - NoUnwind.
- those that do not care about derefinement
for these attributes we are allowed to apply inference as soon as we see
any function definition. Example - removal of Convergent attribute.
Also in this commit:
* Converted all the FunctionAttrs tests to use FileCheck and added new-PM
invocations to them
* FunctionAttrs/convergent.ll test demonstrates a difference in behavior between
new and old PM implementations. Marked with FIXME.
* PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg
combo as intended
* some of "other" tests were updated since function-attrs now infers 'nounwind'
even for old PM pipeline
* -disable-nounwind-inference hidden option added as a possible workaround for a supposedly
rare case when nounwind being inferred by default presents a problem
Reviewers: chandlerc, jlebar
Reviewed By: jlebar
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44415
llvm-svn: 328377
When building the SLP tree, we look for reuse among the vectorized tree
entries. However, each gather sequence is represented by a unique tree entry,
even though the sequence may be identical to another one. This means, for
example, that a gather sequence with two uses will be counted twice when
computing the cost of the tree. We should only count the cost of the definition
of a gather sequence rather than its uses. During code generation, the
redundant gather sequences are emitted, but we optimize them away with CSE. So
it looks like this problem just affects the cost model.
Differential Revision: https://reviews.llvm.org/D44742
llvm-svn: 328316
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 328307
Loop peeling also has an impact on the induction variables, so we should
benefit from induction variable simplification after peeling too.
Reviewers: sanjoy, bogner, mzolotukhin, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43878
llvm-svn: 328301
Summary:
Revert r325687 workaround for PR36032 since
a fix was committed in r326154.
Reviewers: sbaranga
Differential Revision: http://reviews.llvm.org/D44768
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 328257
Summary:
When building with libFuzzer, converting control flow to selects or
obscuring the original operands of CMPs reduces the effectiveness of
libFuzzer's heuristics.
This patch provides an attribute to disable or modify certain optimizations
for optimal fuzzing signal.
Provides a less aggressive alternative to https://reviews.llvm.org/D44057.
Reviewers: vitalybuka, davide, arsenm, hfinkel
Reviewed By: vitalybuka
Subscribers: junbuml, mehdi_amini, wdng, javed.absar, hiraditya, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44232
llvm-svn: 328214
Summary:
LoopPredication is not profitable when the loop is known to always exit
through some block other than the latch block.
A coarse grained latch check can cause loop predication to predicate the
loop, and unconditionally deoptimize.
However, without predicating the loop, the guard may never fail within the
loop during the dynamic execution because the non-latch loop termination
condition exits the loop before the latch condition causes the loop to
exit.
We teach LP about this using BranchProfileInfo pass.
Reviewers: apilipenko, skatkov, mkazantsev, reames
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44667
llvm-svn: 328210
There are at least 3 problems:
1. We're distributing across large patterns, but fail to do that for the minimal patterns.
2. We're not checking uses, so we may create more instructions than we eliminate.
3. We should be able to do these transforms with less than full 'fast' fmuls.
llvm-svn: 328152
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
MemCpyOpt pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.
We also add a few tests to fill gaps in the testing of this pass.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 328097
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Backed out for causing performance regressions. Re-landing
because we've determined that these regressions were noise.
Original Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 328096
The inline assembly generated for the ARC autorelease elision marker
must have a funclet token if it's emitted inside a funclet, otherwise
the inline assembly (and all subsequent code in the funclet) will be
marked unreachable by WinEHPrepare.
Note that this only applies for the non-O0 case, since at O0, clang
emits the autorelease elision marker itself rather than deferring to the
backend. The fix for clang is handled in a separate change.
Differential Revision: https://reviews.llvm.org/D44641
llvm-svn: 328042
Summary: Fix a bug in entry block shuffled to middle of the chain.
Reviewers: davide, courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44642
llvm-svn: 327971
Summary:
It turned out to be error-prone to expect the callers to handle that - better to
leave the decision to this routine and make the required data to be explicitly
passed to the function.
This handles the case that was missed in the r322473 and fixes the assert
mentioned in PR36524.
Reviewers: dorit, mssimpso, Ayal, dcaballe
Reviewed By: dcaballe
Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits
Differential Revision: https://reviews.llvm.org/D43812
llvm-svn: 327960
This is complicated by -0.0 and nan. This is based on the DAG patterns
as shown in D44091. I'm hoping that we can just remove those DAG folds
and always rely on IR canonicalization to handle the matching to fabs.
We would still need to delete the broken code from DAGCombiner to fix
PR36600:
https://bugs.llvm.org/show_bug.cgi?id=36600
Differential Revision: https://reviews.llvm.org/D44550
llvm-svn: 327858
This is re-land of https://reviews.llvm.org/rL327362 with a fix
and regression test.
The crash was due to it is possible that for found MDL loop,
LHS or RHS may contain an invariant unknown SCEV which
does not dominate the MDL. Please see regression
test for an example.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44553
llvm-svn: 327822
LICM deletes trivially dead instructions which it won't attempt to sink.
Attempt to salvage debug values which reference these instructions.
llvm-svn: 327800
As shown in the code comment, we don't need all of 'fast',
but we do need reassoc + nsz + nnan.
Differential Revision: https://reviews.llvm.org/D43765
llvm-svn: 327796
This builds on the work from https://reviews.llvm.org/D44287. It turned out supporting fcmp was much easier than I realized, so let's do that now.
As an aside, our -O3 handling of a floating point IVs leaves a lot to be desired. We do convert the float IV to an integer IV, but do so late enough that many other optimizations are missed (e.g. we don't vectorize).
Differential Revision: https://reviews.llvm.org/D44542
llvm-svn: 327722
JumpThreading iterates over F until the IR quiesces. Transforming
unreachable BBs increases compile time and it is also possible to
never stabilize causing JumpThreading to hang. An older attempt at
fixing this problem was D3991 where removeUnreachableBlocks(F)
was called before JumpThreading began. This has a few drawbacks:
* expensive - the routine attempts to fix up the IR to identify
additional BBs that can be removed along with unreachable BBs.
* aggressive - does not identify and preserve the shape of the IR.
At a minimum it does not preserve loop hierarchies.
* invasive - altering reachable blocks it may disrupt IR shapes
that could have otherwise been JumpThreaded.
This patch avoids removeUnreachableBlocks(F) and instead tracks
unreachable BBs in a SmallPtrSet using DominatorTree to validate the
initial state of all BBs. We then rely on subsequent passes to identify
and remove these unreachable blocks from F.
Reviewers: dberlin, sebpop, kuhar, dinesh.d
Reviewed by: sebpop, kuhar
Subscribers: hiraditya, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D44177
llvm-svn: 327713
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.
Differential Revision: https://reviews.llvm.org/D44490
llvm-svn: 327702
If the loop body contains conditions of the form IndVar < #constant, we
can remove the checks by peeling off #constant iterations.
This improves codegen for PR34364.
Reviewers: mkuper, mkazantsev, efriedma
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43876
llvm-svn: 327671
It is common to have conditional exits within a loop which are known not to be taken on some iterations, but not necessarily all. This patches extends our reasoning around guaranteed to execute (used when establishing whether it's safe to dereference a location from the preheader) to handle the case where an exit is known not to be taken on the first iteration and the instruction of interest *is* known to be taken on the first iteration.
This case comes up in two major ways:
* If we have a range check which we've been unable to eliminate, we frequently know that it doesn't fail on the first iteration.
* Pass ordering. We may have a check which will be eliminated through some sequence of other passes, but depending on the exact pass sequence we might never actually do so or we might miss other optimizations from passes run before the check is finally eliminated.
The initial version (here) is implemented via InstSimplify. At the moment, it catches a few cases, but misses a lot too. I added test cases for missing cases in InstSimplify which I'll follow up on separately. Longer term, we should probably wire SCEV through to here to get much smarter loop aware simplification of the first iteration predicate.
Differential Revision: https://reviews.llvm.org/D44287
llvm-svn: 327664
If we've already established an invariant scope with an earlier generation, we don't want to hide it in the scoped hash table with one with a later generation. I noticed this when working on the invariant-load handling, but it also applies to the invariant.start case as well.
Without this change, my previous patch for invariant-load regresses some cases, so I'm pushing this without waiting for review. This is why you don't make last minute tweaks to patches to catch "obvious cases" after it's already been reviewed. Bad Philip!
llvm-svn: 327655
This is a follow up to https://reviews.llvm.org/D43716 which rewrites the invariant load handling using the new infrastructure. It's slightly more powerful, but only in somewhat minor ways for the moment. It's not clear that DSE of stores to invariant locations is actually interesting since why would your IR have such a construct to start with?
Note: The submitted version is slightly different than the reviewed one. I realized the scope could start for an invariant load which was proven redundant and removed. Added a test case to illustrate that as well.
Differential Revision: https://reviews.llvm.org/D44497
llvm-svn: 327646
This matcher implementation appears to be slightly more efficient than
the generic constant check that it is replacing because every use was
for matching FP patterns, but the previous code would check int and
pointer type nulls too.
llvm-svn: 327627
From the LangRef definition for frem:
"The value produced is the floating-point remainder of the two operands.
This is the same output as a libm ‘fmod‘ function, but without any
possibility of setting errno. The remainder has the same sign as the
dividend. This instruction is assumed to execute in the default
floating-point environment."
llvm-svn: 327626
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", the HoistThenElseCodeToIf routine will attempt to merge
the debug location associated with the two original copies of the hoisted
instruction.
This is a problem in the special case where the hoisted instruction is a
debug info intrinsic, since for those the debug location is considered
part of the intrinsic and attempting to modify it may resut in invalid
IR. This is the underlying cause of PR36410.
This patch fixes the problem by handling debug info intrinsics specially:
instead of hoisting one copy and merging the two locations, the code now
simply hoists both copies, each with its original location intact. Note
that this is still only done in the case where both original copies are
otherwise (i.e. apart from location metadata) identical.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D44312
llvm-svn: 327622
There are two nontrivial details here:
* Loop structure update interface is quite different with new pass manager,
so the code to add new loops was factored out
* BranchProbabilityInfo is not a loop analysis, so it can not be just getResult'ed from
within the loop pass. It cant even be queried through getCachedResult as LoopCanonicalization
sequence (e.g. LoopSimplify) might invalidate BPI results.
Complete solution for BPI will likely take some time to discuss and figure out,
so for now this was partially solved by making BPI optional in IRCE
(skipping a couple of profitability checks if it is absent).
Most of the IRCE tests got their corresponding new-pass-manager variant enabled.
Only two of them depend on BPI, both marked with TODO, to be turned on when BPI
starts being available for loop passes.
Reviewers: chandlerc, mkazantsev, sanjoy, asbirlea
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D43795
llvm-svn: 327619
Summary:
Before this patch call graph is like this in the LoopUnrollPass:
tryToUnrollLoop
ApproximateLoopSize
collectEphemeralValues
/* Use collected ephemeral values */
computeUnrollCount
analyzeLoopUnrollCost
/* Bail out from the analysis if loop contains CallInst */
This patch moves collection of the ephemeral values to the tryToUnrollLoop
function and passes the collected values into both ApproximateLoopsize (as
before) and additionally starts using them in analyzeLoopUnrollCost:
tryToUnrollLoop
collectEphemeralValues
ApproximateLoopSize(EphValues)
/* Use EphValues */
computeUnrollCount(EphValues)
analyzeLoopUnrollCost(EphValues)
/* Ignore ephemeral values - they don't contribute to the final cost */
/* Bail out from the analysis if loop contains CallInst */
Reviewers: mzolotukhin, evstupac, sanjoy
Reviewed By: evstupac
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43931
llvm-svn: 327617
These should all be folded. The vector tests need to have
m_AnyZero updated to ignore undef elements, but we need to
be careful not to return the existing value in that case
and unintentionally propagate undef.
llvm-svn: 327585
If we have an invariant.start with no corresponding invariant.end, then the memory location becomes invariant indefinitely after the invariant.start. As a result, anything dominated by the start is guaranteed to see the value the memory location had when the invariant.start executed.
This patch adds an AvailableInvariants table which tracks the generation a particular memory location became invariant and then uses that information to allow value forwarding that would otherwise be disallowed by potentially aliasing stores. (Reminder: In EarlyCSE everything clobbers everything by default.)
This should be compatible with the MemorySSA variant, but design is generational. We can and should add first class support for invariant.start within MemorySSA at a later time. I took a quick look at doing so, but probably need some input from a MemorySSA expert.
Differential Revision: https://reviews.llvm.org/D43716
llvm-svn: 327577
As shown in:
https://bugs.llvm.org/show_bug.cgi?id=27151
...the existing fold could miscompile when X is NaN.
The fold was also dependent on 'ninf' but that's not necessary.
From IEEE-754 (with default rounding which we can assume for these opcodes):
"When the sum of two operands with opposite signs (or the difference of two
operands with like signs) is exactly zero, the sign of that sum (or difference)
shall be +0...However, x + x = x − (−x) retains the same sign as x even when
x is zero."
llvm-svn: 327575
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682https://reviews.llvm.org/D44390https://godbolt.org/g/oKvT5H
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast`+`icmp` can be evaluated to a boolean:
* `slt 0`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5889242}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44421
llvm-svn: 327535
Summary:
This pattern came up in PR36682 / D44390
https://bugs.llvm.org/show_bug.cgi?id=36682https://reviews.llvm.org/D44390https://godbolt.org/g/oKvT5H
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `uitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
I did not check vectors, but i'm guessing it's the same there.
{F5889189}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Generated with
{F5889196}
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44416
llvm-svn: 327534
Summary:
It is possible for LVI to encounter instructions that are not in valid
SSA form and reference themselves. One example is the following:
%tmp4 = and i1 %tmp4, undef
Before this patch LVI would recurse until running out of stack memory
and crashed. This patch marks these self-referential instructions as
Overdefined and aborts analysis on the instruction.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33357
Reviewers: craig.topper, anna, efriedma, dberlin, sebpop, kuhar
Reviewed by: dberlin
Subscribers: uabelho, spatel, a.elovikov, fhahn, eli.friedman, mzolotukhin, spop, evandro, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34135
llvm-svn: 327432
This was supposed to be an NFC refactoring that will eventually allow
eliminating the isFast() predicate, but there's a rare possibility
that we would pessimize the code as shown in the test case because
we failed to check 'hasOneUse()' properly. This version also removes
an inefficiency of the old code; we would look for:
(X * C) * C1 --> X * (C * C1)
...but that pattern is always handled by
SimplifyAssociativeOrCommutative().
llvm-svn: 327404
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
SROA pass to cease using the old getAlignment() & setAlignment() APIs of MemoryIntrinsic in
favour of getting source & dest specific alignments through the new API. This allows us
to enhance visitMemTransferInst to be more aggressive setting the alignment in memcpy
calls that it creates, as well as to only change the alignment of a memcpy/memmove
argument that it replaces.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960, rL325816 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: chandlerc, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42974
llvm-svn: 327398
It is a revert of rL327362 which causes build bot failures with assert like
Assertion `isAvailableAtLoopEntry(RHS, L) && "RHS is not available at Loop Entry"' failed.
llvm-svn: 327363
IsKnownPredicate is updated to implement the following algorithm
proposed by @sanjoy and @mkazantsev :
isKnownPredicate(Pred, LHS, RHS) {
Collect set S all loops on which either LHS or RHS depend.
If S is non-empty
a. Let PD be the element of S which is dominated by all other elements of S
b. Let E(LHS) be value of LHS on entry of PD.
To get E(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their entry values.
Define E(RHS) in the same way.
c. Let B(LHS) be value of L on backedge of PD.
To get B(LHS), we should just take LHS and replace all AddRecs that
are attached to PD on with their backedge values.
Define B(RHS) in the same way.
d. Note that E(LHS) and E(RHS) are automatically available on entry of PD,
so we can assert on that.
e. Return true if isLoopEntryGuardedByCond(Pred, E(LHS), E(RHS)) &&
isLoopBackedgeGuardedByCond(Pred, B(LHS), B(RHS))
Return true if Pred, L, R is known from ranges, splitting etc.
}
This is follow-up for https://reviews.llvm.org/D42417.
Reviewers: sanjoy, mkazantsev, reames
Reviewed By: sanjoy, mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43507
llvm-svn: 327362
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
(Reland with a suspected fix for a unit test failure I haven't been able
to reproduce locally)
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327360
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction. The new site will have a different token and needs to be
reassociated with the new instruction.
Unfortunately, there is no way to alter the bundle after the
construction of the instruction. Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.
llvm-svn: 327336
This is the FP equivalent of D42818. Use it for the few cases in InstSimplify
with -0.0 folds (that's the only current use of m_NegZero()).
Differential Revision: https://reviews.llvm.org/D43792
llvm-svn: 327307
Summary:
This pattern came up in PR36682:
https://bugs.llvm.org/show_bug.cgi?id=36682https://godbolt.org/g/LhuD9A
Tests for proposed fix in D44367.
Looking at the IR pattern in question, as per [[ https://github.com/rutgers-apl/alive-nj | alive-nj ]], for all the type combinations i checked
(input: `i16`, `i32`, `i64`; intermediate: `half`/`i16`, `float`/`i32`, `double`/`i64`)
for the following `icmp` comparisons the `sitofp`+`bitcast` can be dropped:
* `eq 0`
* `ne 0`
* `slt 0`
* `sle 0`
* `sge 0`
* `sgt 0`
* `slt 1`
* `sge 1`
* `sle -1`
* `sgt -1`
I did not check vectors, but i'm guessing it's the same there.
{F5887419}
Thus all these cases are in the testcase (along with the vector variant with additional `undef` element in the middle).
There are no negative patterns here (unless alive-nj lied/is broken), all of these should be optimized.
Generated with {F5887551}
Reviewers: spatel, majnemer, efriedma, arsenm
Reviewed By: spatel
Subscribers: nlopes, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D44390
llvm-svn: 327301
This reverts r326908, originally landed as D44102.
Reverted for causing performance regressions on x86. (These regressions
are not yet understood.)
llvm-svn: 327252
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.
As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758
llvm-svn: 327250
With the updated LangRef ( D44216 / rL327138 ) in place, we can proceed with more constant folding.
I'm intentionally taking the conservative path here: no matter what the constant or the FMF, we can
always fold to NaN. This is because the undef operand can be chosen as NaN, and in our simplified
default FP env, nothing else happens - NaN just propagates to the result. If we find some way/need
to propagate undef instead, that can be added subsequently.
The tests show that we always choose the same quiet NaN constant (0x7FF8000000000000 in IR text).
There were suggestions to improve that with a 'NaN' string token or not always print a 64-bit hex
value, but those are independent changes. We might also consider setting/propagating the payload of
NaN constants as an enhancement.
Differential Revision: https://reviews.llvm.org/D44308
llvm-svn: 327208
Use isInlineViable to prevent inlining of functions with non-inlinable
constructs, in case cost analysis is skipped.
Reviewers: efriedma, sfertile, davide, davidxl
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42846
llvm-svn: 327207
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", there is no need to require that debug intrinsics match
before moving them (and merging them). Instead, we can simply always keep
all debug intrinsics from both sides of the "if".
This fixes PR36410, which describes a problem where as a result of the attempt
to merge debug locations for two debug intrinsics we end up with an invalid
intrinsic, where the scope indicated in the !dbg location no longer matches
the scope of the variable tracked by the intrinsic.
In addition, this has the benefit that we no longer throw away information
that is actually still valid, helping to generate better debug data.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D44312
llvm-svn: 327175
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.
This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.
The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.
For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html
Differential Revision: https://reviews.llvm.org/D42453
llvm-svn: 327163
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
This patch supports ds_read_b128 instruction pattern and generation of this instruction.
In the vectorizer, this patch also widen the vector length so that vectorizer generates
128 bit loads for local address-space which gets translated to ds_read_b128.
Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.
Author: FarhanaAleen
Reviewed By: rampitec, arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44210
llvm-svn: 327153
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading. In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262
llvm-svn: 327150
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.
rdar://problem/34496278
llvm-svn: 327092
These are uncontroversial and independent of a proposed LangRef edits (D44216).
I tried to fix tests that would fold away:
rL327004
rL327028
rL327030
rL327034
I'm not sure if the Reassociate tests are meaningless yet, but they probably will be
as we add more folds, so if anyone has suggestions or wants to fix those, please do.
Differential Revision: https://reviews.llvm.org/D44258
llvm-svn: 327058
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
I've also enabled EliminateAvailableExternally for all optimization
levels, I believe it being disabled for O1 was an oversight.
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327041
These are based on:
https://bugs.llvm.org/show_bug.cgi?id=35875
It's not clear if/how instcombine can reduce these,
but we should have the tests here either way to
document current behavior.
llvm-svn: 327039
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Backed out for failing an assert in clang bootstrap builds. Re-landing
with a fix for handling non-power-of-two inputs (e.g. udiv i24).
Original Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326908
The purpose of this patch is to have LSR generate better code on Power.
This is done by overriding isLSRCostLess.
Differential Revision: https://reviews.llvm.org/D40855
llvm-svn: 326906
Breaks bootstrap builds: clang built with this patch asserts while
building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid
cast!"' failed.
llvm-svn: 326900
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Reviewers: spatel, sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326898
The LoadStoreVectorizer thought that <1 x T> and T were the same types
when merging stores, leading to a crash later.
Patch by Erik Hogeman.
Differential Revision: https://reviews.llvm.org/D44014
llvm-svn: 326884
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:
from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];
DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:
maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
src = {0,+,1}<nuw><nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
subscript 1
src = {-6,+,3}<nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
Separable = {}
Coupled = {1}
With the current patch, DA will correctly work on only one dimension:
maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
dst = {%A,+,404}<%for.body>
class = 1
loops = {1}
Separable = {0}
Coupled = {}
This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.
The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.
I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.
The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.
Patch written by Sebastian Pop and Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D35430
llvm-svn: 326837
Using cst_pred_ty in the definition allows us to match vectors with undef elements.
This is a continuation of an effort to make all pattern matchers allow undef elements in vectors:
rL325437
rL325466
D43792
Differential Revision: https://reviews.llvm.org/D44076
llvm-svn: 326823
Change doCallSiteSplitting to iterate until we reach the terminator instruction.
tryToSplitCallSite can replace BB's terminator in case BB is a successor of
itself. Then IE will be invalidated and we also have to check the current
terminator.
Reviewers: junbuml, davidxl, davide, fhahn
Reviewed By: fhahn, junbuml
Differential Revision: https://reviews.llvm.org/D43824
llvm-svn: 326793
Summary:
RewriteStatepointsForGC collects parse points for further processing.
During the collection if a callsite is found in an unreachable block
(DominatorTree::isReachableFromEntry()) then all unreachable blocks are
removed by removeUnreachableBlocks(). Some of the removed blocks could
have been reachable according to DominatorTree::isReachableFromEntry().
In this case the collected parse points became stale and resulted in a
crash when accessed.
The fix is to unconditionally canonicalize the IR to
removeUnreachableBlocks and then collect the parse points.
The added test crashes with the old version and passes with this patch.
Patch by Yevgeny Rouban!
Reviewed by: Anna
Differential Revision: https://reviews.llvm.org/D43929
llvm-svn: 326748
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).
This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.
Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44063
llvm-svn: 326730
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.
Differential Revision: https://reviews.llvm.org/D43774
llvm-svn: 326729
getCompare returns true, false or undef constants if the comparison can
be evaluated, or nullptr if it cannot. This is in line with what
ConstantExpr::getCompare returns. It also allows us to use
ConstantExpr::getCompare for comparing constants.
Reviewers: davide, mssimpso, dberlin, anna
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D43761
llvm-svn: 326720
Summary:
We can discard initial blocks that do other work
We do not need to limit ourselves to just the first block in the chain.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44029
llvm-svn: 326698
Summary:
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.
Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.
Reviewers: junbuml, mcrosier, davidxl, davide, fhahn
Reviewed By: fhahn
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D43729
llvm-svn: 326666
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.
Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.
llvm-svn: 326638
The code was checking that all of the instructions in the
sequence are 'fast', but that's not necessary. The final
multiply is all that we need to check (tests adjusted).
The fmul doesn't need to be fully 'fast' either, but that
can be another patch.
llvm-svn: 326608
This narrow fold was added with no motivation or test cases
a bit over 5 years ago. Removing a constant operand is a
good canonicalization? We should handle Y*2.0 too then?
llvm-svn: 326606
If we are only truncating bits from the extend we should be able to just use a smaller extend.
If we are truncating more than the extend we should be able to just use a fptrunc since the presense of the fpextend shouldn't affect rounding.
Differential Revision: https://reviews.llvm.org/D43970
llvm-svn: 326595
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.
Differential Revision: https://reviews.llvm.org/D43826
llvm-svn: 326585
This includes the test cases from D43970 and additional tests for combining (fptrunc (binop (fpext), (fpext))) where the pre-extended types don't match the trunc and therefore can't be completely removed.
llvm-svn: 326528
The loads and stores were getting the data and storing the results. There's no reason we can't just use function arguments and return.
llvm-svn: 326515
This is a retry of r326502 with updates to the reassociate
test file that I missed the first time.
@test15_reassoc in the supposed -reassociate test file
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.
Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern.
Reassociate doesn't get that case because it hasn't been
updated to use less than 'fast' FMF.
llvm-svn: 326513
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.
llvm-svn: 326510
The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN`
can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`.
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D43810
llvm-svn: 326418
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.
Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.
Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.
Patch by Fedor Indutny
Differential Revision: https://reviews.llvm.org/D43695
llvm-svn: 326404
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.
This fixes PR36546
Patch by: Fedor Indutny (indutny)
Differential revision: https://reviews.llvm.org/D43859
llvm-svn: 326376
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.
We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43825
llvm-svn: 326318
This is similar to what's done in computeKnownBits and computeSignBits. Don't do anything fancy just collect information valid for any element.
Differential Revision: https://reviews.llvm.org/D43789
llvm-svn: 326237
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.
llvm-svn: 326221
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.
llvm-svn: 326217
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.
This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.
This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.
Reviewers: davide, dberlin
Reviewed By: dberlin
Differential Revision: https://reviews.llvm.org/D43770
llvm-svn: 326181
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.
Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.
Differential Revision: https://reviews.llvm.org/D43733
llvm-svn: 326133
Note: gcc appears to allow this fold with -freciprocal-math alone,
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.
This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.
Differential Revision: https://reviews.llvm.org/D43398
llvm-svn: 326098
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.
This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.
Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.
Closes D43208.
Patch by: Hideki Saito <hideki.saito@intel.com>
llvm-svn: 326079
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.
Specifically this case:
```
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
```
In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.
This patch also moves the dbg.value insertion routine from LoopRotation.cpp
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).
Reviewers: mzolotukhin, aprantl, vsk, davide
Reviewed By: aprantl, vsk
Subscribers: dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 325926
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Reviewers: arsenm, rampitec, jlebar
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D40546
llvm-svn: 325881
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.
Reviewers: davide, Gerolf
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43177
llvm-svn: 325880
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: hfinkel, bollu, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D43081
llvm-svn: 325816
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
from changing naked functions.
- These passes can perform some alterations to the functions that should not be
applied. An example is removing parameters that are seemingly not used because
they are only referenced in the inline assembly. Another example is marking
the function as fastcc.
llvm-svn: 325788
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.
llvm-svn: 325717
of turning SCEVUnknowns of PHIs into AddRecExprs.
This feature is now hidden behind the -scev-version-unknown flag.
Fixes PR36032 and PR35432.
llvm-svn: 325687
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.
llvm-svn: 325660
There are too many perf regressions resulting from this, so we need to
investigate (and add tests for) targets like ARM and AArch64 before
trying to reinstate.
llvm-svn: 325658
Summary:
We used to remove the first memmove in cases like this:
memmove(p, p+2, 8);
memmove(p, p+2, 8);
which is incorrect. Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.
Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.
Reviewers: rsmith
Subscribers: mcrosier, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D43425
llvm-svn: 325641
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.
llvm-svn: 325630
It's possible that we could allow this either 'arcp' or 'reassoc' alone, but this
should be conservatively better than what we have right now. GCC allows this with
only -freciprocal-math.
The last test is changed to show a case that is expected to fold, but we need D43398.
llvm-svn: 325533
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this:
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.
The div/rem costs for x86 look very wrong in some cases, but that's already true,
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent
problem (though the solution may be adjusted after this change is made).
Differential Revision: https://reviews.llvm.org/D43079
llvm-svn: 325515
With this patch in place, when a new-format TBAA tag is available
for a memory-transfer intrinsic call, we prefer propagating that
new-format tag. Otherwise, we fallback to the old approach where
we try to construct a proper TBAA access tag from 'tbaa.struct'
metadata.
Differential Revision: https://reviews.llvm.org/D41543
llvm-svn: 325488
Loosening the matcher definition reveals a subtle bug in InstSimplify (we should not
assume that because an operand constant matches that it's safe to return it as a result).
So I'm making that change here too (that diff could be independent, but I'm not sure how
to reveal it before the matcher change).
This also seems like a good reason to *not* include matchers that capture the value.
We don't want to encourage the potential misstep of propagating undef values when it's
not allowed/intended.
I didn't include the capture variant option here or in the related rL325437 (m_One),
but it already exists for other constant matchers.
llvm-svn: 325466
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36133
Reviewers: sebpop, dberlin, kuhar
Reviewed by: sebpop, kuhar
Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov
Differential Revision: https://reviews.llvm.org/D42717
llvm-svn: 325356
Now that we have the new TBAA metadata format that is capable of
representing accesses to aggregates, we can propagate TBAA access
tags from memory setting and transferring intrinsics to load and
store instructions and vice versa.
Since SROA produces lots of new loads and stores on optimized
builds, this change significantly decreases the share of
undecorated memory accesses on such builds.
Differential Revision: https://reviews.llvm.org/D41563
llvm-svn: 325329
In r325063, we salvaged debug values from dying instructions in
GVN::processBlock() and GVN::performScalarPRE().
The change in performScalarPRE(), while correct, is unhelpful. It
introduced a call to salvageDebugInfo() which was immediately followed
by a RAUW, meaning it prevented the RAUW from efficiently updating
dbg.value intrinsics. This commit reverts the mistake and tightens up
the affected test case.
llvm-svn: 325308
This results in small increases in the size of the .debug_loc section
and the number of unique source variables in a stage2 build of opt.
llvm-svn: 325301
Summary:
The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7`
allows coroutine parameters to be passed into allocator functions.
The instructions to store values into the alloca'd parameters must not
be moved past the frame allocation, otherwise uninitialized values are
passed to the allocator.
Test Plan: `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: compnerd, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43000
llvm-svn: 325285
The variable name 'AllowReassociate' is a lie at this point because
it's set to 'isFast()' which is more than the 'reassoc' FMF after
rL317488.
In D41286, we showed that this transform may be valid even with strict
math by brute force checking every 32-bit float result.
There's a potential problem here because we're replacing with a tan()
libcall rather than a hypothetical LLVM tan intrinsic. So we might
set errno when we should be guaranteed not to do that. But that's
independent of this change.
llvm-svn: 325247
There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that
does constant range check and few more additional checks. We use it some places (e.g.
when proving implications) and in some other places we only check constant ranges.
Currently, indvar simplifier fails to remove the check in following loop:
int inc = ...;
for (int i = inc, j = inc - 1; i < 200; ++i, ++j)
if (i > j) { ... }
This patch replaces all usages of `isKnownPredicateViaConstantRanges` with
`isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the
case above.
Reviewed-By: sanjoy
Differential Revision: https://reviews.llvm.org/D43175
llvm-svn: 325214
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.
Fixes PR36362.
Differential Revision: https://reviews.llvm.org/D43276
llvm-svn: 325148
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43022
llvm-svn: 325134
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325126
We can use incremental dominator tree updates to avoid re-calculating
the dominator tree after interchanging 2 loops.
Reviewers: dmgreen, kuhar
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D43176
llvm-svn: 325122
Preserve debug info from a dead 'and' instruction with a constant.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D43163
llvm-svn: 325119
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
This preserves an additional 581 unique source variables in a stage2
build of clang (according to `llvm-dwarfdump --statistics`). It
increases the size of the .debug_loc section by 0.1% (or 87139 bytes).
Differential Revision: https://reviews.llvm.org/D43255
llvm-svn: 325063
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly.
There is no evidence in existing regression tests that the greater
power of bit-tracking was needed here, but we should be aware of
this potential loss of optimization.
llvm-svn: 325062
The scalar folds are done indirectly and use potentially
expensive value tracking calls. That can be improved
along with the enhancement to support vector types.
llvm-svn: 325051
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an
indirect and expensive way.
llvm-svn: 325048
The InstCombine integer mul test file had tests that belong in InstSimplify
(including fmul tests). Move things to where they belong and auto-generate
complete checks for everything.
llvm-svn: 325037
According to `llvm-dwarfdump --statistics` this salvages 43 additional
unique source variables in a stage2 build of clang. It increases the
size of the .debug_loc section by 0.002% (or 2864 bytes).
Differential Revision: https://reviews.llvm.org/D43220
llvm-svn: 325035
Due to a typo in D41565, mutable TBAA tags created with
createMutableTBAAAccessTag() lose their base types. This patch
fixes that typo and updates tests respectively.
Differential Revision: https://reviews.llvm.org/D42364
llvm-svn: 325008
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325001
In cases where the OuterMostLoopLatchBI only has a single successor,
accessing the second successor will fail.
This fixes a failure when building the test-suite with loop-interchange
enabled.
Reviewers: mcrosier, karthikthecool, davide
Reviewed by: karthikthecool
Differential Revision: https://reviews.llvm.org/D42906
llvm-svn: 324994
We already try to salvage debug values from no-op bitcasts and inttoptr
instructions: we should handle ptrtoint instructions as well.
This saves an additional 24,444 debug values in a stage2 build of clang,
and (according to llvm-dwarfdump --statistics) provides an additional
289 unique source variables.
llvm-svn: 324982
Here are the number of additional debug values salvaged in a stage2
build of clang:
63 SALVAGE: MUL
1250 SALVAGE: SDIV
(No values were salvaged from `srem` instructions in this experiment,
but it's a simple case to handle so we might as well.)
llvm-svn: 324976
Here are the number of additional debug values salvaged in a stage2
build of clang:
1912 SALVAGE: ASHR
405 SALVAGE: LSHR
249 SALVAGE: SHL
llvm-svn: 324975
These intrinsic folds were added with D41381, but only allowed with isFast().
That's more than necessary because FMF has 'reassoc' to apply to these
kinds of folds after D39304, and that's all we need in these cases.
Differential Revision: https://reviews.llvm.org/D43160
llvm-svn: 324967
The diff to use 'reassoc' is part of D43160; it should not have
been made with rL324961. Reverting that part here, so we'll
see the intended diff with the code change.
llvm-svn: 324963
Some tests didn't add much value because we already show stronger
constraints for the folds in other tests, so the weaker versions
were deleted.
Moved the remaining tests into 1 file because the folds are
very similar and handled from 1 place in the code.
llvm-svn: 324961
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and
instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify
the source and destination alignment attributes separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: majnemer, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D42871
llvm-svn: 324960
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
creation of memcpys in the SafeStack pass to set the alignment of the destination object to
its stack alignment while separately setting the source byval arguments alignment to its
alignment.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. (rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: eugenis, bollu
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42710
llvm-svn: 324955
This is similar to the instsimplify fold added with D42385
( rL323716 )
...but this can't be in instsimplify because we're creating/morphing
a different instruction.
llvm-svn: 324927
Update BlockColors after splitting predecessors. Do not allow splitting
EHPad for sinking when the BlockColors is not empty, so we can
simply assign predecessor's color to the new block.
Fixes PR36184
llvm-svn: 324916
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
llvm-svn: 324893
The related cases for (X * Y) / X were handled in rL124487.
https://rise4fun.com/Alive/6k9
The division in these tests is subsequently eliminated by existing instcombines
for 1/X.
llvm-svn: 324843
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.
This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.
Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.
This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.
Reviewers: spatel, delena, RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43137
llvm-svn: 324827
Summary:
If -pass-remarks=loop-vectorize, atomic ops will be seen by
analyzeInterleaving(), even though canVectorizeMemory() == false. This
is because we are requesting extra analysis instead of bailing out.
In such a case, we end up with a Group in both Load- and StoreGroups,
and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups.
The fix is to include mayWriteToMemory() when validating that two
instructions are the same kind of memory operation.
Reviewers: mssimpso, davidxl
Reviewed By: davidxl
Subscribers: hsaito, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D43064
llvm-svn: 324786
Extend salvageDebugInfo to preserve the debug info from a dead 'or'
with a constant.
Patch by Ismail Badawi!
Differential Revision: https://reviews.llvm.org/D43129
llvm-svn: 324764
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
LoopIdiom pass to cease using the old IRBuilder CreateMemCpy single-alignment APIs in
favour of the new API that allows setting source and destination alignments independently.
This allows us to be slightly more aggressive in setting the alignment of memcpy calls that
loop idiom creates.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324626
The last assume in the test says that %B12 is 0.
The first assume says that %and1 is less than %B12.
Therefore, %and1 is unsigned less than 0...does not compute.
That means this line:
Known.Zero.setHighBits(RHSKnown.countMinLeadingZeros() + 1);
...tries to set more bits than exist.
Differential Revision: https://reviews.llvm.org/D43052
llvm-svn: 324610
Refactor getLogBase2Vector into getLogBase2 to accept all scalars/vectors. Generalize from ConstantDataVector to support all constant vectors.
llvm-svn: 324603
Add support of uge and sge latch condition to Loop Prediction for
reverse loops.
Reviewers: apilipenko, mkazantsev, sanjoy, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42837
llvm-svn: 324589
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
The test profile/Linux/counter_promo_nest.c in compiler-rt project
is updated to meet this change.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324572
Summary:
Loops with inequality comparers, such as:
// unsigned bound
for (unsigned i = 1; i < bound; ++i) {...}
have getSmallConstantMaxTripCount report a large maximum static
trip count - in this case, 0xffff fffe. However, profiling info
may show that the trip count is much smaller, and thus
counter-recommend vectorization.
This change:
- flips loop-vectorize-with-block-frequency on by default.
- validates profiled loop frequency data supports vectorization,
when static info appears to not counter-recommend it. Absence
of profile data means we rely on static data, just as we've
done so far.
Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal
Reviewed By: davidxl
Subscribers: bkramer, llvm-commits
Differential Revision: https://reviews.llvm.org/D42946
llvm-svn: 324543
The failures happened because of assert which was overconfident about
SCEV's proving capabilities and is generally not valid.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324473
Sometimes `isLoopEntryGuardedByCond` cannot prove predicate `a > b` directly.
But it is a common situation when `a >= b` is known from ranges and `a != b` is
known from a dominating condition. Thia patch teaches SCEV to sum these facts
together and prove strict comparison via non-strict one.
Differential Revision: https://reviews.llvm.org/D42835
llvm-svn: 324453
Before r324429 we essentially didn't have a verification of LCSSA, so
no wonder that it has been broken: currently loop-sink breaks it (the
attached test illustrates the failure).
It was detected during a stage2 RA build, so to unbreak it I'm disabling
the check for now.
llvm-svn: 324445
Generalize existing constant matching to work with non-uniform constant vectors as well.
Differential Revision: https://reviews.llvm.org/D42818
llvm-svn: 324369
- Fix condition for detecting that a complex basic block was the first in
the chain.
- Add tests.
This was caught by buildbots when submitting rL324319.
llvm-svn: 324341
This patch moves ThinLTOBitcodeWriter/module-asm.ll test case into x86 directory to avoid a test failure when x86 backend is not enabled.
llvm-svn: 324316
If the inline asm provides the definition of a symbol, this can result
in duplicate symbol errors.
Differential Revision: https://reviews.llvm.org/D42944
llvm-svn: 324313
Summary:
Removing the dropped symbols will prevent indirect call promotion in the
ThinLTO Backend from adding a new reference to a symbol, which can
result in linker unsats. This can happen when we compile with a sample
profile collected from one binary by used for another, which may have
profiled targets that aren't used in the new binary.
Note that until dropDeadSymbols handles variables and aliases (in
progress), we may not be able to remove the declaration and can still
have an issue.
Reviewers: grimar, davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D42816
llvm-svn: 324299
In the motivating case from PR35681 and represented by the macro-fuse-cmp test:
https://bugs.llvm.org/show_bug.cgi?id=35681
...there's a 37 -> 31 byte size win for the loop because we eliminate the big base
address offsets.
SPEC2017 on Ryzen shows no significant perf difference.
Differential Revision: https://reviews.llvm.org/D42607
llvm-svn: 324289
As PR36225 shows, we definitely don't want to enable the
canEvaluate* logic with phis.
There's still a question of whether we should just revert
r324014 completely because it exposes a compile-time sinkhole
(although that problem might exist independently).
llvm-svn: 324266
This is the instcombine part of unsigned saturation canonicalization.
Backend patches already commited:
https://reviews.llvm.org/D37510https://reviews.llvm.org/D37534
It converts unsigned saturated subtraction patterns to forms recognized
by the backend:
(a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)
(b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)
(a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)
((a > b) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)
((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
Patch by Yulia Koval!
Differential Revision: https://reviews.llvm.org/D41480
llvm-svn: 324255
This broke the Chromium build; see PR36238.
> This patch is an enhancement to propagate dbg.value information when
> Phis are created on behalf of LCSSA. I noticed a case where a value
> carried across a loop was reported as <optimized out>.
>
> Specifically this case:
>
> int bar(int x, int y) {
> return x + y;
> }
>
> int foo(int size) {
> int val = 0;
> for (int i = 0; i < size; ++i) {
> val = bar(val, i); // Both val and i are correct
> }
> return val; // <optimized out>
> }
>
> In the above case, after all of the interesting computation completes
> our value is reported as "optimized out." This change will add a
> dbg.value to correct this.
>
> This patch also moves the dbg.value insertion routine from
> LoopRotation.cpp into Local.cpp, so that we can share it in both places
> (LoopRotation and LCSSA).
>
> Patch by Matt Davis!
>
> Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 324247
The patch causes the failure of the test
compiler-rt/test/profile/Linux/counter_promo_nest.c
To unblock buildbot, revert the patch while investigation is in progress.
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324214
The commit rL308422 introduces a restriction for folding unconditional
branches. Specifically if empty block with unconditional branch leads to
header of the loop then elimination of this basic block is prohibited.
However it seems this condition is redundantly strict.
If elimination of this basic block does not introduce more back edges
then we can eliminate this block.
The patch implements this relax of restriction.
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: pacxx
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42691
llvm-svn: 324208
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
To bugs additionally fixed:
assert is moved after the check whether loop is not a nullptr.
Usage of isLoopEntryGuardedByCond in ScalarEvolution::isImpliedCondOperandsViaNoOverflow
is guarded by isAvailableAtLoopEntry.
Reviewers: sanjoy, mkazantsev, anna, dorit, reames
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42417
llvm-svn: 324204
When using the partial inliner, we might have attributes for forwarded
varargs, but the CodeExtractor does not create an empty argument
attribute set for regular arguments in that case, because it does not know
of the additional arguments. So in case we have attributes for VarArgs, we
also have to make sure we create (empty) attributes for all regular arguments.
This fixes PR36210.
llvm-svn: 324197
The type-shrinking logic in reduction detection, although narrow in scope, is
also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies
the approach to rely on the demanded bits and value tracking analyses, if
available. We currently perform type-shrinking separately for reductions and
other instructions in the loop. Long-term, we should probably think about
computing minimal bit widths in a more complete way for the loops we want to
vectorize.
PR35734
Differential Revision: https://reviews.llvm.org/D42309
llvm-svn: 324195
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 324174
Without extra instructions and uses, swapMayExposeCSEOpportunities() would change
the icmp (as seen in the check lines), so we were not actually testing patterns
that should be handled by D41480.
llvm-svn: 324143
This is the enhancement suggested in D42536 to fix a shortcoming in
regular InstCombine's canEvaluate* functionality.
When we have multiple uses of a value, but they're all in one instruction, we can
allow that expression to be narrowed or widened for the same cost as a single-use
value.
AFAICT, this can only matter for multiply: sub/and/or/xor/select would be simplified
away if the operands are the same value; add becomes shl; shifts with a variable shift
amount aren't handled.
Differential Revision: https://reviews.llvm.org/D42739
llvm-svn: 324014
This, in instcombine, allows conversions to i8/i16/i32 (very
common cases) even if the resulting type is not legal according
to the data layout. This can often open up extra combine
opportunities.
Differential Revision: https://reviews.llvm.org/D42424
llvm-svn: 323951
Summary:
Before emitting code for scaled registers, we prevent
SCEVExpander from hoisting any scaled addressing mode
by emitting all the bases first. However, these bases
are being forced to the final type, resulting in some
odd code.
For example, if the type of the base is an integer and
the final type is a pointer, we will emit an inttoptr
for the base, a ptrtoint for the scale, and then a
'reverse' GEP where the GEP pointer is actually the base
integer and the index is the pointer. It's more intuitive
to use the pointer as a pointer and the integer as index.
Patch by: Bevin Hansson
Reviewers: atrick, qcolombet, sanjoy
Reviewed By: qcolombet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42103
llvm-svn: 323946
This covers the case where TruncInst leaf node is a constant expression.
See PR36121 for more details.
Differential Revision: https://reviews.llvm.org/D42622
llvm-svn: 323926
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
CodeGenPrepare pass to be more aggressive in improving the source and destination alignments
of memcpy/memmove/memset by exploiting our new ability to record independent alignments
for each argument.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 323891
Because dead code may contain non-standard IR that causes infinite looping or crashes in underlying analysis.
See PR36134 for more details.
Differential Revision: https://reviews.llvm.org/D42683
llvm-svn: 323862
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.
Differential Revision: https://reviews.llvm.org/D42526
llvm-svn: 323797
When removing return value Dead Argument Elimination pass clobbers first
llvm.dbg.value’s argument for live arguments of that function by replacing
it with nullptr. In the next pass it will be deleted, so debug location
about those arguments are lost. This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D42541
llvm-svn: 323784
candidates with coldcc attribute.
This recommits r322721 reverted due to sanitizer memory leak build bot failures.
Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 323778
Summary:
There's an asymmetry in the definitions of findBaseDefiningValueOfVector() and
findBaseDefiningValue() of RS4GC. The later handles call and invoke instructions,
and the former does not. This appears to be simple oversight. This patch remedies
the oversight by adding the call and invoke cases to findBaseDefiningValueOfVector().
Reviewers: DaniilSuchkov, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42653
llvm-svn: 323764
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323662
This pretty much reverts r322006, except that we keep the test,
because we work around the issue exposed in a different way (a
recursion limit in value tracking). There's still probably some
sequence that exposes this problem, and the proper way to fix that
for somebody who has time is outlined in the code review.
llvm-svn: 323630
This prevents functions accessing varargs from being inlined if they
have the alwaysinline attribute.
Reviewers: efriedma, rnk, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42556
llvm-svn: 323619
A cast from A to B is eliminable if its result is casted to C, and if
the pair of casts could just be expressed as a single cast. E.g here,
%c1 is eliminable:
%c1 = zext i16 %A to i32
%c2 = sext i32 %c1 to i64
InstCombine optimizes away eliminable casts. This patch teaches it to
insert a dbg.value intrinsic pointing to the final result, so that local
variables pointing to the eliminable result are preserved.
Differential Revision: https://reviews.llvm.org/D42566
llvm-svn: 323570
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323530
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.
Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551
llvm-svn: 323482
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA. I noticed a case where a value
carried across a loop was reported as <optimized out>.
Specifically this case:
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.
This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).
Patch by Matt Davis!
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 323472
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323441
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323430
The only part of the datalayout that should matter for these tests
is the part that specifies the legal int widths ('n*'). But there
was a bug - that part of the string was not correctly separated with
the expected '-' character, so we were testing as if there were no
legal int widths at all. Removed the leading cruft so we have some
legal ints to test with.
I noticed this while testing a potential change to the way we
transform shifts and sexts in D42424.
llvm-svn: 323377
It causes regressions in various OpenGL test suites.
Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.
Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323348
Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.
Reviewers: spatel, reames
Reviewed By: spatel
Subscribers: magabari, llvm-commits
Differential Revision: https://reviews.llvm.org/D42485
llvm-svn: 323343
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.
llvm-svn: 323309
This was probably fixed long ago, but I don't see a test
that lines up with the example and target in the bug report:
https://bugs.llvm.org/show_bug.cgi?id=13837
...so adding it here.
llvm-svn: 323269
Summary:
Currently, there is no way to extract a basic block from a function easily. This patch
extends llvm-extract to extract the specified basic block(s).
Reviewers: loladiro, rafael, bogner
Reviewed By: bogner
Subscribers: hintonda, mgorny, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D41638
llvm-svn: 323266
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323246
If in complex addressing mode the difference is in GV then
base reg should not be installed because we plan to use
base reg as a merge point of different GVs.
This is a fix for PR35980.
Reviewers: reames, john.brawn, santosh
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42230
llvm-svn: 323192
Summary:
First, we need to explain the core of the vulnerability. Note that this
is a very incomplete description, please see the Project Zero blog post
for details:
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
The basis for branch target injection is to direct speculative execution
of the processor to some "gadget" of executable code by poisoning the
prediction of indirect branches with the address of that gadget. The
gadget in turn contains an operation that provides a side channel for
reading data. Most commonly, this will look like a load of secret data
followed by a branch on the loaded value and then a load of some
predictable cache line. The attacker then uses timing of the processors
cache to determine which direction the branch took *in the speculative
execution*, and in turn what one bit of the loaded value was. Due to the
nature of these timing side channels and the branch predictor on Intel
processors, this allows an attacker to leak data only accessible to
a privileged domain (like the kernel) back into an unprivileged domain.
The goal is simple: avoid generating code which contains an indirect
branch that could have its prediction poisoned by an attacker. In many
cases, the compiler can simply use directed conditional branches and
a small search tree. LLVM already has support for lowering switches in
this way and the first step of this patch is to disable jump-table
lowering of switches and introduce a pass to rewrite explicit indirectbr
sequences into a switch over integers.
However, there is no fully general alternative to indirect calls. We
introduce a new construct we call a "retpoline" to implement indirect
calls in a non-speculatable way. It can be thought of loosely as
a trampoline for indirect calls which uses the RET instruction on x86.
Further, we arrange for a specific call->ret sequence which ensures the
processor predicts the return to go to a controlled, known location. The
retpoline then "smashes" the return address pushed onto the stack by the
call with the desired target of the original indirect call. The result
is a predicted return to the next instruction after a call (which can be
used to trap speculative execution within an infinite loop) and an
actual indirect branch to an arbitrary address.
On 64-bit x86 ABIs, this is especially easily done in the compiler by
using a guaranteed scratch register to pass the target into this device.
For 32-bit ABIs there isn't a guaranteed scratch register and so several
different retpoline variants are introduced to use a scratch register if
one is available in the calling convention and to otherwise use direct
stack push/pop sequences to pass the target address.
This "retpoline" mitigation is fully described in the following blog
post: https://support.google.com/faqs/answer/7625886
We also support a target feature that disables emission of the retpoline
thunk by the compiler to allow for custom thunks if users want them.
These are particularly useful in environments like kernels that
routinely do hot-patching on boot and want to hot-patch their thunk to
different code sequences. They can write this custom thunk and use
`-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this
case, on x86-64 thu thunk names must be:
```
__llvm_external_retpoline_r11
```
or on 32-bit:
```
__llvm_external_retpoline_eax
__llvm_external_retpoline_ecx
__llvm_external_retpoline_edx
__llvm_external_retpoline_push
```
And the target of the retpoline is passed in the named register, or in
the case of the `push` suffix on the top of the stack via a `pushl`
instruction.
There is one other important source of indirect branches in x86 ELF
binaries: the PLT. These patches also include support for LLD to
generate PLT entries that perform a retpoline-style indirection.
The only other indirect branches remaining that we are aware of are from
precompiled runtimes (such as crt0.o and similar). The ones we have
found are not really attackable, and so we have not focused on them
here, but eventually these runtimes should also be replicated for
retpoline-ed configurations for completeness.
For kernels or other freestanding or fully static executables, the
compiler switch `-mretpoline` is sufficient to fully mitigate this
particular attack. For dynamic executables, you must compile *all*
libraries with `-mretpoline` and additionally link the dynamic
executable and all shared libraries with LLD and pass `-z retpolineplt`
(or use similar functionality from some other linker). We strongly
recommend also using `-z now` as non-lazy binding allows the
retpoline-mitigated PLT to be substantially smaller.
When manually apply similar transformations to `-mretpoline` to the
Linux kernel we observed very small performance hits to applications
running typical workloads, and relatively minor hits (approximately 2%)
even for extremely syscall-heavy applications. This is largely due to
the small number of indirect branches that occur in performance
sensitive paths of the kernel.
When using these patches on statically linked applications, especially
C++ applications, you should expect to see a much more dramatic
performance hit. For microbenchmarks that are switch, indirect-, or
virtual-call heavy we have seen overheads ranging from 10% to 50%.
However, real-world workloads exhibit substantially lower performance
impact. Notably, techniques such as PGO and ThinLTO dramatically reduce
the impact of hot indirect calls (by speculatively promoting them to
direct calls) and allow optimized search trees to be used to lower
switches. If you need to deploy these techniques in C++ applications, we
*strongly* recommend that you ensure all hot call targets are statically
linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well
tuned servers using all of these techniques saw 5% - 10% overhead from
the use of retpoline.
We will add detailed documentation covering these components in
subsequent patches, but wanted to make the core functionality available
as soon as possible. Happy for more code review, but we'd really like to
get these patches landed and backported ASAP for obvious reasons. We're
planning to backport this to both 6.0 and 5.0 release streams and get
a 5.0 release with just this cherry picked ASAP for distros and vendors.
This patch is the work of a number of people over the past month: Eric, Reid,
Rui, and myself. I'm mailing it out as a single commit due to the time
sensitive nature of landing this and the need to backport it. Huge thanks to
everyone who helped out here, and everyone at Intel who helped out in
discussions about how to craft this. Also, credit goes to Paul Turner (at
Google, but not an LLVM contributor) for much of the underlying retpoline
design.
Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer
Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D41723
llvm-svn: 323155
ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check
that SCEV is available at entry point of the loop. It is incorrect and fixed by patch.
Reviewers: sanjoy, mkazantsev, anna, dorit
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42165
llvm-svn: 323077
...when the shift is known to not overflow with the matching
signed-ness of the division.
This closes an optimization gap caused by canonicalizing mul
by power-of-2 to shl as shown in PR35709:
https://bugs.llvm.org/show_bug.cgi?id=35709
Patch by Anton Bikineev!
Differential Revision: https://reviews.llvm.org/D42032
llvm-svn: 323068
This is the 'rem' counterpart to D42032 and would be folded by
D42341.
Patch by Anton Bikineev.
Differential Revision: https://reviews.llvm.org/D42342
llvm-svn: 323067
This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for.
I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10.
This has been split from D41895 with the remainder in a follow up commit.
llvm-svn: 323015
to @objc_autorelease if its operand is a PHI and the PHI has an
equivalent value that is used by a return instruction.
For example, ARC optimizer shouldn't replace the call in the following
example, as doing so breaks the AutoreleaseRV/RetainRV optimization:
%v1 = bitcast i32* %v0 to i8*
br label %bb3
bb2:
%v3 = bitcast i32* %v2 to i8*
br label %bb3
bb3:
%p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ]
%retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p
%v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p)
ret i32* %retval
Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's
operand uses with its value so that the call gets tail-called.
rdar://problem/15894705
llvm-svn: 323009
Summary:
This patch attempts to fix the DomTree incremental insertion bug found here [[ https://bugs.llvm.org/show_bug.cgi?id=35969 | PR35969 ]] .
When performing an insertion into a piece of unreachable CFG, we may find the same not at different levels. When this happens, the node can turn out to be affected when we find it starting from a node with a lower level in the tree. The level at which we start visitation affects if we consider a node affected or not.
This patch tracks the lowest level at which each node was visited during insertion and allows it to be visited multiple times, if it can cause it to be considered affected.
Reviewers: brzycki, davide, dberlin, grosser
Reviewed By: brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42231
llvm-svn: 322993
Summary:
If the vectorized tree has truncate to minimum required bit width and
the vector type of the cast operation after the truncation is the same
as the vector type of the cast operands, count cost of the vector cast
operation as 0, because this cast will be later removed.
Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner.
Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41948
llvm-svn: 322946
Three (or more) operand getelementptrs could plausibly also be handled, but
handling only two-operand fits in easily with the existing BinaryOperator
handling.
Differential Revision: https://reviews.llvm.org/D39958
llvm-svn: 322930
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.
The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).
llvm-svn: 322806
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.
llvm-svn: 322662
This removes some duplication from splitCallSite and makes it easier to
add additional code dealing with each predecessor. It also allows us to
split for more than 2 predecessors, although that is not enabled for
now.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41858
llvm-svn: 322599
Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D33954
llvm-svn: 322579
This patch fixes the assertion failure in SROA reported in PR35657.
PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522.
The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert.
In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets.
This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices.
Differential Revision: https://reviews.llvm.org/D41981
llvm-svn: 322533
Summary:
This method is supposed to be called for IVs that have casts in their use-def
chains that are completely ignored after vectorization under PSE. However, for
truncates of such IVs the same InductionDescriptor is used during
creation/widening of both original IV based on PHINode and new IV based on
TruncInst.
This leads to unintended second call to recordVectorLoopValueForInductionCast
with a VectorLoopVal set to the newly created IV for a trunc and causes an
assert due to attempt to store new information for already existing entry in the
map. This is wrong and should not be done.
Fixes PR35773.
Reviewers: dorit, Ayal, mssimpso
Reviewed By: dorit
Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D41913
llvm-svn: 322473
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perform the
preversation was minimally altered and simply marked as
preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements such as threading across loop headers.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 322401
Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value.
Aside from that, `End` can also be `nullptr` which can be later conditionally
converted into a non-null SCEV.
To make this logic more transparent, this patch makes `End` a SCEV and
calculates it early, so that it is never a null.
Differential Revision: https://reviews.llvm.org/D39590
llvm-svn: 322364
This is a fix for PR35884.
When we want to delete dead loop we must clean uses in unreachable blocks
otherwise we'll get an assert during deletion of instructions from the loop.
Reviewers: anna, davide
Reviewed By: anna
Subscribers: llvm-commits, lebedev.ri
Differential Revision: https://reviews.llvm.org/D41943
llvm-svn: 322357
This doesn't handle the more complicated case in the bug report yet:
https://bugs.llvm.org/show_bug.cgi?id=35790
For that, we have to match / look through a cast.
llvm-svn: 322327
While updating clang tests for having clang set dso_local I noticed
that:
- There are *a lot* of tests to update.
- Many of the updates are redundant.
They are redundant because a GV is "obviously dso_local". This patch
starts formalizing that a bit by requiring that internal and private
GVs be dso_local too. Since they all are, we don't have to print
dso_local to the textual representation, making it a bit more compact
and easier to read.
llvm-svn: 322317
LoadInst isn't enough; we need to include intrinsics that perform loads too.
All side-effecting intrinsics and such are already covered by the isSafe
check, so we just need to care about things that read from memory.
D41960, originally from D33179.
llvm-svn: 322311
parent function
Ideally we should merge the attributes from the functions somehow, but
this is obviously an improvement over taking random attributes from the
caller which will trip up the verifier if they're nonsensical for an
unary intrinsic call.
llvm-svn: 322284
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on
smaller canonicalizations to get us to the goal.
As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.
We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603
An Alive proof for one variant of the pattern to show that the
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1
Name: min3_nots
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%cmpxyz = icmp slt i8 %minxz, %ny
%r = select i1 %cmpxyz, i8 %minxz, i8 %ny
Name: min3_nots_alt
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%cmpxz = icmp slt i8 %nx, %nz
%minxz = select i1 %cmpxz, i8 %nx, i8 %nz
%cmpyz = icmp slt i8 %ny, %nz
%minyz = select i1 %cmpyz, i8 %ny, i8 %nz
%cmpyx = icmp slt i8 %y, %x
%r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
%xz = icmp sgt i8 %x, %z
%maxxz = select i1 %xz, i8 %x, i8 %z
%xyz = icmp sgt i8 %maxxz, %y
%maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
%r = xor i8 %maxxyz, -1
llvm-svn: 322283
D41353 / D41233 are proposing to alter the shl/and canonicalization,
but I think that would just move an existing pattern-matching hole
to a different place.
llvm-svn: 322206
Summary:
After teaching InlineCost more about address spaces ()
another fault was detected in the inliner. If an argument has
the byval attribute the parameter might be copied to an alloca.
That part seems to work fine even if the argument has a different
address space than the alloca address space. However, if the
address spaces differ, then the inlined function still might
refer to the parameter using the original address space (the
inliner does not handle that situation very well).
This patch avoids the problem by simply disallowing inlining
when there are byval arguments with address space that differs
from the alloca address space.
I'm not really sure how to transform the code if we want to
get inlining for this situation. I assume that it never has
been working, and that the fixes in r321809 just exposed an
old problem.
Fault found by skatkov (Serguei Katkov). It is mentioned in
follow up comments to https://reviews.llvm.org/D40455.
Reviewers: skatkov
Reviewed By: skatkov
Subscribers: uabelho, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D41898
llvm-svn: 322181
Summary:
LowerTypeTests moves some function definitions from individual object
files to the merged module, leaving a stub to be called in the merged
module's jump table. If an alias was pointing to such a function
definition LowerTypeTests would fail because the alias would be left
without a definition to point to.
This change 1) emits information about aliases to the ThinLTO summary,
2) replaces aliases pointing to function definitions that are moved to
the merged module with function declarations, and 3) re-emits those
aliases in the merged module pointing to the correct function
definitions.
The patch does not correctly fix all possible mis-uses of aliases in
LowerTypeTests. For example, it does not handle aliases with a different
type from the pointed to function.
The addition of alias data increases the size of Chrome build artifacts
by less than 1%.
Reviewers: pcc
Reviewed By: pcc
Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc
Differential Revision: https://reviews.llvm.org/D41741
llvm-svn: 322139
Summary:
When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects.
This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time.
Reviewers: davide, sanjoy, bruno, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38856
llvm-svn: 322125
Add powerpc- (32-bit) as XFAIL for tests that are documented either in-
line or via commit messages as expected to fail on big-endian systems.
Tests not documented in-line are documented in commit messages as
follows:
r211172 - test/tools/llvm-cov/llvm-cov.test
r247920 - test/Transforms/SampleProfile/gcc-simple.ll
llvm-svn: 322114
Summary:
This pass synthesizes function entry counts by traversing the callgraph
and using the relative block frequencies of the callsites. The intended
use of these counts is in inlining to determine hot/cold callsites in
the absence of profile information.
The pass is split into two files with the code that propagates the
counts in a callgraph in a Utils file. I plan to add support for
propagation in the thinlto link phase and the propagation code will be
shared and hence this split. I did not add support to the old PM since
hot callsite determination in inlining is not possible in old PM
(although we could use hot callee heuristic with synthetic counts in the
old PM it is not worth the effort tuning it)
Reviewers: davidxl, silvas
Subscribers: mgorny, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D41604
llvm-svn: 322110
Summary:
If the vector type is transformed to non-vector single type, the compile
may crash trying to get vector information about non-vector type.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41862
llvm-svn: 322106
Because of potential UB (known bits conflicts with an llvm.assume),
we have to check rather than assert here because InstSimplify doesn't
kill the compare:
https://bugs.llvm.org/show_bug.cgi?id=35846
llvm-svn: 322104
EarlyCSE did not try to salvage debug info during erasing of instructions.
This change fixes it.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D41496
llvm-svn: 322083
SCEV tracks the correspondence of created SCEV to original instruction.
However during creation of SCEV it is possible that nuw/nsw/exact flags are
lost.
As a result during expansion of the SCEV the instruction with nuw/nsw/exact
will be used where it was expected and we produce poison incorreclty.
Reviewers: sanjoy, mkazantsev, sebpop, jbhateja
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41578
llvm-svn: 322058
If the offset is differ in two addressing mode we can continue only if
ScaleReg is not set due to we will use it as merge of different offsets.
It should fix PR35799 and PR35805.
Reviewers: john.brawn, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41227
llvm-svn: 322056
This patch was part of:
https://reviews.llvm.org/D41338
...but we can expose the bug in IR via constant propagation
as shown in the test. Unless the triple includes 'linux', we
should not fold these because the functions don't exist on
other platforms (yet?).
llvm-svn: 322010
This is an attempt of fixing PR35807.
Due to the non-standard definition of dominance in LLVM, where uses in
unreachable blocks are dominated by anything, you can have, in an
unreachable block:
%patatino = OP1 %patatino, CONSTANT
When `SimplifyInstruction` receives a PHI where an incoming value is of
the aforementioned form, in some cases, loops indefinitely.
What I propose here instead is keeping track of the incoming values
from unreachable blocks, and replacing them with undef. It fixes this
case, and it seems to be good regardless (even if we can't prove that
the value is constant, as it's coming from an unreachable block, we
can ignore it).
Differential Revision: https://reviews.llvm.org/D41812
llvm-svn: 322006
There is precedence for factorization transforms in instcombine for FP ops with fast-math.
We also have similar logic in foldSPFofSPF().
It would take more work to add this to reassociate because that's specialized for binops,
and min/max are not binops (or even single instructions). Also, I don't have evidence that
larger min/max trees than this exist in real code, but if we find that's true, we might
want to reorganize where/how we do this optimization.
In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have:
int test(int xc, int xm, int xy) {
int xk;
if (xc < xm)
xk = xc < xy ? xc : xy;
else
xk = xm < xy ? xm : xy;
return xk;
}
This patch solves that problem because we recognize more min/max patterns after rL321672
https://rise4fun.com/Alive/Qjnehttps://rise4fun.com/Alive/3yg
Differential Revision: https://reviews.llvm.org/D41603
llvm-svn: 321998
Summary:
Fixes the bug with incorrect handling of InsertValue|InsertElement
instrucions in SLP vectorizer. Currently, we may use incorrect
ExtractElement instructions as the operands of the original
InsertValue|InsertElement instructions.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41767
llvm-svn: 321994
Summary:
If the vectorized value is marked as extra reduction argument, its users
are not considered as external users. Patch fixes this.
Reviewers: mkuper, hfinkel, RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41786
llvm-svn: 321993
The approach was never discussed, I wasn't able to reproduce this
non-determinism, and the original author went AWOL.
After a discussion on the ML, Philip suggested to revert this.
llvm-svn: 321974
In addition to target-dependent attributes, we can also preserve a
white-listed subset of target independent function attributes. The white-list
excludes problematic attributes, most prominently:
* attributes related to memory accesses, as alloca instructions
could be moved in/out of the extracted block
* control-flow dependent attributes, like no_return or thunk, as the
relerelevant instructions might or might not get extracted.
Thanks @efriedma and @aemerson for providing a set of attributes that cannot be
propagated.
Reviewers: efriedma, davidxl, davide, silvas
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41334
llvm-svn: 321961
If the varargs are not accessed by a function, we can inline the
function.
Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D41335
llvm-svn: 321940
In the minimal case, this won't remove instructions, but it still improves
uses of existing values.
In the motivating example from PR35834, it does remove instructions, and
sets that case up to be optimized by something like D41603:
https://reviews.llvm.org/D41603
llvm-svn: 321936
This is the last step needed to fix PR33325:
https://bugs.llvm.org/show_bug.cgi?id=33325
We're trading branch and compares for loads and logic ops.
This makes the code smaller and hopefully faster in most cases.
The 24-byte test shows an interesting construct: we load the trailing scalar
elements into vector registers and generate the same pcmpeq+movmsk code that
we expected for a pair of full vector elements (see the 32- and 64-byte tests).
Differential Revision: https://reviews.llvm.org/D41714
llvm-svn: 321934
Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b),
the use checking and operand creation were off. We were potentially creating
repeated identical instructions of existing values. This led to infinite
looping after I added the extra folds.
By using the simpler m_Not matcher and not creating new 'not' ops for a and b,
we avoid that problem. It's possible that not using IsFreeToInvert() here is
more limiting than the simpler matcher, but there are no tests for anything
more exotic. It's also possible that we should relax the use checking further
to handle a case like PR35834:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but we can make that a follow-up if it is needed.
llvm-svn: 321882
Summary:
See D37528 for a previous (non-deferred) version of this
patch and its description.
Preserves dominance in a deferred manner using a new class
DeferredDominance. This reduces the performance impact of
updating the DominatorTree at every edge insertion and
deletion. A user may call DDT->flush() within JumpThreading
for an up-to-date DT. This patch currently has one flush()
at the end of runImpl() to ensure DT is preserved across
the pass.
LVI is also preserved to help subsequent passes such as
CorrelatedValuePropagation. LVI is simpler to maintain and
is done immediately (not deferred). The code to perfom the
preversation was minimally altered and was simply marked
as preserved for the PassManager to be informed.
This extends the analysis available to JumpThreading for
future enhancements. One example is loop boundary threading.
Reviewers: dberlin, kuhar, sebpop
Reviewed By: kuhar, sebpop
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D40146
llvm-svn: 321825
Summary:
I basically copied this patch from here:
https://reviews.llvm.org/D1251
But I skipped some of the refactoring to make the patch more clean.
The new outer3/inner3 test case in ptr-diff.ll triggers the
following assert without this patch:
lib/IR/Constants.cpp:1834: static llvm::Constant *llvm::ConstantExpr::getCompare(unsigned short, llvm::Constant *, llvm::Constant *, bool): Assertion `C1->getType() == C2->getType() && "Op types should be identical!"' failed.
The other new test cases makes sure that there is code coverage
for all modifications in InlineCost.cpp (getting different values
due to not fetching sizes for address space zero). I only guarantee
code coverage for those tests. The tests are not written in a way
that they would break if not having the corrections in
InlineCost.cpp. I found it quite hard to fine tune the tests into
getting different results based on the pointer sizes (except for
the test case where we hit an assert if not teaching InlineCost
about address spaces).
Reviewers: chandlerc, arsenm, haicheng
Reviewed By: arsenm
Subscribers: wdng, eraman, llvm-commits, haicheng
Differential Revision: https://reviews.llvm.org/D40455
llvm-svn: 321809
These were inspired by a very old review I'm about to abandon (https://reviews.llvm.org/D7061). Several of the test cases from that worked without modification and expanding test coverage of such cases is always worthwhile.
llvm-svn: 321764
The work order was changed in r228186 from SCC order
to RPO with an arbitrary sorting function. The sorting
function attempted to move inner loop nodes earlier. This
was was apparently relying on an assumption that every block
in a given loop / the same loop depth would be seen before
visiting another loop. In the broken testcase, a block
outside of the loop was encountered before moving onto
another block in the same loop. The testcase would then
structurize such that one blocks unconditional successor
could never be reached.
Revert to plain RPO for the analysis phase. This fixes
detecting edges as backedges that aren't really.
The processing phase does use another visited set, and
I'm unclear on whether the order there is as important.
An arbitrary order doesn't work, and triggers some infinite
loops. The reversed RPO list seems to work and is closer
to the order that was used before, minus the arbitary
custom sorting.
A few of the changed tests now produce smaller code,
and a few are slightly worse looking.
llvm-svn: 321751