At least one buildbot was able to actually trigger that assert
on the top of the function. Will investigate.
This reverts commit r339610.
llvm-svn: 339612
Summary:
This comes with `Implicit Conversion Sanitizer - integer sign change` (D50250):
```
signed char test(unsigned int x) { return x; }
```
`clang++ -fsanitize=implicit-conversion -S -emit-llvm -o - /tmp/test.cpp -O3`
* Old: {F6904292}
* With this patch: {F6904294}
General pattern:
X & Y
Where `Y` is checking that all the high bits (covered by a mask `4294967168`)
are uniform, i.e. `%arg & 4294967168` can be either `4294967168` or `0`
Pattern can be one of:
%t = add i32 %arg, 128
%r = icmp ult i32 %t, 256
Or
%t0 = shl i32 %arg, 24
%t1 = ashr i32 %t0, 24
%r = icmp eq i32 %t1, %arg
Or
%t0 = trunc i32 %arg to i8
%t1 = sext i8 %t0 to i32
%r = icmp eq i32 %t1, %arg
This pattern is a signed truncation check.
And `X` is checking that some bit in that same mask is zero.
I.e. can be one of:
%r = icmp sgt i32 %arg, -1
Or
%t = and i32 %arg, 2147483648
%r = icmp eq i32 %t, 0
Since we are checking that all the bits in that mask are the same,
and a particular bit is zero, what we are really checking is that all the
masked bits are zero.
So this should be transformed to:
%r = icmp ult i32 %arg, 128
https://rise4fun.com/Alive/3Ou
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: RKSimon, erichkeane, vsk, llvm-commits
Differential Revision: https://reviews.llvm.org/D50465
llvm-svn: 339610
This is a very partial fix for the reported problem. I suspect
we do not get this fold in most motivating cases because most of
the time, the libcall would have been replaced by an intrinsic,
and that optimization is handled elsewhere...but maybe it should
be handled here?
llvm-svn: 339604
This is a second part of D49974 that handles widening of conditional branches that
have very likely `false` branch.
Differential Revision: https://reviews.llvm.org/D50040
Reviewed By: reames
llvm-svn: 339537
Summary: computeKnownBits is expensive. The cases that would be detected by the computeKnownBits portion of haveNoCommonBitsSet were already handled by the earlier call to SimplifyDemandedInstructionBits.
Reviewers: spatel, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50604
llvm-svn: 339531
Try to improve the computed counts when it has been explicitly set by a pragma
or command line option. This moves the code around, so that first call to
computeUnrollCount to get a sensible count and override that if explicit unroll
and jam counts are specified.
Also added some extra debug messages for when unroll and jamming is disabled.
Differential Revision: https://reviews.llvm.org/D50075
llvm-svn: 339501
Pulled out a separate function for some code that calculates
if an inner loop iteration count is invariant to it's outer
loop.
Differential Revision: https://reviews.llvm.org/D50063
llvm-svn: 339500
My previous change moved some code upwards which caused an assert in debug mode
because the global value didn't necessarily have an initializer. Don't do that.
llvm-svn: 339485
If we have an assume which is known to execute and whose operand is invariant, we can lift that into the pre-header. So long as we don't change which paths the assume executes on, this is a legal transformation. It's likely to be a useful canonicalization as other transforms only look for dominating assumes.
Differential Revision: https://reviews.llvm.org/D50364
llvm-svn: 339481
This is a retry of rL339439 with a fix for the problem that
caused the original commit to be reverted at rL339446.
That problem was that the compare can be integer while
the binop is FP or vice-versa, so we need to use the binop
type when we ask for the identity constant.
A test to guard against the problem was added at rL339453.
llvm-svn: 339469
Summary: Similar to asan's flag, it can be used to disable the use of ifunc to access hwasan shadow address.
Reviewers: vitalybuka, kcc
Subscribers: srhines, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50544
llvm-svn: 339447
If code is compiled for X86 without SSE support, the register save area
doesn't contain FPU registers, so `AMD64FpEndOffset` should be equal to
`AMD64GpEndOffset`.
llvm-svn: 339414
This makes the code easier to read and will make an upcoming patch I have easier to review because that patch needed this refactoring to reuse some of the functions.
llvm-svn: 339391
The motivating case is an otherwise dead loop with a fence in it. At the moment, this goes all the way through the optimizer and we end up emitting an entirely pointless loop on x86. This case may seem a bit contrived, but we've seen it in real code as the result of otherwise reasonable lowering strategies combined w/thread local memory optimizations (such as escape analysis).
To handle this simple case, we can teach LICM to hoist must execute fences when there is no other memory operation within the loop.
Differential Revision: https://reviews.llvm.org/D50489
llvm-svn: 339378
Summary:
LoopSimplifyCFG should update ScEv for all loops after a block is deleted.
If the deleted block "Succ" is part of L, then it is part of all parent loops, so forget topmost loop.
Reviewers: greened, mkazantsev, sanjoy
Subscribers: jlebar, javed.absar, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D50422
llvm-svn: 339363
The inalloca parameter has to be the only parameter passed in memory.
Changing the convention to fastcc can break that.
At some point we should teach global opt how to optimize ABI attributes
like inalloca and maybe byval. These attributes are mainly used to match
C ABIs. They are harder for LLVM to optimize and they don't always
generate the best code.
Fixes PR38487
llvm-svn: 339360
Summary: DenseMap's operator[] performs an insertion if the entry isn't found. The second phase of ConstantMerge isn't trying to insert anything: it's just looking to see if the first phased performed an insertion. Use find instead, avoiding insertion of every single global initializer in the map of constants. This has the side-effect of making all entries in CMap non-null (because only global declarations would have null initializers, and that would be a bug).
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D50476
llvm-svn: 339309
This accounts for the missing IR fold noted in D50195. We don't need any fast-math to enable the negation transform.
FP negation can always be folded into an fmul/fdiv constant to eliminate the fneg.
I've limited this to one-use to ensure that we are eliminating an instruction rather than replacing fneg by a
potentially expensive fdiv or fmul.
Differential Revision: https://reviews.llvm.org/D50417
llvm-svn: 339248
Summary:
https://rise4fun.com/Alive/IT3
Comes up in the [most ugliest] `signed int` -> `signed char` case of
`-fsanitize=implicit-conversion` (https://reviews.llvm.org/D50250)
Previously, we were stuck with `not`: {F6867736}
But now we are able to completely get rid of it: {F6867737}
(FIXME: why are we loosing the metadata? that seems wrong/strange.)
Here, we only want to do that it we will be able to completely
get rid of that 'not'.
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: vsk, erichkeane, llvm-commits
Differential Revision: https://reviews.llvm.org/D50301
llvm-svn: 339243
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.
Reviewers: efriedma, javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50143
llvm-svn: 339166
In combineMetadata, we should be able to preserve K's nonnull metadata,
if K does not move. This condition should hold for all replacements by
NewGVN/GVN, but I added a bunch of assertions to verify that.
Fixes PR35038.
There probably are additional kinds of metadata that could be preserved
using similar reasoning. This is follow-up work.
Reviewers: dberlin, davide, efriedma, nlopes
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D47339
llvm-svn: 339149
This function is shared between both implementations. I am not sure if
Utils/Local.h is the best place though.
Reviewers: davide, dberlin, efriedma, xbolva00
Reviewed By: efriedma, xbolva00
Differential Revision: https://reviews.llvm.org/D47337
llvm-svn: 339138
Logic for tracking implicit control flow instructions was added to GVN to
perform PRE optimizations correctly. It appears that GVN is not the only
optimization that sometimes does PRE, so this logic is required in other
places (such as Jump Threading).
This is an NFC patch that encapsulates all ICF-related logic in a dedicated
utility class separated from GVN.
Differential Revision: https://reviews.llvm.org/D40293
llvm-svn: 339086
Properly shrink `pow()` to `powf()` as a binary function and, when no other
simplification applies, do not discard it.
Differential revision: https://reviews.llvm.org/D50113
llvm-svn: 339046
If there is a frequently taken branch dominated by a guard, and its condition is available
at the point of the guard, we can widen guard with condition of this branch and convert
the branch into unconditional:
guard(cond1)
if (cond2) {
// taken in 99.9% cases
// do something
} else {
// do something else
}
Converts to
guard(cond1 && cond2)
// do something
Differential Revision: https://reviews.llvm.org/D49974
Reviewed By: reames
llvm-svn: 338988
In the past, DbgInfoIntrinsic has a strong assumption that these
intrinsics all have variables and expressions attached to them.
However, it is too strong to derive the class for other debug entities.
Now, it has problems for debug labels.
In order to make DbgInfoIntrinsic as a base class for 'debug info', I
create a class for 'variable debug info', DbgVariableIntrinsic.
DbgDeclareInst, DbgAddrIntrinsic, and DbgValueInst will be derived from it.
Differential Revision: https://reviews.llvm.org/D50220
llvm-svn: 338984
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338969
Summary:
Previously, in the NewPM pipeline, TailCallElim recalculates the DomTree when it modifies any instruction in the Function.
For example,
```
CallInst *CI = dyn_cast<CallInst>(&I);
...
CI->setTailCall();
Modified = true;
...
if (!Modified || ...)
return PreservedAnalyses::all();
```
After applying this patch, the DomTree only recalculates if needed (plus an extra insertEdge() + an extra deleteEdge() call).
When optimizing SQLite with `-passes="default<O3>"` pipeline of the newPM, the number of DomTree recalculation decreases by 6.2%, the number of nodes visited by DFS decreases by 2.9%. The time used by DomTree will decrease approximately 1%~2.5% after applying the patch.
Statistics:
```
Before the patch:
23010 dom-tree-stats - Number of DomTree recalculations
489264 dom-tree-stats - Number of nodes visited by DFS -- DomTree
After the patch:
21581 dom-tree-stats - Number of DomTree recalculations
475088 dom-tree-stats - Number of nodes visited by DFS -- DomTree
```
Reviewers: kuhar, dmgreen, brzycki, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49982
llvm-svn: 338954
Merge the helper functions for shrinking unary and binary functions into a
single one, while keeping all their functionality. Otherwise, NFC.
llvm-svn: 338905
In r337830 I added SCEV checks to enable us to insert fewer bounds checks. Unfortunately, this sometimes crashes when multiple bounds checks are added due to SCEV caching issues. This patch splits the bounds checking pass into two phases, one that computes all the conditions (using SCEV checks) and the other that adds the new instructions.
Differential Revision: https://reviews.llvm.org/D49946
llvm-svn: 338902
Summary:
Previously, `removeUnreachableBlocks` still returns true (which indicates the CFG is changed) even when all the unreachable blocks found is awaiting deletion in the DDT class.
This makes code pattern like
```
// Code modified from lib/Transforms/Scalar/SimplifyCFGPass.cpp
bool EverChanged = removeUnreachableBlocks(F, nullptr, DDT);
...
do {
EverChanged = someMightHappenModifications();
EverChanged |= removeUnreachableBlocks(F, nullptr, DDT);
} while (EverChanged);
```
become a dead loop.
Fix this by detecting whether a BasicBlock is already awaiting deletion.
Reviewers: kuhar, brzycki, dmgreen, grosser, davide
Reviewed By: kuhar, brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49738
llvm-svn: 338882
Summary:
This patch is the second in a series of patches related to the [[ http://lists.llvm.org/pipermail/llvm-dev/2018-June/123883.html | RFC - A new dominator tree updater for LLVM ]].
It converts passes (e.g. adce/jump-threading) and various functions which currently accept DDT in local.cpp and BasicBlockUtils.cpp to use the new DomTreeUpdater class.
These converted functions in utils can accept DomTreeUpdater with either UpdateStrategy and can deal with both DT and PDT held by the DomTreeUpdater.
Reviewers: brzycki, kuhar, dmgreen, grosser, davide
Reviewed By: brzycki
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48967
llvm-svn: 338814
This one requires a bit of explaination. It's not every day you simply delete code to implement an optimization. :)
The transform in question is sinking an instruction from a loop to the uses in loop exiting blocks. We know (from LCSSA) that all of the uses outside the loop must be phi nodes, and after predecessor splitting, we know all phi users must have a single operand. Since the use must be strictly dominated by the def, we know from the definition of dominance/ssa that the exit block must execute along a (non-strict) subset of paths which reach the def. As a result, duplicating a potentially faulting instruction can not *introduce* a fault that didn't previously exist in the program.
The full story is that this patch builds on "rL338671: [LICM] Factor out fault legality from canHoistOrSinkInst [NFC]" which pulled this logic out of a common helper routine. As best I can tell, this check was originally added to the helper function for hoisting legality, later an incorrect fastpath for loads/calls was added, and then the bug was fixed by duplicating the fault safety check in the hoist path. This left the redundant check in the common code to pessimize sinking for no reason. I split it out in an NFC, and am not removing the unneccessary check. I wanted there to be something easy to revert in case I missed something.
Reviewed by: Anna Thomas (in person)
llvm-svn: 338794
Adds some cleaned up debug messages from back when I was writing this.
Hopefully useful to others (and myself) as to why unroll and jam is not
transforming as expected.
Differential Revision: https://reviews.llvm.org/D50062
llvm-svn: 338676
This method has three callers, each of which wanted distinct handling:
1) Sinking into a loop is moving an instruction known to execute before a loop into the loop. We don't need to worry about introducing a fault at all in this case.
2) Hoisting from a loop into a preheader already duplicated the check in the caller.
3) Sinking from the loop into an exit block was the only true user of the code within the routine. For the moment, this has just been lifted into the caller, but up next is examining the logic more carefully. Whitelisting of loads and calls - while consistent with the previous code - is rather suspicious. Either way, a behavior change is worthy of it's own patch.
llvm-svn: 338671
Originally, this was part of a larger refactoring I'd planned, but had to abandoned. I figured the minor improvement in readability was worthwhile.
llvm-svn: 338663
(Previously reverted in r338442)
I'm told that the breakage came from us using an x86 triple on configs
that didn't have x86 enabled. This is remedied by moving the
debugcounter test to an x86 directory (where there's also a
opt-bisect-isel.ll test for similar reasons).
I can't repro the reverse-iteration failure mentioned in the revert with
this patch, so I assume that a misconfiguration on my end is what caused
that.
Original commit message:
Add DebugCounters to DivRemPairs
For people who don't use DebugCounters, NFCI.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50033
llvm-svn: 338653
This patch just extract code into a separate function to remove some
duplication between the old and new pass manager pipeline. Due to the
different CGSCC iterators used, not all code duplication was eliminated.
llvm-svn: 338585
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338494
Workaround bug where the InstCombine pass was asserting on the IR added in lit
test, where we have a bitcast instruction after a GEP from an addrspace cast.
The second bitcast in the test was getting combined into
`bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should
be an addrspace cast instruction instead. Otherwise if control flow is allowed
to continue as it is now we create a GEP instruction
`<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However
because the type of this instruction doesn't match the address space we hit an
assert when replacing the bitcast with that GEP.
```
void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed.
```
Differential Revision: https://reviews.llvm.org/D50058
llvm-svn: 338395
Summary:
When inserting lcssa Phi Nodes in the exit block
mak sure to preserve the original instructions DL.
Reviewers: vsk
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D50009
llvm-svn: 338391
Summary:
This patch improves Inliner to provide causes/reasons for negative inline decisions.
1. It adds one new message field to InlineCost to report causes for Always and Never instances. All Never and Always instantiations must provide a simple message.
2. Several functions that used to return the inlining results as boolean are changed to return InlineResult which carries the cause for negative decision.
3. Changed remark priniting and debug output messages to provide the additional messages and related inline cost.
4. Adjusted tests for changed printing.
Patch by: yrouban (Yevgeny Rouban)
Reviewers: craig.topper, sammccall, sgraenitz, NutshellySima, shchenz, chandlerc, apilipenko, javed.absar, tejohnson, dblaikie, sanjoy, eraman, xbolva00
Reviewed By: tejohnson, xbolva00
Subscribers: xbolva00, llvm-commits, arsenm, mehdi_amini, eraman, haicheng, steven_wu, dexonsmith
Differential Revision: https://reviews.llvm.org/D49412
llvm-svn: 338387
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.
Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49928
llvm-svn: 338380
This fold was written in an odd way and tried to avoid
an endless loop by bailing out on all constants instead
of the supposedly problematic case of -1. But (X & -1)
should always be simplified before we reach here, so I'm
not sure how that is a problem.
There were no tests for the commuted patterns, so I added
those at rL338364.
llvm-svn: 338367
The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases.
This analysis will be necessary to perform some H-CFG transformations and
detect and introduce regions representing a loop in the H-CFG.
Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48816
llvm-svn: 338346
The patch introduces dominator analysis for VPBlockBases and extend
VPlan's GraphTraits specialization with the required interfaces. Dominator
analysis will be necessary to perform some H-CFG transformations and
to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation).
Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D48815
llvm-svn: 338310
These are reassociated versions of the same pattern and
similar transforms as in rL338200 and rL338118.
The motivation is identical to those commits:
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.
llvm-svn: 338221
https://rise4fun.com/Alive/jDd
Patterns with add/sub combos can be improved using
'not' ops. This is better for analysis and may lead
to follow-on transforms because 'xor' and 'add' are
commutative/associative. It can also help codegen.
llvm-svn: 338200
We now, from clang, can turn arrays of
static short g_data[] = {16, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0};
into structs of the form
@g_data = internal global <{ [8 x i16], [8 x i16] }> ...
GlobalOpt will incorrectly SROA it, not realising that the access to the first
element may overflow into the second. This fixes it by checking geps more
thoroughly.
I believe this makes the globalsra-partial.ll test case invalid as the %i value
could be out of bounds. I've re-purposed it as a negative test for this case.
Differential Revision: https://reviews.llvm.org/D49816
llvm-svn: 338192
Summary:
Fixing 2 issues with the DT update in trivial branch switching, though I don't have a case where DT update fails.
1. After splitting ParentBB->UnswitchedBB edge, new edges become: ParentBB->LoopExitBB->UnswitchedBB, so remove ParentBB->LoopExitBB edge.
2. AFAIU, for multiple CFG changes, DT should be updated using batch updates, vs consecutive addEdge and removeEdge calls.
Reviewers: chandlerc, kuhar
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49925
llvm-svn: 338180
The tests with constants show a missing optimization.
Analysis for adds is better than subs, so this can also
help with other transforms. And codegen is better with
adds for targets like x86 (destructive ops, no sub-from).
https://rise4fun.com/Alive/llK
llvm-svn: 338118
This is a follow-up for the patch rL335020. When we replace compares against
trunc with compares against wide IV, we can also replace signed predicates with
unsigned where it is legal.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D48763
llvm-svn: 338115
LowerDbgDeclare inserts a dbg.value before each use of an address
described by a dbg.declare. When inserting a dbg.value before a CallInst
use, however, it fails to append DW_OP_deref to the DIExpression.
The DW_OP_deref is needed to reflect the fact that a dbg.value describes
a source variable directly (as opposed to a dbg.declare, which relies on
pointer indirection).
This patch adds in the DW_OP_deref where needed. This results in the
correct values being shown during a debug session for a program compiled
with ASan and optimizations (see https://reviews.llvm.org/D49520). Note
that ConvertDebugDeclareToDebugValue is already correct -- no changes
there were needed.
One complication is that SelectionDAG is unable to distinguish between
direct and indirect frame-index (FRAMEIX) SDDbgValues. This patch also
fixes this long-standing issue in order to not regress integration tests
relying on the incorrect assumption that all frame-index SDDbgValues are
indirect. This is a necessary fix: the newly-added DW_OP_derefs cannot
be lowered properly otherwise. Basically the fix prevents a direct
SDDbgValue with DIExpression(DW_OP_deref) from being dereferenced twice
by a debugger. There were a handful of tests relying on this incorrect
"FRAMEIX => indirect" assumption which actually had incorrect
DW_AT_locations: these are all fixed up in this patch.
Testing:
- check-llvm, and an end-to-end test using lldb to debug an optimized
program.
- Existing unit tests for DIExpression::appendToStack fully cover the
new DIExpression::append utility.
- check-debuginfo (the debug info integration tests)
Differential Revision: https://reviews.llvm.org/D49454
llvm-svn: 338069
Create a processHeaderPhiOperands for analysing the instructions
in the aft blocks that must be moved before the loop.
Differential Revision: https://reviews.llvm.org/D49061
llvm-svn: 338033
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.
Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D49428
llvm-svn: 337965
r337828 resolves a PredicateInfo issue with unnamed types.
Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
llvm-svn: 337904
This ports the profiling runtime on Fuchsia and enables the
instrumentation. Unlike on other platforms, Fuchsia doesn't use
files to dump the instrumentation data since on Fuchsia, filesystem
may not be accessible to the instrumented process. We instead use
the data sink to pass the profiling data to the system the same
sanitizer runtimes do.
Differential Revision: https://reviews.llvm.org/D47208
llvm-svn: 337881
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).
Reviewers: sbaranga, mssimpso, hsaito
Reviewed By: hsaito
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49461
llvm-svn: 337861
This patch uses SCEV to avoid inserting some bounds checks when they are not needed. This slightly improves the performance of code compiled with the bounds check sanitizer.
Differential Revision: https://reviews.llvm.org/D49602
llvm-svn: 337830
This is a workaround and it would be better to fix this generally, but
doing it generally is quite tricky. See D48541 and PR38117.
Doing it in PredicateInfo directly allows us to use the type address to
differentiate different unnamed types, because neither the created
declarations nor the ssa_copy calls should be visible after
PredicateInfo got destroyed.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D49126
llvm-svn: 337828
Summary:
Without this change, the WholeProgramDevirt pass, which requires the
TargetLibraryInfo, will construct one from the default triple.
Fixes PR38139.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49278
llvm-svn: 337750
This patch makes debug counters keep track of the total number of times
we've called `shouldExecute` for each counter, so it's easier to build
automated tooling on top of these.
A patch to print these counts is coming soon.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D49560
llvm-svn: 337748
In ConstructSSAForLoadSet if an available value is actually the load that we're
doing SSA construction to eliminate, then we can omit it as SSAUpdate will add
in the value for the phi that will be replacing it anyway. This can result in
simpler IR which can allow further optimisation.
Differential Revision: https://reviews.llvm.org/D44160
llvm-svn: 337686
Bug fix for PR36787. When reasoning if it's safe to hoist a load we
want to make sure that the defining memory access dominates the new
insertion point of the hoisted instruction. safeToHoistLdSt calls
firstInBB(InsertionPoint,DefiningAccess) which returns false if
InsertionPoint == DefiningAccess, and therefore it falsely thinks
it's safe to hoist.
Differential Revision: https://reviews.llvm.org/D49555
llvm-svn: 337674
We tested different cap values with a recent commit of Chromium. Our results show that the 32-byte cap yields the smallest binary and all the caps yield similar performance.
Based on the results, we propose to change the cap value to 32-byte.
Patch by Zhaomo Yang!
Differential Revision: https://reviews.llvm.org/D49405
llvm-svn: 337622
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)
llvm-svn: 337606
When pointer checking is enabled, it's important that every pointer is
checked before its value is used.
For stores MSan used to generate code that calculates shadow/origin
addresses from a pointer before checking it.
For userspace this isn't a problem, because the shadow calculation code
is quite simple and compiler is able to move it after the check on -O2.
But for KMSAN getShadowOriginPtr() creates a runtime call, so we want the
check to be performed strictly before that call.
Swapping materializeChecks() and materializeStores() resolves the issue:
both functions insert code before the given IR location, so the new
insertion order guarantees that the code calculating shadow address is
between the address check and the memory access.
llvm-svn: 337571
This version contains a fix to add values for which the state in ParamState change
to the worklist if the state in ValueState did not change. To avoid adding the
same value multiple times, mergeInValue returns true, if it added the value to
the worklist. The value is added to the worklist depending on its state in
ValueState.
Original message:
For comparisons with parameters, we can use the ParamState lattice
elements which also provide constant range information. This improves
the code for PR33253 further and gets us closer to use
ValueLatticeElement for all values.
Also, as we are using the range information in the solver directly, we
do not need tryToReplaceWithConstantRange afterwards anymore.
Reviewers: dberlin, mssimpso, davide, efriedma
Reviewed By: mssimpso
Differential Revision: https://reviews.llvm.org/D43762
llvm-svn: 337548
It's more aggressive than we need to be, and leads to strange
workarounds in other places like call return value inference. Instead,
just directly mark an edge viable.
Tests by Florian Hahn.
Differential Revision: https://reviews.llvm.org/D49408
llvm-svn: 337507
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.
It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:
1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).
2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.
3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).
I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.
Reviewed By: rampitec
Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
both of them are not factorized. But isConsecutiveAccess() fails if one of the
pointers is factorized but the other one is not.
Here is an example:
PtrA = 4 * (A + B)
PtrB = 4 + 4A + 4B
This patch uses getMinusSCEV() to compute the distance between two pointers.
getMinusSCEV() allows combining the expressions and computing the simplified distance.
Author: FarhanaAleen
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D49516
llvm-svn: 337471
Summary:
Enable these passes for CFI and WPD in ThinLTO and LTO with the new pass
manager. Add a couple of tests for both PMs based on the clang tests
tools/clang/test/CodeGen/thinlto-distributed-cfi*.ll, but just test
through llvm-lto2 and not with distributed ThinLTO.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D49429
llvm-svn: 337461
This prevents gold from printing a warning when trying to export
these symbols via the asan dynamic list after ThinLTO promotes them
from private symbols to external symbols with hidden visibility.
Differential Revision: https://reviews.llvm.org/D49498
llvm-svn: 337428
Summary:
The optimizer is 10%+ slower with vs without debuginfo. I started checking where
the difference is coming from.
I compiled sqlite3.c with and without debug info from CTMark and compare the time difference.
I use Xcode Instrument to find where time is spent. This brings about 20ms, out of ~20s.
Reviewers: davide, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D49337
llvm-svn: 337416
Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be.
llvm-svn: 337390
InstCombine has a cast transform that matches a cast-of-select:
Orig = cast (Src = select Cond TV FV)
And tries to replace it with a select which has the cast folded in:
NewSel = select Cond (cast TV) (cast FV)
The combiner does RAUW(Orig, NewSel), so any debug values for Orig would
survive the transform. But debug values for Src would be lost.
This patch teaches InstCombine to replace all debug uses of Src with
NewSel (taking care of doing any necessary DIExpression rewriting).
Differential Revision: https://reviews.llvm.org/D49270
llvm-svn: 337310
Once we resolved an undef in a function we can run Solve, which could
lead to finding a constant return value for the function, which in turn
could turn undefs into constants in other functions that call it, before
resolving undefs there.
Computationally the amount of work we are doing stays the same, just the
order we process things is slightly different and potentially there are
a few less undefs to resolve.
We are still relying on the order of functions in the IR, which means
depending on the order, we are able to resolve the optimal undef first
or not. For example, if @test1 comes before @testf, we find the constant
return value of @testf too late and we cannot use it while solving
@test1.
This on its own does not lead to more constants removed in the
test-suite, probably because currently we have to be very lucky to visit
applicable functions in the right order.
Maybe we manage to come up with a better way of resolving undefs in more
'profitable' functions first.
Reviewers: efriedma, mssimpso, davide
Reviewed By: efriedma, davide
Differential Revision: https://reviews.llvm.org/D49385
llvm-svn: 337283
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.
llvm-svn: 337280
Similarly to rL336736, at least one more C API function does not
properly get declared as extern "C" due to a missing header, causing
name mangling and linking errors.
This patch fixes calls to LLVMAddAggressiveInstCombinerPass().
Differential Revision: https://reviews.llvm.org/D49416
Reviewed By: whitequark
llvm-svn: 337264
Summary:
[[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]]
As discussed in https://reviews.llvm.org/D49179#1158957 and later,
the IR for 'check for [no] signed truncation' pattern can be improved:
https://rise4fun.com/Alive/gBf
^ that pattern will be produced by Implicit Integer Truncation sanitizer,
https://reviews.llvm.org/D48958https://bugs.llvm.org/show_bug.cgi?id=21530
in signed case, therefore it is probably a good idea to improve it.
Proofs for this transform: https://rise4fun.com/Alive/mgu
This transform is surprisingly frustrating.
This does not deal with non-splat shift amounts, or with undef shift amounts.
I've outlined what i think the solution should be:
```
// Potential handling of non-splats: for each element:
// * if both are undef, replace with constant 0.
// Because (1<<0) is OK and is 1, and ((1<<0)>>1) is also OK and is 0.
// * if both are not undef, and are different, bailout.
// * else, only one is undef, then pick the non-undef one.
```
The DAGCombine will reverse this transform, see
https://reviews.llvm.org/D49266
Reviewers: spatel, craig.topper
Reviewed By: spatel
Subscribers: JDevlieghere, rkruppe, llvm-commits
Differential Revision: https://reviews.llvm.org/D49320
llvm-svn: 337190
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337184
This patch introduces createUserspaceApi() that creates function/global
declarations for symbols used by MSan in the userspace.
This is a step towards the upcoming KMSAN implementation patch.
Reviewed at https://reviews.llvm.org/D49292
llvm-svn: 337155