In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction. The new site will have a different token and needs to be
reassociated with the new instruction.
Unfortunately, there is no way to alter the bundle after the
construction of the instruction. Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.
llvm-svn: 327336
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.
It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.
Differential Revision: https://reviews.llvm.org/D44053
llvm-svn: 327329
Summary:
ProvenanceAnalysis::related(A, B) currently memoizes its results, and on big
tests the cache grows too large, and we're spending most of the time
growing/looking through DenseMap.
This patch reduces the size of the cache by normalizing keys first: we do that
by calling GetUnderlyingObjCPtr on the input values. The results of
GetUnderlyingObjCPtr are also memoized in a separate cache.
The patch doesn't bring noticable changes to compile time on CTMark, however
significantly helps one of our internal tests.
Reviewers: gottesmm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44270
llvm-svn: 327328
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
getNumUses is a linear operation. It walks a linked list to get a count. So in this case its better to just ask if there are any users rather than how many.
llvm-svn: 327314
This reverts r326908, originally landed as D44102.
Reverted for causing performance regressions on x86. (These regressions
are not yet understood.)
llvm-svn: 327252
Use isInlineViable to prevent inlining of functions with non-inlinable
constructs, in case cost analysis is skipped.
Reviewers: efriedma, sfertile, davide, davidxl
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D42846
llvm-svn: 327207
When hoisting common code from the "then" and "else" branches of a condition
to before the "if", there is no need to require that debug intrinsics match
before moving them (and merging them). Instead, we can simply always keep
all debug intrinsics from both sides of the "if".
This fixes PR36410, which describes a problem where as a result of the attempt
to merge debug locations for two debug intrinsics we end up with an invalid
intrinsic, where the scope indicated in the !dbg location no longer matches
the scope of the variable tracked by the intrinsic.
In addition, this has the benefit that we no longer throw away information
that is actually still valid, helping to generate better debug data.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D44312
llvm-svn: 327175
There are six separate instances of getPointerOperand() utility.
LoopVectorize.cpp has one of them,
and I don't want to create a 7th one while I'm trying to move
LoopVectorizationLegality into a separate file
(eventual objective is to move it to Analysis tree).
See http://lists.llvm.org/pipermail/llvm-dev/2018-February/120999.html
for llvm-dev discussions
Closes D43323.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327173
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.
This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.
The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.
For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html
Differential Revision: https://reviews.llvm.org/D42453
llvm-svn: 327163
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading. In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262
llvm-svn: 327150
Fixes PR36311.
See more detailed analysis in
https://bugs.llvm.org/show_bug.cgi?id=36311.
isUniform() information is recomputed after LV started transforming the
underlying IR and that triggered an assert in SCEV.
From vectorizer's architectural perspective, such information, while
still useful in vector code gen, should not be recomputed after the
start of transforming the LLVM IR. Instead, we should collect and cache
such information during the analysis phase of LV and use the cached info
during code gen.
From the symptom perspective, this assert as it stands right now is not
very useful. Legality already rejected loops that would trigger the
assert. As such, commenting out the assert is NFC from vectorizer's
functionality perspective. On top of that, just above the assertion, we
check for unit-strided load/store or
gather scatter. Addresses can't be uniform below that check.
From vectorization theory point of view, we don't have to reject all
cases of stores to uniform addresses. Eventually, we should support
safe/profitable cases.
This patch resolves the issue by removing the useless assertion that is
invoking LAA's isUniform() that requires up-to-date DomTree ---- once
vector code gen starts modifying CFG, we don't have an up-to-date
DomTree.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327109
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.
rdar://problem/34496278
llvm-svn: 327092
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327061
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327053
Summary:
This change fixes PR36483. The bug was originally introduced by a change
that marked non-prevailing symbols dead. This broke LowerTypeTests
handling of available_externally functions, which are non-prevailing.
LowerTypeTests uses liveness information to avoid emitting thunks for
unused functions.
Marking available_externally functions dead is incorrect, the functions
are used though the function definitions are not. This change keeps them
live, and lets the EliminateAvailableExternally/GlobalDCE passes remove
them later instead.
I've also enabled EliminateAvailableExternally for all optimization
levels, I believe it being disabled for O1 was an oversight.
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: grimar, mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D43690
llvm-svn: 327041
This fixes a false positive ODR violation that is reported by ASan when using LTO. In cases, where two constant globals have the same value, LTO will merge them, which breaks ASan's ODR detection.
Differential Revision: https://reviews.llvm.org/D43959
llvm-svn: 327029
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.
Author: FarhanaAleen
Reviewed By: rampitec
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44179
llvm-svn: 326910
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Backed out for failing an assert in clang bootstrap builds. Re-landing
with a fix for handling non-power-of-two inputs (e.g. udiv i24).
Original Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326908
Breaks bootstrap builds: clang built with this patch asserts while
building MCDwarf.cpp: Assertion `castIsValid(op, S, Ty) && "Invalid
cast!"' failed.
llvm-svn: 326900
Summary:
If the operands of a udiv/urem can be proved to fit within a smaller
power-of-two-sized type, reduce the width of the udiv/urem.
Reviewers: spatel, sanjoy
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44102
llvm-svn: 326898
The LoadStoreVectorizer thought that <1 x T> and T were the same types
when merging stores, leading to a crash later.
Patch by Erik Hogeman.
Differential Revision: https://reviews.llvm.org/D44014
llvm-svn: 326884
It's been quite some time the Dependence Analysis (DA) is broken,
as it uses the GEP representation to "identify" multi-dimensional arrays.
It even wrongly detects multi-dimensional arrays in single nested loops:
from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6
;; for (long int i = 0; i < 50; i++) {
;; A[i][3*i - 6] = i;
;; *B++ = A[i][i];
DA used to detect two subscripts, which makes no sense in the LLVM IR
or in C/C++ semantics, as there are no guarantees as in Fortran of
subscripts not overlapping into a next array dimension:
maximum nesting levels = 1
SrcPtrSCEV = %A
DstPtrSCEV = %A
using GEPs
subscript 0
src = {0,+,1}<nuw><nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
subscript 1
src = {-6,+,3}<nsw><%for.body>
dst = {0,+,1}<nuw><nsw><%for.body>
class = 1
loops = {1}
Separable = {}
Coupled = {1}
With the current patch, DA will correctly work on only one dimension:
maximum nesting levels = 1
SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body>
DstSCEV = {%A,+,404}<%for.body>
subscript 0
src = {(-2424 + %A)<nsw>,+,1212}<%for.body>
dst = {%A,+,404}<%for.body>
class = 1
loops = {1}
Separable = {0}
Coupled = {}
This change removes all uses of GEP from DA, and we now only rely
on the SCEV representation.
The patch does not turn on -da-delinearize by default, and so the DA analysis
will be more conservative in the case of multi-dimensional memory accesses in
nested loops.
I disabled some interchange tests, as the DA is not able to disambiguate
the dependence anymore. To make DA stronger, we may need to
compute a bound on the number of iterations based on the access functions
and array dimensions.
The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to
avoid checking for snippets of LLVM IR: this form of checking is very hard to
maintain. Instead, we now check for output of the pass that are more meaningful
than dozens of lines of LLVM IR. Some tests now require -debug messages and thus
only enabled with asserts.
Patch written by Sebastian Pop and Aditya Kumar.
Differential Revision: https://reviews.llvm.org/D35430
llvm-svn: 326837
Most of the folds based on SelectPatternResult belong in InstSimplify rather than
InstCombine, so the helper code should be available to other passes/analysis.
llvm-svn: 326812
Change doCallSiteSplitting to iterate until we reach the terminator instruction.
tryToSplitCallSite can replace BB's terminator in case BB is a successor of
itself. Then IE will be invalidated and we also have to check the current
terminator.
Reviewers: junbuml, davidxl, davide, fhahn
Reviewed By: fhahn, junbuml
Differential Revision: https://reviews.llvm.org/D43824
llvm-svn: 326793
In case PredBB == BB and StopAt == BB's terminator, StopAt != &*BI will
fail, because BB's terminator instruction gets replaced.
By using BB.getTerminator() we get the current terminator which we can use
to compare.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D43822
llvm-svn: 326779
Summary:
RewriteStatepointsForGC collects parse points for further processing.
During the collection if a callsite is found in an unreachable block
(DominatorTree::isReachableFromEntry()) then all unreachable blocks are
removed by removeUnreachableBlocks(). Some of the removed blocks could
have been reachable according to DominatorTree::isReachableFromEntry().
In this case the collected parse points became stale and resulted in a
crash when accessed.
The fix is to unconditionally canonicalize the IR to
removeUnreachableBlocks and then collect the parse points.
The added test crashes with the old version and passes with this patch.
Patch by Yevgeny Rouban!
Reviewed by: Anna
Differential Revision: https://reviews.llvm.org/D43929
llvm-svn: 326748
Summary:
Presently, InstCombiner::foldICmpWithCastAndCast() implicitly assumes that it is
only invoked with icmp instructions of integer type. If that assumption is broken,
and it is called with an icmp of vector type, then it fails (asserts/crashes).
This patch addresses the deficiency. It allows it to simplify
icmp (ptrtoint x), (ptrtoint/c) of vector type into a compare of the inputs,
much as is done when the type is integer.
Reviewers: apilipenko, fedor.sergeev, mkazantsev, anna
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44063
llvm-svn: 326730
This patch teaches getMinimumFPType to support shrinking a vector of ConstantFPs. This should improve our ability to combine vector fptrunc with fp binops.
Differential Revision: https://reviews.llvm.org/D43774
llvm-svn: 326729
getCompare returns true, false or undef constants if the comparison can
be evaluated, or nullptr if it cannot. This is in line with what
ConstantExpr::getCompare returns. It also allows us to use
ConstantExpr::getCompare for comparing constants.
Reviewers: davide, mssimpso, dberlin, anna
Reviewed By: davide
Differential Revision: https://reviews.llvm.org/D43761
llvm-svn: 326720
Summary:
We can discard initial blocks that do other work
We do not need to limit ourselves to just the first block in the chain.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44029
llvm-svn: 326698
Iterating through predecessors of `TailBB` while removing their
terminators leads to use after-free, because the predecessor list is
changing on each removal.
llvm-svn: 326668
Summary:
`musttail` calls can't be naively splitted. The split blocks must
include not only the call instruction itself, but also (optional)
`bitcast` and `return` instructions that follow it.
Clone `bitcast` and `ret`, place them into the split blocks, and
remove the tail block when done.
Reviewers: junbuml, mcrosier, davidxl, davide, fhahn
Reviewed By: fhahn
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D43729
llvm-svn: 326666
This caused some links to fail with ThinLTO due to missing symbols as
well as causing some binaries to have failures at runtime. We're working
with the author to get a test case, but want to get the tree green
again.
Further, it appears to introduce a data race. While the test usage of
threads was disabled in r325361 & r325362, that isn't an acceptable fix.
I've reverted both of these as well. This code needs to be thread safe.
Test cases for this are already on the original commit thread.
llvm-svn: 326638
In stage2 -O3 builds of llc, this results in small but measurable
increases in the number of variables with locations, and in the number
of unique source variables overall.
(According to llvm-dwarfdump --statistics, there are 123 additional
variables with locations, which is just a 0.006% improvement).
The size of the .debug_loc section of the llc dsym increases by 0.004%.
llvm-svn: 326629
In stage2 -O3 builds of llc, this results in a 0.3% increase in the
number of variables with locations, and a 0.2% increase in the number of
unique source variables overall.
The size of the .debug_loc section of the llc dsym increases by 0.5%.
llvm-svn: 326621
Instead of returning the smaller FP constant we now return the minimal Type the constant can fit into. We also return the Type of the input to any fp extends. The legality checks are then done on just the size of these Types. If we find something profitable we then emit FPTruncs in front of the smaller binop and assume those FPTruncs will be constant folded or combined with any ConstantFPs or fpextends.
Differential Revision: https://reviews.llvm.org/D44038
llvm-svn: 326617
The code was checking that all of the instructions in the
sequence are 'fast', but that's not necessary. The final
multiply is all that we need to check (tests adjusted).
The fmul doesn't need to be fully 'fast' either, but that
can be another patch.
llvm-svn: 326608
Currently when AllowRemainder is disabled, pragma unroll count is not
respected even though there is no remainder. This bug causes a loop
fully unrolled in many cases even though the user specifies a unroll
count. Especially it affects OpenCL/CUDA since in many cases a loop
contains convergent instructions and currently AllowRemainder is
disabled for such loops.
Differential Revision: https://reviews.llvm.org/D43826
llvm-svn: 326585
This patch adds support for detecting outer loops with irreducible control
flow in LV. Current detection uses SCCs and only works for innermost loops.
This patch adds a utility function that works on any CFG, given its RPO
traversal and its LoopInfoBase. This function is a generalization
of isIrreducibleCFG from lib/CodeGen/ShrinkWrap.cpp. The code in
lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility
function.
Patch by Diego Caballero <diego.caballero@intel.com>
Differential Revision: https://reviews.llvm.org/D40874
llvm-svn: 326568
This is a retry of r326502 with updates to the reassociate
test file that I missed the first time.
@test15_reassoc in the supposed -reassociate test file
(except that it tests 2 other passes too...) shows that
there's no clear responsiblity for reassociation transforms.
Instcombine now gets that case, but only because the
constant values are identical. Otherwise, it would still
miss that pattern.
Reassociate doesn't get that case because it hasn't been
updated to use less than 'fast' FMF.
llvm-svn: 326513
I forgot that I added tests for 'reassoc' to -reassociate, but
suprisingly that file calls -instcombine too, so it is affected.
I'll update that file and try again.
llvm-svn: 326510
Do not replace results of `musttail` calls with a constant if the
call itself can't be removed.
Do not zap returns of `musttail` callees, if the call site can't be
removed and replaced with a constant.
Do not zap returns of `musttail`-calling blocks, this breaks
invariant too.
Patch by Fedor Indutny
Differential Revision: https://reviews.llvm.org/D43695
llvm-svn: 326404
When the function has musttail call - its cc is fixed to be equal to the
cc of the musttail callee. In such case (and in the case of the musttail
callee), GlobalOpt should not change the cc to fastcc as it will break
the invariant.
This fixes PR36546
Patch by: Fedor Indutny (indutny)
Differential revision: https://reviews.llvm.org/D43859
llvm-svn: 326376
Currently this code's control flow very much assumes that there are no meaningful checks after determining that it's a ConstantFP. So whenever it wants to stop it just does "return V". But V is also the variable name it uses when it wants to return a new value. So 'return V' appears multiple times with different meanings.
This patch just moves all the code into a helper function and returns nullptr when it wants to stop.
I've split this from D43774 while I try to figure out how to best handle the vector case there. But this change by itself at least seemed like a readability improvement.
Differential Revision: https://reviews.llvm.org/D43833
llvm-svn: 326361
The API verification tool tapi has difficulty processing frameworks
which enable code coverage, but which have no code. The profile lowering
pass does not emit the runtime hook in this case because no counters are
lowered.
While the hook is not needed for program correctness (the profile
runtime doesn't have to be linked in), it's needed to allow tapi to
validate the exported symbol set of instrumented binaries.
It was not possible to add a workaround in tapi for empty binaries due
to an architectural issue: tapi generates its expected symbol set before
it inspects a binary. Changing that model has a higher cost than simply
forcing llvm to always emit the runtime hook.
rdar://36076904
Differential Revision: https://reviews.llvm.org/D43794
llvm-svn: 326350
Also, rename 'foldOpWithConstantIntoOperand' because that's annoyingly
vague. The constant check is redundant in some cases, but it allows
removing duplication for most of the calls.
llvm-svn: 326329
Summary:
Fix a bug in MergeICmp that can lead to a BCECmp block being processed more than once and eventually lead to a broken LLVM module.
The problem is that if the non-constant value is not produced by the last block, the producer will be processed once when the its parent block
is processed and second time when the last block is processed.
We end up having 2 same BCECmpBlock in the merge queue. And eventually lead to a broken LLVM module.
Reviewers: courbet, davide
Reviewed By: courbet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43825
llvm-svn: 326318
Removes verifyDomTree, using assert(verify()) everywhere instead, and
changes verify a little to always run IsSameAsFreshTree first in order
to print good output when we find errors. Also adds verifyAnalysis for
PostDomTrees, which will allow checking of PostDomTrees it the same way
we check DomTrees and MachineDomTrees.
Differential Revision: https://reviews.llvm.org/D41298
llvm-svn: 326315
In case we update a ValuePHI node created earlier, we could update it
based on a different OpPHI which could be in a different block.
We need to update the TempToBlock mapping reflecting the new block,
otherwise we would end up placing the new phi node in a wrong block.
This problem is exposed by the test case in
https://bugs.llvm.org/show_bug.cgi?id=36504.
This patch fixes a slightly simpler problem than in the bug report. In
the bug's re-producer, the additional problem is that we are re-using a
ValuePHI node with to few incoming values for the new OpPHI. If this
patch makes sense, I will follow it up with a patch that creates a new
PHI node if the existing PHI node has a different number of incoming
values.
Reviewers: davide, dberlin
Reviewed By: dberlin
Differential Revision: https://reviews.llvm.org/D43770
llvm-svn: 326181
Note: gcc appears to allow this fold with -freciprocal-math alone,
but clang/llvm require more than that with this patch. The wording
in the definitions seems fuzzy enough that it could go either way,
but we'll err on the conservative side of FMF interpretation.
This patch also changes the newly created fmul to have FMF propagated
by the last fdiv rather than intersecting the FMF of the fdivs. This
matches the behavior of other folds near here. The new fmul is only
used to produce an intermediate op for the final fdiv result, so it
shouldn't be any stricter than that result. The previous behavior
could result in dropping FMF via other folds in instcombine or CSE.
Differential Revision: https://reviews.llvm.org/D43398
llvm-svn: 326098
All SIMD architectures can emulate masked load/store/gather/scatter
through element-wise condition check, scalar load/store, and
insert/extract. Therefore, bailing out of vectorization as legality
failure, when they return false, is incorrect. We should proceed to cost
model and determine profitability.
This patch is to address the vectorizer's architectural limitation
described above. As such, I tried to keep the cost model and
vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning
should be done separately.
Please see
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for
RFC and the discussions.
Closes D43208.
Patch by: Hideki Saito <hideki.saito@intel.com>
llvm-svn: 326079
The dependency matrix is only empty if no conflicting load/store
instructions have been found. In that case, it is safe to interchange.
For the LLVM test-suite, after this change around 1900 loops are
interchanged, whereas it is 15 before this change. On cortex-a57,
this gives an improvement of -0.57% on the geomean execution
time of SPEC2006, SPEC2000 and the test-suite. There are a
few small perf regressions, but I think we can improve on those
by making the cost model better.
Reviewers: karthikthecool, mcrosier
Reviewed by: karthikthecool
Differential Revision: https://reviews.llvm.org/D43236
llvm-svn: 326077
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.
Specifically this case:
```
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
```
In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.
This patch also moves the dbg.value insertion routine from LoopRotation.cpp
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).
Reviewers: mzolotukhin, aprantl, vsk, davide
Reviewed By: aprantl, vsk
Subscribers: dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 325926
The existing code was inefficiently looking for 'nsz' variants.
That's unnecessary because we canonicalize those to the expected
form with -0.0.
We may also want to adjust or remove the fold that sinks negation.
We don't do that for fdiv (or integer ops?). That should be uniform?
It may also lead to missed optimization as in PR21914:
https://bugs.llvm.org/show_bug.cgi?id=21914
...or we just have to fix other passes to avoid that problem.
llvm-svn: 325924
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Reviewers: arsenm, rampitec, jlebar
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D40546
llvm-svn: 325881
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.
Reviewers: davide, Gerolf
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43177
llvm-svn: 325880
When DataFlowSanitizer transforms a call to a custom function, the
new call has extra parameters. The attributes on parameters must be
updated to take the new position of each parameter into account.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D43132
llvm-svn: 325820
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: hfinkel, bollu, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D43081
llvm-svn: 325816
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
from changing naked functions.
- These passes can perform some alterations to the functions that should not be
applied. An example is removing parameters that are seemingly not used because
they are only referenced in the inline assembly. Another example is marking
the function as fastcc.
llvm-svn: 325788
Summary:
Exposing getOffset and findFunctionSamples as members of
SampleProfile. They are intimately tied to design choices of the
sample profile format - using offsets instead of line numbers, and
traversing inlined functions stack, respectively.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43605
llvm-svn: 325747
According to the current coverage report salvageDebugInfo() is called
5.12 million times during testing and almost always returns early.
The early return depends on LocalAsMetadata::getIfExists returning null,
which involves a DenseMap lookup in an LLVMContextImpl. We can probably
speed this up by simply checking the IsUsedByMD bit in Value.
llvm-svn: 325738
This patch changes hwasan inline instrumentation:
Fixes address untagging for shadow address calculation (use 0xFF instead of 0x00 for the top byte).
Emits brk instruction instead of hlt for the kernel and user space.
Use 0x900 instead of 0x100 for brk immediate (0x100 - 0x800 are unavailable in the kernel).
Fixes and adds appropriate tests.
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D43135
llvm-svn: 325711
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.
llvm-svn: 325660