Summary:
We have two copies of createPrivateGlobalForString (in asan and in esan).
This change merges them into one. NFC
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53178
llvm-svn: 344314
This patch ports the legacy pass manager to the new one to take advantage of
the benefits of the new PM. This involved moving a lot of the declarations for
`AddressSantizer` to a header so that it can be publicly used via
PassRegistry.def which I believe contains all the passes managed by the new PM.
This patch essentially decouples the instrumentation from the legacy PM such
hat it can be used by both legacy and new PM infrastructure.
Differential Revision: https://reviews.llvm.org/D52739
llvm-svn: 344274
InstCombine keeps a worklist and assumes that optimizations don't
eraseFromParent() the instruction, which SimplifyLibCalls violates. This change
adds a new callback to SimplifyLibCalls to let clients specify their own hander
for erasing actions.
Differential Revision: https://reviews.llvm.org/D52729
llvm-svn: 344251
This is the umin alternative to the umax code from rL344237. We use
DeMorgans law on the umax case to bring us to the same thing on umin,
but using countLeadingOnes, not countLeadingZeros.
Differential Revision: https://reviews.llvm.org/D53036
llvm-svn: 344239
Use the demanded bits of umax(A,C) to prove we can just use A so long as the
lowest non-zero bit of DemandMask is higher than the highest non-zero bit of C
Differential Revision: https://reviews.llvm.org/D53033
llvm-svn: 344237
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.
Reviewers: hsaito, rengolin, nadav, dcaballe
Reviewed By: rengolin
Differential Revision: https://reviews.llvm.org/D53089
llvm-svn: 344233
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.
Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk
Reviewed By: rengolin, dcaballe
Differential Revision: https://reviews.llvm.org/D53091
llvm-svn: 344232
Summary:
Right now there is no hit counter on the line of function.
So the idea is add the line of the function to all the lines covered by the entry block.
Tests in compiler-rt/profile will be fixed in another patch: https://reviews.llvm.org/D49854
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: sylvestre.ledru, llvm-commits
Differential Revision: https://reviews.llvm.org/D49853
llvm-svn: 344228
There is a transform that may replace `lshr (x+1), 1` with `lshr x, 1` in case
if it can prove that the result will be the same. However the initial instruction
might have an `exact` flag set, and it now should be dropped unless we prove
that it may hold. Incorrectly set `exact` attribute may then produce poison.
Differential Revision: https://reviews.llvm.org/D53061
Reviewed By: sanjoy
llvm-svn: 344223
This can be used to preserve profiling information across codebase
changes that have widespread impact on mangled names, but across which
most profiling data should still be usable. For example, when switching
from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI,
or even from a 32-bit to a 64-bit build.
The user can provide a remapping file specifying parts of mangled names
that should be treated as equivalent (eg, std::__1 should be treated as
equivalent to std::__cxx11), and profile data will be treated as
applying to a particular function if its name is equivalent to the name
of a function in the profile data under the provided equivalences. See
the documentation change for a description of how this is configured.
Remapping is supported for both sample-based profiling and instruction
profiling. We do not support remapping indirect branch target
information, but all other profile data should be remapped
appropriately.
Support is only added for the new pass manager. If someone wants to also
add support for this for the old pass manager, doing so should be
straightforward.
This is the LLVM side of Clang r344199.
Reviewers: davidxl, tejohnson, dlj, erik.pilkington
Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51249
llvm-svn: 344200
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
llvm-svn: 344186
I've added a new test case that causes the scalarizer to try and use
dead-and-erased values - caused by the basic blocks not being in
domination order within the function. To fix this, instead of iterating
through the blocks in function order, I walk them in reverse post order.
Differential Revision: https://reviews.llvm.org/D52540
llvm-svn: 344128
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D52887
llvm-svn: 344120
There are places where we need to merge multiple LocationSizes of
different sizes into one, and get a sensible result.
There are other places where we want to optimize aggressively based on
the value of a LocationSizes (e.g. how can a store of four bytes be to
an area of storage that's only two bytes large?)
This patch makes LocationSize hold an 'imprecise' bit to note whether
the LocationSize can be treated as an upper-bound and lower-bound for
the size of a location, or just an upper-bound.
This concludes the series of patches leading up to this. The most recent
of which is r344108.
Fixes PR36228.
Differential Revision: https://reviews.llvm.org/D44748
llvm-svn: 344114
This is the second in a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The first change being r344012.
Since I was requested to do all of this with post-commit review, this is
about as small as I can make this patch.
This patch makes LocationSize into an actual type that wraps a uint64_t;
users are required to call getValue() in order to get the size now. If
the LocationSize has an Unknown size (e.g. if LocSize ==
MemoryLocation::UnknownSize), getValue() will assert.
This also adds DenseMap specializations for LocationInfo, which required
taking two more values from the set of values LocationInfo can
represent. Hence, heavy users of multi-exabyte arrays or structs may
observe slightly lower-quality code as a result of this change.
The intent is for getValue()s to be very close to a corresponding
hasValue() (which is often spelled `!= MemoryLocation::UnknownSize`).
Sadly, small diff context appears to crop that out sometimes, and the
last change in DSE does require a bit of nonlocal reasoning about
control-flow. :/
This also removes an assert, since it's now redundant with the assert in
getValue().
llvm-svn: 344013
This is one of a series of changes intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context.
Since I was requested to do all of this with post-commit review, this is
about as small as I can make it (beyond committing changes to these few
files separately, but they're incredibly similar in spirit, so...)
On its own, this change doesn't make a great deal of sense. I plan on
having a follow-up Real Soon Now(TM) to make the bits here make more
sense. :)
In particular, the next change in this series is meant to make
LocationSize an actual type, which you have to call .getValue() on in
order to get at the uint64_t inside. Hence, this change refactors code
so that:
- we only need to call the soon-to-come getValue() once in most cases,
and
- said call to getValue() happens very closely to a piece of code that
checks if the LocationSize has a value (e.g. if it's != UnknownSize).
llvm-svn: 344012
In r339636 the alias analysis rules were changed with regards to tail calls
and byval arguments. Previously, tail calls were assumed not to alias
allocas from the current frame. This has been updated, to not assume this
for arguments with the byval attribute.
This patch aligns TailCallElim with the new rule. Tail marking can now be
more aggressive and mark more calls as tails, e.g.:
define void @test() {
%f = alloca %struct.foo
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test2(%struct.foo* byval %f) {
call void @bar(%struct.foo* byval %f)
ret void
}
define void @test3(%struct.foo* byval %f) {
%agg.tmp = alloca %struct.foo
%0 = bitcast %struct.foo* %agg.tmp to i8*
%1 = bitcast %struct.foo* %f to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
call void @bar(%struct.foo* byval %agg.tmp)
ret void
}
The problematic case where a byval parameter is captured by a call is still
handled correctly, and will not be marked as a tail (see PR7272).
llvm-svn: 343986
Summary:
If we have a symbol with (linkonce|weak)_odr linkage, we do not want
to dead strip it even it is not prevailing.
IR level (linkonce|weak)_odr symbol can become non-prevailing when we mix
ELF objects and IR objects where the (linkonce|weak)_odr symbol in the ELF
object is prevailing and the ones in the IR objects are not. Stripping
them will prevent us from doing optimizations with them.
By not dead stripping them, We will convert these symbols to
available_externally linkage as a result of non-prevailing and eventually
dropping them after inlining.
I modified cache-prevailing.ll to use linkonce linkage as it is
testing whether cache prevailing bit is effective or not, not
we should treat linkonce_odr alive or not
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52893
llvm-svn: 343970
The IRBuilder CreateIntrinsic method wouldn't allow you to specify the
types that you wanted the intrinsic to be mangled with. To fix this
I've:
- Added an ArrayRef<Type *> member to both CreateIntrinsic overloads.
- Used that array to pass into the Intrinsic::getDeclaration call.
- Added a CreateUnaryIntrinsic to replace the most common use of
CreateIntrinsic where the type was auto-deduced from operand 0.
- Added a bunch more unit tests to test Create*Intrinsic calls that
weren't being tested (including the FMF flag that wasn't checked).
This was suggested as part of the AMDGPU specific atomic optimizer
review (https://reviews.llvm.org/D51969).
Differential Revision: https://reviews.llvm.org/D52087
llvm-svn: 343962
Currently running the @insertelem_after_gep function below through the InstCombine pass with opt produces invalid IR.
Input:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t1 = bitcast <16 x i32>* %t0 to [16 x i32]*
%t2 = addrspacecast [16 x i32]* %t1 to [16 x i32] addrspace(3)*
%t3 = getelementptr inbounds [16 x i32], [16 x i32] addrspace(3)* %t2, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @extern_vec_pointers_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Output:
```
define void @insertelem_after_gep(<16 x i32>* %t0) {
%t3 = getelementptr inbounds <16 x i32>, <16 x i32>* %t0, i64 0, i64 0
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
call void @my_extern_func(<16 x i32 addrspace(3)*> %t4)
ret void
}
```
Which although causes no complaints when produced, isn't valid IR as the insertelement use of the %t3 GEP expects an address space.
```
opt: /tmp/bad.ll:52:73: error: '%t3' defined with type 'i32*' but expected 'i32 addrspace(3)*'
%t4 = insertelement <16 x i32 addrspace(3)*> undef, i32 addrspace(3)* %t3, i32 0
```
I've fixed this by adding an addrspacecast after the GEP in the InstCombine pass, and including a check for this type mismatch to the verifier.
Reviewers: spatel, lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52294
llvm-svn: 343956
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.
The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.
This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.
Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.
Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito
llvm-svn: 343954
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().
This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.
getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.
Review: Florian Hahn
https://reviews.llvm.org/D52883
llvm-svn: 343852
Summary:
At some point in the past the recursion in DominatesMergePoint used to pass null for AggressiveInsts as part of the recursion. It no longer does this. So there is no way for AggressiveInsts to be null.
This passes it by reference and removes the null check to make this explicit.
Reviewers: efriedma, reames
Reviewed By: efriedma
Subscribers: xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D52575
llvm-svn: 343828
We established the (unfortunately complicated) rules for UB/poison
propagation with vector ops in:
D48893
D48987
D49047
It's clear from the affected tests that we are potentially creating
poison where none existed before the transforms. For add/sub/mul,
the answer is simple: just drop the flags because the extra undef
vector lanes are generally more valuable for analysis and codegen.
llvm-svn: 343819
Summary:
The llvm::SimplifyCFG function creates a SimplifyCFGOpt object and calls run on it. There were numerous places reached from this run function that called back out llvm::SimplifyCFG which would create another SimplifyCFGOpt object. This is an inefficient use of stack space at minimum. We are also not passing along the LoopHeaders pointer passed into the outer llvm::SimplifyCFG call. So if its not null we lose it on the first recursion and get nullptr from there on.
This patch adds an outer loop around the main BasicBlock simplifying code and adds a flag to the SimplifyCFGOpt class that can be set by to request another iteration. I don't think we can iterate based just on the change flag alone since some of the simplifications delete a basic block entirely leaving nothing to iterate on.
Reviewers: bogner, eli.friedman, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52760
llvm-svn: 343816
This is a follow-up to rL343482 / D52439.
This was a pattern that initially caused the commit to be reverted because
the transform requires a bitcast as shown here.
llvm-svn: 343794
We're a long way from D50992 and D51553, but this is where we have to start.
We weren't back-propagating undefs into binop constant values for anything but
add/sub/mul/and/or/xor.
This is likely because we have to be careful about not introducing UB/poison
with div/rem/shift. But I suspect we already are getting the poison part wrong
for add/sub/mul (although it may not be possible to expose the bug currently
because we use SimplifyDemandedVectorElts from a limited set of opcodes).
See the discussion/implementation from D48987 and D49047.
This patch just enables functionality for FP ops because those do not have
UB/poison potential.
llvm-svn: 343727
1. Fix include ordering.
2. Improve variable name (width is bitwidth not number-of-elements).
3. Add local Opcode variable to reduce code duplication.
llvm-svn: 343694
Modified the testcases to use both pass managers
Use single commandline flag for both pass managers.
Differential Revision: https://reviews.llvm.org/D52708
Reviewers: sebpop, tejohnson, brzycki, SirishP
Reviewed By: tejohnson, brzycki
llvm-svn: 343662
This is an attempt to get out of a local-minimum that instcombine currently
gets stuck in. We essentially combine two optimisations at once, ~a - ~b = b-a
and min(~a, ~b) = ~max(a, b), only doing the transform if the result is at
least neutral. This involves using IsFreeToInvert, which has been expanded a
little to include selects that can be easily inverted.
This is trying to fix PR35875, using the ideas from Sanjay. It is a large
improvement to one of our rgb to cmy kernels.
Differential Revision: https://reviews.llvm.org/D52177
llvm-svn: 343569
getNumUses is linear in the number of uses. Since we're looking for a specific use count, we can use hasNUses which will stop as soon as it determines there are more than N uses instead of walking all of them.
llvm-svn: 343550
This reverts commit r342387 as it's showing significant performance
regressions in a number of benchmarks. Followed up with the
committer and original thread with an example and will get performance
numbers before recommitting.
llvm-svn: 343522
Summary:
This is a continuation of the fix for PR34627 "InstCombine assertion at vector gep/icmp folding". (I just realized bugpoint had fuzzed the original test for me, so I had fixed another trigger of the same assert in adjacent code in InstCombine.)
This patch avoids optimizing an icmp (to look only at the base pointers) when the resulting icmp would have a different type.
The patch adds a testcase and also cleans up and shrinks the pre-existing test for the adjacent assert trigger.
Reviewers: lebedev.ri, majnemer, spatel
Reviewed By: lebedev.ri
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52494
llvm-svn: 343486
This was originally committed at rL343407, but reverted at
rL343458 because it crashed trying to handle a case where
the destination type is FP. This version of the patch adds
a check for that possibility. Tests added at rL343480.
Original commit message:
This transform is requested for the backend in:
https://bugs.llvm.org/show_bug.cgi?id=39016
...but I figured it was worth doing in IR too, and it's probably
easier to implement here, so that's this patch.
In the simplest case, we are just truncating a scalar value. If the
extract index doesn't correspond to the LSBs of the scalar, then we
have to shift-right before the truncate. Endian-ness makes this tricky,
but hopefully the ASCII-art helps visualize the transform.
Differential Revision: https://reviews.llvm.org/D52439
llvm-svn: 343482