If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
addGlobalValueSummary can check newly added FunctionSummary
and set HasParamAccess to mark that generateParamAccessSummary
is needed.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D85182
In GlobalISel, if you have a load into a small type with a range, you'll hit
an assert if you try to compute known bits on it starting at a larger type.
e.g.
```
%x:_(s8) = G_LOAD %whatever(p0) :: (load 1 ... !range !n)
...
%y:_(s32) = G_SOMETHING %x
```
When we walk through G_SOMETHING and hit the load, the width of our known bits
is 32. However, the width of the range is going to be 8. This will cause us
to hit an assert.
To fix this, make computeKnownBitsFromRangeMetadata zero extend or truncate
the range type to match the bitwidth of the known bits we're calculating.
Add a testcase in CodeGen/GlobalISel/KnownBitsTest.cpp to reflect that this
works now.
https://reviews.llvm.org/D85375
We don't want mandatory events in the training log. We do want to handle
them, to keep the native size accounting accurate, but that's all.
Fixed the code, also expanded the test to capture this.
Differential Revision: https://reviews.llvm.org/D85373
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
https://rise4fun.com/Alive/pZEr
Name: mul nuw with icmp eq
Pre: (C2 %u C1) != 0
%a = mul nuw i8 %x, C1
%r = icmp eq i8 %a, C2
=>
%r = false
Name: mul nuw with icmp ne
Pre: (C2 %u C1) != 0
%a = mul nuw i8 %x, C1
%r = icmp ne i8 %a, C2
=>
%r = true
There are potentially several other transforms we need to add based on:
D51625
...but it doesn't look like there was follow-up to that patch.
This reverts commit e9761688e4. It breaks the build:
```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
return ReductionOperations;
```
These were implementation detail, but become necessary for generic data
copying.
Also added const variations to them, and move assignment, since we had a
move ctor (and the move assignment helps in a subsequent patch).
Differential Revision: https://reviews.llvm.org/D85262
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.
In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).
Differential Revision: https://reviews.llvm.org/D75069
This is one more NFC part extracted from D79485. Normal and SCC based loops have very different representation and have to be handled separatly each time we deal with loops. D79485 is going to introduce much more extensive use of loops what will be problematic with out this change.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84838
Added a mechanism to check the element type, get the total element
count, and the size of an element.
Differential Revision: https://reviews.llvm.org/D85250
This revision adds the following peephole optimization
and it's negation:
%a = urem i64 %x, %y
%b = icmp ule i64 %a, %x
====>
%b = true
With John Regehr's help this optimization was checked with Alive2
which suggests it should be valid.
This pattern occurs in the bound checks of Rust code, the program
const N: usize = 3;
const T = u8;
pub fn split_mutiple(slice: &[T]) -> (&[T], &[T]) {
let len = slice.len() / N;
slice.split_at(len * N)
}
the method call slice.split_at will check that len * N is within
the bounds of slice, this bounds check is after some transformations
turned into the urem seen above and then LLVM fails to optimize it
any further. Adding this optimization would cause this bounds check
to be fully optimized away.
ref: https://github.com/rust-lang/rust/issues/74938
Differential Revision: https://reviews.llvm.org/D85092
This is based on the existing code for the non-intrinsic idioms
in InstCombine.
The vector constant constraint is non-obvious: undefs should be
ok in the outer call, but they can't propagate safely from the
inner call in all cases. Example:
https://alive2.llvm.org/ce/z/-2bVbM
define <2 x i8> @src(<2 x i8> %x) {
%0:
%m = umin <2 x i8> %x, { 7, undef }
%m2 = umin <2 x i8> { 9, 9 }, %m
ret <2 x i8> %m2
}
=>
define <2 x i8> @tgt(<2 x i8> %x) {
%0:
%m = umin <2 x i8> %x, { 7, undef }
ret <2 x i8> %m
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
<2 x i8> %x = < undef, undef >
Source:
<2 x i8> %m = < #x00 (0) [based on undef value], #x00 (0) >
<2 x i8> %m2 = < #x00 (0), #x00 (0) >
Target:
<2 x i8> %m = < #x07 (7), #x10 (16) >
Source value: < #x00 (0), #x00 (0) >
Target value: < #x07 (7), #x10 (16) >
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.
Differential Revision: https://reviews.llvm.org/D82998
Merging alias results from different paths, when a path did phi
translation is not necesarily correct. Conservatively terminate such paths.
Aimed to fix PR46156.
Differential Revision: https://reviews.llvm.org/D84905
A JSON->TensorSpec utility we will use subsequently to specify
additional outputs needed for certain training scenarios.
Differential Revision: https://reviews.llvm.org/D84976
In some cases, it seems like we can get rid of unnecessary s/umins by
using information from the loop guards (unless I am missing something).
One place where this seems to be helpful in practice is when computing
loop trip counts. This patch just changes howManyGreaterThans for now.
Note that this requires a loop for which we can check 'is guarded'.
On SPEC2000/SPEC2006/MultiSource, there are some notable changes for
some programs in the number of loops unrolled and trip counts computed.
```
Same hash: 179 (filtered out)
Remaining: 58
Metric: scalar-evolution.NumTripCountsComputed
Program base patch diff
test-suite...langs-C/compiler/compiler.test 25.00 31.00 24.0%
test-suite.../Applications/SPASS/SPASS.test 2020.00 2323.00 15.0%
test-suite...langs-C/allroots/allroots.test 29.00 32.00 10.3%
test-suite.../Prolangs-C/loader/loader.test 17.00 18.00 5.9%
test-suite...fice-ispell/office-ispell.test 253.00 265.00 4.7%
test-suite...006/450.soplex/450.soplex.test 3552.00 3692.00 3.9%
test-suite...chmarks/MallocBench/gs/gs.test 453.00 470.00 3.8%
test-suite...ngs-C/assembler/assembler.test 29.00 30.00 3.4%
test-suite.../Benchmarks/Ptrdist/bc/bc.test 263.00 270.00 2.7%
test-suite...rks/FreeBench/pifft/pifft.test 722.00 741.00 2.6%
test-suite...count/automotive-bitcount.test 41.00 42.00 2.4%
test-suite...0/253.perlbmk/253.perlbmk.test 1417.00 1451.00 2.4%
test-suite...000/197.parser/197.parser.test 387.00 396.00 2.3%
test-suite...lications/sqlite3/sqlite3.test 1168.00 1189.00 1.8%
test-suite...000/255.vortex/255.vortex.test 173.00 176.00 1.7%
Metric: loop-unroll.NumUnrolled
Program base patch diff
test-suite...langs-C/compiler/compiler.test 1.00 3.00 200.0%
test-suite.../Applications/SPASS/SPASS.test 134.00 234.00 74.6%
test-suite...count/automotive-bitcount.test 3.00 4.00 33.3%
test-suite.../Prolangs-C/loader/loader.test 3.00 4.00 33.3%
test-suite...langs-C/allroots/allroots.test 3.00 4.00 33.3%
test-suite...Source/Benchmarks/sim/sim.test 10.00 12.00 20.0%
test-suite...fice-ispell/office-ispell.test 21.00 25.00 19.0%
test-suite.../Benchmarks/Ptrdist/bc/bc.test 32.00 38.00 18.8%
test-suite...006/450.soplex/450.soplex.test 300.00 352.00 17.3%
test-suite...rks/FreeBench/pifft/pifft.test 60.00 69.00 15.0%
test-suite...chmarks/MallocBench/gs/gs.test 57.00 63.00 10.5%
test-suite...ngs-C/assembler/assembler.test 10.00 11.00 10.0%
test-suite...0/253.perlbmk/253.perlbmk.test 145.00 157.00 8.3%
test-suite...000/197.parser/197.parser.test 43.00 46.00 7.0%
test-suite...TimberWolfMC/timberwolfmc.test 205.00 214.00 4.4%
Geomean difference 7.6%
```
Fixes https://bugs.llvm.org/show_bug.cgi?id=46939
Fixes https://bugs.llvm.org/show_bug.cgi?id=46924 on X86.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D85046
It's always safe to pick the earlier abs regardless of the nsw flag. We'll just lose it if it is on the outer abs but not the inner abs.
Differential Revision: https://reviews.llvm.org/D85053
abs() should be rare enough that using value tracking is not going
to be a compile-time cost burden, so use it to reduce a variety of
potential patterns. We do this in DAGCombiner too.
Differential Revision: https://reviews.llvm.org/D85043
Add the optimizations we have in the SelectionDAG version.
Known non-negative copies all known bits. Any known one other than
the sign bit makes result non-negative.
Differential Revision: https://reviews.llvm.org/D85000
If absolute value needs turn a negative number into a positive number it reduces the number of sign bits by at most 1.
Differential Revision: https://reviews.llvm.org/D84971
findAllocaForValue uses AllocaForValue to cache resolved values.
The function is used only to resolve arguments of lifetime
intrinsic which usually are not fare for allocas. So result reuse
is likely unnoticeable.
In followup patches I'd like to replace the function with
GetUnderlyingObjects.
Depends on D84616.
Differential Revision: https://reviews.llvm.org/D84617
This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.
Differential Revision: https://reviews.llvm.org/D84963
Currently we skip alias sets with only reads or a single write and no
reads, but still add the pointers to the list of pointers in RtCheck.
This can lead to cases where we try to access a pointer that does not
exist when grouping checks. In most cases, the way we access
PositionMap masked that, as the value would default to index 0.
But in the example in PR46854 it causes a crash.
This patch updates the logic to avoid adding pointers for alias sets
that do not need any checks. It makes things slightly more verbose, by
first checking the numbers of reads/writes and bailing out early if we don't
need checks for the alias set.
I think this makes the logic a bit simpler to follow.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D84608
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
Further abstracting the specification of a tensor, to more easily
support different types and shapes of tensor, and also to perform
initialization up-front, at TFModelEvaluator construction time.
Differential Revision: https://reviews.llvm.org/D84685
This adds a common API for compute constant ranges of intrinsics.
The intention here is that
a) we can reuse the same code across different passes that handle
constant ranges, i.e. this can be reused in SCCP
b) we only have to add knowledge about supported intrinsics to
ConstantRange, not any consumers.
Differential Revision: https://reviews.llvm.org/D84587
This matches the behavior of simplify calls for regular opcodes -
rely on ConstantFolding before spending time on folds with variables.
I am not aware of any diffs from this re-ordering currently, but there was
potential for unintended behavior from the min/max intrinsics because that
code is implicitly assuming that only 1 of the input operands is constant.
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
This is the main icmp simplification shortcoming seen in D84655.
Alive2 agrees that the basic examples are correct at least:
define <2 x i1> @src(<2 x i8> %x) {
%0:
%r = icmp sle <2 x i8> { undef, 128 }, %x
ret <2 x i1> %r
}
=>
define <2 x i1> @tgt(<2 x i8> %x) {
%0:
ret <2 x i1> { 1, 1 }
}
Transformation seems to be correct!
define <2 x i1> @src(<2 x i32> %X) {
%0:
%A = or <2 x i32> %X, { 63, 63 }
%B = icmp ult <2 x i32> %A, { undef, 50 }
ret <2 x i1> %B
}
=>
define <2 x i1> @tgt(<2 x i32> %X) {
%0:
ret <2 x i1> { 0, 0 }
}
Transformation seems to be correct!
https://alive2.llvm.org/ce/z/omt2eehttps://alive2.llvm.org/ce/z/GW4nP_
Differential Revision: https://reviews.llvm.org/D84762
There is a silly mistake where release() is used instead of reset() for free resources of unique pointer.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D84747
In order to facilitate review of D79485 here is a small NFC change which restructures code around handling of SCCs in BPI.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84514
Summary:
Use getChildren() method in GraphDiff instead of GraphTraits.
This simplifies the code and allows for refactorigns inside GraphDiff.
All usecase need not have a light-weight/copyable range.
Clean GraphTraits implementation.
Reviewers: dblaikie
Subscribers: hiraditya, llvm-commits, george.burgess.iv
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84562
Summary:
Try not to resize vector of call records in a call graph node when
replacing call edge. That would prevent invalidation of iterators
stored in the CG SCC pass manager's scc_iterator.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84295
This is a simple patch that adds constant folding for freeze
instruction.
IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84597
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84596
This is the first of two patches to address PR46753. We basically allow
mem2reg to promote allocas that are used in doppable instructions, for
now that means `llvm.assume`. The uses of the alloca (or a bitcast or
zero offset GEP from there) are replaced by `undef` in the droppable
instructions.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D83976
Summary: To match NewPM name. Also the new name is clearer and more consistent.
Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D84542
Make sure we do not call
constainsConstantExpression/containsUndefElement on ConstantExpression,
which is not supported.
In particular, containsUndefElement/constainsConstantExpression are only
supported on constants which are supported by getAggregateElement.
Unfortunately there's no convenient way to check if a constant supports
getAggregateElement, so just check for non-constantexpressions with
vector type. Other users of those functions do so too.
Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84512
PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list.
This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.
(This reverts commit a5e0194709, and
corrects author).
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
Rename the pass to be able to extend it to function properties other than inliner features.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D82044
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
This assert was added to verify assumption that GEP's SCEV will be of pointer type,
basing on fact that it should be a SCEVAddExpr with (at least) last operand being
pointer. Two notes:
- GEP's SCEV does not have to be a SCEVAddExpr after all simplifications;
- In current state, GEP's SCEV does not have to have at least one pointer operands
(all of them can become int during the transforms).
However, we might want to be at a point where it is true. We are currently removing
this assert and will try to enumerate the cases where "is pointer" notion might be
lost during the transforms. When all of them are fixed, we can return it.
Differential Revision: https://reviews.llvm.org/D84294
Reviewed By: lebedev.ri
.. in isGuaranteedNotToBeUndefOrPoison.
This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return
imprecise result.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84251
Outside of compiler-rt (where it's arguably an anti-pattern too),
LLVM tries to keep its build files as simple as possible. See e.g.
llvm/docs/SupportLibrary.rst, "Code Organization".
Differential Revision: https://reviews.llvm.org/D84243
We can sometimes get into the situation where the operand to a vctp
intrinsic becomes constant, such as after a loop is fully unrolled. This
adds the constant folding needed for them, allowing them to simplify
away and hopefully simplifying remaining instructions.
Differential Revision: https://reviews.llvm.org/D84110
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:
- loading models via a command-line parameter, thus allowing for rapid
training iteration, where models can be used for the next exploration
phase without requiring recompiling the compiler. This trades off some
compilation speed for the added flexibility.
- collecting training logs, in the form of tensorflow.SequenceExample
protobufs. We generate these as textual protobufs, which simplifies
generation and testing. The protobufs may then be readily consumed by a
tensorflow-based training algorithm.
To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html
Reviewers: jdoerfert, davidxl
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83733
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.
This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.
My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.
This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.
Background:
There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.
It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.
The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.
This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.
Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.
The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.
Additional context is in D79744.
The getAllOnesValue can only handle things that are bitcast from a
ConstantInt, while here we bitcast through a pointer, so we may see more
complex objects (like Array or Struct).
Differential Revision: https://reviews.llvm.org/D83870
This patch
- adds `canCreateUndefOrPoison`
- refactors `canCreatePoison` so it can deal with constantexprs
`canCreateUndefOrPoison` will be used at D83926.
Reviewed By: nikic, jdoerfert
Differential Revision: https://reviews.llvm.org/D84007
Summary:
This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.
Subscribers: mgorny, aprantl, hiraditya, llvm-commits
Tags: #llvm
Resubmit for https://reviews.llvm.org/D84086
This is a step towards trying to remove unnecessary FP compares
with infinity when compiling with -ffinite-math-only or similar.
I'm intentionally not checking FMF on the fcmp itself because
I'm assuming that will go away eventually.
The analysis part of this was added with rGcd481136 for use with
isKnownNeverNaN. Similarly, that could be an enhancement here to
get predicates like 'one' and 'ueq'.
Differential Revision: https://reviews.llvm.org/D84035
Summary:
This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning.
A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed.
Reviewers: davidxl, mtrofin, wmi, hoy
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83743
Many tests use opt's -analyze feature, which does not translate well to
NPM and has better alternatives. The alternative here is to explicitly
add a pass that calls ScalarEvolution::print().
The legacy pass manager RUNs aren't changing, but they are now pinned to
the legacy pass manager. For each legacy pass manager RUN, I added a
corresponding NPM RUN using the 'print<scalar-evolution>' pass. For
compatibility with update_analyze_test_checks.py and existing test
CHECKs, 'print<scalar-evolution>' now prints what -analyze prints per
function.
This was generated by the following Python script and failures were
manually fixed up:
import sys
for i in sys.argv:
with open(i, 'r') as f:
s = f.read()
with open(i, 'w') as f:
for l in s.splitlines():
if "RUN:" in l and ' -analyze ' in l and '\\' not in l:
f.write(l.replace(' -analyze ', ' -analyze -enable-new-pm=0 '))
f.write('\n')
f.write(l.replace(' -analyze ', ' -disable-output ').replace(' -scalar-evolution ', ' "-passes=print<scalar-evolution>" ').replace(" | ", " 2>&1 | "))
f.write('\n')
else:
f.write(l)
There are a couple failures still in ScalarEvolution under NPM, but
those are due to other unrelated naming conflicts.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D83798
When the byref attribute is added, there will need to be two similar
functions for the existing cases which have an associate value copy,
and byref which does not. Most, but not all of the existing uses will
use the existing version.
The associated size function added by D82679 also needs to
contextually differ, and will help eliminate a few places still
relying on pointee element types.
The IR doesn't have a proper concept of invalid pointers, and "null"
constants are just all zeros (though it really needs one).
I think it's not possible to break this for AMDGPU due to the copy
semantics of byval. If you have an original stack object at 0, the
byval copy will be placed above it so I don't think it's really
possible to hit a 0 address.
As shown in D82998, the basic-aa-recphi option can cause miscompiles for
gep's with negative constants. The option checks for recursive phi, that
recurse through a contant gep. If it finds one, it performs aliasing
calculations using the other phi operands with an unknown size, to
specify that an unknown number of elements after the initial value are
potentially accessed. This works fine expect where the constant is
negative, as the size is still considered to be positive. So this patch
expands the check to make sure that the constant is also positive.
Differential Revision: https://reviews.llvm.org/D83576
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Summary:
This change avoids exposing tensorflow types when including TFUtils.h.
They are just an implementation detail, and don't need to be used
directly when implementing an analysis requiring ML model evaluation.
The TFUtils APIs, while generically typed, are still not exposed unless
the tensorflow C library is present, as they currently have no use
otherwise.
Reviewers: mehdi_amini, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83843
Since D82572, we keep "reference" edges for callback call sites. While
not strictly necessary they can improve the traversal order. However, we
did not update them properly in case a pass removed the callback call
site which caused a verification error (PR46687). With this patch we
update these reference edges properly during the invocation of
`CallGraphSCCPass::RefreshCallGraph` in non-checking mode.
Reviewed By: sdmitriev
Differential Revision: https://reviews.llvm.org/D83718
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
Summary:
Add debug counter and stats counter to assume queries and assume builder
here is the collected stats on a build of check-llvm + check-clang.
"assume-builder.NumAssumeBuilt": 2720879,
"assume-builder.NumAssumesMerged": 761396,
"assume-builder.NumAssumesRemoved": 1576212,
"assume-builder.NumBundlesInAssumes": 6518809,
"assume-queries.NumAssumeQueries": 85566380,
"assume-queries.NumUsefullAssumeQueries": 2727360,
the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries
ask to keep providing information to try to get better information. and this isn't counted
as a usefull query evem tho it can be usefull
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83506
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.
Differential Revision: https://reviews.llvm.org/D83709
Here we teach the ConstantFolding analysis pass that it is not legal to
replace a load of a bitcast constant (having a non-integral addrspace)
with a bitcast of the value of that constant (with a different
non-integral addrspace).
But also teach it that certain bit patterns are always known and
convertable (a fact it already uses elsewhere). This required us to also
fix a globalopt test, since, after this change, LLVM is able to realize
that the test actually is a valid transform (NULL is always a known
bit-pattern) and so it doesn't need to emit the failure remarks for it.
Also simplify some of the negative tests for transforms by avoiding a
type change in their bitcast, and add positive versions of the same
tests, to show that they otherwise should work.
Differential Revision: https://reviews.llvm.org/D59730
This reverts commit 9908a3b9f5.
The fix was to exclude the content of TFUtils.h (automatically
included in the LLVM_Analysis module, when LLVM_ENABLE_MODULES is enabled).
Differential Revision: https://reviews.llvm.org/D82817
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This is in preparation for the 'development' mode advisor. We currently
want to track what the default policy's decision would have been, this
refactoring makes it easier to do that.
Also compacted the checkpoints (variables) to one file (plus the index).
This reduces the binary model files to just the variables and their
index. The index is very small. The variables are serialized float
arrays. When updated through training, the changes are very likely
unlocalized, so there's very little value in them being anything else
than binary.
Summary:
This is an experimental ML-based native size estimator, necessary for
computing partial rewards during -Oz inliner policy training. Data
extraction for model training will be provided in a separate patch.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html
Reviewers: davidxl, jdoerfert
Subscribers: mgorny, hiraditya, mgrang, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82817
Summary:
eraseBlock is trying to erase all probability info for the given BB.
This info is stored in a DenseMap organized like so:
using Edge = std::pair<const BasicBlock *, unsigned>;
DenseMap<Edge, BranchProbability> Probs;
where the unsigned in the Edge key is the successor id.
It was walking through every single map entry, checking if the BB in the
key's pair matched the given BB. Much more efficient is to do what
another method (getEdgeProbability) was already doing, which is to walk
the successors of the BB, and simply do a map lookup on the key formed
from each <BB, successor id> pair.
Doing this dropped the overall compile time for a file containing a
very large function by around 32%.
Reviewers: davidxl, xur
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83596
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.
Author: sidbav (Sidharth Baveja)
Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)
Reviewed By: Meinersbur (Michael Kruse)
Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80580
This silences the warning below:
llvm-project/llvm/lib/Analysis/DomTreeUpdater.cpp:510:20: warning: loop variable 'BB' is always a copy because the range of type 'const SmallPtrSet<llvm::BasicBlock *, 8>' does not return a reference [-Wrange-loop-analysis]
for (const auto &BB : DeletedBBs) {
^
llvm-project/llvm/lib/Analysis/DomTreeUpdater.cpp:510:8: note: use non-reference type 'llvm::BasicBlock *'
for (const auto &BB : DeletedBBs) {
^~~~~~~~~~~~~~~~
1 warning generated.
Summary: This patch moves OrderedInstructions to CodeMoverUtils as It was
the only place where OrderedInstructions is required.
Authored By: RithikSharma
Reviewer: Whitney, bmahjour, etiotto, fhahn, nikic
Reviewed By: Whitney, nikic
Subscribers: mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80643
Change file static function getEntryForPercentile to be a static member function
in ProfileSummaryBuilder so it can be used by other files.
Differential Revision: https://reviews.llvm.org/D83439
Summary:
Ignore callback uses when adding a callback function
in the CallGraph. Callback functions are typically
created when outlining, e.g. for OpenMP, so they have
internal scope and linkage. They should not be added
to the ExternalCallingNode since they are only callable
by the specified caller function at creation time.
A CGSCC pass, such as OpenMPOpt, may need to update
the CallGraph by adding a new outlined callback function.
Without ignoring callback uses, adding breaks CGSCC
pass restrictions and results to a broken CallGraph.
Reviewers: jdoerfert
Subscribers: hiraditya, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83370
Follow up from the transform being removed in D83360. If X is probably not poison, then the transform is safe.
Still plan to remove or adjust the code from ConstantFolding after this.
Differential Revision: https://reviews.llvm.org/D83440
We can't fold to the non-undef value unless we know it isn't poison. So check each element with isGuaranteedNotToBeUndefOrPoison. This currently rules out all constant expressions.
Differential Revision: https://reviews.llvm.org/D83442
This option was added a while back, to help improve AA around pointer
phi loops. It looks for phi(gep(phi, const), x) loops, checking if x can
then prove more precise aliasing info.
Differential Revision: https://reviews.llvm.org/D82998
These represent the same thing but 64BIT only showed up from
getHostCPUFeatures providing a list of featuers to clang. While
EM64T showed up from getting the features for a named CPU.
EM64T didn't have a string specifically so it would not be passed
up to clang when getting features for a named CPU. While 64bit
needed a name since that's how it is index.
Merge them by filtering 64bit out before sending features to clang
for named CPUs.
If a loop is in a function marked OptSize, Loop Access Analysis should refrain
from generating runtime checks for unit strides that will version the loop.
If a loop is in a function marked OptSize and its vectorization is enabled, it
should be vectorized w/o any versioning.
Fixes PR46228.
Differential Revision: https://reviews.llvm.org/D81345