Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.
Test case IR is generated using Fortran front-end.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99273
Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot.
This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D99550
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:
1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.
2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.
3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.
4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.
Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.
I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.
The change is an enhancement to https://reviews.llvm.org/D95988.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D99351
tries to check for the
absence of a sequence of instructions with several CHECK-NOT with
one of
those directives using a variable defined in another.
LLVM test CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll tries to
check for the absence of a sequence of instructions with several
CHECK-NOT with one of those directives using a variable defined in
another. However, CHECK-NOT are checked independently so that is using
a variable defined in a pattern that should not occur in the input. The
bug was then copied over in
Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector-inseltpoison.ll
This commit removes the definition and uses of variable to check each
line independently, making the check stronger than the current one.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D99597
LLVM test Transforms/LoopVectorize/X86/x86-pr39099.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.
This commit only checks for the absence of a widened load which rules
out the presence of the whole sequence and does not involve an undefined
variable.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D99583
LLVM test Transforms/HardwareLoops/ARM/structure.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.
This commit only checks for the absence of llvm.loop.decrement.i32 which
rules out the presence of the whole sequence and does not involve an
undefined variable.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D99591
This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 .
`select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later
transformations confused.
Simplify the branch condition to not have the form.
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:
1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
vectors. I did this by calculating the full vector value for each Part
of the unroll factor (UF) and caching this in the VP state. This means
that we are always able to extract an arbitrary element from the vector
if necessary. In addition to this, I also permitted the caching of the
individual lane values themselves for the known minimum number of elements
in the same way we do for fixed width vectors. This is a further
optimisation that improves the code quality since it avoids unnecessary
extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
testing some code paths I noticed this is currently broken for scalable
vectors.
Various tests to support different cases have been added here:
Transforms/LoopVectorize/AArch64/sve-inductions.ll
Differential Revision: https://reviews.llvm.org/D98715
MCJIT served well as the default JIT engine in lli for a long time, but the code is getting old and maintenance efforts don't seem to be in sight. In the meantime Orc became mature enough to fill that gap. The newly added greddy mode is very similar to the execution model of MCJIT. It should work as a drop-in replacement for common JIT tasks.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D98931
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Differential Revision: https://reviews.llvm.org/D94355
during profile update.
When we inline a function and update the profile, the value profiles of the
indirect call in the inliner and inlinee will be scaled. In
https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we start
using the magic number NOMORE_ICP_MAGICNUM (-1) to mark targets which have
been promoted. The magic number shouldn't be scaled during the profile update.
Although the problem has been suppressed by https://reviews.llvm.org/D98187
for SampleFDO, which stops profile update for inlining in sampleFDO, the patch
is still wanted since it will be more consistent to handle the magic number
properly in profile update.
Differential Revision: https://reviews.llvm.org/D99394
Re-apply 25fbe803d4, with a small update to emit the right remark
class.
Original message:
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.
A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.
Reviewed By: lebedev.ri
This is a 2nd try of:
3c8473ba53
which was reverted at:
a26312f9d4
because of crashing.
This version includes extra code and tests to avoid the known
crashing examples as discussed in PR49730.
Original commit message:
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508
Differential Revision: https://reviews.llvm.org/D98981
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.
A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98634
Using $ breaks demangling of the symbols. For example,
$ c++filt _Z3foov\$123
_Z3foov$123
This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.
Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:
$ c++filt _Z3foov.123
foo() [clone .123]
This is already done in some places; try to do it everywhere.
Differential revision: https://reviews.llvm.org/D97484
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
This was originally just an XFAIL test, but I modified it
to check output. To make that bot-friendly, I'm moving it
to the x86 dir since it specified an x86 target.
This reverts commit 3c8473ba53 and includes test diffs to
maintain testing status.
There's at least 1 place that was not updated with 7202f47508 ,
so we can crash mismatching select and intrinsics as shown in
PR49730.
The tests, test/Transforms/InstCombine/AArch64/sve-*,
have been shown to not be AArch64 specific. These tests
have been renamed and moved to reflect this.
Differential Revision: https://reviews.llvm.org/D99253
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.
unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(...
After some investigation it seems that there are only these cases that occur
in practice:
1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.
I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.
Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.
Differential Revision: https://reviews.llvm.org/D98512
In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code.
To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg.
Differential Revision: https://reviews.llvm.org/D98899
The SCEV commit b46c085d2b [NFCI] SCEVExpander:
emit intrinsics for integral {u,s}{min,max} SCEV expressions
seems to reveal a new crash in SLPVectorizer.
SLP crashes expecting a SelectInst as an externally used value
but umin() call is found.
The patch relaxes the assumption to make the IR flag propagation safe.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D99328
D95598 added a cost model for broadcast shuffle, which should enable loops
such as the following to vectorize, where the load of b[42] is invariant
and can be done using a scalar load + splat:
for (int i=0; i<n; ++i)
a[i] = b[i] + b[42];
This patch adds tests to verify that we can vectorize such loops.
Reviewed By: joechrisellis
Differential Revision: https://reviews.llvm.org/D98506